Why can spreadsheet references change?
Spreadsheets are full of little details you have to get into the weeds of. For example, everyone has seen what a range looks like in a spreadsheet - “C2:D6”. Seems simple! But looking closer, these references and ranges pack a lot of surprising behavior into them; they change when the cell they’re in is copied or shifted within the spreadsheet!
Can you imagine if you copy and pasted some JS code like let result = x + 4
and as it pasted it became let result = y + 4
?!
To understand why this is the case, let’s first look at what ranges and references do when you move cells:
What’s in a range?
Unpacking a range like C2:D6 gives us the start and end references; C2 and D6. A range reference is inclusive at both ends. So our total height of the range is 5, and width is 2. That reference C2:D6 isn’t actually enough in all cases though; it doesn’t specify what sheet the reference is for! By default it means “the current sheet” but we often need to be more specific. So we can append the sheet name to the start, remembering to quote it if it contains spaces: ‘Summary data’!C2:D6.
Now that we’ve got a definite range, we can do things in the spreadsheet that impact that range. For example, let’s add a new row at row 3. Inserting a full row there will change our reference to be C2:D7. This is one of the main building blocks of spreadsheets and why range based addressing is so powerful - any reference to some range of data will now “do the right thing” and encompass the extra row.
Ok, but what if we took a cell with that range in it, and copied that cell to somewhere else? For example, copying E1 to E8 here. What formula pastes in E8?
Yep, the range gets translated down (you can think of the range being specified as a fixed offset from whatever cell contains the formula).
This does pose some more questions. What if I paste in column A where there’s no room to the left for the offset? (you get a #REF! error). What happens if I paste in another sheet? (you get the same offset, but the sheet name remains, so you’re referring back to the offset cells on the original sheet just in a new offset). What happens if I cut and paste? (the range remains unchanged to point at the original cells). Oof. OK.
Now let’s look at filling operations - you can pull the corner of a cell around to “drag to fill”, but there’s also the ability to do a “fill down” operation if you select some starting cell and where you want it to fill (these are slightly different). Here let’s look at how the range changes if we fill down our formula in the purple area…
The range shifts down here row by row!
So these shifting operations can be controlled by locks. Locks are set by adding “$” characters in front of either the column or row component of a reference, and they prevent shifts in that dimension when copying or filling down. For example, here we lock the row by setting the formula to be =SUM(‘Summary data’!B$2:C$6). Then filling down doesn’t move it:
Similarly, copy pasting in the same column won’t alter the range. However, filling across or pasting in another column would translate the column portion of the range unless that was locked too!
Because the start and end references of a range are treated separately, you can even lock just one half of the range. If we just lock the start reference and fill down then we see how the end of the range grows as the start stays fixed, changing the overall size of the range!
Simple ranges end up being quite complicated to implement correctly!
Why though?
At first, all this behavior seemed insane to me - why was it so complicated? After using spreadsheets in anger for a while the answer is really that programming in a spreadsheet is a different paradigm to how we write code in regular imperative programming languages. It’s 2-dimensional instead of linear, and the formulas are parameterized by their cell’s location.
To see why spreadsheets work like this, we can look at writing something like a map function in a spreadsheet. Where we’d write something like the following in another language:
const arr = [1, 2, 3, 4, 5]
const mapped = arr.map((element) => element * 2)
// [2, 4, 6, 8, 10]
In a spreadsheet we’d write the formula to map a single element once, and then fill down:
We can do all the same stuff - we’re just programming while using the 2D grid as a form of input! Instead of thinking of the reference as pointing to a specific address, instead think of it as pointing to some offset from the current cell. Maybe we can refer to the reference in a format like R[$rowOffset]C[$colOffset]
(this is called “R1C1 notation” in Excel!).
In this case, our formula could be rewritten as =R[0]C[-1] * 2
. Note now that this formula is the same for every value in our mapping column C2:C6 here! That’s an important properity, and enables several kinds of optimization - like those which power our vectorized “calculated columns” in Equals.
Seen through that lens - of references being relative, and the behavior of changing references makes a lot more sense. This spatial nature of formula evaluation also lets you do things like build calculation “templates” in a spreadsheet, that can be copied and pasted around.