Advanced Math/Missing numbers
Dear Mr Oliver,
I have been rescuing spreadsheet workbooks for my uncle (he found them in a cupboard when he bought a business). They only exist on paper, and it most cases it's been completely straightforward.
Unfortunately some of the pages are damaged (it looks like water damage). I've managed to reconstruct some of them where the damage is slight and the mathematics is simple. But some of them are in a very bad state.
It occurred to me that where I have all the summation figures, at the end of each row and the bottom of each column, then perhaps I don't need the information in the cells at all. I drew up a very small example (four cells by four cells) and there was only one way to make the numbers add up to the sum cells.
Is there a way to do this inside a spreadsheet program? Can I get my computer to do it without a spreadsheet? My uncle doesn't expect me to do the impossible, but I'd like to do more than he expects!
Unfortunately, this is a problem that gets worse if you are not looking at a small sample.
Assume the spreadsheet has R rows and C columns. This means there will be RxC entries in the spreadsheet, plus totals along each row and each column.
This gives R+C conditions that must be true (R sums for rows, C for columns). This means you have a system of R+C equations. For example, if your figures were these:
a b c d
e f g h
i j k l
Then you have 3+4=7 equations, which would be:
a+b+c+d = R1
e+f+g+h = R2
i+j+k+l = R3
a+e+i = C1
b+f+j = C2
c+g+k = C3
d+h+l = C4
However, it is possible (in the worst case) that you are missing ALL of the entries. This leads you to a system of 7 equations and 12 unknown quantities. It is impossible to solve for more than seven of the quantities (you are limited by the number of equations). As it turns out, you can't even do that much -- it is only possible to solve for six of them.
The reason you can only recover six is because you only have six equation's worth of information. Any one of these seven equations can be derived from the other six. For example, the last equation is the sum of the first three minus the next three. (In other words, if you know all the row sums, and all the column sums except one column, you can determine the last column -- it won't give you any new information.)
If you had a 100x100 spreadsheet, you would have only 200 equations, but 10,000 variables, which means you would only be able to solve for (at most) 2% of the missing entries.
If you had a spreadsheet that was 100x100 and there were only 199 entries smudged off, you might be able to recover them all, but it would depend. For example, you could easily recover one missing entry per row (that's super easy, just figure out how much is missing from the row total). It would not be so easy if you were missing a big block of entries in the middle of the spreadsheet, even if it were the same number of entries.
In general, if you had R rows and C columns, the best you could do is to recover R+C-1 of them. There is a mathematical way of determining if you could indeed recover R+C-1 of these entries (possibly depending on which entries). Based on some preliminary analysis, I am relatively sure that you can always recover R+C-1 missing entries (but never more, since you only have R+C equations and one of them is redundant).
I do not believe you can do this in a spreadsheet. Spreadsheets usually cannot solve systems of algebraic equations. You may be able to do this with a computer algebra program like Maple or Wolfram Mathematica, but those programs are fairly expensive.