Contents

Checksum

For array of integer numbers, Checksum is some derived value calculated using all the members of an array in some whimsical way. Its main property is that if two arrays are equal, their checksums are equal too. (however, if checksums are equal, it does not necessarily mean they are calculated over the same arrays)

The straightforward use of checksums in programming is for verification of large amounts of data. After receiving some file (from email for example, or unpacking some archive) we are usually interested whether it was corrupted or not. Storing and transferring checksums along with the files in such cases are good idea (archivers do this automatically).

For the practical example we have calculations from the task Weighted sum of digits, where checksum is calculated (regarding initial numer as an array of digits) as the sum of digits multiplied by their position in the number's decimal representation. Such checksum will obviously yield the same values for numbers 1, 10, 100 etc. Though anyway this checksum function is far better than simple sum of digits since it covers greater range of values for the same range of initial numbers.

Of course the mentioned property - the range covered by values of checksum (for any possible sets of initial data not exceeding some determined size) and uniformity of distribution of checksums in this range - these are the most important metrics of whether the method is good or not.

We will often use the other simple algorithm:

  1. choose some arbitrary number (primes are good) as seed - let us use 113;
  2. choose some big number as the limit for checksum - let it be 10000007;
  3. initialize result value to 0;
  4. initialize array index to the start of array;
  5. add the member pointed by index to result;
  6. multiply result by seed;
  7. take the result by modulo limit (i.e. take the remainder of dividing result by limit if necessary);
  8. increment index to point to next element of array;
  9. if array is not exhausted, return to step 4.

Let us have an example - here follows the the calculation of checksum for array a = [3, 1, 4, 1, 5, 9]:

result = 0

result = (result + a[0]) * seed = (0 + 3) * 113 = 339

result = (result + a[1]) * seed = (339 + 1) * 113 = 38420

result = (result + a[2]) * seed = (38420 + 4) * 113 = 4341912

result = (result + a[3]) * seed = (4341912 + 1) * 113 = 490636169
result = result % limit = 490636169 % 10000007 = 635826

result = (result + a[4]) * seed = (635826 + 5) * 113 = 71848903
result = result % limit = 71851163 % 10000007 = 1848854

result = (result + a[5]) * seed = (1848854 + 9) * 113 = 208921519
result = result % limit = 209176899 % 10000007 = 8921379

so result is 8921379. Note that on first few steps we skipped taking the remainder (since while result is less than limit, its remainder after division is just the same).

Task Array Checksum provides an exercise for exactly this algorithm.