Council of Europe - Venice Commission

Report on the Identification of Electoral Irregularities by Statistical Methods

Download file

Paragraph 46

As an illustration, suppose we wish to test whether the terminal digit behaves as if it is random, in particular, whether all digits are equally likely to occur and whether there is any connection (dependence) among the final digits in different reporting groups. The statistical question is whether the final digits are “independently and uniformly distributed on {0, 1, ..., 9}.” There are a variety of standard statistical tests, but they are sensitive to different kinds of anomalies. For instance, we might test using the mean of the terminal digits (which is expected to be 4.5 if the digits are indeed uniformly distributed), or using the Kolmogorov test of the empirical distribution against the theoretical probability mass function that assigns chance 0.1 to each possibility; or using a chi-square test for equal frequencies of all digits¹; or using the multinomial range test to compare the most frequent and least frequent digits; or a test for multimodality; or a test based on the frequency with which 0s and 5s occur; or any number of other tests. If we use enough such tests, and especially if we examine the data before choosing which tests to apply, it can be quite likely that at least one will classify the election as “tainted.”

¹ Consider the hypothesis that the terminal digits are random, independent, and have a 10% chance of being equal to 0, 1, 2, ..., or 9. The chi-square test involves comparing the observed frequency of each digit to its expected frequency if that hypothesis were true--namely, 10% of the number of measurements. The chi-square test sums the squares of the differences between the observed and expected frequencies, each divided by its expected frequency. That sum is an overall measure of how the ten observed frequencies differ from their expected values. If the hypothesis is true, the sum is expected to be relatively small. If, on the assumption that the hypothesis is true, the sum is surprisingly large, that is evidence that the hypothesis is not true.