Many analysts in the fields of audit, forensic and accounting sciences have heard of Benford’s Law. I have not seen many practical cases where this law has been applied to identify fraud or manipulation of data but, in the few cases where I did, it almost always delivered interesting but to some extent confusing results.
I believe the confusion stems from a lack of understanding the mathematical basis of Benford’s Law. As an auditor I grasp the concept and how it can assist in the identification of data manipulation but I lack the mathematical understanding to effectively interpret the results of a Benford analysis. In fact, from a review of articles written by mathematicians, it seems even they believe the exact reason why certain natural numbers follow this law and others not, still require further study and description.
But let us start at the beginning. According to an article published in the American Scientist (T.P Hill – 1998), the astronomer and mathematician Simon Newcomb published a two-page article in the American Journal of Mathematics (1881) reporting a mathematical phenomenon. Newcomb described his observation that books of logarithms in the library were quite dirty at the beginning and progressively cleaner throughout. From this he inferred that fellow scientists using the logarithm tables were looking up numbers starting with 1 more often than numbers starting with 2, numbers with first digit 2 more often than 3, and so on.
After a short heuristic argument, Newcomb concluded that the probability that a number has a particular first significant digit (that is, first non-zero digit) d can be calculated as follows:
Prob (first significant digit= d) = log10 (1 + 1/d), d = 1,2, … ,9
In particular, his conjecture was that the first digit is 1 about 30 percent of the time and 9 only about 4.6 percent of the time. That the digits are not equally likely to appear comes as something of a surprise, but to claim an exact law describing their distribution is indeed striking.
Newcomb’s article went unnoticed, and 57 years later General Electric physicist Frank Benford, apparently unaware of Newcomb’s paper, made exactly the same observation about logarithm books and also concluded the same logarithm law. Evidence indicates that Benford spent several years gathering data, and the table he published in 1938 in the Proceedings of the American Philosophical Society was based on 20,229 observations from such diverse data sets as areas of rivers, American League baseball statistics, atomic weights of elements and numbers appearing in Reader’s Digest articles. Benford’s article received much wider attention and led to the name “Benford’s Law”
I first got interested in Benford’s Law at a conference in Germany where I met accounting professor Mark Ngrini, originally from South Africa but based in the United States at the time. He did his Phd thesis on Benford’s Law and delivered a very convincing presentation of the practical value of the law in accounting data. Since then I have always looked for opportunities to use Benford’s law in data analysis. I use Arbutus software that provides probably the most advanced Benford functionality.