Saturday, 15 March 2014

Stata and Benford's Law

I came across this interesting phenomenon called Benford's Law. It states that in a naturally occurring random number sets the probability of 1 being the leading digit of any given number is ~30% and not ~11% as one would expect (the options being 1 through 9). 2 comes up about 18% of the time with decreasing chances all the way through to 9 with ~5%

One thing to note is this dose not happen in limited sets like human hight but does come up everywhere from credit card numbers to the lengths of rivers and this happens irrespective of the units of measurement. Its a difficult phenomenon to understand and involves some mathematics which I don't so if you want to find out more check out the wikipedia page: https://en.wikipedia.org/wiki/Benford's_law

The law can be used to test for fraud because if a set is fabricated by a person or computer it should fail to obey the distribution. I wanted to test this law in Stata and the code below does that - showing an equal distribution and failing the law.

set obs 1000000
gen num = int(1000*uniform())
gen str10 numstring = string(num , "%6.0f")
gen numstring1 = substr(numstring,1,1)
destring numstring1 , generate(num1)

histogram num1 , bcolor(ebblue) lwidth(vthick) 


This happens because Stata and most computer programs cannot generate random numbers; they generate pseudo-random sets.

No comments:

Post a Comment