I was reading xkcd by clicking on the random link when I noticed that the same cartoons were coming up again and again. I was wondering if this was Confirmation bias on my part or a duff random number generator on the server's part. Randall Munroe is a science geek, and I figured what he would do is test this idea...
One python script and 12,000 (approx) requests later, I had a file full of numbers and started trying to remember some statisics. I threw together a quick bit of python to work out the mean and standard deviation (its here). The mean value is 97.4155602788 and the standard deviation is 171.040683155. If they are uniformly distributed from 1 to 361, the expected mean would be 104.211323761 and the standard deviation would be 181.0. So I was lost in thought for a while.
However, the following quick check threw some light on the difference.
>>> data = [int(x) for x in open('numbers.data')]
>>> for x in xrange(0, 361):
... if x not in data:
... print x
...
0
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
So comics numbered about 338 don't appear. And recomputing the mean and standard deviation for 1 to 337 gives a mean of 97.2830920561 and a standard deviation of 169.0, which is about the mean and standard deviation the data gives. I'm now waiting for someone who actually knows stats correcting me in where I went wrong.
The conclusion? I suffer from confirmation bias :-(.
For those who like pretty pictures, here is one, courtesy of the Google charts API and pygooglechart:
posted at: 01:04 | path: /computing | permanent link to this entry