While adding the ELO ranking system to LibCapy I wanted to generate pseudo-random floating-point numbers to make the unit test. It's a subject I had until then overlooked, so I felt like learning a bit more about it. As one should expect it's just another never ending story and after spending a whole afternoon browsing Wikipedia's articles, without understanding one-tenth of them, it took me several days to escape from that black hole...
Integer random numbers
The ISO C srand() and rand() functions are implemented in the glibc library as a linear-feedback shift register, which gives better performances over the linear congruential generator suggested by the standard. Happy me, from reading Wikipedia articles that's the type of pseudo-random generators I thought would fit my needs: playing with procedural generation, generative art, machine learning, etc... (and not serious cryptographic stuff). In particular, the xorshift algorithm developped by Pr. George Marsaglia has the quality of being very simple and performant, so I decided to use it in LibCapy.
With the view to check my implementation of xorshift I've searched how to evaluate the performance of a generator. This is an overly complicated subject as one could expect, so I thought I would first look at one simple characteristic: these integers should be uniformly distributed. I.e. in the long run, every possible integer should be generated approximately the same number of time.
The distribution produced looks like this: (buckets on the x-axis, number of occurence on the y-axis)
We can see that both distributions are relatively uniform as desired. Is that uniform 'enough' (I'll learn later it's the chi-squared test) ? Isn't grouping several values into one bucket hiding some bias ? It's just one run, would that repeat consistently ? Yes, such a simple test is very weak, but bear with me one second. At least, my implementation of xorshift giving similar results to rand suggests I haven't completely screwed it up.
Floating point random numbers
Pseudo random generators generate sequences of bits usually interpreted as integers. In ISO C it's a 2 or 4 bytes integer value between 0 and RAND_MAX for rand. For CapyRandom I'm using 8 bytes integer value, from 0 to UINT64_MAX. Now, I need to generate floating-point numbers too. The solution I've always used was too simply divide the pseudo-random integer value by it's maximum possible value to get a real in [0.0, 1.0] (and scale/translate it as needed). But thinking about it I wondered if there isn't a bias induced by the non uniformity of the representation of floating point values. Then I modified my test program as follow:
which produced the following results:
There is still no obvious problem, however I kept skeptical. Then I searched on the subject and I came accross this paper by Pr Allen B. Downey. As he explains, generating floating point number by division is good enough for simple use case but it has the flaw to miss many possible floating point values (hidden inside the buckets). The exact number of missed values can be obtained with the following code for float:
It gives Nb missed values: 981467136/1065353217 92.1260%. Using RAND_MAX instead of UINT32_MAX makes things even worst of course: Nb missed values: 989855744/1065353217 92.9134%, and match the calculation of Pr Downey. The proportion of missed values is staggering! Doing the same for double would take around 2760 years on my machine, so I won't even try to parallelise it. But for the exact same reason we can expect the same problem.
This missing values were indeed one of my concern, but another one was about repeated values (when i/UINT32_MAX and (i+n)/UINT32_MAX,n>0 produce the same floating-point value). This can be checked with the code below:
The results are:
In conclusion, the generated values using integer division are a very small and highly repeated subsets of the representable floating-point values. That's really far from satisfactory...
To avoid these two problems, I first thought of using the random sequence of bits directly as a double value. Of course this generates all possible floating point values but the distribution also get overwhemingly condensed near 0.0 where most of the representable values are. You also have to take care of the sign bit and scale it to bring it to [0.0, 1.0]. Pr. Downey gives a very nice solution to that problem in his paper: use a random mantissa and calculate the exponent in such a way that the probability for a given number to be choosen is proportional to the distance to his neighbours. His algorithm works on float values, while I was interesting in double values, so I've made a double version of it:
Of course, it still passes my crappy test:
Better tests
Excited by Pr Downey's algorithm, I got a bit more motivated and tried again to find some more serious ways to check the generated numbers. I've found in particular this utility program called ent and developped by John Walker to check sequences of pseudo-random bytes. I've generated some data files as follow:
The output of the ent utility on these data are given below.
Int - rand
Int - Capy (xorshift)
Check the page of the ent utility for the explanation of each value. I get the same poor results for the rand function, and much better results for xorshift. However, it is important to understand that the rand function returns a value in the range [0, RAND_MAX] as a signed int. Excluding the negative values implies that half the values of the high-order byte aren't used, inducing a bias. This is unfair to compare it to my xorshift implementation which generates all the possible values. Excluding the high-order byte in the data for rand (Run(TestRand, "intRand.dat", sizeof(int) - 1);, as I'm on a little endian architecture, i.e. the high order byte is the last byte), I get:
Int - rand (without high-order byte)
Results are immediately much better. The exact results vary from one run to the other, so to get a better appreciation I've ran them a thousand times and counted how many times the percentage of the chi-squared test fall into the 10%-90% zone.
The results were: 796 for rand without the high-order byte, 781 for xorshift, and ... 0 for rand with the high-order byte! That makes xorshift and rand as good as each other, with the big downside for rand generating numbers in a very small range of values compared to my 64 bits implementation of xorshift.
To evaluate the quality of the floating-point numbers generation, I reused the Monte Carlo calculation of Pi and compared the combinations of rand/xorshift and scaling/Downey.
The results were:
The value itself is not very important and vary slightly for each run, but the smaller the better. These results show no significant difference between the different algorithms. That's a bummer! Why is there no difference? In fact, in that test the missing values have no influence. Whatever the values, we only care of there ratio inside the circle/outside the circle. And thinking of it carefully, the repeated values too. Being repeated the values near 1.0 will be more often generated than the values near 0.0, but they equally represent a larger interval. Hence, the ratio is not affected by them neither.
What would be a better test then? It should be sensitive to these missing and repeated values. My thoughts went toward some visualisation of a chaotic system but I wasn't able to come up with anything useful. Or, using the Monte Carlo method to calculate something sensitive to values near 0.0, where most of the missing values are? Or, the test of a system for which the generated subset of all double input values by division woud be a poor design choice? Pr Downey himself leaves opened the question of finding a real application were his algorithm brings an advantage.
Performance
In term of execution time, generating 64 bits at once and using buffer gives a boost in performance which can be measured as follow:
which gives:
Here too, the value itself is not very important because it will vary depending on how many bits you use at once (one bit at a time, or one byte at a time, or two bytes, etc...). The point is that 64 bits xorshift is clearly faster than rand. For the comparison between the Downey algorithm and scaling, as one would expect a single integer division is faster than the bits manipulation in the Downey algorithm.
Conclusion
In conclusion, to generate integer numbers, rand and xorshift are as good (or bad depending on what you do with them) as each other, but an implementation of xorshift generating 64 bits values and buffering ensure better performance in term of execution time. Then, I decided to use 64 bits xorshift in LibCapy for integer generation. About floating-point numbers, even if I couldn't prove the advantage of Pr Downey's algorithm, I can't help but thinking that the richness and uniformity it provides will prove helpful one day. The cost in time is not terrible, and it looks so cool! Then I decided to use it in LibCapy for double generation as default, but added also the division method, in case I want to compare them again one day, or need a bit more of speed.
If you know/think about an application which could prove/disprove the advantage/disadvantage of Pr Downey's algorithm, please let me know, I'll be happy to try it, and if it does provide good result Pr Downey will surely be happy to here about it!
Edit on 2022/02/02: I came across that excellent video of Pr Squirrel Eiserloh speaching about random number generation. A definite must see.
Edit on 2022/12/06: I've learnt more about Permuted Congruential Generator described by Pr Melissa O'Neill in this paper. Given the simplicity of that algorithm and its performance it became the default one in LibCapy. I invite you to read her paper.
Edit on 2023/07/26: correction of typos in the percentages.
Edit on 2023/10/21: Another interesting article on corsix.org.
Edit on 2024/04/06: An analysis of the performance of Lemire's algorithm for generation of random integers here.