C++/Very fast Random Number Generator
I am coding a simple simulation program in C++. This simulation requires generation of 410 random numbers (non-sequential and different on each run of the program) in a loop (which can be terminated by certain action from the user's side, otherwise infinite). I tried using the standard rand() and srand() implementations, system time as seed, but they are proving to be slow as a turtle! Please recommend me the fastest RNGs in your knowledge.
Unfortunately for you the subject of PRNG performance has never come up for me but I am not aware that such implementations generally were all that slow for general use - although of course this will vary between platforms and implementations of rand / srand and the meaning of fast and slow in a particular context. You give no hints of the sort of general performance of the computer you are running your simulation on (e.g. 3GHz Intel 64 bit multicore beast, or small 8-bit 4MHz embedded processor, or something in between) nor what sort of time constraints you require other than you think rand is running like a drugged slug. Stating something like requiring 10,000,000,000 random numbers on a 2GHz dual core super scalar machine (which might prove tricky as such a beast can probably only crunch around 8 billion instructions per second tops) would have given me some idea of the performance level you require. It is also a bit unclear why if your simulation requires 410 values you generate values continuously until some user action interrupts the process.
I would first double and triple check your code and that you are using rand and srand correctly (e.g. that you are only calling srand once per program execution - this might seem an obvious mistake but I find many people get this detail wrong at first) and the code is behaving in the way you think it should. Try to get timings for the execution of rand (which should be the only repeated call out of rand, srand and time in the loop). It may not be rand that is dragging down the loop iteration times (e.g. it could be the I/O required to see if the user had interrupted - I/O operations are often quite - or very - slow in comparison with raw computations).
If you find rand is still too slow you might try using an alternative compiler / standard library combination or maybe try the new C++ specific random number support in an up to date C++ library implementation if available for your platform - either a C++11 library or C++03 TR1 library - or the Boost Random library on which they are based. Such support includes different random number engines with different characteristics (speed, storage, length of repeating sequence,...):
- N3337 (PDF) is a just-after-ISO-C++11-standard document working draft of the C++ standard.
See section 26.5 for the C++11 library random number support
-- section 5.1 (reference)
Using the C++11 linear_congruential_engine or subtract_with_carry_engine might give you the performance you require.
Failing that you should take to the Intertubes, spending some time with your search engine of choice, to see if there are any high performance PRNGs available and suitable for your needs. For example the Mathtools site:
Has many entries for various tools on maths, science and engineering topics.
The C/C++ entries section:
Has a Random numbers sub-section with 11 entries:
Some mention a bit about performance, e.g. the blurb for the first entry "C++ Random Number Library" states:
"...the set up time for each type of distribution is relatively
long but it is efficient when generating each new random number."
You might like to start off with a query like:
high performance PRNG
also the opposite:
C slow rand
might give some interesting leads such as:
For some rough idea of the GNU C/C++ rand implementation performance I devised the following program which simply loops summing the returned value of rand and collects crude timings using clock to collect start and end tick timings around the whole loop. I built and ran it with no optimisation - that is -O0 - on a Raspberry Pi running the Raspbian port of Debian Linux. The Raspberry Pi uses an embedded system on a chip with a 32-bit ARM processor running at 700MHz - that is it is not that high speed a machine:
unsigned int const NumberOfSamples(1000000);
long long sum(0);
clock_t start_tics = clock();
for (unsigned int i=0;i<NumberOfSamples;++i)
sum += rand();
clock_t end_tics = clock();
clock_t delta_tics = end_tics - start_tics;
double delta_t = double(delta_tics)/CLOCKS_PER_SEC;
double delta_t_1_sample = delta_t / NumberOfSamples;
std::cout << "Average of " << NumberOfSamples
<< " random integers in range 0 to " << RAND_MAX
<< " is " << sum / NumberOfSamples
<< "\n" << NumberOfSamples << " samples took "
<< delta_t << " secs (" << delta_t_1_sample
<< " secs per sample).\n\n"
The results were similar to:
Average of 1000000 random integers in range 0 to 2147483647 is 1073003163
1000000 samples took 0.24 secs (2.4e-07 secs per sample).
Each run had the same timings and equate to more than 4 million iterations of the loop per second.
However, should we mistakenly place the PRNG seeding line inside the loop like so:
for (unsigned int i=0;i<NumberOfSamples;++i)
srand(time(0)); // ### Oops!!
sum += rand();
Then the results show around a 90 times slowdown:
Average of 1000000 random integers in range 0 to 2147483647 is 1232457843
1000000 samples took 21.65 secs (2.165e-05 secs per sample).
And by the way seems to have skewed the distribution of values upwards somewhat.
(and yes I did have to wait that 21+ seconds for this version of the built program to complete its run!).
Should you require better timing information then I suggest you look into the subject of profiling code both in general and specifically what you need to do for the build tools you are using.
Hope this gives you some clues, points you in some useful directions and gets you moving on your project.