Random points of contention

May 21, 2012

I’m working on some high performance code which needs to scale to many, many CPUs. Against better judgement, I decided to use threads again, but to steer clear from anything that needs locking. I’ve previously found this to be very hard, and this time round proved to be no exception.

In my “lock free code”, the system was still spinning wildly with context switches, plus strace() saw a lot of calls to futex(2), the Linux primitive that underlies most locks & mutexes.

It took me a while to figure it out (why won’t gdb set a breakpoint on futex()?), but it turns out that random(3) is guaranteed to provide the same sequence of numbers if it was seeded identically, and that this promise for some reason is kept even in multithreaded setting. In other words random(3) is serialized using a shared state.

And indeed, in the dungeons of glibc (stdlib/random.c), we find:

long int __random ()
  int32_t retval; 
  __libc_lock_lock (lock); 
  (void) __random_r (&unsafe_state, &retval); 

  __libc_lock_unlock (lock); 
  return retval;

And given the amount of (non-cryptographic) random I required, this was a major slowdown in scaling to multiple threads. There are, by the way, at least 80 other ’embedded locks’ in glibc, but most (except for stdio) do not appear to be in performance critical code.


In a random()-dominated benchmark, the single-threaded case ran in 1.7 seconds, using 4 threads this exploded to 69 seconds and with 8 threads it clocked a whopping 113 seconds. This all on a 4-core i7 system, while incurring 800k context switches/s.

The reason that this behavior from random() is noteworthy is that random() has long been considered ‘not perfect but at least fast’, and that when we go multithreaded, this rapidly turns into ‘not perfect and slow, but keeping a promise you probably don’t value about being predictable for a given seed’. The last bit is especially useless since multi-threaded programs rarely execute in a deterministic manner.

So to keep a promise you don’t value, random() became really slow. And in fact, the very random() manpage mentions the non-reproducibility:

“This function should not be used in cases where multiple threads use random() and the behavior should be reproducible.”

To be honest, I don’t know what else glibc could’ve done here – the seeding srandom() function has never been specified to initialize the random() system for only one thread, so it must necessarily do so for all threads. And this in turn requires that random() accesses the shared random state.

At the very least, the manpage should contain a big warning that random() can be a source of contention, and that the use of random_r() should be considered if performance is required. I’ll be drafting a paragraph for the inestimable Michael Kerris that does just that.

About the author

Bert Hubert

Bert Hubert

Principal, PowerDNS

Related Articles

PowerDNS 2.x End of Life Statement

PowerDNS Authoritative Server 2.9.22 was released more than 6 years ago, in January 2009. Because of its immense and durable...

Peter van Dijk 05/6/15

A surprising discovery on converting IPv6 addresses: we no...

Yesterday, we were contacted by PowerDNS user James Baer who noted strange crashes in PowerDNS (on Linux) upon adding...

Bert Hubert 05/4/14

When DNS is cool and when it is not

Whenever massive query rates are desired for globally distributed data, with high redundancy and built in positive and...

Bert Hubert 11/4/09

Domain security outside of DNS: Getting hacked administratively

This is a brief blogpost on the news that has been sent to us by many people, namely that there is a suspected Iranian group...

Bert Hubert 01/4/19