Random points of contention

May 21, 2012

I’m working on some high performance code which needs to scale to many, many CPUs. Against better judgement, I decided to use threads again, but to steer clear from anything that needs locking. I’ve previously found this to be very hard, and this time round proved to be no exception.

In my “lock free code”, the system was still spinning wildly with context switches, plus strace() saw a lot of calls to futex(2), the Linux primitive that underlies most locks & mutexes.

It took me a while to figure it out (why won’t gdb set a breakpoint on futex()?), but it turns out that random(3) is guaranteed to provide the same sequence of numbers if it was seeded identically, and that this promise for some reason is kept even in multithreaded setting. In other words random(3) is serialized using a shared state.

And indeed, in the dungeons of glibc (stdlib/random.c), we find:

long int __random ()
{ 
  int32_t retval; 
  __libc_lock_lock (lock); 
  (void) __random_r (&unsafe_state, &retval); 

  __libc_lock_unlock (lock); 
  return retval;
}

And given the amount of (non-cryptographic) random I required, this was a major slowdown in scaling to multiple threads. There are, by the way, at least 80 other ’embedded locks’ in glibc, but most (except for stdio) do not appear to be in performance critical code.

 

In a random()-dominated benchmark, the single-threaded case ran in 1.7 seconds, using 4 threads this exploded to 69 seconds and with 8 threads it clocked a whopping 113 seconds. This all on a 4-core i7 system, while incurring 800k context switches/s.

The reason that this behavior from random() is noteworthy is that random() has long been considered ‘not perfect but at least fast’, and that when we go multithreaded, this rapidly turns into ‘not perfect and slow, but keeping a promise you probably don’t value about being predictable for a given seed’. The last bit is especially useless since multi-threaded programs rarely execute in a deterministic manner.

So to keep a promise you don’t value, random() became really slow. And in fact, the very random() manpage mentions the non-reproducibility:

“This function should not be used in cases where multiple threads use random() and the behavior should be reproducible.”

To be honest, I don’t know what else glibc could’ve done here – the seeding srandom() function has never been specified to initialize the random() system for only one thread, so it must necessarily do so for all threads. And this in turn requires that random() accesses the shared random state.

At the very least, the manpage should contain a big warning that random() can be a source of contention, and that the use of random_r() should be considered if performance is required. I’ll be drafting a paragraph for the inestimable Michael Kerris that does just that.

About the author

Bert Hubert

Bert Hubert

Principal, PowerDNS

Related Articles

PowerDNS Authoritative Server 4.9.0

This is release 4.9.0 of the Authoritative Server. It brings a few new features, and a collection of small improvements and...

Peter van Dijk Mar 15, 2024

PowerDNS Recursor: Extended DNS Errors Help You Troubleshooting

This is the seventh episode of a series of blog posts we are publishing, mostly around recent developments with respect to...

Otto Moerbeek Mar 12, 2024

PowerDNS Recursor 4.8.7, 4.9.4 and 5.0.3 Released

Today we have released PowerDNS Recursor 4.8.7, 4.9.4 and 5.0.3. These releases are maintenance releases that fix a few...

Otto Moerbeek Mar 7, 2024

PowerDNS Authoritative Server 4.9.0-beta2

This is release 4.9.0-beta2 (beta1 was not released, due to a tagging mistake) of the Authoritative Server. It brings a few...

Peter van Dijk Feb 16, 2024