Sharing data between threads in PowerDNS Recursor

This is the third part of a series of blog posts we are publishing, mostly around recent developments with respect to PowerDNS Recursor. The first blog post was Refreshing Of Almost Expired Records: Keeping The Cache Hot, the second Probing DoT Support of Authoritative Servers: Just Try It.

In PowerDNS Recursor the actual resolving is done using mthreads: a lightweight cooperative thread switching mechanism. This allows us to write the resolving code in a straightforward manner, we can program it as if we are resolving in a synchronous way. The mthreads abstraction takes care of running another mthread when the resolving process for a particular query has to wait for incoming data from the network. Mthreads are mapped to Posix threads, the thread abstraction the C++ runtime provides. Typically a handful of Posix threads run many mthreads, one for each query in-progress. Mthread switching happens only at specific points in the resolving process, basically whenever I/O is done.

Process and Data

The resolving process uses a lot of data: for example cached DNS records and speed measurements of authoritative servers. When the Recursor was originally written this data was associated to the Posix threads as thread-local data. Other Posix threads use other data (because the data is thread-local) and other mthreads only have a chance to modify the data if the current mthread does network I/O. With some care around the steps doing I/O, an mthread can assume it owns the data, and no protection against concurrent access or modification is needed.

This approach is nice and simplifies things but it has a big drawback: each Posix thread has its own copy of the record cache and other data. Several copies of each data item can exist next to each other. This is inefficient: it uses more memory than needed and an mthread run by one Posix thread cannot take advantage of information learned by another mthread that happens to have been run by another Posix thread. We partially mitigated this drawback using our distributor, making sure that all queries for a given name would be served by the same Posix thread, and thus would have access to the cached data.

Sharing data

A few releases ago, we started to solve the ‘lack of data sharing’ problem by gradually moving to a shared data model. Sharing data is nice, but a lot of coordination has to be taken care of, as Posix threads run concurrently: one thread might change the data while another thread is looking at it or is also changing it. To solve this, some form of coordination has to be implemented. A general approach to do this is using a mutual exclusion primitive called mutex, which protects the accessing and modification of data and allows only one thread at a time to operate on the data. This is nice but there are two potential issues with this:

  1. If multiple threads are accessing the data, others must wait, this is called lock contention
  2. Making sure the code has acquired the right mutex while accessing the data is hard, it is easy to forget this.

Lets take a look how we solved both these issues.

Avoiding lock contention

To avoid lock contention, the record cache is split up in many caches called shards. The name of a record determines in which shard the entry is to be found if present. This has the consequence that two mthreads only experience contention if they happen to access records in the same shard. Even on a busy Recursor this chance is quite low, as the metrics the record cache keeps show. The cache also is more effective: as it does not store multiple copies of a single record, there is more room for unique records, and the cache hit ratio can be larger with the same amount of memory used.

For other data an important observation is that data is mostly used to decide if and which authoritative server to send a query to. At that point we know we are going to do I/O to send out a request and we expect having to wait for a reply. The cost of locking and accessing a shared data structure might be high for CPU bound code, but when already setting things up to do I/O the cost of locking is relatively low while the benefits of sharing the data between Posix threads is big: e.g. if one thread found a particular authoritative server to be unresponsive, it helps a lot if another thread can use that information to its advantage and choose to contact an alternative authoritative server.

Getting locking right

Sharing information can be beneficial, but only if you do it right. Making sure shared data is only accessed and modified while holding the proper mutex is a job you do not want to leave to the developer only as it is too easy to make mistakes. Both forgetting to acquire a mutex as well as forgetting to release a mutex (for example in a not so often executed error path) can happen easily. Tools like ThreadSanitizer (TSAN) can help to detect these errors, but have the drawback they can only detect errors while running the program and exercising the particular error path.

Luckily C++ has some facilities to make locking easier. My colleague Remi Gacogne used these primitives to create a locking mechanism that is easy to use and which prevents locking mistakes.

Lets first look at the basic mechanism provided by C++:

struct Data {
  int field;
  std::mutex mutex;
};

Example code using this class:

void f(Data& data)
{
  const std::lock_guard<std::mutex> lock(data.mutex);
  data.field = 0;
}

The lock guard local variable is initialised with an acquired mutex and when the scope is left the mutex held by the guard will be released so other threads can try to acquire the mutex.

This is nice, but it means we still have to remember to use this construct when accessing the field and we also have to to document or remember which mutex protects which data. With complex data-structures that becomes hard, forgetting to acquire the mutex or using the wrong mutex will happen.

The solution is to use a helper C++ template that takes care of the locking:

struct Data {
  int field;
};
using ProtectedData = LockGuarded<Data>;

void f(ProtectedData& protectedData)
{
  auto guard = protectedData.lock();
  guard->field = 0;
}

The nice trick is here that the fields of protectedData can only be accessed via the guard obtained using the lock() method. The scoping rules of C++ guarantee that. If I would try to access data it guarded without holding the proper lock, the compiler will not like it:

x.cc:12:17: error: no member named 'field' in 'LockGuarded<Data>'
  protectedData.field = 0;
  ~~~~~~~~~~~~~ ^

Compile time errors are much nicer than runtime errors only detected with extra sanitation! The implementation of the LockGuarded template and related classes can be found in lock.hh. We use them everywhere in the PoweDNS code base.

Status

The shared record cache was implemented in version 4.4.0 of PowerDNS Recursor. Implementing sharing of various tables related to authoritative servers was started in 4.7.0 and will be finished in the upcoming 4.8.0 version. In our testing we observe very low lock contention, a more effective cache and lower memory consumption.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s