Introducing dnsdist: DNS, abuse- and DoS-aware query distribution for optimal performance

Over the years, PowerDNS users have frequently asked us about our preferred DNS load balancing solution, and we’ve never had a satisfying answer for that. Users of dedicated hardware often tell us that vendors spend most of their time and effort on balancing HTTP, and frequently deliver substandard or even buggy DNS functionality.

UPDATE: Dnsdist now has its own homepage on http://dnsdist.org/!

In terms of software, one big PowerDNS deployment is happy with OpenBSD relayd, and it indeed does look powerful. Other operators have deployed keepalived, which is very strong from a networking standpoint, but does not offer a lot of DNS specifics.

But all in all, we never found any load balancing solution for DNS that made people truly happy.

Simultaneously, there is now a lot of focus on providing the very best DNS performance, even in the face of the ongoing reflection attacks (one of which has been dubbed the “Chinese water torture attack“).

Putting these three things together (no really satisfying DNS aware load balancer, drive for the very best performance, ongoing attacks) led us to pollute the waters of the internet with yet another piece of software: dnsdist.

From its README:

“dnsdist is a highly DNS-, DoS- and abuse-aware loadbalancer. Its goal in life is to route traffic to the best server, delivering top performance to legitimate users while shunting or blocking abusive traffic.”

This is quite a mission statement, but we’ve tried to keep things simple. The simplest possible invocation:

# dnsdist --local 0.0.0.0:53 192.168.1.2 192.168.1.3 192.168.1.4

Delivers sensible and smart distribution of queries over three downstream servers (.2, .3 and .4). All three are polled for availability, and traffic is delivered to the server with the lowest amount of outstanding queries. If this is all you need, you should be done. However, much more power is available.

For example, the following configuration does something quite special:

newServer{address="192.168.1.2", qps=10000, order=1}
newServer{address="127.0.0.1:5300", qps=5000, order=2}
newServer{address="192.168.1.79:5300", qps=7000, order=3}
setServerPolicy(firstAvailable)

This configuration again defines three downstream servers, but it also configures the preferred QPS limit of each. Finally, it changes the default server selection policy to ‘firstAvailable’. In production, as viewed from the dnsdist console, this looks like this:

qps

(This console is available either locally or remotely.)

It has frequently been observed that a busy nameserver is a happy nameserver, since this means that all caches are hot and that more cache hits are generated per TTL. The configuration above will send traffic to the first server that has not hit its configured QPS limit. If all have exceeded their limit, it decays to the standard ‘leastOutstanding’ policy.

Built in, two further load balancing policies are available:

  • wrandom: weighted random, where traffic is distributed by the weight parameter
  • roundrobin: as the name implies

In the configuration, custom policies can be written in Lua (see below).

Traffic inspection

Now, if all traffic were bona fide, and everything worked as planned, we’d be done with the feature set above. In the real world however, we frequently suffer degraded performance from malicious or clearly illegitimate traffic. Dnsdist offers quite some features to inspect live traffic:
topqueries

This shows us the ‘top 5’ of incoming queries, showing a rather ‘flat’ profile for this server. Nothing really jumps out. Potentially however, this is because unwanted traffic consists of many unique names, so let’s see:

topqueries-2
This shows traffic grouped by the last 2 components of query names, and now ‘t-ipnet.de.’ jumps out. Let’s say we determine these queries as unwanted (which would be HIGHLY unlikely in this case), dnsdist offers three options:

  1. Block outright: addDomainBlock("t-ipnet.de.")
  2. Rate limit to (say) 10/s: addQPSLimit("t-ipnet.de.", 10)
  3. Shunt to a dedicated abuse pool:
    • newServer{address="192.168.1.30:5300", pool="abuse")
    • newPoolRule("t-ipnet.de.", "abuse")

All three options have their merits, but we’re especially excited about this last option. Operators usually face a stark choice with traffic – block it, and potentially upset lots of bona fide users, or allow and degrade performance for everyone. By dedicating a bunch of servers to dubious traffic, it is possible to isolate worrying traffic, and only have it impact the users of the impacted domain names.

Further analysis is possible based on response codes. This for example shows the top-5 servfail responses:

topresponses

Analogous inspection based on IP addresses is possible via:

topclients

And the three options outlined above for blocking, limiting and shunting traffic also accept IP addresses as parameters.

Full flexibility

While dnsdist natively comes with a broad set of primitives to route, shape and block traffic, ultimate flexibility is possible with the built-in Lua support.

For example, to block all domain names with 5 consecutive numbers in them, try:

function blockFilter(remote, qname, qtype, dh)
return string.match(qname:tostring(), "%d%d%d%d%d") ~= nil
end

Similarly, blockFilter can implement for example returning TC=1 answers for all ANY queries, or if you are so inclined, rate limit all ANY queries. A example of this can be found in the sample configuration file.

Server load balancing policies can also be written entirely in Lua, and direct traffic to specific servers, or to a specific built-in load balancing policy, but based from a different pool. This last service is great for providing split-horizon service, for example.

Practical details

dnsdist is a daemon that can partially be configured from the commandline, but otherwise has a configuration file. It can be queried at runtime via an optionally encrypted TCP/IP connection. At this console, which is readline() based and offers history search etc, every setting can be displayed and changed.

The Lua parser is the console commandline and configuration parser. In practice, most configurations can run without involving Lua for every packet. However, even in very dynamic setups where this is required, the practical overhead is minimal, especially when compiled against luajit.

Note that dnsdist is not PowerDNS specific, and will happily balance traffic between nameserver implementations or even third party services!

What dnsdist is not

dnsdist tries to be high performance, but is fundamentally limited by the fact that it is a general purpose handler of DNS packets. Practically speaking, this means that dnsdist can’t be used to scale to many hundreds of thousands of forwarded packets per second, as this tends to stress out commodity servers and operating systems. However, care has been taken to allow dnsdist to itself be hooked up to network-based load balancing solutions, for example by allowing non-local binds, and proper behavior on binding to 0.0.0.0.

Similarly, dnsdist is not an alternative to many gigabit capable DoS scrubbing devices or services.

It is however a great way to deal with “the last gigabit” of an attack, or simply smaller scale attacks that algorithmically kill your performance, instead of simply flooding your pipe. And by distributing queries wisely, your users experience better performance.

Some further cool things

  • showResponseLatency(): prints a histogram of response times
  • showServers(): show statistics for all configured downstreams
  • getServer(0):setDown(): force server 0 down administratively
  • showPoolRules(): show configured pool rules
  • showQPSLimiters(): show configured QPS limiters
  • setServerPolicy(wrandom): set weighted random server selection policy
  • setServerPolicyLua(): configure Lua-based server policies
  • .. more in the README

Current status & how to get it

dnsdist 1.0.0 was released on the 21st of April 2016 at UKNOF34. Packages, documentation, source code and news can be found on http://dnsdist.org/

 

2 comments

  1. regs

    udp is useless?

    dig @{remote_dnsdist_server} domain

    is not work. +tcp is OK:

    dig @{remote_dnsidst_server} +tcp domain

  2. Greg

    Regarding performance. As we have heard, UDP handling in most *x OSs is pretty useless. Switch vendors (the ‘C’s and the ‘J’s for example) developed distributed switching a long time ago, where decision making is done in the slow CPU and then the results of the decisions are downloaded to the interface ASICs to implement the switching in hardware at line rate. Would it be possible to run your code on a distributed switching platform (notwithstanding their proprietary nature) to take advantage of its hardware architecture for much faster performance? Also you do away with the need for a separate box in which to implement the distribution function.

    Just a thought
    Greg Choules

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s