Probing DoT Support of Authoritative Servers: Just Try It
This is the second part of a series of blog posts we are publishing, mostly around recent developments with respect to PowerDNS Recursor. The first blog post was Refreshing Of Almost Expired Records: Keeping The Cache Hot.
In PowerDNS Recursor 4.6.0 we introduced DNS over TLS (DoT) support for outgoing connections. Starting with that version, DoT is used by default for (forwarded) connections to port 853 and for a configurable list of authoritative servers. On the client-resolver side there has been developments to make DoT discovery easy. In this post we will discuss the DoT discovery from the resolver to authoritative server perspective. PowerDNS Recursor 4.7.0 has an experimental feature implementing this.
These days many DNS clients have the ability to switch to a DoT (or DNS over HTTPS, DoH) connection under certain circumstances, providing better security for the client-resolver channel. While DNSSEC provides authenticity proof of data received from an authoritative server, it does not provide confidentiality and in general it does not protect the channel between the resolver and the client: most clients do not support DNSSEC validation. DoT (and DoH) fill parts of this gap. It is less well known that DoT can also be used to secure the channels between a resolver and the authoritative servers. Common to both applications of DoT is the question when to use DoT instead of regular DNS queries over UDP or TCP to port 53, commonly called Do53.
Client-Resolver DoT Discovery
In the case of client-resolver communication there is a (draft) standard that specifies how a client can discover if a resolver supports DoT: Discovery of Designated Resolvers (DDR). In short, using DNS queries with name
_dns.example.net (or _dns.resolver.arpa if it does not know the name) and query type SVCB, a client can learn if a resolver supports DoT. The draft also contains rules to make sure a validated DoT connection can be set up if needed. All this allows a secure upgrade to DoT without user or administrator intervention. The major public resolvers all support DDR and if you run a resolver yourself, it is not difficult to set it up to return the proper record(s) for DDR queries.
Resolver-Authoritative DoT Discovery
Sadly, the case for resolver to authoritative server traffic is not as simple. At the moment of writing, the committee concerned with these things was not able to come up with anything better than Unilateral Opportunistic Deployment of Recursive-to-Authoritative DNS, which in essence boils down to: let the resolver find out if DoT is available by trying a DoT Connection to the IP address of the authoritative server and remembering the results for a while. At this moment most authoritative servers do not support DoT. A DoT connection is likely to time out, using resources during the attempt. There are also no guidelines on certificate checking of the DoT connection, the draft says any certificate must be accepted. Even if this is not very satisfying we set out to implement this mechanism as an experimental feature in PowerDNS Recursor 4.7.0. We hope in the future a better mechanism to discover DoT support of authoritative servers will be drafted. If so, we certainly intend to implement this better method.
Implementing DoT of authoritative server discovery
PowerDNS Recursor maintains several in-memory tables with characteristics of authoritative servers. An example of this information is the speed of each authoritative server contacted. The speed information is used to select the authoritative expected to be fastest when a query to an authoritative server of a domain needs to be made. Following the guidelines in the draft mentioned above, we introduced a new table to record DoT probe activity and results.
When an authoritative server needs to be contacted, this table is consulted to see if we already know if this server supports DoT. There are several cases:
- If the server IP is not in the table because we did not contact it recently, we create an entry in the table and mark the DoT status Unknown and do a Do53 query
- If the server is known to not support DoT, we do a Do53 query.
- If the server is known to support DoT, we do a DoT query.
- If we find the status is
Unknown, we do a regular Do53 query and we schedule an asynchronous task to probe for DoT support. The table entry is marked Busy in this case.
if a task is submitted, a separate task mechanism in the recursor picks up the task and tries a DoT connection to the associated IP. If that works and a DNS query is also successful over the newly established DoT connection, the IP is marked as DoT capable (Good). If the DoT connections times out or the query is not successful, the IP is marked as Bad.
While the DoT status is not yet determined, we use regular DNS queries to contact a specific authoritative server, since we do not want to burden clients with the long (a couple of seconds) time outs involved in unsuccessful DoT probes.
The housekeeping of the table involves a bit more work, so that old information is expired or refreshed properly and to make sure that we do not overwhelm ourselves or authoritative servers with a lot of simultaneous DoT probes. The draft has some guidelines for doing this properly. There are also more complex issues: what do we do if a DoT query fails to a server we marked as DoT capable? Do we mark the IP as not DoT capable or not? It is likely not good to downgrade a server that has been serving DoT for many queries on the failure of a single query. These open questions and the hope that a draft that provides a better method will appear are the reasons this feature is marked as experimental.
The draft mentioned above also recommends the information learned to be persistent, the resolver should store the information while running and re-read the information on start-up. We do not implement that part yet.
The mechanism we implemented decouples the use of DoT from the discovery process. This means that in the future, alternative discovery mechanisms (specifying how) or policies (specifying when and in what order) can be implemented without interfering with other parts of the recursor. At the moment the probing process is lazy: tasks will be executed when the recursor is processing queries. An idle recursor will execute tasks only sporadically. Because of this, probe tasks executed will have no immediate relation to the queries executed on behalf of clients. This will be improved upon in the near future.
Trying it yourself
Using PowerDNS Recursor 4.7.0, you can try this yourself by setting max-busy-dot-probes to a non-zero value. This configuration governs the maximum number of simultaneous DoT probes. After the recursor has been busy for a while, you can look at the status of the DoT probes by using
rec_control dump-dot-probe-map -
To see a list of know authoritative servers and their DoT support status. A few example lines are shown below
126.96.36.199 akam.net. 1 Bad 2022-05-26T14:44:10 188.8.131.52 akam.net. 1 Busy 2022-05-26T14:44:21 184.108.40.206 akam.net. 1 Bad 2022-05-26T14:43:59 220.127.116.11 dscx.akamaiedge.net. 0 Unknown 2022-05-26T14:40:46 18.104.22.168 powerdns.com. 3 Good 2022-05-28T15:21:06 22.214.171.124 dscx.akamaiedge.net. 0 Unknown 2022-05-26T14:40:26 126.96.36.199 akagtm.org. 1 Bad 2022-05-26T14:41:16
A server marked Unknown is not probed yet, only servers that are visited at least twice within an interval get probed. The status can also be
Bad (no DoT support),
Good (DoT support) or
Busy (DoT probe scheduled or running).
While we’ve implemented the DoT discovery mechanism as an experimental feature in PowerDNS Recursor 4.7.0, we consider it a half-measure, a waypoint if you will. We hope a future draft will provide more guidance on the how and when of the discovery process of DoT support between resolver and authoritative servers, which should lead to better security for DNS users.
Thanks for this post.
Just enabled it on my home dns resolver and was surprised to see that facebook and wikipedia do support DoT.
Great feature, I hope it is endorsed by other resolvers as well.
Is there a way to reduce the amount of logging this feature produces?
As this is an experimental feature it is quite verbose, logging at Warning (4) level. At the moment, you can only skip the DoT probing message by setting the loglevel to “Error” (3), potentially missing other relevant messages. Once this feature matures, we will change the loglevel to “Info” (6) to spam the log less.