This is the sixth part of a series of blog posts we are publishing, mostly around recent developments with respect to PowerDNS Recursor:
- Refreshing Of Almost Expired Records: Keeping The Cache Hot,
- Probing DoT Support of Authoritative Servers: Just Try It,
- Sharing data between threads in PowerDNS Recursor,
- Structured Logging in PowerDNS Recursor and
- ZONEMD, the missing validation.
This post is about how the Recursor can write to files even when its permissions to access the file system are restricted.
When PowerDNS Recursor is running it mostly does not need to access files. In many runtime environments its access to the file system is restricted to limit the impact of potential security issues. When reconfiguring the Recursor, we need to make sure the files it needs to read are accessible in this restricted runtime environment. But in some cases, we also want to be able to write files.
A common case for writing to a file is extracting status information from the Recursor, for example, a dump of the Recursor caches. In older releases, the Recursor would need to have write access to a directory where it could write those dumps. This is not ideal, as this restricted runtime environment might not allow for opening files with write permission and even if it allows writing, we would rather run the Recursor without that ability.
Depending on the runtime environment, the place where files are written may be surprising. For example, using chroot
or systemd
‘s RuntimeDirectory
and PrivateTmp
changes the view of the file system from the Recursor’s perspective. As questions asked on IRC and other help channels reveal, this can be confusing and hard to diagnose for a system administrator.
From a general security perspective it is good to restrict a networking daemon to not be able to create and write to files, so how can we solve the problem of where to write the cache dumps?
Passing information between processes
One way would be to dump to information via a socket to a client process. That would involve the Recursor writing to a socket, the client process then reads from the socket and writes the information to a file or pipe.
We chose another approach: file descriptor passing. The main advantage is that this allows the information to be written immediately to a file or pipe without an intermediate networking step transferring the data between the Recursor and a client process. This simplifies the code and avoids work for both the client and the Recursor.
File descriptor passing
File descriptor passing is a technique that can be used between two processes communicating over a local (also know as UNIX) domain socket. The Recursor’s command client program rec_control
already uses a local socket to communicate with the Recursor.
When a file, pipe or socket is opened it is assigned a file descriptor by the kernel. The file descriptor is an integer and its scope is per-process. The kernel keeps track of open file descriptors per process and has a mechanism to translate the per process file descriptor to an actual kernel object, which represents the actual file, pipe or socket. File descriptor passing transfers a file descriptor from one process to another while keeping the kernel file object reference the same, only the per-process kernel data are changing: the sending process loses a file descriptor slot while the receiving one gains one.
The effect is that one process can open a file (or other object) for writing and then transfer the file descriptor–which is just a reference–to another process. The second process can then write to the file (or other object) without having the permissions to create or open files for writing itself.
The code to do the actual file descriptor passing is a bit arcane, but well documented. See for example https://www.man7.org/linux/man-pages/man3/cmsg.3.html or http://man.openbsd.org/man3/CMSG_DATA.3.
Consequences for system administrators
The rec_control
command was modified to use file descriptor passing with the release of Recursor 4.4.0. From the system administrator’s perspective, the command still looks like this:
rec_control dump-cache cache_dump.txt
The difference is that the cache_dump.txt
file will now be created using the credentials and current working directory of the rec_control
process and not the credentials and current working directory of the Recursor process. The rec_control
command will create the file, pass the file descriptor to the Recursor and wait until the Recursor signals it is done writing. This part of the communication between rec_control
and the Recursor did not change.
It now is also possible to let the Recursor write to standard output of the rec_control
command:
rec_control dump-cache - | grep some_pattern
Other rec_control
subcommands writing files were converted in a similar way.
Conclusion
By using file descriptor passing, we are able to run the Recursor process with more restricted privileges and provide a flexible way to write diagnostic information.