Last Friday, Manfred Paul published a blog post about the vuln he used at Pwn2Own 2020, CVE-2020-8835, a local privilege escalation bug in the Linux Kernel. It affects any Linux distros using Linux kernels 5.5.0 and newer.
Why it’s cool: eBPF is the Hacker News hotness for tracing (i.e. monitoring execution of) the Linux kernel, so a vuln in it is guaranteed to gain attention. Because the bpf syscall, by design, facilitates data sharing between userspace and kernel space, exploiting this vuln means the attacker only needs to load their malicious eBPF program once to go HAM into the good night.
The underlying problem is due to bad math on the part of a runtime security control (the “ALU sanitizer”), who was supposed to ensure only memory within the appropriate boundaries could be accessed by eBPF programs, but failed to do so. Unfortunately, the security of eBPF programs largely relies on the assumption that the verifiers work as intended…
Digging deeper: You can think of eBPF programs kind of like rollercoasters. The eBPF bytecode is a design for a rollercoaster. This bytecode gets turned into assembly that runs in the kernel, like a rollercoaster being installed in a park. The JIT verifier, combined with the ALU sanitizer (a runtime check intended as a last layer of defense), makes assumptions about the safety of this eBPF bytecode, either allowing or denying it to run — similar to the role of a rollercoaster architectural review firm.
So, if you’re a theme park designer, you probably (hopefully!) don’t want people to die on your rollercoasters. So, you enlist an architectural review firm to verify (approve or deny) the safety of each of the arbitrary ride designs you create. Importantly, you, as the park designer, are relying on that firm to find proof of safety before you install any rides to ensure they are rollercoasters, not yolocoasters.
If the architectural review firm makes incorrect assumptions while proving the safety of a ride’s design, it may result in a coaster being installed that goes off the rails, yeeting its passengers off the map. This vulnerability, in essence, allows the attacker to trick the architectural review firm into making these poor assumptions — thereby allowing installation and operation of a wildly dangerous coaster that makes the system scream, “I want to get off MR BONES WILD RIDE!”
In computer nerd terms, the verifier performs static analysis on the eBPF program prior to JIT compilation and loading of the program — it attempts to form proofs on the eBPF program’s range of memory accesses to avoid the overhead of bounds checking at runtime. This is akin to the architectural review firm forming proofs of a rollercoaster design’s safety to avoid having to add super expensive safety systems post-installation.
However, incorrect assumptions underpinning the verifier’s proof, combined with the faulty math by the ALU sanitizer, results in the ability to perform out-of-bounds (OOB) reads and writes — like our rollercoaster track hurtling off into the sky. Because the eBPF program’s JIT’d assembly is running in kernel space, these OOB reads and writes allow escalation from the bpf syscall straight to ring 0 — giving the attacker full (i.e. root) access on the system.
Yes, but: It’s a local vuln, not remote, so the attacker still requires another vuln to gain an initial foothold on the system. Additionally, Docker’s default seccomp profile blocks the bpf syscall by default, so hopefully people listened to Jessie Frazelle’s advice and started following it at some point over the last four years.
This vulnerability had a fairly short life, and only affected a handful of distributions and releases, none of which are super popular to run in production: the non-LTS (Long Term Support) Ubuntu 19.10, Debian unstable, and Fedora (which uses a bleeding-edge kernel that, unsurprisingly, comes with bleeding-edge bugs).
There are patches out already for Ubuntu, Debian, and Fedora, and RHEL 5, 6, 7, and 8 aren’t affected anyway (because they didn’t backport the commit that originally introduced the issue). In fact, in RHEL, unprivileged users aren’t allowed to access the bpf syscall by default. This isn’t the case in Fedora, so they recommend disabling unprivileged access to the bpf syscall by setting the following sysctl variable:
# sysctl -w kernel.unprivileged_bpf_disabled=1
The bottom line: This weapon of math destruction is available to motivated attackers only, due to the initial foothold required to take over the system via an eBPF program. While the impact footprint is somewhat limited, given the nascency of the feature being exploited, we recommend patching promptly. And it certainly wouldn’t hurt to run the default seccomp profile in your Docker containers, too (remember – seccomp is off by default in Kubernetes).
For Capsule8 customers, we detect when BPF programs are loaded and executed, so you’re already covered.
The Capsule8 Labs team conducts offensive and defensive research to understand the threat landscape for modern infrastructure and to continuously improve Capsule8’s attack coverage.