In a recent webinar, Capsule8 Research Scientist Nick Gregory discussed some of the core building blocks of server monitoring and tracing in Linux.
To get the most from your systems, it’s more important than ever to monitor and trace potential issues and to understand the key components of a successful response system. Let’s take a closer look.
perf CLI and Subsystem in Linux
The perf CLI is the defacto toolkit for measuring performance in Linux. It installs under any of several names depending on your distro (e.g. linux-tools-common on Ubuntu and linux-perf in Debian), but the CLI itself is most commonly named perf. Basically anywhere you have Linux installed, there will be a package for it. Within perf, there are perf events, and there are a lot of them available out of the box – typically north of 2,500 available. They come in many different forms, including CPU level events that you typically won’t care too much about unless you’re aiming for absolute top-tier performance, “system-level” events, and software events. Most importantly, users can define their own events on any given function, kernel or userland. For example, the following sample command will create a new perf on the malloc function in libc:
# perf probe -x /lib/x86_64-linux-gnu/libc.so.6 malloc
If you’re running a C program you can use this to quickly inspect programs running and see how often and how much they are allocating.
Within perf, there are upwards of 30 sub-commands available, but let’s take a close look at the five that provide the most sweeping insights, starting with the most summarization and working our way up.
This sub-command provides quick statistics on the number of events that can happen in a given time span. The duration can be set to either a specific number of milliseconds or the duration of a program run within the command. Using perf stat, you can probe on:
- Any program run under perf (perf stat … — /path/to/program)
- A specific PID that’s already running (-p 1234)
- A specific CPU or CPUs (-C 0-3)
- All CPUs (-a)
If you’ve used strace before to monitor syscalls for a program, perf trace is similar. It streams events that it gets out to console, including the arguments that it captures. We’re starting to get more detailed here, going beyond raw counts and into events and data associated with those events.
This sub-command records all specified events into a file for later analysis. It can also record stack traces at each occurrence, which can be useful for figuring out the actual cause of events happening. The net result is that you can see over the duration of the recording things like what functions were the most “expensive” (where the CPU spent the most time).
With this sub-command, you can read from a perf record data file. This is where you actually interact with the data you’ve recorded, filtering by the process, by the CPU that you’re running on, and several other factors. You can run additional scripts on top of the data as well to interactively explore the data.
Finally, there is perf top, which allows for real-time event collection and display. With this sub-command, you can get a more comprehensive overview of what the entire machine is doing at a glance. This command is similar to perf trace, but is constantly refreshing and aggregating the data, and is useful if you are experiencing unexpected slowdowns or unresponsiveness.
kprobes are a run-time programmable way to inspect the kernel. You can specify a function that you want to probe and arguments to dump when the probe is hit. This can be done for any function at any time, and can also be filtered by any expression in the kernel, limiting the amount of data coming out so that you’re more effectively using your CPU time.
uprobes are much the same as kprobes but for userland. You can specify any program/library, a function within that program/library to trace, and the arguments to fetch when this probe is hit in the exact same format as kprobes.
When using kprobe or uprobe tools, you can manually write to files in /sys to create, enable and read from the probe (not recommended), you can use perf, or you can use scripts in the “perf-tools” toolkit to quickly set, read from, and remove kprobes and uprobes.
Being able to trace anywhere across the kernel and userland allows you to do a lot. The “perf-tools” toolkit includes a number of examples such as:
- execsnoop – trace exec() calls with command line arguments
- opensnoop – trace open() calls with filenames
I’ve also written (under “Using Linux Tracing for Security”) a couple other quick scripts to showcase probes, like:
- strace – Instead of using the normal command-line tool strace, you can use a kprobe under the hood. This makes the probe “invisible” to things that check if they are ptraced.
- conntrace – This allows you to show DNS requests and results of each program.
More Efficient Probes
We now have the tools needed to get data out of the kernel, but it still needs to be processed via either perf and perf scripts, or raw kprobes and shell scripts/small programs. Is there a way to do this faster? Thankfully, the answer is yes.
eBPF allows you to push your code and logic into the kernel so you can perform much faster processing, allowing you to trace hot paths without actually impacting them that much. The interception, filtering, serialization, and processing of kprobes can really add up, but if you shift to doing this inline in the kernel, you can do some interesting things without impacting performance overtly.
Of course, eBPF has some limitations. The biggest of these is that you cannot write loops. eBPF has to be verified when it’s loaded into the kernel, and one of the constraints is that it cannot run for an unbound amount of time, which loops could potentially cause. Another thing of note is that eBPF is constantly being improved. Some things that you might think are core and necessary haven’t actually been added until recently, so older kernels may not have those tools available. eBPF can also be much more complicated to set up – you’re not just writing a one-line kprobe definition, you’re writing your entire program in C or eBPF “assembly”. It allows you to do more, but there is a greater initial cost.There’s a toolkit called bpftrace that provides a good starting point when using eBPF, offering a simple CLI to get output from eBPF probes as well as a Python library to start building more complete solutions.
With the basics covered, let’s look at three sample ideas you could implement to improve monitoring of your Linux systems:
- Hook bash’s readline to track commands being run in shells. This could also send the results to a remote server for auditing.
- Send perf top data to a centralized server, and determine what code out of everything running uses the most time across your entire fleet.
- Hook in the Linux scheduler and look for task_structs not reported in proc. This could help identify malware that is attempting to hide itself.
Capsule8 uses this exact technology under the hood for our clients, ensuring a high level of performance with very low overhead and no chance of crashes since it’s not a kernel module. The core technology we utilize ensures you can start monitoring for known security-sensitive events with the flip of a switch, or you can customize to whatever degree you want, writing rules around default policies or monitoring your own events entirely with kprobes and uprobes.
Learn more about Capsule8 and how we can help protect your Linux infrastructure. Request a demo here.