The Linux Auditing System is a native feature to the Linux kernel that collects certain types of system activity to facilitate incident investigation. In this post, we will cover what it is as well as how people deploy and manage it. We will also discuss its strengths — namely it being offered for the delicious price of “free” — and its weaknesses, including excessive overhead, lack of granularity, missing container support, onerous output, and bugginess.
Our goal is to present a neutral overview of the Linux Auditing System so anyone considering implementing it in their own organization knows what to consider before embarking on their quest and what challenges may lurk ahead.
You are more likely to have heard the term “AuditD” before, rather than the rather prolix “Linux Auditing System.” So, where does AuditD fit in? The audit system’s components include kernel code to hook syscalls, plus a userland daemon that logs syscall events. This userland daemon component is auditd, and because it is the default admin interface to the kernel subsystem, people use “auditd” as a colloquial reference to the whole audit system.
The Linux Auditing subsystem is capable of monitoring three distinct things:
- System calls: See which system calls were called, along with contextual information like the arguments passed to it, user information, and more.
- File access: This is an alternative way to monitor file access activity, rather than directly monitoring the
opensystem call and related calls.
- Select, pre-configured auditable events within the kernel: Red Hat maintains a list of these types of events.
Using these categories of events, you can audit activity like authentications, failed cryptographic operations, abnormal terminations, program execution, and SELinux modifications. When audit rules are triggered, the Linux Audit System outputs a record with a variety of fields. We will walk through an example output below, but see the full list for all possible fields.
type=1300 msg=audit(01/01/2020 13:37:89.567:444): arch=c000003e syscall=323 success=yes exit=42 a0=feae74 a1=80000 a2=1b6 a3=9434e28 items=1 ppid=1337 pid=1812 auid=300 uid=500 gid=500 euid=0 suid=0 fsuid=0 egid=500 sgid=500 fsgid=500 tty=pts2 ses=2 comm=”sudo” exe=”/usr/bin/sudo” subj=unconfined_u:unconfined_r:unconfined_t:20-s0:c0.c1023 key=”test_audit”
This audit record tells us that a user with auid
300 used an
ssh terminal to use the
sudo command to invoke the
userfaultfd syscall as
root, on January 1, 2020 at 13:37 GMT. You can see how this level of detail can be useful for inspecting system behavior during incident response and building a timeline of activity.
While useful for post-hoc investigation, the Linux Audit System is not intended to provide protection. It can help you figure out what’s going on, but if an attacker exploits your system, the audit system will be more like a small dog yapping at the mailman, birds, passing cars, an ice cream truck, your neighbor taking out the trash, and oh, yes, also the burglars as they leave your house.
Next, we will go through how to operationalize the Linux Auditing System by creating rules and viewing logs.
It is highly likely you will need to create your own rules when implementing the Linux Auditing System. The audit system only enables limited logging by default, focused on security-related commands like logins, logouts, sudo usage, and SELinux-related messages. However, there are pre-configured rule files in the audit package based on four certification standards, which you can always copy into your audit rules file if they meet your needs:
- Controlled Access Protection Profile (CAPP):
- Labeled Security Protection Profile (LSPP):
- National Industrial Security Program Operating Manual (NISPOM):
- Security Technical Implementations Guide (STIG):
Unfortunately, even if you use these pre-configured rules, you need quite a bit of thinky thinky to properly set up AuditD. When considering adding audit rules to a specific system, first ask yourself what you need to audit. Do you need to audit when people delete files? Which directories are most important to audit? What kind of files are most sensitive to your operations?
As you think through these questions, remember that there are really two types of audit rules you can write — file system and system call rules. Any other system activity, such as specific scripts executed, userland events, and internal kernel behaviors that can be triggered independently of syscalls (e.g. disabling SELinux) are out of scope for the Linux Auditing System. Additionally, there are control rules that are used to configure the audit system (e.g. setting event rate limits, failure modes, etc.). We will not go into how to write rules here, but we recommend Digital Ocean’s guide as a place to start.
Another consideration during rule creation is that audit rules work on a “first match wins” basis, meaning once audited activity matches a rule, none of the rules on the rest of the list will be evaluated. Thus, you must think carefully about the order in which you write your rules. By default, new rules are added to the bottom of the list (so will be evaluated last). For those of you lamentably experienced in the pleasures of writing firewall rules, this old-school drawback of rules not being recursive will be bitterly familiar.
Once you’ve figured out what rules to write, you will be writing them in the
audit.rules will be loaded by the audit daemon whenever it’s started. Of course, you’ll probably want to edit these rules at some point. Using the auditctl utility will only make temporary changes, so you must modify the actual audit.rules file whenever you want to make a permanent change. For companies using immutable infrastructure and CI/CD pipelines, all you need to do is update the audit rules in your base image.
However, companies without the blessing of immutable infra will face a trickier time editing audit rules across a fleet. Popular configuration management tools can help. Chef offers a cookbook for installing auditd and the pre-configured rulesets we mentioned earlier, with the option to add your own rules. For Puppet, the U.K. Government Digital Service open sourced its puppet-auditd project, which handles installation of the audit daemon and manages audit configuration and rules. OpenStack’s ansible-hardening project includes auditd configuration as part of automating implementation of of STIG’s security hardening guide.
As you can see, adopting immutable infrastructure and pushing out audit rules through your CI/CD pipelines saves quite a bit of a hassle — leaving you more time to think carefully about the audit rules you create. The next hurdle is viewing your audit logs.
After you create your audit rules and deploy them on your Linux systems, you probably want to see the logs generated by a triggered rule (I cannot imagine anyone creates audit rules for funsies, but you do you). The audit system’s logs are stored by default in
/var/log/audit/audit.log, which can be viewed like any text file except for being so dense that you may want to blind yourself. Thus, most people query the audit logs.
Native utilities for querying audit records include
Ausearch allows you to search your audit log files for specific criteria, such as command name, event identifier, hostname, group name, key identifier, message, syscall, and more.
Aureport creates summary reports from the audit log files.
Aureport shows less information per event than
ausearch but presents it in a more readable, tabular display.
The default deployment of the audit system is to query local files. The
audisp-remote plugin can be used to send logs to a remote server rather than query them locally. It does so by taking audit events and writing them to syslog, which can then be sent on to a logging server. The data format is anchored around key=value, so it will take additional work or tooling to output your audit logs into friendlier formats like JSON or XML.
If you want centralized analysis, you must export your logs to a log management / SIEM tool (e.g. rsyslog, Splunk, ELK, Nagios, etc.) or data pipeline / warehousing tools (Apache Kafka, Google BigQuery, AWS Athena, etc.). You will likely need to employ
cron jobs for scripts that periodically push the local audit logs to your central log hub. For instance, a fancier setup could have each host with audit logs directly write to a Kafka, PubSub, or Kinesis stream.
This is why it is essential that, before you invest in building out a data pipeline leveraging the Linux Auditing System, you determine:
- What activity you need to log on each host
- Where the logs will go
- How the logs will be stored and consumed
The Linux Auditing System’s primary value proposition is in facilitating investigations, especially historical investigations in response to an incident. As outlined earlier, both file and system calls can be audited, including failed logins, authentications, failed syscalls, abnormal terminations, executed programs, and more. Most regular system activity falls within that purview, so your problem is certainly not going to be too low a volume of data.
The Linux Auditing System also does not cost money (not including time), and as revelatory as this may be, it turns out that people like getting things for free.
There are quite a few weaknesses of AuditD, without which maybe Linux security could be considered “solved” and we could all go on vacation. The primary weaknesses we will cover here include excessive overhead, lack of granularity, missing container support, onerous output, and bugginess.
The additional overhead by enabling the Linux Auditing System is generally formidable. One of our customers found an 80% increase in userspace syscall overhead. This means that system performance is only slightly over half of what it would be with auditing disabled, as measured by how many syscalls the system can perform.
Reasonably, we would not expect that real workloads would incur such extreme overhead, but overhead statistics for the Linux Auditing System are unfortunately seldom made public. With that said, one source from 2015 cited that a system capable of performing 200,000 open/close syscalls per second can only perform 3,000 open/close syscalls with auditing enabled (1.5% of normal performance). Another individual reported overhead increases between 62% to 245%. Until proper statistics are published, the evidence suggests that Linux auditing performance concerns are not to be shrugged aside.
The file-related audit rules are particularly prone to generating performance degradation. Because everything on Linux is a file, implementing rules for file creation, deletion, ownership, or permission changes can slow down filesystem operations. Setting
audit.conf can improve performance by flushing records to disk (i.e. transferring the logs from the buffer to the hard disk) at a given interval (specified by
freq), but not all at once.
We have seen some companies deal with the overhead problem by auditing very little data — only auditing select system calls and applying heavy filtering — so that the overhead is much lower. However, that approach discards security-relevant data that could be useful for investigation, which defeats the purpose of using auditing. Another attempted option is growing the size of the internal buffer used by the Linux Auditing System. Unfortunately, the more CPU the auditing system consumes, the more likely it is to exacerbate performance problems since it cannot load shed (i.e. it is unable to drop data when resource demand is above system capacity).
Lack of Detection Granularity
The Linux Auditing System can audit quite a lot of system activity, but it lacks depth. Certain types of unwanted activity cannot be fully captured by the Linux Auditing System. In particular, auditing file access or modification can prove challenging, as any relative paths or symlinks will be resolved only after the audit event. Thus, you would need to collect all file related actions (like opens, forks, etc.) and treat it as one of those ten-thousand-piece-puzzles during investigation, meticulously and tediously piecing together the history of activity involving the file from countless logs.
execveat is one example of this impediment. It runs programs based on file descriptors, which means your audit log will show that
execveat was invoked, but not what precisely was executed by it. Another syscall which falls prey to this limitation in visibility is
fchmod, which changes file permissions. Auditing
fchmod will generate a log entry that refers to file descriptor and the new mode (permissions), but the entry will not include the target file path — which adds time and effort to investigative work.
Additionally, organizations may want to detect when executables are created. This activity consists of multiple operations, including file creation, many writes to disk, and potentially a chmod (changing access permissions). Each of these events must be stitched back together once collected in order to glean the appropriate context around the activity. This can prove especially difficult without engendering unmanageable overhead — because collecting a record of each write event on a system might generate a black hole that consumes all data within its event horizon.
Being able to see the full context around activity supports incident response by helping you determine the impact of unwanted activity across your systems. Moreover, seeing unwanted events from all possible angles also provides invaluable feedback to refine your detection coverage, ensuring you can detect similar activity on your systems going forward.
Missing Container Support
The Linux Auditing System does not support containerized systems, as Richard Guy Briggs (one of audit’s core maintainers) outlined in a presentation last year. The primary barrier is the audit system’s namespace ID tracking being “complex and incomplete” — events are associated with different IDs for each namespaced subsystem (of which there are many in each container… and they are dynamic). Until container orchestrators can apply inheritable, immutable tags to processes that can be included in audit logs — which is still firmly in the concept phase — the inability to correctly track containers to events will persist.
The other main barrier is the Linux Auditing System’s current restriction of only permitting one one audit daemon running at a time. This means that audit rules from that auditing system will apply to all of the containers running on the host, making rule customization impossible. Your options to cope with this are not ideal. You can create as bland of rules as possible that could apply to all containers, but that would not accomplish much. If you are made out of money or in a position to blackmail Splunk, you can instead create a cacophony of rules and then perform analytics on the logs once collected.
Thus, if you operate in a containerized environment, you will not be able to accurately audit events, nor customize audit rules per container, if you use the Linux Auditing System. For organizations using microservices, this can present a problem, as you now require an additional monitoring system if you want to collect container-based activity.
“At the moment, auditd can be used inside a container only for aggregating logs from other systems. It cannot be used to get events relevant to the container or the host OS. Container support is still under development.”
~ Red Hat
July 19, 2018
As you perhaps gathered from the “Viewing Logs” section earlier, the Linux Auditing System’s output is far from user-friendly. The data format is anchored around key=value, and audit logs cannot be output in JSON or XML without extra effort. Events can arrive out of order and take up multiple lines. It is basically a hot mess. Therefore, if you want a nicer output, you will need to implement additional tools for cleaning and conversion — and more tools if you want to store and query audit logs in a centralized place.
“Treating the kernel part as a reliable black box seems unwise to me.”
~ Andy Lutomirski
A core maintainer for syscall infrastructure on Linux lowkey hates AuditD and recommends staying away from it due to unreliability, saying, “the syscall auditing infrastructure is awful.” Beyond the issues he cites, there is the palpable issue of no one tasked with stress testing the Linux Auditing System, which is perhaps why the considerable performance issues linger.
Until someone helps fix it upstream, Lutomirski recommends not using it in production and instead using syscall tracing infrastructure. That guidance may seem a bit extreme, but when a Linux kernel maintainer proclaims that “the code is a giant mess,” it is probably worth contemplating.
Obviously, we think Capsule8 is the best alternative for detecting unwanted system activity in Linux and goes way beyond the Linux Auditing System’s capabilities. Nevertheless, here are some tools that are considered direct replacements for the userland agent (auditd) — but keep in mind that you can only run one userland audit subsystem at a time, so migrating between agents will likely prove challenging:
- Auditbeat by Elastic collects events from the Linux Auditing System and centralizes them by shipping these logs to Elasticsearch. You can then use Kibana to analyze the data. Note that this still requires quite a bit of configuration and still relies on Linux’s audit events, but absolutely removes the “output unfriendliness” problem.
- Slack’s go-audit completely replaces auditd, designed in a way to ideally reduce the overhead, unfriendly output, and bugginess problems. It outputs JSON and can write to a variety of outputs (including syslog, local file, graylog2, or stdout). Like Auditbeat, go-audit is built specifically with centralized logging in mind.
- OSquery is another option to replace auditd — using the
osquerydmonitoring daemon — as shown in Palantir’s post on auditing with osquery.
- Mozilla’s audit-go was another alternative, but given it was written as a plugin for Heka, which is now deprecated, it is probably not wise to invest in implementing it as an auditd replacement.
It is quite awesome that Linux offers a native way to collect some system events that can be queried for post-hoc investigation. Unfortunately, the Linux Auditing System presents performance, UX, and systems support challenges that are potentially intolerable for many enterprises. Each organization is different, so what may be an implementation and maintenance nightmare for you may not be for another company. However, it is important that organizations understand the strengths and weaknesses of the Linux Auditing System before investing in it and embarking on a potentially perilous quest.
- Hacking Failure to Build Resilience
- Resilience as a Core Component of Successful Security
- Software and Skydiving: Meet Matt Messier!
- The Scope and Impact of President Biden’s Executive Order on Cybersecurity
- Achieving Compliance in a Cloud-Native World
Kelly Shortridge is currently VP of Product Strategy at Capsule8. In her spare time, she researches applications of behavioral economics to information security, on which she’s spoken at conferences internationally, including Black Hat, AusCERT, Hacktivity, Troopers, and ZeroNights. Most recently, Kelly was the Product Manager for Analytics at SecurityScorecard. Previously, Kelly was the Product Manager for cross-platform detection capabilities at BAE Systems Applied Intelligence as well as co-founder and COO of IperLane, which was acquired. Prior to IperLane, Kelly was an investment banking analyst at Teneo Capital covering the data security and analytics sectors.