An Essential Guide to Cloud Native Security: Part 3

Part 3 of 3: Detection Engineering For Cloud Native Systems

Traditional Detection is a human exhaustion exercise

Detection in a traditional SOC is based on investigation and analyzing alerts and event logs. When you have many security devices generating logs and alerts, a typical SOC may see thousands of alerts and Terabytes of logs per day.

Because of this, a SOC analyst may have to handle hundreds or thousands of alerts a day to keep up with the volume. Analysts are overwhelmed, and yet SOCs still lag behind in terms of alert handling and threat identification.

In addition, many SOCs do not have effective metrics to measure true efficacy or success of operations. When a SOC measures alert handling rate per analyst, it is essentially running on human fumes, and the only way to scale up is adding more humans to fuel that fire. We all know where that strategy goes.

Modern detection is as much about engineering as it is about security analysis

Modern detection is less about humans analyzing event logs. Rather, it is more about a system of tools, programs, and automated workflows that enable you to rapidly identify interesting events and threats, even amidst a massive amount of data.

This is a very different treatment to the detection problem — rather than solving the problem with analysts and more analysts, you use engineering principles to develop logics and tools that not only automate detection, but also continuously update and deploy new detection and response logics.

For instance, one of the logics for a detection program is the automatic contextualization of alerts — pulling relevant information from endpoints, network logs, Active Directory, threat intelligence feeds, etc. Executing this logic will reduce the amount of time a human analyst must expend to query different devices and interfacing with different tools just to gain context for the alerts

Another example might be a rule to automatically trigger a second-factor authentication workflow or forcing a password reset if a risky event is detected.

Applying software engineering principles to threat detection

So which software engineering principles make sense for threat detection? Below is a list that maps well known engineering practices with modern detection engineering.

Unit test the creation of new detection rules: Any time a new detection rule is created, you must practice extensive unit tests. You may write scripts or simulations to directly attack the detection rules or its assumptions.

Integration test detection workflows: Detection workflows are often complex and may impact multiple teams and infrastructure. You should employ integration testing principles to ensure the quality and reliability of detection workflows. More teams should be testing detection tasks the same way you’d test a build pipeline or an automated configuration management workflow.

Tool the most frequent scenarios: Just as software developers encapsulate well understood tasks into “subroutines,” the frequently executed detection tasks/workflows can be codified into automated tools or programs to eliminate manual tasks and improve efficiency.

Ensure continuous feedback loop and continuous improvement: Your detection infrastructure needs continuous improvement. The first step to achieving that is establishing feedback loops with infrastructure components, security devices, and even with different teams and users. This feedback loop will allow you to create an assurance process to improve the quality and precision of your detection programs.

Manage repository and accountability: Not unlike any well managed software engineering organization, you should house detection rules and codified workflows in a centrally managed repository. Efficacy metrics for rules and detectors should be tracked to allow continuous improvement.

Detection engineering in the cloud native system

In an earlier post in this series, we discussed the defining characteristics of being cloud native, specifically: microservice-centric, portable, and automatically managed.

To perform detection engineering in a cloud native system, security tools must, at the very least, provide these capabilities:

  • Ability to handle cloud native infrastructure: Detection technology must be tooled to handle cloud native components like containers, serverless, and microservices. At the same time, it must work seamlessly with detection technology designed for virtual machines, physical servers, and traditional networks.
  • Effectively reduce and normalize security alerts: Because cloud native workloads can be ephemeral, alert volumes may be higher than that of a traditional system and can easily overwhelm even the most sophisticated modern detection infrastructure. Thus, the task of reducing and normalizing security alerts into meaningful signals is essential in a cloud native system.
  • A critical scaling factor: Moving from manual handling of alerts to detectors and logics renders a “scaling factor” — a bulk of the alerts covered by the detectors are either discarded or acted upon without being specifically reviewed. There is no other place the impact of this scaling factor is more pronounced than in a cloud native environment because of the sheer scale and velocity of such systems.

Modern detection engineering requires the adoption of engineering principles to security analysis. In a cloud native system, this practice becomes existentially critical — without it, security detection will be untenable.

Dr. Chenxi Wang is vice Chair, OWASP vice chair as well as founder and general partner, Rain Capital. Dr. Wang is also on Capsule8’s Advisory Board.