Archive for the ‘Next Generation Security’ Category

An Introduction to Container Escapes

Posted by

“‘ESS-ca-pay’… that’s funny, it’s spelled just like the word ‘escape’!”
-A famous fish with ephemeral memory

Containers are more popular than well-understood. Container escapes are even less understood. This post is intended to solve the latter issue and demystify the heretofore arcane art of container escapology, even for people who feel confused by containers or uneducated about exploitation.  

What are containers and why are containers?

An initial stab at defining containers led to some interesting hair-splitting discussions around specific words like “virtualization” and “emulation”. We won’t dwell on those here, so here’s the long-and-short of what you need to know about containers.

The big picture:

  • They’re bundles of code dedicated to a specific task within an application
  • They are logically isolated and communicate with each other through well-defined channels like APIs
  • They share a kernel with their host (unlike VMs)

Why containers matter to businesses:

  • Scalability: more efficient to scale applications based on demand
  • Portability: they allow devs to test and deploy from any environment
  • Speed: because containers need fewer computing resources, they can be spun up and shut down quicker and more efficiently than VMs (this quick spin-up can also mean less prolonged downtime, and therefore less loss of revenue)

The security potential of containers (spoiler alert: these are the assumptions that get broken in container escapes):

  • Isolation: Containers are designed to be logically isolated from other services on the same host. Assuming isolation holds true, it provides some security by default (similar to sandboxing) that is seen as an improvement over retrofitting access control post-deployment.
  • Immutability: Once containers have been deployed, they’re not meant to be accessed or modified. This is useful defensively because attackers want to gain access to systems and make changes to get the information they’re after, but can’t because container architecture aims to eliminate that possibility.

The reality: “More secure” != “100% secure,” and isolation boundaries aren’t guaranteed.

What are container escapes and why do they matter?

Anytime a system is designed for isolation, the first question from any red-teamer or attacker is, “how do I de-isolate myself from this?” After all, why would you hang out in this ephemeral and isolated space when you could frolic around the much more persistent and access-rich host OS? 

The big picture: Container escapes are the attackers’ bus ticket out of Isolation Town.

  • Attackers use system weaknesses to break an isolation boundary, like finding a hole in a fence
  • For example, getting access to one container, then escaping into the host, grants the attacker access to all of the containers on that host

Why it matters: Even if the place you start doesn’t have sensitive data, the place you end up might. 

  • For example, a container running a service with penguin GIFs may share the same host as a container that handles sensitive information like social security numbers (SSNs) or passwords
  • In isolation, it may not matter if the penguin GIF container is insecure, but if it lets an attacker access the host, even strong security on the SSN-handling container won’t matter

The counter: Container escapes are a multi-step process, presenting the opportunity to detect them at each step.

The lifecycle of a container escape

Container escape techniques involve many stages, and the particulars can vary. It’s not a simple matter of an attacker simply grabbing the nearest container and Leeroy Jenkins-ing their way into its host. They will take steps such as:

  • Research the target looking for weaknesses as points of entry
  • Execute commands remotely to gain a foothold into the target network
  • Gather internal system information to determine next steps
  • Exploit a vulnerability in the system to gain greater permissions and defeat the isolation boundary
  • Install a method of maintaining persistent access to the network

The reality: Basically, a lot of asking “where am I?” and a lot of Googling.

The following scenario is one type of container escape among many possible escape routes.


An attacker discovers that a webpage’s form field allows them to run commands on the system hosting the site.

The problem: This is not supposed to be doable.

  • Even if the web app doesn’t handle particularly juicy information, an attacker can use it to gain access to the broader production environment and find more sensitive data, like passwords or customer contact information
  • Side note, this is also an incredibly common class of vulnerability

Execution and initial access

The attacker executes commands on the vulnerable website to gain remote access to the back-end of the application, hoping to access the underlying OS or network capabilities.

What’s happening: This is called an interactive shell.

  • The shell allows the attacker to tell the compromised computer what to do from the comfort of their own home
  • Remember, containers shouldn’t have anyone accessing them remotely once they’ve been deployed

Initial discovery

With a connection established, the attacker tries to gather more information about the system they ended up on in order to determine their next steps.

The analogy: If you’ve ever done an escape room, what’s the first thing you do when you gain access to a new area? You take stock of your environment, trying to find bits of information that will help you identify the next actions to take in your escape attempt. 

  • Attackers need to go through those same motions: they may have opened a connection, but they won’t always know exactly where they landed.

The process: The attacker wants to find answers to: what kind of system am I on, what level of access do I have, and what weaknesses can I find?

  • They figure out that they’re in a container, which is designed to be isolated from other systems, so they need to find a way to break that isolation
  • Next, they look for weaknesses in the system, and ways to use them to escape from the container and move around the network 
  • A Google search tells them that the host OS’s kernel version has publicly-announced vulnerabilities that attackers can exploit (using a tool like Metasploit) to escape a container


This is the moment the attacker has been waiting for: the great escape.

Behind the scenes: When vulns are publicly announced, attackers start building bundles of code to try and exploit them

  • Even as a non-privileged user, an attacker can download and execute an exploit to
    • give them super-user privileges
    • disable built-in security mechanisms
    • get themself out of the container and into the host system

PSA: This is why applying updates quickly is so important. 

Discovery on the host

Now an attacker will do some more discovery on the host to see where else they can pivot, and how they can exfiltrate data and maintain persistence.

  • They will also look for sensitive data in other containers or parts of the host


Depending on the host’s version and vulnerabilities, the attacker has a few ways that they can try to maintain persistent access.


The bottom line: Preserving isolation and immutability is essential to realizing the full benefits of containers. 

  • These assumptions aren’t guaranteed, and attackers are clever
  • Detecting multiple stages of container escape attacks allows users to take swift remediation steps (like restarting a container) and contain an attack’s impact more efficiently

Go deeper

Top 3 Security Problems Caused by Rogue Developers

Posted by

The Road to Fail is Paved with Good Intentions

Security vulnerabilities are often a top concern for security teams.  But when it comes to defending production systems, it’s not about bugs. There are a number of seemingly innocent developer behaviors that can wreak as much, if not more, havoc — or even worse, take an entire system down. These developers aren’t malicious, and they don’t intend to bork entire production environments. Developers are human, and it’s understandable that it can be tempting to find shortcuts or skip steps. 

Most apps aren’t just a single unit, but instead involve coordination of different components and services to execute the desired functionality. . As a result, application development can also use a lot of different libraries and APIs. Frontends affect backends and backends affect frontends. The growth of cloud services means that more apps than ever before are being developed under DevOps principles, which can encourage the implementation of a lot of containerization. Everyone is just trying to do their jobs. 

Unfortunately, the consequences of rogue or unwanted developer behavior can be disastrous inside of a production environment. If there aren’t policies that describe what is acceptable, developers are likely to perform all of these behaviours at some point, and still may even with a policy forbidding it. That’s why monitoring for and recovering from unwanted activity is so important. Here are three common behaviors by rogue developers that you need to keep an eye on:

Developers debugging in production

Remote debugging features make debugging in production really tempting. And it’s easy to assume that debugging as soon as possible would prevent future headaches. Unfortunately, debugging in production can create major availability and performance issues, an absolute no-go for production. 

Grzegorz Mirek explains one reason why it’s not a good idea, “Most of our business applications handle many requests per second. There is no easy way to control breakpoints firing everywhere when your application is being remotely debugged. As you can imagine, we don’t want to block all of our users from using our application when we decided to debug it. More often than not, we also can’t just force our application to reproduce the bug which happened yesterday; sometimes the only way to do it is to wait until it happens again to one of our users. Thus, keeping a remote debug session in production without a strict control of how breakpoints fire is like putting landmines in the forest and inviting our users to run through it.”

Inviting users to run through landmines in exchange for developers finding bugs more easily is a costly tradeoff. Compounding this is the fact that debugging in production can also provide detailed information on both the system running the application and its users that can be used for future attacks. 

Deploying tracing, monitoring, and performance analysis tools for production systems offers a less destructive alternative to debugging. At the very least, , applications should only be debugged in test or staging environments. For software engineers, it can be tempting to debug an application that’s in production, but it is too dangerous not to avoid at all costs.

Surprise deployments or deployment before review

In an organization leveraging DevOps practices or any agile development environment, it can be tempting to deploy code before it passes required security reviews if software engineering teams are trying to move quickly or avoid the dreaded bottleneck of security. But as we’ve seen with countless surprise party entrances gone wrong, not everyone loves surprises — and it’s an especially bad idea when it comes to delivering software in production environments.  

A common developer behaviour is to “roll forward” when some planned deployment fails in prod. This is often due to a minor mistake that is straightforward to fix, but wasn’t caught in staging or pre-production. Teams are often under pressure to ship features on time, the 90-90 rule suggests they’re likely to already be late, and it’s incredibly tempting to make the tiny hotfix and redeploy instead of issuing a rollback.

Plenty of mistakes can be made when using APIs or libraries. Inefficient memory management can adversely impact speed or worse, lead to application crashes and costly downtime. Simple, yet disastrous, syntax errors can be introduced. This is why code review to ensure safety both performance- and security-wise should be part of CI/CD pipelines,  and surprise deployments are a no-go in production. 

Some of the most disastrous surprise deployments are not of code, but of configuration. Bucket not accessible? Make it public! Database connectivity error? Open all the ports! Aside from security issues, manual configuration changes are responsible for a surprising amount of downtime events, both at the time of manual configuration and as a landmine for later automated deployments.

But security teams, take note: implementing heavy change approval processes in the form of onerous, opaque security reviews hinders the acceleration of software delivery that DevOps practices can bring. When the real policy is too onerous, a shadow policy of accepted behaviour emerges. If you aren’t working with software delivery workflows, don’t be surprised when your developers surprise you with deployments.

Downloading and mishandling sensitive data

Sensitive data can be anything from authentication credentials to credit card numbers, to private keys and machine identities, to all manner of PII. If you mishandle sensitive data, you could be exposing it to attackers — or causing compliance violations for your organization. Practicing the principle of least privilege in application design can help reduce the impact of engineers improperly accessing sensitive data. No software process, machine, application, or human user should have access to any sensitive data that they don’t absolutely need.

Debugging production services, beyond the dangers discussed above, can also lead to mishandling of sensitive data. To analyze why a service is misbehaving, software engineers will put a service into debug mode — which can often result in personal information, passwords, or other sensitive data being written to application logs. In a similar vein, common application security problems can jeopardize data security in production, like putting information in error messages that can inform application compromise or passing secrets in plaintext via URLs in APIs that can facilitate account takeover. 

Data analysis involving production data can also pose serious concerns. If appropriate data pipelines aren’t in place, data analysts will open up a tunnel from production to another environment so they can perform analysis on business data. This often occurs in response to ad hoc queries from senior management, but predictably ends with someone mishandling data — or worse, accidentally modifying production data and causing outages.

Bad habits don’t mean bad developers. There are malicious developers with their own set of behaviors you need to watch for, but we’ll cover that in another post. Meanwhile, there are well-intentioned developer behaviors you need to monitor for to avoid impacting the speed and stability of your production environment.

Learn more about Capsule8’s Detection Methods

See what Capsule8 can detect in your environment – request a demo

Security Considerations for Cloud Migration

Posted by

Many companies have long resisted migrating to the cloud for security reasons. An evolving technology landscape can already make a well-planned cloud migration strategy seem like a complex task, but what if you add in a global pandemic? An entire workforce operating remotely? Murder hornets? These unforeseen challenges (OK, maybe not the murder hornets) can mean a business needs to kick plans into overdrive when operational activities are already difficult. How can you be sure you’re taking the necessary precautions pre-, during and post-migration?

Let’s talk about it together. 

On July 28th at 11 am ET,  Rob Harrison, Chief Product Officer at Capsule8, and guest speaker, Andras Cser, Vice President and Principal analyst at Forrester Research, will discuss on a live webcast how security considerations for a cloud migration have changed over the past few months and how future trends change risk when adopting accelerating strategies. They will discuss the challenges from both a business execution level and a cybersecurity level and how to mitigate those risks. Andras will also be sharing some of Forrester’s predictions in cloud security and both experts will be available to field your questions. 
If you’d like to join us, you can sign up here, and be sure to keep an eye out for the on-demand recording.

Register for the webinar

More Content You Might Enjoy

Maximizing Business Impact with Machine Learning

Posted by

I recently had the great fortune of presenting a lunch & learn session to the Capsule8 team. In this presentation I discussed how to effectively leverage machine learning to build intelligent products as efficiently as possible. Rather than focus on a single type of audience, I included information relevant to multiple levels including executive leadership, middle management, and individual contributors building machine learning solutions. The slides for the talk can be accessed here.

Below is a brief write up of the talk.

The Main Question

If you’re a leader at an organization seeking to leverage machine learning within your company, the main question to focus on is how to deliver value through machine learning as efficiently as possible. 

It’s easy to get caught up in the hype surrounding AI, especially if you follow popular tech media outlets. Tech juggernauts like Google, Amazon, and Facebook have realigned themselves as AI-first companies. Startups promising machine learning enabled scale are raising enormous amounts of venture capital. But from my experience building ML powered products as an individual contributor, manager, and consultant, I can confidently say that much of what’s written online about AI is overly hyped.

I’d like to share a 5-step process for maximizing business results through the use of machine learning and AI. Before we dig into the individual steps, let’s examine a well-known product powered by ML.

An Example

Smart Compose is a feature in Gmail that uses machine learning to interactively offer sentence completion suggestions as you type. Although the feature doesn’t generate revenue for Google directly, it does provide a magical user experience that no other email client offers today. And it’s the Smart Compose experience, not the machine learning model behind Smart Compose, that generates value for Google.

When Smart Compose generates text to finish your sentence, the predicted words appear in light gray font to distinguish the predictions from what you’ve written. The predictions are suggested, but you’re not forced to use them. If you decide to accept the suggestions, you can integrate the words by swiping right on mobile or hitting Tab on your laptop. This experience is like magic! It feels totally natural. Imagine how frustrating it would be if instead of suggesting words, Smart Compose automatically filled in the text with its predictions and you had to manually delete all the incorrect ones. Or if designers chose to  introduce an additional button to accept suggestions.

On the right-hand side of the slide we see the neural network architecture that powers smart compose by predicting the next few words given an initial phrase. This is where data scientists spend the bulk of their time. But it’s not where the value generation is realized. The cost and time behind the data scientists effort is worth its weight in gold many times over when you’re able to deliver an experience like the Smart Compose feature.

Remember to keep this dichotomy in mind. An intelligent experience isn’t possible without intelligence, typically in the form of a trained machine learning model. But on its own that intelligence doesn’t generate user value until it’s delivered in the form of a product or feature through an experience. Always remember there’s a difference between machine learning powered products and machine learning models. 

The 5 Step Process

1. Focus on formulating the business problem

The first step in delivering value is to focus on the business problem you wish to solve. This is more product management than it is machine learning. Can you clearly state the problem? Who are the users whose lives you want to improve? Do these users want whatever you’re planning to build? Objectively answering this last question requires conducting user research through interviews, surveys, and other means.

While this first step is part of any product ideation process, it’s important to keep in mind that machine learning enables entirely new sorts of experiences than traditional software development. Hence a product manager for AI does everything a traditional PM does, and much more.

It’s important to ask yourself whether you can build the experience without machine learning before committing to building the intelligence. If you don’t need a complex model, you’ll avoid significant capital investment and deliver user value much faster. Even if you do need intelligence, starting with a simple solution lets you gather customer feedback incrementally and collect more data, which leads to better models. For example, Smart Compose built off of Smart Reply, a feature that suggested replies to emails. 

2. Assemble a team

Once you’ve clearly formulated the problem and decided that intelligence is needed, you need to assemble a team. Data scientists can’t build products on their own. There is a false belief floating around that if you hire a data scientist with a PhD, this person can do everything required to build an intelligent product. Meanwhile in reality this PhD only knows how to write matlab code. I’m being facetious here but my point is that you need to assemble a team with diverse skills in order to build a great product.

Deepak Agarwal, the VP of Artificial Intelligence at LinkedIn, made this same point in his keynote at TWIMLcon 2019:

At LinkedIn, we have a machine learning engineer sitting with product designers at the design stage when building new products. If you want to get the AI right these different roles need to work together from the planning stage all the way through implementation. 

Deepak Agarwal, VP of Artificial Intelligence at LinkedIn

And it’s not enough just to include folks like designers. Hussein Mehanna, the Head of AI at self-driving car company Cruise, sas that in order to attract and keep high quality UX designers on ML projects you need to treat them as first class citizens along with your data scientists and machine learning engineers.

Deepak Agarwal summarized it perfectly: “It takes a village to get AI right.”

3. Quantify the cost of a model error

After defining the problem defined and assembling a team, it’s time to begin scoping the technical work. For a machine learning problem, the most important part of this preparation is planning for incorrect predictions. Machine learning is inherently non-deterministic. Data scientists work hard to estimate how models will perform on unseen data, but it’s impossible to plan for every combination of input data that a model might see. Instead, the best thing to do is quantify the cost of a model error.

For instance, a binary classifier can produce two types of errors: false positives and false negatives. The cost of each type of error is domain dependent. A song recommender that incorrectly guesses you like a particular song is a nuisance. A medical diagnostic that predicts a patient has cancer and requires chemotherapy is life altering.

When possible teams should assign dollar values to each type of error and factor these into the model’s loss function. During the model building process, data scientists move beyond aggregate metrics and perform error analysis to assess performance on critical subpopulations in the data.

4. Build out and automate the delivery pipeline

Data scientists are driven by their desire to build the most accurate model possibly. This often translates into training increasingly complex models. Avoid this temptation and start with basic models.

In his popular Rules of Machine Learning: Best Practices for ML Engineering, Google engineer Martin Zinkevich advises to “Keep the first model simple and get the infrastructure right” (Rule #4). ML systems require pipelines for extracting, transforming, and loading data, oftentimes in real time with strict SLAs. It’s vital to ensure that these pipelines are robust before introducing other sources of uncertainty like model complexity. And like all software systems, these data pipelines need to be well tested. As Zinkevich states in his next rule “Test the infrastructure independently from the machine learning.”

Beyond moving data through pipelines, ML systems need to deploy models in a continuous manner. Deployment is a multi-step process with its own set of challenges, including the need to A/B Test models to ensure predictions are driving the right product metrics.

5. Monitor, monitor, monitor

Once you’ve deployed your models to production, the real work begins. Models must be continuously monitored to detect and combat deviations in model quality such as concept drift. Early and proactive detection of these deviations enables you to take corrective actions, such as retraining models, auditing upstream systems, or fixing data quality issues without having to manually monitor models or build additional tooling.

Standard tools for monitoring software systems are not sufficient for monitoring machine learning systems. Besides standard software metrics like system uptime and latency, monitoring ML systems requires tracking input output data and model accuracy metrics. What good is a model that’s operational 99.999% of the time but returns inaccurate predictions? 


Building machine learning products is a relatively new endeavour for many companies. The tooling landscape is changing quickly and new best practices are emerging every day. If you’re planning to build intelligent products, one piece of advice is to start small. Avoid lofty goals and work on small projects that help create momentum within your organization and build confidence amongst your team. Major impact requires major investment (millions of dollars). Starting small will help you learn iteratively along the way to big impact.

Security Delusions Part 3: Cheat Codes

Posted by

Organizations are unearthing the potential of digital transformation, but security often remains a gatekeeper to this path of promised potential, largely due to its own delusions about what modern infrastructure means. As Herman Melville wrote in Moby Dick, “Ignorance is the parent of fear” – and security is too frequently hindered by its fear of the new and the agile precisely because of its ignorance about blossoming technologies.

In this blog series, drawn from my QCon talk last year, I will explore the history of infosec’s errant gatekeeping in the face of new technologies, and how we can encourage security to embrace new technologies to enable the business, rather than get in its own way. Part 1 and Part 2 are already published.

Now that we went through our history journey of infosec’s wariness towards cloud computing, and explored security’s present fears about microservices, how should we go forth in this muddled world? How can we evangelize real threat models and real solutions to security issues while prying traditional FUD-borne notions from enterprise infosec’s white-knuckled hands? In this final post of the series, I will detail the “cheat codes” for securing cloud and microservices environments and how to efficiently evangelize these best practices to security teams.

This discussion must start with how infosec perceives the engineers implementing all this newfangled, scary tech. Infosec tends to look at DevOps as reckless, overpowered frenemies rather than an ally who could teach them a thing or two about process improvement. As one security professional (who shall remain nameless) said, “DevOps is like a black hole to security teams because they have no idea what DevOps is doing and have no way of ensuring security policy is enforced.” The current conflict is ultimately about control – the fact that security is not exclusively gripping the wheel anymore. 

This means that engineers should be cautious when evangelizing cloud infrastructure, APIs, or containers to security folks. When someone is overwhelmed by fear, they will react quite poorly to being told to “calm down,” or that there is nothing to fear. Instead, engineers as well as infra-savvy security professionals must acknowledge that there are valid concerns borne from cloud or microservices environments — just not the ones commonly believed by the infosec industry. 

Cheat codes for cloud, APIs, and container security

What realistic concerns should be highlighted to replace the security delusions I covered in the first two parts of this series? Before we dig into specific best practices for clouds, APIs, and containers, there are three fundamental security tenants to remember for each category:

  1. Do not publicly expose your cloud storage buckets (AWS S3, Google Cloud Storage, Azure Storage).
  2. Do not use unauthenticated APIs.
  3. Do not use “god mode” in your containers – minimize access wherever possible.

The fortunate news is that there are established best practices for security for all the “super scary” technology – and these best practices should absolutely make infosec’s job easier. If anything, infosec takes on the role of evangelizing and enforcing best practices rather than implementing anything themselves.

IAM as the new perimeter

Analogizing security in cloud or microservices environments to the old, pre-Copernican ways (when the firewall was the center of the security universe) can help translate modern best practices into the language of traditional security professionals. Security groups and network isolation by CSPs are the firewall equivalent. Ingress and egress routes defined through AWS, GCP, or Azure are similar to firewall rules, letting you specify, “No, this resource can only talk to these systems.” It requires trust that the CSPs properly segregate resources, but again: it is a delusion to believe you can do so better than the CSPs.

Leverage your CSP’s tools

For cloud systems, making sure your AWS S3, Google Cloud Storage, or Azure Storage  buckets are not available to the public is the most valuable step you can take to avoid data leaks like Accenture and Time Warner’s. AWS offers a wealth of tools to help ensure best practices, including Amazon Inspector (looking for deviations from best practices), and AWS Trusted Advisor (provisioning resources using AWS best practices). 

Ensure the principle of least privilege

The CSP’s IAM roles can help ensure the principle of least privilege when accessing systems. Each provider has their best practices for IAM policies readily available, only a search away1. Segmenting production and development environments through maintaining separate AWS accounts for them is an alternative strategy. Instead of users, use Assumed Roles instead. This way, admins will log in as read-only users, and you can create keys with fine-grained permissions without needing a user with a password for each key or service account. 

API hygiene habits

Basic API hygiene will suffice for most organizations, consisting of authentication, validation, and the philosophy of not trusting external data. OWASP maintains a valuable “REST Security Cheat Sheet,” and its advice proves far simpler than the tangle of considerations for monolithic apps. For instance, sensitive data like API keys should not be exposed in the URL – instead, they should be exposed in the request body, request headers, or HTTP header (depending on the request type). Only HTTPS endpoints should be used, and there should be access control at each API endpoint. Apply allowlists of permitted HTTP methods for each endpoint.

Granular allowlisting in microservices

In the vein of API hygiene, ensure you validate input and content types. Do not trust input as a rule, so add constraints based on the type of input you are expecting. Analogize this to any traditional infoseccers as a form of granular allowlisting – previously impossible with monoliths, but now possible with microservices. Explicitly define what content types are intended and reject any requests with unintended content types in the header. This also engenders a performance benefit and is often part of API definition anyway – again, making the security team’s job much easier.

God is not a mode

For containers, the most prevalent “threat” is misconfiguration – just as it is for cloud and APIs. Much of the security best practice for containers is related to access management, a common theme across modern technologies. Do not expose your management dashboards publicly. Do not let internal microservices remain unencrypted – use of a service mesh can reduce friction when implementing encryption.

Crucially, do not allow “god mode” or anonymous access in your containers – and generally make your access roles as minimal as possible. Any CISO will be very familiar with the concept of least privilege already. Do not mount containers as root with access to the host. Disable your default service account token. Enforce access control on metadata. These amount to the new security “basics” in the modern era.

Your CI/CD is your new patch manager

Patching becomes palpably easier with containers – which can be argued as an antidote to the “Equifax problem,” in which procrastination due to the friction of taking systems out of production to patch them contributes to an incident. Continuous releasing means versions will be upgraded and patched more frequently – and container patching can be baked into CI/CD pipelines themselves. Any infosec team should be delighted to hear that containers let you patch continuously and automatically, removing them from the awkward position of requesting downtime for necessary security fixes.

Leverage containers for resilience and visibility

The fact that containers are managed through images in a registry removes work for security, too. The container image can be rolled out or rolled back, which should add a feeling of control for infosec teams. Further, visibility into which containers are affected by emerging vulnerabilities is much easier – container registries can be scanned to see which containers are vulnerable, instead of scanning production resources directly. And, live migration becomes possible through slowly moving traffic to new, healthy workloads from existing, vulnerable workloads, without any impact on the end user.

It will be hit or miss whether your organization’s security team really understands containers. You can try using the example of updating a Windows laptop to provide an analogy to live migrations. Usually, you have to shut down Word or PowerPoint and disrupt your work. Instead, imagine the Word document migrates to an updated OS in the background, followed by the PowerPoint presentation, until all the work is moved to the patched OS. Now, the unpatched OS can be safely restarted without interrupting work.

Codify secure configurations

It is critical for enterprise infosec teams to help codify secure configurations and enforce all of these best practices. This is the modern equivalent of crafting security policy templates2 (but less painful). Infosec teams can lead the charge in documenting threat models for standardized APIs, containers, and other resources. They should start with scenarios that would be most damaging to the business, such as customer data being leaked, data loss, disruption of service, then working backwards to the most likely avenues for attackers to accomplish those feats.

Prioritize protecting prized pets

Infosec teams should put additional effort into securing prized “pets” (vs. cattle), which are enticing to attackers and less standardized. As shown through the surveys mentioned in the prior post, visibility is one of the most coveted capabilities among enterprise infosec teams, and is crucial for protecting prized pets. However, the types of tools that could provide the right visibility for infosec teams are often already used by operations teams seeking to optimize performance. This is a propitious opportunity for security and DevOps to collaborate, with the benefit of sparing budget and integration work required by removing duplicate functionality.

Build your audit use cases

Hitting the right compliance boxes can encourage adoption of modern tech as well, since compliance is a consistent budget item. File integrity and access monitoring (known as “FIM” and “FAM”) is an underpinning of nearly every compliance standard, from PCI to HIPAA to SOX. FIM/FAM requires monitoring and logging of file events for a few different purposes, but primarily to catch unauthorized modification of sensitive data (a violation of data integrity) and to create audit trails of which users accessed sensitive data (to preserve data confidentiality).

Because of the improved inspectability of containers, FIM/FAM becomes easier – even without a tool like Capsule8, which does it for you. Because microservices are distilled into simpler components than in a monolithic application, it is easier to pinpoint where sensitive data is being handled, helping target monitoring efforts. Demonstrating the ease at which visibility is obtained can help assuage concerns about control. Note, however, that infosec professionals are less familiar with the term “observability,” so translation is required when collaborating.

Caveats and cautions

Each CISO and infosec team maintains different priorities and possesses different skills, so not every tactic here will necessarily be effective for every team. Some teams prioritize compliance work, others seek to rigorously define policy, and yet others are only familiar with maintaining network security equipment and SIEMs. Many enterprise infosec practitioners will be more proficient with Windows than Unix, think in a network-centric model, and rarely develop anything themselves. Therefore, patience, analogies, and proof that not all control is lost will be critical in gaining buy-in.


It is hard to let go of long-held beliefs, and the firewall-centric model in a well-understood world of monoliths is tricky to dislodge from the heart of enterprise information security. Many of infosec’s fears over modern technology can be distilled into fears over losing control. For those in DevOps functions looking to help infosec evolve — or security professionals wanting to help their teams enter the modern era — assuaging those fears by redirecting control from grasps at threat phantasms towards tangible, meaningful threat mitigation is an essential step forward.

Work together to build secure standards for APIs and containers, to document appropriate cloud configurations, and to create threat models that can help continuously refine design towards more secure outcomes. Enterprise infosec teams, freed of many maintenance burdens through native controls and standards, can now focus on securing the “pets” in this modern world. Security will no longer herd cats and cattle, but instead be an evangelizer and enforcer of best practices.

Everyone maintains delusions in one fashion or another, but I sincerely believe we are not bound to them like Andromeda chained to a rock in the stormy sea. Information security can survive this Copernican revolution of cloud and microservices, but they could use their Perseus to save them from their Cetus – the devouring fear fueled by the siren song from infosec vendors urging them to succumb to dread. My hope is my guidance throughout this series can help us unchain infosec, allowing them to go forth into a new dawn of secure and resilient software delivery performance.


[1]:  If you don’t feel like Googling for them, here are the links to each: Security Best Practices in AWS IAM; Using Google Cloud IAM securely; Azure identity & access security best practices

[2]:  These are often required by compliance, and most CISOs should have familiarity with them.

Security Delusions Part 2: Modern Monsters

Posted by

Organizations are unearthing the potential of digital transformation, but security often remains a gatekeeper to this path of promised potential, largely due to its own delusions about what modern infrastructure means. As Herman Melville wrote in Moby Dick, “Ignorance is the parent of fear” – and security is too frequently hindered by its fear of the new and the agile precisely because of its ignorance about blossoming technologies.

In this blog series, drawn from my QCon talk last year, I will explore the history of infosec’s errant gatekeeping in the face of new technologies, and how we can encourage security to embrace new technologies to enable the business, rather than get in its own way. You can read part one here.

Now that we explored infosec’s history of cloud compunction, we can turn to the new looming beast for security teams to face: microservices. 

This darkling terror security harbors in its heart is that microservices creates a titanic, labyrinthian attack surface. It is as if they believe that each microservice adds the same attack surface as a traditional monolithic application – and thus with thousands of microservices, the attack surface of the monolith days is multiplied by a thousand as well. Through this lens, it is understandable why microservices would be absolutely terrifying – but this mental model is, of course, wildly mistaken.

In this infosec Copernican Revolution, it is exceedingly difficult for security to let go of the perimeter model. Although proven false countless times, the pervading belief was and still often is that if the perimeter is secure, then the enterprise will be safe. This is an illusory history. Lateral movement was so pernicious because once attackers bypassed perimeter defenses, the only defense they encountered was #yolosec, giving them free reign over internal networks.

While security is lamenting the dissolution of the perimeter and the daunting monster that is microservices, they completely miss that microservices forces the purported dream security held for so long – that security would be baked-in rather than bolted-on. Because microservices are typically considered publicly-facing by default, no one can rest on the assumption that perimeter defenses can save them – thus turning native security controls into the necessary default rather than a nice-to-have.1

Let us now turn to two essential components of microservices environments to explore the security delusions about each individually: APIs and containers.

APIs: Infosec’s Anathema

In a November 2018 survey by Ping Identity on API security concerns2, 51% of respondents noted that they are not certain their security team knows about all the APIs in their enterprise’s network. Certainly, developers are now opening many API endpoints – but that does not differ from the prior mode of developers opening particular ports on an internal network. 30% of respondents said they do not know if their organization has experienced a security-related incident involving their APIs – and I suspect the 30% would not know whether they have been compromised whether it involves APIs or not.

CISOs are particularly fraught over the idea of public APIs – that they add attack surface, that they are so close to the grasp of attackers, that it is impossible for security to have control over all of them. As one security professional publicly opined, “Formerly, local networks had only a few connections to the outside world, and securing those endpoints was sufficient.”3 That, in fact, was never truly sufficient. This strategy resulted in local networks that were astonishingly brittle because of the assumption that network security controls would prevent anyone from gaining access. 

Infosec practitioners will cite related fears that APIs can provide a “roadmap” for underlying functionality of the application, and that this roadmap can aid attackers. These fears are, quite frankly, ridiculous. Any legitimate security expert will caution that the “security through obscurity” approach is a terrible idea. Hiding functionality does not make your app inherently secure or insecure. However, if infosec teams are concerned about this, there is a high degree of certainty that the app is not designed to be resilient – which is a failure of the infosec program.

As I advocated in my previous research on resilience in infosec, the only way to ensure resilient systems from a security perspective is to assume that your added security controls will fail. Specifically, I recommended treating any internal or private resources as public – because otherwise you will bask in a false sense of security when your controls are inevitably bypassed. It is eye-opening how few enterprise security teams traditionally treat their internal or private resources in this way, as if there was not extensive documentation of attackers bypassing network security tools.

Further, what security practitioners often do not realize is that standard OWASP-based attack tools (such as Burp or Nessus) do not work nearly as well on API endpoints, because there are no links to follow, no attack surface to map, unknown responses, and potentially no stack traces. What is more, for RESTful JSON APIs, whole classes of vulnerabilities around cross-site scripting (XSS), session management vulnerabilities, compromised cookies, or protecting tokens are removed through the use of digest authentication and JSON Web Tokens (JWT tokens). If anything, API-centric apps abate application security (appsec) concerns rather than aggravate them.

One of the performance benefits of a microservices approach is borne out of standardization – and standardization also begets security benefits. However, standardization is not a common, nor commonly understood, topic among enterprise infosec professionals. They still live in the tailored and monolithic universe, not grasping that there can be a singular, well-developed API deployment that can be replicated – thus reducing their work down to rigorously testing the single API deployment until they are comfortable with its security posture. Standardization is a prevalent factor in the world of containers, as well – and is one no less fraught with security concerns.

The Curse of Containers

This new world of public-facing API connections is not the only aspect of modern technology receiving condemnation and trepidation by enterprise information security – containers themselves are seen as quite a grave bouquet of threats. 

Not every infosec professional realizes that containers are not, in fact, featherweight virtual machines (VMs). Frequently asked questions, as noted by Mike Coleman, may include “How do I implement patch management for containers running in production” or “how do I backup a container running in production?” – questions that evince the lack of understanding of the nature of containers. They do not know that there is a separate data volume that is backed up and they do not know that you patch the container image instead of the actively running container. 

A recent survey by Tripwire4 incidentally exposes this confusion among information security professionals. 94% of respondents have concerns regarding container security – and this “lack of faith” has led 42% to delay or limit container adoption within their organization due to security concerns. The winning reason (54%) among respondents for their security concerns is inadequate container security knowledge among teams – and we should be grateful they are at least acknowledging that their lack of understanding is a contributing factor. 

Source: Tripwire

The remaining concerns include visibility into container security (52%), inability to assess risk in container images prior to deployment (43%), lack of tools to effectively secure containers (42%), and the most nebulous one: insufficient process to handle fundamental differences in securing containers (40%). I, for one, am deeply curious to know what they perceive these fundamental differences to be, given prior erroneous beliefs about cloud security.

To crystallize the confusion and anxiety, the survey results around infosec professionals’ desired security capabilities for containers are worth exploring, too. 52% quite reasonably desire incident detection and response – something we (Capsule8) provide. Another reasonable request, by 49% of respondents, is for isolation of containers behaving abnormally. Regrettably, 40% also want “AI security analytics” for containers, and 22% want blockchain to secure containers, so we can presume somewhere between 9% to 12% are sane, and at least 22% have absolutely no idea what they are doing. 

Source: Tripwire

Beyond survey data, a frequently suggested straw man by infosec is that each container requires its own monitoring, management, and securing, leading to time and effort requirements that spiral out of control. The whole point of containers is for them to be standardized, so such claims are directly ignoring the purpose of the technology. Yes, they need to be monitored – but were you not monitoring your existing technology?

A cited fear of standardization itself is that vulnerabilities can be replicated many times as source code is used repeatedly. This ignores the status quo. Testing containers is still monumentally better than having developers write random queries every time in different parts of the application stack. At least in a container, you can find the vulnerabilities easily and orchestrate a patch to all relevant containers. Good luck finding the vulnerability in a custom-built Java app with intricate functionality.

It is as if infosec forgot the trials and tribulations of dealing with monolithic applications, as they now will cite that “you know exactly where the bad guys are going to try to get in” because there was one service and a couple of ports. They apparently have not heard that “complexity is the enemy of security,” or have conveniently forgotten the mantra.

In a monolithic application, workflows are enormously complex, making it extremely difficult to understand every workflow within it – meaning it is nearly impossible to understand how workflows can be taken advantage of by attackers. Because microservices represent one workflow each and are standardized, they can be mapped out in an automated fashion, making threat models considerably easier. For instance, JSON mapping and Swagger are designed to describe exactly how APIs interact, and modern web appsec tools will ingest these maps to understand an app’s API endpoints.

Another vital, but overlooked, benefit of containers for security teams is immutability and ephemerality (as discussed in my Black Hat talk last year). An immutable container is one that cannot be changed after it is deployed — so attackers cannot modify it as it is running. An ephemeral container is one that dies after completing a specific task — leaving only a short window of opportunity for attackers to do their thing. Both characteristics embed security by design at the infrastructure level, and are far easier to implement with containers than with traditional monolithic applications. 

If you segregate identity and access management (IAM) roles in Amazon, containers can only talk to each other based on what you specify, removing network services from your systems. Any infosec professional pretending authentication between microservices is not easy is either lying or has not actually attempted to learn how to do it. The shared environment of containers, much like concerns infosec held over the shared environment of cloud, are a frequent fear as well. This, too, forgets history.

Before, your systems would talk over FTP, telnet, SSH, random UDP ports, port 80 talking to other things – but now, all that network mapping is removed because you are using TCP, authenticated APIs, and HTTP standards. Using containers, someone needs to pop a shell in (a.k.a. compromise) your web server infrastructure, whereas before, they could get in just through an FTP service running.

The update process for containers also concerns infosec practitioners – specifically, that it is still too easy for developers to use vulnerable versions of software. I ask: this is in contrast to what paradigm? When people were still using versions of Windows Server 2008 that were built with Metasploit backdoors ready to go? Software versioning is, was, and probably will always be an issue – containers or otherwise. Pretending this is a new issue is disingenuous. And containers present an opportunity in this regard — that you can ensure your software is compliant, secured, and patched before the workload even spins up.

In this modern world, you do have multiple services of which you must keep track, but you are also separating out complex functionality into separate services. With big, complicated applications, one of the key issues previously when moving from staging to production was needing to track every single place where you needed to remove, for instance, stack traces. If you deploy in a container-based environment, you have a build for stage, a build for production, and you can track exactly what the systems are, building the API on top of it. 


During this exploration of these “modern monsters,” we saw that the security industry’s present fear of microservices (both APIs and containers) do not match with their realistic threat model. Unlike the concerns over cloud computing, it seems security teams are less reticent to acknowledge that part of their hesitation is driven by a lack of understanding — and acknowledging the problem is a necessary first step on the path to recovery.

Unfortunately, security’s apprehension of microservices is also withholding opportunities for security teams to leverage microservices to improve organizational security. Promoting standardized APIs should reduce a whole host of security headaches — moving away from manual security reviews across knotted monoliths towards automated security that checks whether an API endpoint adheres to the defined standard. While containers are certainly not secure by default, they present an opportunity to scale security workflows — as well as raise the cost of attack through their ephemeral nature.

It is all well and good to document the anxieties of infosec teams, but what can we do to handle these concerns? In the final part of this series, I will dive into the cheat codes for dealing with all of this — including recommendations on best practices for securing modern infrastructure.

Read part 3 of this series: Cheat Codes


[1]: A caveat here is that typical internal microservices often will not use encryption because of certificate challenges that create friction for engineers. Yet, this is undesirable, and will certainly panic your security team if done.

[2]: Canner, B. Solutions Review. (2018, November 19). Ping Identity Releases Survey on the Perils of Enterprise APIs. Retrieved from

[3]: Because of their apparent predilection for espousing FUD, I am not naming them so as to not give them more attention.

[4]: Tripwire. (2019). Tripwire State of Container Security Report. Retrieved from

Security Delusions Part 1: A History of Cloud Compunction

Posted by

Organizations are unearthing the potential of digital transformation, but security often remains a gatekeeper to this path of promised potential, largely due to its own delusions about what modern infrastructure means. As Herman Melville wrote in Moby Dick, “Ignorance is the parent of fear” – and security is too frequently hindered by its fear of the new and the agile precisely because of its ignorance about blossoming technologies.

In this blog series, drawn from my QCon talk last year, I will explore the history of infosec’s errant gatekeeping in the face of new technologies, and how we can encourage security to embrace new technologies to enable the business, rather than get in its own way.

Let us take a trip down memory lane, back to the early 2010s when “cloud transformation” reached sufficient significance to warrant concern by security professionals. These concerns presented as simplistically as fears of “storing data online,” extending to fears of shared resources, data loss, insider threat, denial of service attacks, inadequate systems security by cloud service providers (CSPs), supply chain attacks. 

However, the crux of the matter was rooted in a loss of control. No longer would security teams maintain the security of infrastructure themselves. No longer would their programs be anchored to firewalls. While the general IT question of the moment was usually, “What happens if our connectivity is interrupted?”, the question for IT security was, “How can we keep things secure if they aren’t directly under our control?”

For those of you who joined infosec more recently or who are interested observers from other disciplines, you may wonder why the prior model fostered such a sense of control. Traditional information security programs centered around the firewall – the first line of defense for the organization’s internal network perimeter, the anchor of the perimeter-based strategy, the key producer of netflow data that populated dashboards and provided signal for correlation across products.

Image result for security architecture firewall ids
Figure 1 – Fittingly in Comic Sans
Image result for ngfw diagram
Figure 2 – The Next-gen Firewall (NGFW) did not change things much

The Defense in Depth model became quite popular, one that advised a “multi-layered approach” to security (which is not wrong in the abstract). The first line of defense was always network security controls, starting with the firewall and its rules to block or allow network traffic. Intrusion prevention systems (IPS) worked just behind the firewall, ingesting data from it to analyze network traffic for potential threats. Fancier enterprise infosec programs segmented the network using multiple firewalls – what a SANS Institute paper called the “holy grail of network security.”1

But the transition to cloud erodes the traditional enterprise perimeter, and thus erodes the firewall’s position as center of the security universe. Thus, one can view the cloud transition as a Copernican Revolution for enterprise information security. And with such a shift, it is perhaps natural for enterprise infosec teams to reject it, wary of their relevance in this new world.

Survey data throughout the years covering infosec’s skepticism towards cloud helps fill in this picture. In 2012, Intel performed a survey on “What is holding back the cloud?” discovering that the top three security concerns regarding private cloud were related to control2. 57% of respondents cited concern over their inability to measure the security measures by CSPs, 55% cited lack of control over data, and 49% cited lack of confidence in the provider’s security capabilities.

Source: Intel

After the loss of control followed concerns of “lack of visibility into abstracted resources,” and “uneasiness about adequate firewalling.” Hypervisor vulnerabilities were widespread concerns for both public and private cloud – though with the benefit of hindsight, they never materialized for the typical threat model (even today). And concerns about adequate firewalling more than anything reveal the stickiness of the network perimeter security model.

By 2014, 66% of security professionals surveyed by Ponemon3 said their organization’s use of cloud resources diminished their ability to protect confidential or sensitive data. 64% said cloud makes it difficult to secure business-critical applications, and 51% said the likelihood of a data breach increases due to the cloud. In 2015, a survey by the Cloud Security Alliance (CSA) highlighted that 71% of respondents view the security of cloud data as a big red flag. 38% said their fear over loss of control kept them from moving data into cloud-based apps – thankfully fewer who believed the same in Intel’s 2012 poll.

Source: Ponemon

Distilling these fears, it is absurd in hindsight that enterprise defenders could believe that a few people maintaining a firewall could outmatch the security efforts and measures of Amazon, Google, or Microsoft. It can only be through the endowment effect – that people overvalue things which they already possess – with a bit of sunk cost fallacy that could lead to such a hubristic conclusion.

Looking back, seldom were the major CSPs hit by publicly-disclosed data breaches. Salesforce has no known major data breaches, outside of disclosure that attackers were using fake websites to phish customers in 2014. Heroku, a Salesforce subsidiary, disclosed a vulnerability in early 2013 that could lead to the potential to access customer accounts – but they did not appear to possess proof of an actual breach. AWS, GCP, and Azure have no known breaches outside of customer misconfiguration.

The most notable CSP breaches include Dropbox in 2012 (68 million usernames + passwords), Evernote in 2013 (50 million usernames), and Slack in 2015 (500 thousand usernames). Yet, these were all breaches of user account databases, rather than evidence of customer accounts or storage repositories being breached themselves.

Despite these sentiments, cloud adoption inexorably marched onwards, and security teams mostly had to shut up and deal with it. The notion that security teams could secure infrastructure better than Amazon, Microsoft, or Google finally became fringe – but truthfully it only did so within the past two years or so, well after operations teams realized that the CSPs could provide more performant infrastructure than most could manage on their own.

The reality of cloud systems is that misconfigurations present the biggest concerns, such as an S3 bucket that is accidentally publicly exposed. Gartner indeed suggests that “through 2020, 80% of cloud breaches will be due to customer misconfiguration, mismanaged credentials or insider theft, not cloud provider vulnerabilities.”4 Luckily, there are considerable resources to assist with the misconfiguration problem – more on that in the third part of this series.

Another reality is that security operating expenses can decrease when using the CSP’s native security controls, according to McKinsey research5. This research suggests that an enterprise with an annual budget of $200 million would spend just under $12 million per year on security, which is $5 million less than they would spend if they did not use the CSP’s native security controls.

This is not surprising. One large security vendor’s web malware protection system sells for over $100,000, as does its email protection system, both of which are deployed as blinky boxes on the network. A larger security vendor’s next-gen firewall (NGFW) starts at $50,000, though higher throughput models quickly reach $150,000 or more. An even larger security vendor’s firewall (with five year support) is priced near the $200,000 mark.

One might assume that a transformation resulting in less hardware to manage and less expense would be a welcome one – but this was not the case for enterprise infosec teams and cloud transformation. Of course, some CISOs readily embraced the potency and efficiency of cloud adoption, but even today, you can still find CISOs reticent to acknowledge cloud’s security benefits. 


During this history lesson, we saw that the security industry’s palpable fears of cloud computing did not match the eventual reality. Infosec was not only late to the cloud party, but often unnecessarily stalled organizational transformation, too. 

Much of security’s reluctance was driven by status quo bias — the stickiness of the defense-in-depth perimeter security model that gave security considerable control, and thus a sense of comfort. Importantly, that sticky bias led (and still leads!) many security teams to attempt recreating the same old-school model in new environments. Such an approach defies resilience, and serves as a warning for how security will handle the adoption of other tech today and in the future.

Now that we are faced with the rapid adoption of APIs and containers in the enterprise, will history repeat itself? In the next part of the series, I will explore how infosec is currently responding to microservices (APIs and containers) and what delusions are being conjured…

Read post 2 in this series: Modern Monsters


[1]: Bridge, S. (2001). [Achieving Defense-in-Depth with Internal Firewalls]( Retrieved May 2019.

[2]: Intel, Intel IT Center Peer Research. (2012). What’s Holding Back the Cloud? Retrieved from

[3]: Ponemon Institute LLC. (2014). [Data Breach: The Cloud Multiplier Effect]( Retrieved May 2019.

[4]: A bunch of vendors cite this quote, but I cannot find it directly via Gartner. I am assuming it is behind Gartner’s paywall.

[5]: Elumalai, A., et al., McKinsey Digital. (2018). Making a secure transition to the public cloud. Retrieved from (Note that these statistics are only true if the apps are rearchitected for the cloud in parallel.)

No More Tiers: Reimagining the Structure of SecOps

Posted by

Why not both?

I’m not sure who thought that arbitrary hierarchical silos among a team of individual contributors was good for team morale and load-balancing, but here we are.

During a recent guest appearance on the Purple Squad Security podcast, I described my last role working on a security operations team that handled incident response as well as the usual monitoring and detection (and, being a small and early-stage team, a handful of other duties). John, the host, described this multifaceted team structure as the 2-in-1 shampoo and conditioner of secops. (I quipped, of course, that unlike the maligned hair product, I felt that we were effective at both functions!)

I would have actually compared us to the promises typically made by baby shampoo: “No more tiers.” I’m here all week, folks.

Older models of security operations often involve a tiered approach, with specialized individuals performing disparate functions among the chain of escalation. Typically, the initial detection is at the “level 1” tier, with incident response and threat hunting higher up. Underlying this divide is the assumption that the latter is naturally a more senior role than the former. Some orgs also implement this strict role divide as a separation-of-duties security measure (which to me signals outdated notions of trust in both the security and the human sense of the word). 

While massive companies with sizable security teams and vast infrastructure may warrant more specialized roles, the tiered model as the best practice across-the-board never made much sense to me. Particularly on a smaller, burgeoning team, separating the monitoring/detection and IR functions would have contributed to faster burnout, more single points of failure, and less efficient incident handling. 

The reality is that these disparate security operations functions can successfully be performed by the same people, regardless of their level of seniority. I’d like to explore how this cross-functional model can set teams up for success by enabling recovery from the drain of IR, removing breakdowns in consistency and empathy, and position teams for a smoother and more complete detection and response process.

Recovery via redundancy

Many of us who have worked in incident response have written about the burnout and anxiety that can accompany it. Dropping everything to investigate and resolve suspected security harm to our employer can be mentally draining even in the best of circumstances.

My old team spaced this out via a weekly on-call rotation. During business hours, one person would be the primary point of intake of ticket queues, alerting, and potential incidents, with another person as secondary on-call in the event that the primary analyst was handling an incident or otherwise indisposed. (We were not expected to work after hours unless there was an active incident.) When not on call, there was still plenty of work to be done: security education, building out documentation, quarterly projects, and other important-but-not-urgent tasks. More importantly, the time in between on-call weeks served as a recovery period.

The times when I felt the most sustained levels of burnout occurred when my recovery window between incidents was shortened. I had a stretch last summer where I noticed that I had been catching a disproportionate number of IRs during my secondary on-call weeks, as well as a couple outside of business hours, which was rare for our team. It led to good discussions about what needed to be escalated to incident-level priority, and raised the urgency of working on process improvements that would make certain things non-incidents — but not before my stress spiked for lack of getting the brain rest that I badly needed.

Now imagine how much more pronounced that burnout would have been if there was no rotation, or a much shorter one — which would have happened had we been structured in a traditional tiered model. During my time on that team, we usually had three or four security analysts (with a couple short streaks of two). Given the same team size, what if one or two of us were responsible for all of the IR and the others responsible for none, based purely on how long we’d been in the role or how many years of experience we had? That siloing would have left us with prime burnout conditions and single points of failure. With cross-functionality comes breathing room.

The next right thing

Incident response is often treated as a senior-level position. Yet most of us have done it before.

Taking a page from the Frozen II song, IR is largely a matter of, in the face of a daunting and incomplete puzzle, doing “the next right thing”. This is a fairly universal experience, even if not applied to security. At some point or another, we find ourselves in a dire situation where our horizon shrinks to one step. We can’t process the long-term — all that matters is putting one foot in front of the other until we’re out of the woods. 

I’m not saying that fight-or-flight survival mode is the best or most sustainable way to handle IR, but the IR-like process of moving forward in the face of intense uncertainty is one that most of us will face in our lives. It’s not a highly specialized skill set — it’s a defining cornerstone of humanity. 

Learning to handle these conditions gracefully takes time, but the way to gain comfort is through repetition — and, again, space to recover in between. This comfort is facilitated by measures that most IR teams already take: maintaining runbooks to provide at least some consistency in an inherently chaotic process, running tabletops and functional drills to iron out process improvements, and defining roles and responsibilities so that no analyst is ever expected to go it alone. 

Keeping security operations professionals siloed off from IR until they reach some superficial level of seniority where we decide they’re worthy of responding to what’s detected just delays the process of building comfort with uncertainty in a security setting.

All I’m asking is for a little detect

What of the folks handling the monitoring? Where there are tiers, this tends to live at the bottom tier not only of security operations, but of security itself. Many security professionals devalue this work, viewing it as immutably entry-level right down to the pejorative “SOC monkey” label. Indeed, when I was starting out in security operations in 2016, one of the warnings I heard from longtime security professionals was “avoid ending up on a team where you’re just farming tickets all day with no room for growth”.

Look at the on-ramp, though: joining a security operations team comes with the expectation of quickly gaining situational awareness of an entire company. Learning a new environment — from the systems, data flows, and existing detections, to the organizational structures, processes, and escalation paths — is one of the biggest undertakings of any new role. That can certainly be done successfully at entry-level — many of us who’ve worked in secops entered the security field through such roles without prior tech jobs — but it doesn’t need to be limited to that.

Once we’ve wrapped our heads around what we have in our environment, what “normal” looks like, and how we detect deviations from this baseline, shouldn’t our goal be to always consider ways to improve those processes? How can we automate away an alert or its triage? What else are we not logging or detecting that we could be? If we roll out a new piece of the detection pipeline, how do we weigh its visibility benefits against its impact on performance? 

Detection is ripe for ingenuity if those handling that initial detection are given the space to grow and explore, rather than having a significant portion of security operations abstracted away from them. The days of the stereotypical “SOC monkey” staring at dashboards as a consumer but not an innovator are long behind us. The detection side of security operations is only as entry-level as you make it.

Consistency and efficiency

When those who work on the initial triage of tickets and alerts are afforded the ability to also take on more thorough investigation and incident response, we are in a better position to build detection tooling and processes in a way that is most salient for IR. If an alert is always escalated away from its initial point of intake, that person may never know if the alert output is missing key information that the incident responder has to manually dig up every time. Feedback loops and IR retrospectives are helpful but often incomplete and no replacement for firsthand experience.

Separating out the detection and response can lead to not only less effective detection and triage, but also less efficient IR. Whenever ownership is transferred, we risk the possibility of loss of context or critical information — and time spent on the knowledge-transfer increases the time to resolution. In the rare cases where something was truly out of my depth, or at a need-to-know level of sensitivity that only a manager should be involved in, then yes, handoff was an option.

For the majority of incidents, however, wouldn’t the process be a smoother ride if the person detecting something could see the entire incident through to completion? They can hand off the non-urgent work, not the incident that they discovered. Health care professionals have the concept of continuity of care, which recognizes the importance of the historical context and trust relationships that can form when a doctor is able to stay assigned to a patient for the duration of their treatment. They may call upon specialists to handle certain aspects of the care plan, but they remain the ones driving treatment forward. Continuity of incident response promotes a similar consistency and protects against key information being dropped during handoffs.

Remembering why we’re here

In the recent “SOC Puppets” episode of the Detections podcast (discussion of tiered vs tierless models starts around 34:00), the hosts raised the fact that in tiered teams, level 1 analysts are often granted very limited access to audit logs and other tooling needed to do a proper base level of analysis. This lockdown is treated as a security control — rather than monitor and alert on the risky behavior they’re concerned about enabling, or putting other safeguards in place to limit the impact and scope of said risky behavior, they opt to simply own that whole “department of no” stereotype and apply it inward.

The result, naturally, is inefficient and inaccurate triage. Certain alerts may be wrongly closed as false positives because an analyst lacks access to relevant logs that would have given them vital context — they may not even know that such logs exist. Other tickets may be wrongly escalated when they could have been an easy open-and-shut case.

Security exists to enable the business; when a security control inhibits a person’s ability to do their job effectively, it’s time to reevaluate the benefits vs risks that that control poses. The security team’s own ability to work is not exempt.

This brings us back to maintaining morale. Burnout doesn’t only stem from emotionally-taxing incident response work, but from the lack of a sense of purpose. The people in “level 1” may enter the role as the most enthusiastic newcomers who have worked their butts off to land a security job. When we shove them into an isolated corner of the team, treat them as untrustworthy, and cut off access to crucial resources, they start to question why they’re even here.

A stratified, isolated, demoralized team does not need to be the norm. We thrive when we are able to access the resources we need to successfully do our jobs — including access to our colleagues and visibility into the broader impact of our work.

Tier Drop

Removing the unnecessary hierarchy and promoting cross-functionality in security operations sets teams up for success. Both sets of functions can be performed with great impact and influence at any level. By creating more space for recovery and growth via role redundancy, allowing secops teams to experience the entire detection and response lifecycle, and baking consistency into IR, the only tears will be those of an attacker shut down by a robust secops team.

What is Container Security?

Posted by

Container Security – Nobody Knows What It Means But It’s Provocative

The current understanding of “container security” as a term and market is muddled, especially given containers are used by different teams in different contexts. It could mean scanning image repositories for vulnerabilities or exposed secrets, managing credentials for container deployment, or monitoring running containers for unwanted activity. 

This confusion isn’t particularly helpful to anyone — developers and operations teams are increasingly being asked about security for their deployments, while security practitioners are looking to either secure their container-based workloads themselves or partner with their DevOps colleagues to do so. Thus, my goal in this post is to help provide some clarity around the market for all involved.

To frame the container security market by use case, I’ll be using the three phases of a typical software delivery lifecycle as outlined by the Accelerate: State of DevOps report: software development, software deployment, and service operation. Others may refer to these phases as “build,” “ship,” and “run.” I’ll highlight the primary security features of each phase, the benefits and downsides, and some of the representative vendors in the space (including open source software). Naturally, not every vendor will offer all features outlined, but hopefully it can help you navigate the vendor landscape a bit more easily.

Container Development

This phase (a.k.a. the “build” phase) is about securing the container development efforts, including container image repositories and new container images being created. The main goal is to spot vulnerabilities in the container’s code early so they can be fixed during development, rather than spotted right before a release deadline or while the container is already running in production (where it could be exploited by an attacker). However, vendors in this category often look for vulnerabilities in running containers, as well, extending into the security of operating containers, too.

Primary Features:

  • Scanning for vulnerabilities, malware, or exposed sensitive information (like API keys)
  • Blocking insecure builds and images from untrusted sources
  • Validating adherence to compliance standards or custom policies 
  • Reviewing third party dependencies and matching base images to allowlists
  • Integration into CI/CD pipelines, including source code repositories (GitHub, GitLab) and build servers (Jenkins)


  • Integrates security assessments earlier in the CI/CD pipeline (that “shift left” thing)
  • Surfaces obvious security issues (like known CVEs) before deployment
  • Helps enforce adherence to compliance standards (like HIPAA, PCI, or CIS benchmarks) within the context of that build.
  • Tracks vulnerabilities in running containers that weren’t previously discovered or fixed (overlapping with container operation security)


  • Constrained to known vulnerabilities that can also be spotted with relative ease
  • Misses container security risks beyond application vulnerabilities
  • Any sort of blocking mode adds some friction to developer workflows and the dev pipeline
  • On the flip side, flagged vulnerabilities can be ignored (especially in high volumes) and remain unfixed before deployment
  • Leaves gaps if there isn’t support for all languages and frameworks used by an organization
  • Vulnerability feeds can contain incomplete or faulty information
  • Doesn’t usually take into account context under which container images will be used 

Representative Vendors:

Note: not all companies listed offer all primary features listed.

  • Startups: Aqua Security, NeuVector, Stackrox, Sysdig, Snyk
  • Large Security: Qualys, Palo Alto Networks (Twistlock acquisition), Tenable, TrendMicro
  • Ops / Platforms: Amazon ECR, Azure Security Center, Docker Enterprise, GitHub, GitLab, Google Cloud Platform, JFrog, Synopsys
  • OSS: Anchore, Clair, OpenSCAP Workbench

Container Deployment

This phase (a.k.a. the “ship” phase) secures the container deployment phase, including built containers that are deployed to image repositories but are not yet running (either in test or prod). The main goal is to audit and ensure only the right people can access and manage container repositories or orchestration layers — because otherwise attackers or accidentally-rogue developers could tamper images, jeopardizing security and stability once those images are running. 

Primary Features:

  • Define and enforce access control policies across different clusters and workloads
  • Managing credentials across different clusters and workloads
  • Performing drift detection (spotting deviations from expected configurations) and validating the integrity of clusters and image repositories 
  • Integration into orchestration tools (AKS, Kubernetes, etc.)


  • Minimizes misconfigured resources that could lead to security or performance issues
  • Enforces the principle of least privilege 
  • Spot vulnerabilities in image repositories that may have been publicly disclosed since the images were originally pushed to the repo
  • Facilitates adherence to compliance standards, which often require auditing of file access and modification
  • IAM capabilities are largely available natively through cloud service providers
  • Maintaining parity between prod, QA, and dev/test environments can be simplified by properly managing configuration parameters


  • Kubernetes-only use case addresses a limited part of the microservices threat model
  • Doesn’t apply to serverless container instances (though that’s admittedly a minor slice of the market)
  • Lack of namespace support limits per-container policy creation

Representative Vendors:

Note: not all companies listed offer all primary features listed.

  • Startups: Aqua, NeuVector, Alcide, Octarine, StackRox, Styra, Tigera
  • Large Security: HPE (Scytale acquisition), Qualys, Trend Micro, Palo Alto Networks
  • Ops / Platforms: AWS, Azure, Docker Enterprise, GCP, RedHat OpenShift
  • OSS: Kubebench, Kubehunter, Kubediff, OpenShift (RedHat), SPIRE (SPIFFE) 

Container Operation

This phase (a.k.a. the “run” phase) secures the actual instantiation and operation of containers, once they are deployed and running in enterprise environments (especially production, but let’s not forget test or QA). The main goal is to protect against unwanted activity within containers in operation, ideally detecting and responding to an incident before it results in downtime or service degradation. 

This category also covers the collection of system telemetry to facilitate post-hoc investigation of incidents. As mentioned previously, many of the container development security vendors also track vulnerabilities in running containers, but we consider that an extension of those aforementioned capabilities.

Primary Features:

  • Detection of unwanted activity (both attacker and developer), usually either deterministic or behavioral / ML-based
  • Automatically responding to unwanted activity (like shutting it down)
  • Collecting and querying system telemetry for incident investigation and auditing
  • FIM, AV, and policy enforcement for compliance requirements
  • Integration into SIEM / log management, incident response tools, and container runtimes (Docker, CRI-O, containerd) 


  • Preserves uptime and reduces impact by stopping unwanted activity or detecting it quickly
  • Monitors erosion of isolation boundaries between containers
  • Reduces effort required during incident response, speeding up recovery time 
  • Upholds system resilience by enforcing immutability and ephemerality
  • Exposes information about container operations that is useful to both security and operations teams
  • Creates a feedback loop to continually improve infrastructure architecture and design
  • Creates an audit log of file-related activity to meet compliance requirements


  • Kernel module-based agents can create reliability and stability risks in production
  • Cloud-based analysis can create network bottlenecks and increase instance costs
  • Lack of coverage for serverless instances with host-based approaches
  • Blocklist or allowlist enforcement can create performance issues when improperly defined
  • Machine learning-based tools may generate large volumes of alerts and false positives without upfront tuning
  • Network-centric tools may interfere with orchestration-based security enforcement or service meshes

Representative Vendors:

Note: not all companies listed offer all primary features listed.

  • Startups: Capsule8, CMD, Lacework, StackRox, Sysdig, ThreatStack, Uptycs
  • Larger Security Vendors: AlertLogic, TrendMicro, VMware (Carbon Black)
  • Ops / Platforms: Amazon GuardDuty, Azure Security Center
  • OSS: Falco, osquery

Parting Thoughts

I’d love to see people referring to “container development security,” “container deployment security,” and “container operation security,” but that’s a lot of syllables — probably “build,” “ship,” and “run” will ultimately reign supreme. Nevertheless, hopefully this post helps delineate the different areas of container security and why they’re each important. As a community — not just security, but also embracing our friends in DevOps — we must work to secure the containers lest we become contained by our own failures.

Yet, it’s important to remember that we don’t live in a digital shipyard filled to the brim with containers; other types of workloads are still predominant in most enterprises. Container security is generally only one component of enterprise infrastructure protection (what of your VMs, VPCs, VPSs, V*…), which is something to consider when evaluating container security tools.

If you’d like to learn more about how Capsule8 protects your containerized systems from runtime threats to security and performance, check out our Solution Brief: How Capsule8 Protects Containerized Environments.

EDR for Linux: Detection and Response in Linux Environments

Posted by

The 3 pillars every solution needs to protect critical Linux production environments

Despite the steady ascent of Linux to the top of the production stack, security has often been an afterthought. That’s right—the OS that runs 54% of public cloud applications and 68% of servers has been getting short shrift when it comes to security. 

There are options out there, but they’re mainly traditional endpoint detection and response (EDR) and endpoint protection platform (EPP) systems. On paper, the notion of deploying traditional EDR and EPP tools to production infrastructure sounds appealing. After all, the companies that market these tools—including next-gen products—herald them as detecting and responding to attacks in real-time. But, what they don’t share is that the requirements for protecting production environments are vastly different than those of securing end-user devices. 

And, of course, they were originally engineered for Windows desktops.

Linux is all about performance and security tools like those used for legacy Windows EDR usually don’t care about performance. But in a production environment that requires near 100% uptime under the stress of production loads, those old-style approaches just don’t work. 

So, what’s the right solution? What should you focus on to evaluate your options?

To start, it makes sense to think about the different security considerations resident in protecting VMs, containers, and bare-metal servers compared to end-user endpoint protection. In short, companies must be able to detect and respond to unwanted activity, including developers debugging in production, cryptominers, or attacks leveraging zero day vulnerabilities, within production environments. Traditional EDR and EPP systems can’t deliver as promised across production environments and, when deployed, seriously impede system performance. So, as companies move forward with more advanced cybersecurity strategies, taking a requirements-first approach will help ensure you make the right decisions and put the right protections in place. Regardless of whether a production environment leans toward on-premises or cloud-based systems, or relies on a mix of both, there are a few pillars every business must consider:

Linux support: 

Because Linux is the technology of choice for production infrastructure, endpoint protection must be built specifically with Linux in mind—from what kernel-level data is most important to collect to how to architect a solution for minimal performance disruption. A resource limiter that enforces hard limits (such as no more than 5%) to systems on CPU, disk and memory, with an intelligent load-shedding strategy, is important. Whether sitting in a traditional data center or the cloud, Linux support should be a defining consideration for cybersecurity tools. With scant Linux support, traditional EDR and EPP tools fail to deliver on this basic requirement.

Architectural scalability: 

Production infrastructures are complex, hybrid environments that skew heavily toward Linux-based systems. Threat detection and response is different in this environment and traditional EDR or EPP solutions, with their centralized analysis approach, may spike network traffic dramatically when deployed. If existing tools can’t scale to production levels without putting stability at risk, they won’t meet the cybersecurity needs of the organization.

Cloud-native expertise: 

Securing Linux production systems must include protecting all components within them, not just offering high-level detection that doesn’t consider system context. In particular, companies need container-aware detection not only to catch unwanted activity, but to prevent excess false positives from firing due to the differences in how container hosts and bare metal servers operate.Traditional EDR solutions tend to port their Windows-based detections to Linux with insufficient modification, or rely on noisy machine learning-based detection that is easily confused by legitimate activity (like a configuration management tool running a file). 

Of course after you take a look at these three pillars, you’re still faced with evaluating the tools themselves. If it’s helpful, we recently created a Quick Read, “EDR for Linux Production Systems” to help you evaluate Linux host security tools that includes: 

  • What you need to protect critical Linux production environments
  • The drawbacks of existing Linux security tools, including lack of detection for Linux environments and containers, inability to scale with the cloud, lack of attack context, and lack of resources
  • How to evaluate Linux security tools using these categories: broad support, Linux support, scale, functionality, and response
  • Why Capsule8 Protect, with production visibility, cloud-native detection, efficient response, DevOps-friendly performance, is a better way to secure Linux infrastructure