An Infosec Lens on the 2019 State of DevOps Report: What It Means for Us

Understanding DevOps trends is essential for infosec professionals. Before you angrily close the tab because you are tired of lectures about the need for infosec to work with DevOps, consider whether the idea of a job focused on strategic, innovative work rather than firefighting and gatekeeping is appealing. If so, then these trends matter for you.

In this post, I will highlight the insights from this year’s Accelerate: State of DevOps Report that are especially pertinent to infosec, divided into the following five themes:

  1. Speed & stability (including security) are bffs
  2. Elite DevOps = faster IR
  3. Feedback loops work
  4. Stricter reviews don’t work
  5. Security’s role is shifting, not dissolving

Note: Based on survey results and rigorous statistical methods, the report defines benchmarks for elite, high, medium, and low performers (see page 18 of the report). While elite performers are determined by these benchmarks, there are other characteristics and drivers of elite performance that I will be highlighting in this post, as well.

Speed & Stability are BFFs

In the modern enterprise, information security and stability have become inexorably linked. When we see public instances of instability, we often assume a security incident is its cause. Is that service down because of an attack or due to a developer’s error? Systems sometimes must be taken offline in order to patch vulnerabilities or otherwise remediate security issues. Compromised databases containing customer information need to be investigated internally and by regulatory bodies, impeding performance. 

Despite this, infosec professionals commonly believe that speed and stability are like Paris Hilton and Lindsay Lohan — more frenemies than friends. This year’s Accelerate State of DevOps report, and all the ones before it, demonstrate that speed and stability are more like Leslie Knope and Anne Perkins — working together to enable and elevate each other.

As the report outlines, stability is measured through time to restore service and change failure rate, while speed is measured by deployment frequency and lead time for changes. Time to recover is a fitting metric for security programs, as I will discuss more in the next section. In fact, this metric is codified in article 32 of GDPR, which states that an organization must be able to restore access to personal data in a timely manner in the event of an incident. 

Change failure rate has an additional application to security by measuring the percentage of changes to user-facing resources that result in an insecure service and subsequently require remediation. These metrics improve as our ability to deliver software quickly improves, because the faster we can deliver a software change in normal conditions, the faster we can deliver security-related software changes in emergency conditions. Therefore, your security program must be designed to work with speed, not against it.

Likewise, being secure one day does not mean you stay secure.

Some organizations that were high performers in last year’s report slipped back into being medium performers. Many seemed to struggle with increased complexity, some of which may be a result of complacency — the assumption that their high performance would persist even as their systems burgeoned. Likewise, being secure one day does not mean you stay secure. The world changes, and you are not owed a participation trophy. Your context will change as your technological capabilities advance and frequently become more complex. Security must be measured by progress, not acute health checks. You cannot become complacent, or you will be left behind, relegated to work that feels tedious and pointless.

Elite DevOps = Faster IR

This statistic perhaps says it all: elite DevOps teams recover 2,604x faster from incidents — taking less than one hour to restore services rather than one week to one month like low performers. Security practitioners face enormous pressure to respond quickly to incidents, so responding over two thousand times faster might feel like being blessed with the winged sandals of Hermes.

According to IBM X-Force, the average time to identify a breach is 206 days, and 73 days to remediate a breach. However, companies who detected and contained breaches in fewer than 200 days spent $1.23 million less on cleanup costs. Imagine being able to reduce recovery time to a day rather than 73 — a fantasy made far more feasible through distributed, immutable, and ephemeral infrastructure (aka the D.I.E. model coined by Sounil Yu).

While far from the sole driver of elite software delivery performance, the use of cloud resources does propel better performance. Elite performers are 24x more likely to meet NIST’s five essential cloud characteristics than low performers (and they are savvier at estimating cloud costs). In shadowy corners of infosec meetups, you can still find professionals raging against cloud infrastructure, though when you start asking probing questions, it becomes immediately apparent that their cloud implementation is sorely deficient. Only 29% of respondents using cloud infrastructure agreed that they meet all five characteristics, so statistically speaking, you are probably doing the cloud incorrectly.

Cloud infrastructure facilitates the use of distributed, immutable, and ephemeral infrastructure, all of which can significantly improve response times (as Dr. Forsgren and I outlined in our Black Hat talk). Shuffling IP blocks makes lateral movement trickier for attackers, even if they gained an initial foothold. If a container is compromised, you can kill it and replace it with a fresh image. And resources with lifespans in the seconds to minutes range provide inherent containment, too.

Improved containment is not the only way being an elite DevOps performer engenders better incident response — detection and remediation get a boost as well. One of the essential characteristics for cloud computing is “measured service,” which requires monitoring capabilities. The research shows that monitoring and observability fuels performance, because it helps teams understand their systems and improve them. The same telemetry used for performance monitoring can absolutely be leveraged for security detection. Additional capabilities needed for elite DevOps, such as continuous integration, deployment automation, and even code maintainability benefit remediation efforts by making it much easier to patch systems, too.

Think of the benefits unlocked through DevOps for security as the “Three Rs of Modern Security — Reduce, Reuse, Recycle” (inspired by the three Rs of the environment). Reduce your persistent attack surface through immutable and ephemeral infrastructure. Reuse monitoring telemetry. Recycle scripts to automate and enable remediation. By collaborating with DevOps efforts and embracing the practices that help elevate them to “leetness,” you can elevate the security program as well.

Reducing response time also helps reduce burnout, which, empathy aside, is noted in the report as being vampiric to productivity. Infosec can be an insanely negative sphere in which to work, which is only magnified if you feel that none of your work meaningfully improves your organization’s security long-term — that you are scrambling to rectify ad-hoc events that are purported to be as devastating as the burning of the Library of Alexandria. Burnout does not have to be a fact of life of infosec, and we should embrace a combination of work recovery and strong software delivery processes to reduce burnout. As the research shows, elite performers are half as likely to report feeling burned out. How lovely would that be for infosec?

Feedback Loops Work

Information security is decent at collecting data (though generally with more noise than signal), but poor at using it to continuously enhance their situational awareness, thus leading to a somewhat stagnant strategy. Teams may collect data on the number of vulnerabilities in a particular component at a single point in time, but not consider its evolving relationships with other components and complexity at the system-level. Further, few organizations employ strong red team (offense) programs to create realistic exercises requiring the blue team (defense) to practice their ability to detect and recover from incidents.

The data from this year’s Accelerate State of DevOps report shows this neglect is a mistake. Organizations are 1.4x more likely to be elite if they create and implement action items based on the insights gleaned from disaster recovery exercises. Only 40% of respondents even perform disaster recovery tests at least once annually, but those who do have higher levels of service availability. Preparing for events that could impact your organization’s operations means your organization is less disrupted when they happen. This sounds obvious, but security is much more bark than bite in these practices.

Amassing a heap of processes offers dubious efficacy, so security teams should instead attempt to correct errors sooner. As the report explains, “techniques such as continuous testing, continuous integration, and comprehensive monitoring and observability provide early and automated detection, visibility, and fast feedback.” This is pertinent for infosec. Infosec teams vary in their level of detection and visibility capability, but usually maintain some basic level of capability. However, “fast feedback,” continuous testing, and continuous integration are rarely seen in practice — and are still somewhat obscure concepts for many infosec professionals.

Investment in performance improvements is difficult when you are drowning in a data swamp befouled by false positive alerts, tickets always marked as the highest priority, and last-minute requests for security reviews. However, to reach the goal of continuous and automated security — and thus a less frenetic workload for the security team — automated test suites must be built, just as they are required for continuous integration. From my observations, security teams sometimes implement automated vulnerability scanning, but not much else. 

However, testing goes beyond vulnerability scanning and into testing how systems respond to failure, as Dr. Forsgren and I discussed in our Black Hat talk. Just as disaster recovery exercises test how systems and teams respond to failure, security teams should implement their own failure testing, known as chaos testing, to see how systems and teams respond to security failures. For instance, inject expired API tokens to test your authentication mechanisms, or modify encrypted data to test your detection of unauthorized modification. Such tests do not simply evaluate the health of a system, but evaluate the system’s ability to respond to unwanted events — and whether progress is being made in improving the system’s resilience over time.

Stricter Reviews Don’t Work

Organizations who experience a breach or other scary security incidents are wont to panic and begin implementing strict security processes in the hope of preventing future events. In this regard, the security team ends up operating much like a change advisory board, an external authority whose approval is required, through a set of formal processes, before software is delivered to users. It is not uncommon to see security teams that serve as figurative gatekeepers to releases, performing stringent security reviews before imparting their blessing and allowing the release to proceed.

This year’s report digs into why this strategy is folly, and not just because it erodes performance. Spoiler alert: gatekeeping is negatively correlated with resilience because it corrodes adaptability. The data bears this out: organizations were 2.6x more likely to be low performers if they maintain a formal approval process — an unsurprising statistic. There was also no evidence that formal approval processes were associated with lower change failure rates. 

In fact, gatekeeping contributed to more delays and more instability. It is difficult to promptly mitigate risk upon discovery if each change involves a protracted process. Additionally, waiting for approvals leads to an unwieldy backlog of code, which increases the blast radius of changes once they hit production. This increases the likelihood that these changes will engender failure, and once they do, debugging such a massive change inflicted on your codebase becomes difficult. From a security standpoint, the end result is exposing your organization to higher failure rates and a nice, juicy window of opportunity for attackers.

Instead, the security team must become, collectively, a strong and transparent communicator. If you speak to engineers, they often feel that infosec scries a crystalline orb containing occult shimmers that reveal the divine requirements for determining whether a new feature or service is sufficiently secure. This uncertainty over what steps are precisely required to gain the stamp of approval from infosec and how long the approval process will take deeply shackles performance.

Teams are 1.8x more likely to be elite performers if they possess a clear understanding of approval processes and what steps are required to go from “submitted” to “accepted” each time. These elite teams also express confidence that they can work through the approval process in a timely fashion. Fundamentally, when the organization has confidence in their processes, the organization is effective. Infosec teams, therefore, need to maintain clear, documented, and standardized processes for their security reviews — your obfuscatory vibe isn’t cute. 

Security’s Role is Shifting, not Dissolving

The above trends do not mean security’s role is dissipating before our very eyes. However, infosec’s mandate is decisively shifting. Despite propaganda that DevOps does not care about security, it does, and very much so, but it lacks the understanding of how to efficiently implement security into software delivery processes. The unwillingness to add excessive friction to software delivery is perceived as apathy, despite the fact that software delivery performance is the engine of the organization. If anything, security is too often apathetic towards ongoing business concerns. 

Security must adopt the mantle of infused SMEs, rather than attempting to recreate the wheel in a vacuum. Knowledge must be shared and simplified. As much as infosec takes an all-or-nothing approach to knowledge, there exists a level of “good enough” that will help engineering teams more securely design and architect their products and services. My belief is that security professionals of the future will be embedded on engineering teams, offering their knowledge and guidance throughout the software delivery lifecycle and helping their team level-up their understanding and implementation of security.

For instance, helping engineering teams understand the concept of “attacker math,” the fact that attackers will take the easiest and most inexpensive path towards their goals, can help improve threat modeling during the design phase. They do not need details around the latest side channel zero day, just an abstract understanding of why an attacker will try to phish for admin credentials if they can instead of exploiting a zero day vulnerability. Understanding that the easiest paths must be eliminated first to raise the cost of attack is a simple enough concept, but possesses an outsized impact on how engineers think about improving security.

One of the report’s findings is that training centers are not the most effective way for organizations to improve performance, because they simply introduce more silos and isolated expertise. Infosec is regrettably enthralled by training modules and awareness programs (or use them for checkbox compliance), which backfire by becoming bottlenecks of expertise and fostering exclusivity. Further, infosec experts are removed from the actual work involved in taking these generic lessons and implementing them into real workflows. Far too often, infosec is not learning and growing with the rest of the organization, but pushing its propaganda and hoping for indoctrination to take root.

Instead, “Community Builders” is the chosen strategy for most elite performers, in which teams learn and grow together. Teams evangelize their tooling and methods with other teams to share their knowledge and expertise. Or, a team may figure out how to transform themselves into elite performers and spread this wisdom with other teams. This strategy is resilient to product and organization changes, something essential for teams like infosec that are already spread too thin. For infosec to adopt this strategy, they must embrace the idea of cross-pollination. Infosec must learn about DevOps workflows to bridge security best practice to pragmatic implementation, or else best practice is just a set of untested hypotheses.

Relatedly, this year’s report notes that engineers on the highest performing teams are 1.5x more likely to report having useful, user-friendly tools that support their work. This is a problem for infosec, as even experts find many security tools cumbersome. If security tooling is not intuitive and easy to use, it will not become a core part of the software delivery lifecycle. Less painful tools will be adopted, even if they are considered less rigorous. 

It is natural that a team which selects and requires complicated tools is not one with a promising future. Other teams will figure out how to get their work done without those tools, and likely without that team, too. Worse yet, they can find workarounds that do not even accomplish the task — something every infosec professional has seen one too many times.

As a result, I would argue that experts in security UX are more relevant for the future than the legions of analysts who tune firewalls, SIEMs, and other blinky boxes. We need people who understand how to integrate the right security tools into software delivery workflows without adding friction. As the past two decades have demonstrated, security cannot be solved by throwing more tools and data at security analysts. Lamentably, the skills required to achieve this paradigm of usable security, a blend of threat modelling and UX knowledge, are sorely missing on typical enterprise security teams.

Conclusion

Security scoffed at DevOps and missed its opportunity to integrate security into DevOps practices at the dawn of the movement. Now, infosec vainly attempts to insert itself into the conversation through the term “DevSecOps,” creating its own sphere of siloed influence and practices that ignore the lessons DevOps learned on its path from low performance to elite performance. Automating security scans is not enough without investment into changing our priorities, processes, and culture. Luckily, by being late to the party, we have a guidebook at our fingertips for how to ascend to eliteness.

Everyone in security wants to be leet. The Accelerate State of DevOps report outlines the path to elite performance for us, if only we open ourselves up to the idea that there is much to learn outside of our infosec bubble. If we embrace the characteristics and practices of these elites, we can reduce the amount of painful gatekeeping, firefighting, and burnout we experience. We can support secure software delivery while also preserving its performance, the only path for security teams to follow if they want to stay relevant in the modern organization.