A Cloudy Forecast for ICS: Recap of S4x20

Photo credit: @montaelkins – Kelly Shortridge Keynote at S4x20

Last week, I keynoted S4x20, the biggest industrial control systems (ICS) security conference in the world, and was able to catch quite a few talks, too. While it took place in sunny Miami Beach, my highlights from the conference suggest a far cloudier outlook. Specifically, there seems to be a growing rumble about the adoption of cloud-based infrastructure, including the DevOps mindset it entails, and what it means for ICS security.

Three talks I saw at S4x20 really stuck out to me, covering critical infrastructure as code, applying chaos engineering to field-based critical systems, and rethinking how we measure security. In this post, I’ll highlight what I found most interesting from each of them to whet your appetite for when the videos of the talks are posted.

Critical Infrastructure as Code

Configuration as code is blooming with popularity in the DevOps world, but is less popular in the security world — let alone the ICS security world, which tends to be even more conservative given the importance of critical infrastructure. However, Matthew Backes from Lincoln Labs outlined how to bring the decidedly modern practice of config as code to ICS, in what he calls “Infrastructure as Code” (infrastructure in the ICS sense), to help reassert control over ICS systems.

One thing I learned from his talk is that configuration of ICS devices is a hot mess. It tends to be highly manual, ad-hoc, and vendor-specific, with a high probability of disrupting systems or bricking devices when it goes wrong. Even pulling the current configuration of devices is a nasty affair — it usually resides in the sysadmin’s head or in scattered, undocumented locations. Otherwise, the device must be directly accessed (or “touched” as is apparently common ICS parlance) to gain the ground truth of the config. Backups are a fantasy. As you can tell, the pain is real.

As Matthew described in his talk, the goal of infrastructure as code for ICS is to cleanly define configurations and push them down to the device — no touching or mind reading required. There must also be version control (something like git) and accessibility (something like an API). Lincoln Labs’ approach was to use Ansible with a JSON config file and then a standardized API written in Python. They experimented this on protection relays, remote terminal units, genset controllers, and serial-to-ethernet converters (of which I knew nothing about and recommend you look out for the recording of the talk to learn more).

The workflow will look familiar to those of you who know how config as code works — ICS is now getting in on the YAML party. There are standardized templates to define device / service functionality, a host inventory (list of devices in the environment), and variables at the global, vendor, and device level (like IP addresses, protection settings, or specific files). These components generate a config file, which can be used to push settings across devices using a tool like Ansible, or even as a compliance report for auditors.

The lessons learned were probably the most interesting to me, given they’re based on real experiments Matthew’s team conducted among operators in field environments. First, there’s a tradeoff between manageability and security, depending on the device type. For instance, protection relays expose their config files through telnet FTP, not SSH, and exposed management interfaces often lack password protection. Second, there isn’t a lot of interoperability, which can lead to bricked devices or other issues when attempting to parse config files. Thus, quite a bit of modification is required to get config as code to work for your ICS.

It’s clear that simplicity and manageability are sorely lacking in ICS, making it difficult to control those systems. There’s appetite for central config storage and standardized workflows, but the tooling hasn’t caught up. While Matthew made a call to ICS vendors to improve their config functionality for the benefit of the community (which is absolutely needed!), it struck me that the Ops community could provide a valuable helping hand here, too. 

The ICS security and DevOps communities are far from close, but my hypothesis is that Ops engineers from Silly Valley and elsewhere can pattern match their way through some of the roadblocks ICS defenders, being new to the concept, will face in attempting to implement config as code. It’s a way to harness your hours of using Terraform, Puppet, Ansible, Chef, Cloud Formation, and more for the benefit of us all, as even helping set up the basics will help ICS security seriously level up their abilities!

Chaos Engineering to Avoid National Chaos

Virginia Wright, a Program Manager in Idaho National Laboratory’s Cybercore division, discussed how to apply DARPA’s RADICS program, which stands for “Rapid Attack Detection, Isolation, and Characterization Systems”, to chaos security engineering. She acknowledged it involves a mindset shift from contemplating how to keep attackers out of our systems towards figuring out how to cope once they’re inside. This closely aligns with my own advice of moving away from the ideal of perfect prevention and instead striving towards resilience — ensuring you can recover gracefully in the face of inevitable failure.

Applying chaos engineering to critical systems in the field is a daunting task. But operators need to feel confident in their ability to recover from attacks, and cannot do so without practicing within real, or at least realistic, environments. Virginia walked through an exercise her team conducted consisting of a “black start recovery of a crank path amidst a cyber attack on the power infrastructure to enable grid restart operations.” In non-ICS speak, I believe it means “restoring operations of an isolated part of the energy grid to seed power back into the overall grid.”

Importantly, this exercise ran on real equipment, not simulated systems, and played with failure modes, like an extended regional power outage. Thus, while the scenarios were a bit scripted to create some structure to the exercise, it still felt pretty random to the operators themselves. 

The campaigns themselves use “consequence-based engineering,” beginning with defining the highest-consequence outcomes and figuring out how the attacker would achieve that outcome. This is essentially the same as the advice in my keynote to start with the most important assets to your organization (like customer data in an S3 bucket, uptime of a particular service, etc.) and work back to how the attacker would most easily compromise that asset. Either way, repeatedly running campaigns with these scenarios helps you to determine the potential blast radius of a real incident and to continuously refine your processes and tools.

I was especially delighted that Virginia espoused the philosophy of confronting your worst fears, bringing to mind one of my favorite lines by Chuck Palahniuk, from his novel Invisible Monsters: “Find out what you’re afraid of and go live there.” We need to develop the “muscle tone,” as Virginia put it, to deal with that fear and move forward from it — otherwise we will feel lost, uncertain, and afraid when our fears (like a real attack on a power plant) materialize. This confrontation via experimentation can (and should!) start small, letting us build confidence and iterate over time.

My own talk recommended starting with less critical systems that are still a part of ICS, like predictive maintenance systems, Office 365, cloud storage, or grid edge systems. Relatedly, I recommend Bryan Owen’s insightful talk on the ICS shift to cloud from last year’s S4 to learn more about the current transition underway. I still think that beginning with those less critical systems is a fantastic way to become more comfortable with the practice of chaos security engineering, but Virginia’s advice can help you safely navigate practicing it on in-the-field critical systems.

Rethinking Security Measurement 

It turns out that figuring out how to rank or score the security of companies is hard. Derek Vadala, Global Head of Cyber Risk at Moody’s, discussed their approach to rating the security risk of companies. While this arena is incredibly complex and full of slippery variables, I liked how Derek offered a straightforward division of companies into two buckets: those who can defend against commodity attacks, and those who can additionally defend against “sophisticated” attacks. Digestible mental models are underrated in security, in my opinion.

I found his recognition that technology shifts — like the shift to cloud-based systems — impact measurements to be particularly insightful. One example that Derek highlighted is that uptime is no longer a desirable metric. The longer systems are online, the higher the chance attackers will compromise and persist in them. This notion neatly fits within my own talk, in which I advised a goal of ephemerality and using a metric of reverse uptime to keep it in check (minimizing the amount of time a host is online to reduce persistence opportunity).

The type and quality of data provided for security ratings is also highly influential. As Derek noted, there’s a tradeoff between data fidelity (how valuable it is as a signal) and effort involved to retrieve it. Validated data, such as that collected from a live exercise, is invaluable in cultivating confidence in the security of deployed systems. This is another overlapping point on the benefits of chaos security engineering from my keynote. Repeated experimentation via the injection of security failure generates feedback about the resilience of your systems, giving you greater fidelity with minimal ongoing effort (albeit with non-trivial upfront effort).

The general inaccessibility of valuable signals will assuredly present a stumbling block for cyber insurance companies as well, who want the highest quality signal possible for the potential risk a company poses, but are likely to receive self-provided data at best and external data only at worst. Nevertheless, these sorts of security ratings pass the muster of a “good enough” health check to give insurers a high-level sense of a company’s security health. Given quote speed is a key driver of insurance purchases, fast but “good enough” will win over slow but “highly accurate” within the industry.

Final Thoughts

In the quintessential hallway track, much of my personal discussions involved the stickiness of the existing ICS security mindset. Specifically, one of the barriers in ICS to adopting modern infrastructure and security practices isn’t technical in nature at all — it’s the emotional cocktail of skepticism and inertia. Part of the goal of my keynote was to demonstrate benefits to security of all this cloudiness and containerity, but it will take many members of the community continuing the drumbeat before the collective perspective shifts.

Luckily, S4x20 had a palpably strong community vibe, which is essential for changing hearts and minds, not just technical approaches (and free lunch from a taco truck doesn’t hurt in fostering a positive spirit). The range of backgrounds, from practitioners in ICS, national laboratories, risk quantifiers, to vendor-resident thinkers like me, helped provide a healthy variety of topics and perspectives. 

As always in infosec, the conference was predominantly white and male. I noticed that the sponsor stage was especially homogenous, which presents an opportunity for S4’s organizers to offer incentives to vendors who put forward speakers from underrepresented backgrounds (such as a small discount on the speaking slot price). S4 is far from alone in this (looking at you, RSAC), but I was impressed by how professional and welcoming it felt, giving me greater confidence in its ability to lead the charge on this front.

Overall, I can attest the conference was impeccably organized, so would highly recommend speaking there if you have the opportunity. I learned a lot about the ICS niche and can empathize better with the challenges ICS security practitioners face, and hopefully those who attended my talk feel more comfortable with DevOps, modern infrastructure, and chaos security engineering. We all know ICS security is vital for our economy and society (even if just to avoid “cyberpocalypse” headlines in the event of a major compromise of critical infrastructure), so it’s heartening to see an openness to fresh ideas and an eagerness to work together to level up the community.