No More Tiers: Reimagining the Structure of SecOps

March 11, 2020

Why not both?

I’m not sure who thought that arbitrary hierarchical silos among a team of individual contributors was good for team morale and load-balancing, but here we are.

During a recent guest appearance on the Purple Squad Security podcast, I described my last role working on a security operations team that handled incident response as well as the usual monitoring and detection (and, being a small and early-stage team, a handful of other duties). John, the host, described this multifaceted team structure as the 2-in-1 shampoo and conditioner of secops. (I quipped, of course, that unlike the maligned hair product, I felt that we were effective at both functions!)

I would have actually compared us to the promises typically made by baby shampoo: “No more tiers.” I’m here all week, folks.

Older models of security operations often involve a tiered approach, with specialized individuals performing disparate functions among the chain of escalation. Typically, the initial detection is at the “level 1” tier, with incident response and threat hunting higher up. Underlying this divide is the assumption that the latter is naturally a more senior role than the former. Some orgs also implement this strict role divide as a separation-of-duties security measure (which to me signals outdated notions of trust in both the security and the human sense of the word). 

While massive companies with sizable security teams and vast infrastructure may warrant more specialized roles, the tiered model as the best practice across-the-board never made much sense to me. Particularly on a smaller, burgeoning team, separating the monitoring/detection and IR functions would have contributed to faster burnout, more single points of failure, and less efficient incident handling. 

The reality is that these disparate security operations functions can successfully be performed by the same people, regardless of their level of seniority. I’d like to explore how this cross-functional model can set teams up for success by enabling recovery from the drain of IR, removing breakdowns in consistency and empathy, and position teams for a smoother and more complete detection and response process.

Recovery via redundancy

Many of us who have worked in incident response have written about the burnout and anxiety that can accompany it. Dropping everything to investigate and resolve suspected security harm to our employer can be mentally draining even in the best of circumstances.

My old team spaced this out via a weekly on-call rotation. During business hours, one person would be the primary point of intake of ticket queues, alerting, and potential incidents, with another person as secondary on-call in the event that the primary analyst was handling an incident or otherwise indisposed. (We were not expected to work after hours unless there was an active incident.) When not on call, there was still plenty of work to be done: security education, building out documentation, quarterly projects, and other important-but-not-urgent tasks. More importantly, the time in between on-call weeks served as a recovery period.

The times when I felt the most sustained levels of burnout occurred when my recovery window between incidents was shortened. I had a stretch last summer where I noticed that I had been catching a disproportionate number of IRs during my secondary on-call weeks, as well as a couple outside of business hours, which was rare for our team. It led to good discussions about what needed to be escalated to incident-level priority, and raised the urgency of working on process improvements that would make certain things non-incidents — but not before my stress spiked for lack of getting the brain rest that I badly needed.

Now imagine how much more pronounced that burnout would have been if there was no rotation, or a much shorter one — which would have happened had we been structured in a traditional tiered model. During my time on that team, we usually had three or four security analysts (with a couple short streaks of two). Given the same team size, what if one or two of us were responsible for all of the IR and the others responsible for none, based purely on how long we’d been in the role or how many years of experience we had? That siloing would have left us with prime burnout conditions and single points of failure. With cross-functionality comes breathing room.

The next right thing

Incident response is often treated as a senior-level position. Yet most of us have done it before.

Taking a page from the Frozen II song, IR is largely a matter of, in the face of a daunting and incomplete puzzle, doing “the next right thing”. This is a fairly universal experience, even if not applied to security. At some point or another, we find ourselves in a dire situation where our horizon shrinks to one step. We can’t process the long-term — all that matters is putting one foot in front of the other until we’re out of the woods. 

I’m not saying that fight-or-flight survival mode is the best or most sustainable way to handle IR, but the IR-like process of moving forward in the face of intense uncertainty is one that most of us will face in our lives. It’s not a highly specialized skill set — it’s a defining cornerstone of humanity. 

Learning to handle these conditions gracefully takes time, but the way to gain comfort is through repetition — and, again, space to recover in between. This comfort is facilitated by measures that most IR teams already take: maintaining runbooks to provide at least some consistency in an inherently chaotic process, running tabletops and functional drills to iron out process improvements, and defining roles and responsibilities so that no analyst is ever expected to go it alone. 

Keeping security operations professionals siloed off from IR until they reach some superficial level of seniority where we decide they’re worthy of responding to what’s detected just delays the process of building comfort with uncertainty in a security setting.

All I’m asking is for a little detect

What of the folks handling the monitoring? Where there are tiers, this tends to live at the bottom tier not only of security operations, but of security itself. Many security professionals devalue this work, viewing it as immutably entry-level right down to the pejorative “SOC monkey” label. Indeed, when I was starting out in security operations in 2016, one of the warnings I heard from longtime security professionals was “avoid ending up on a team where you’re just farming tickets all day with no room for growth”.

Look at the on-ramp, though: joining a security operations team comes with the expectation of quickly gaining situational awareness of an entire company. Learning a new environment — from the systems, data flows, and existing detections, to the organizational structures, processes, and escalation paths — is one of the biggest undertakings of any new role. That can certainly be done successfully at entry-level — many of us who’ve worked in secops entered the security field through such roles without prior tech jobs — but it doesn’t need to be limited to that.

Once we’ve wrapped our heads around what we have in our environment, what “normal” looks like, and how we detect deviations from this baseline, shouldn’t our goal be to always consider ways to improve those processes? How can we automate away an alert or its triage? What else are we not logging or detecting that we could be? If we roll out a new piece of the detection pipeline, how do we weigh its visibility benefits against its impact on performance? 

Detection is ripe for ingenuity if those handling that initial detection are given the space to grow and explore, rather than having a significant portion of security operations abstracted away from them. The days of the stereotypical “SOC monkey” staring at dashboards as a consumer but not an innovator are long behind us. The detection side of security operations is only as entry-level as you make it.

Consistency and efficiency

When those who work on the initial triage of tickets and alerts are afforded the ability to also take on more thorough investigation and incident response, we are in a better position to build detection tooling and processes in a way that is most salient for IR. If an alert is always escalated away from its initial point of intake, that person may never know if the alert output is missing key information that the incident responder has to manually dig up every time. Feedback loops and IR retrospectives are helpful but often incomplete and no replacement for firsthand experience.

Separating out the detection and response can lead to not only less effective detection and triage, but also less efficient IR. Whenever ownership is transferred, we risk the possibility of loss of context or critical information — and time spent on the knowledge-transfer increases the time to resolution. In the rare cases where something was truly out of my depth, or at a need-to-know level of sensitivity that only a manager should be involved in, then yes, handoff was an option.

For the majority of incidents, however, wouldn’t the process be a smoother ride if the person detecting something could see the entire incident through to completion? They can hand off the non-urgent work, not the incident that they discovered. Health care professionals have the concept of continuity of care, which recognizes the importance of the historical context and trust relationships that can form when a doctor is able to stay assigned to a patient for the duration of their treatment. They may call upon specialists to handle certain aspects of the care plan, but they remain the ones driving treatment forward. Continuity of incident response promotes a similar consistency and protects against key information being dropped during handoffs.

Remembering why we’re here

In the recent “SOC Puppets” episode of the Detections podcast (discussion of tiered vs tierless models starts around 34:00), the hosts raised the fact that in tiered teams, level 1 analysts are often granted very limited access to audit logs and other tooling needed to do a proper base level of analysis. This lockdown is treated as a security control — rather than monitor and alert on the risky behavior they’re concerned about enabling, or putting other safeguards in place to limit the impact and scope of said risky behavior, they opt to simply own that whole “department of no” stereotype and apply it inward.

The result, naturally, is inefficient and inaccurate triage. Certain alerts may be wrongly closed as false positives because an analyst lacks access to relevant logs that would have given them vital context — they may not even know that such logs exist. Other tickets may be wrongly escalated when they could have been an easy open-and-shut case.

Security exists to enable the business; when a security control inhibits a person’s ability to do their job effectively, it’s time to reevaluate the benefits vs risks that that control poses. The security team’s own ability to work is not exempt.

This brings us back to maintaining morale. Burnout doesn’t only stem from emotionally-taxing incident response work, but from the lack of a sense of purpose. The people in “level 1” may enter the role as the most enthusiastic newcomers who have worked their butts off to land a security job. When we shove them into an isolated corner of the team, treat them as untrustworthy, and cut off access to crucial resources, they start to question why they’re even here.

A stratified, isolated, demoralized team does not need to be the norm. We thrive when we are able to access the resources we need to successfully do our jobs — including access to our colleagues and visibility into the broader impact of our work.

Tier Drop

Removing the unnecessary hierarchy and promoting cross-functionality in security operations sets teams up for success. Both sets of functions can be performed with great impact and influence at any level. By creating more space for recovery and growth via role redundancy, allowing secops teams to experience the entire detection and response lifecycle, and baking consistency into IR, the only tears will be those of an attacker shut down by a robust secops team.