Posts Tagged ‘alert fatigue’

CISOs: Understanding Three Consequences of Security Alert Fatigue

Posted by

One of the biggest challenges faced by any SOC or security organization today is alert fatigue. There are only so many people on your team who can respond to and investigate so many alerts before they miss true attacks. It’s like the boy who cried wolf, only you have to imagine him yelling down to those villagers 10,000 times per day instead of once or twice. That’s a feasible scenario according to a survey of IT security professionals, which showed that 37 percent of respondents faced more than 10,000 alerts on a daily basis. With more than half of those alerts resulting in false positives, it won’t take long for the villagers, and your team, to face some serious alert burnout that affects your organization in more ways than one.

Ignoring Alerts Doesn’t Make Them Go Away

With so many alerts flooding in on a daily basis, it’s no surprise that some, or most, get ignored. A recent study from ESG Research revealed that 54 percent of organizations admit to ignoring security alerts that may have warranted investigation because they don’t have the time or resources to tackle them all. Ignoring alerts doesn’t make them go away, it passes the issue down to the next person to deal with, until the buck is passed so many times it could be months (like Equifax) or years (like Yahoo), before bringing an exposed breach to light. No responsible security professional wants to operate this way, but it becomes ingrained in the culture of your team because there appears to be no alternative.

Low-Quality Alerts Lead to Alert Fatigue

Low-quality alerts aren’t just a cause of alert fatigue, they can also be an ongoing result of it. Without the resources dedicated to continually updating systems to better identify new types of threats, or pull in better quality data, the team is still bogged down with low-quality or low-signal alerts that contribute to the mounting pile of potential attacks to sift through. A saying we repeat often here at Capsule8, especially when it comes to data quality and meaningful telemetry, is “The answer to more efficiently finding the needle in the haystack isn’t collecting more hay.” It becomes a cycle whereby you’re fighting today’s attacks with yesterday’s information because there is no one to help bring in this new technology because they are too busy trying to find today’s attacks with yesterday’s information, and so on.

Moving toward an SOC-less Enterprise

Even though most alarms are false positives, there are real attacks you need to worry about, and eventually, a real wolf is going to eat the sheep (or the boy, depending on which version of Aesop’s fable you read). Missing these real attacks are why a new breach hits the headlines almost every day, and why there is such a constant need to help solve the alert fatigue problem.

There are, of course, plenty of technologies that can help with this issue as well, but it can’t be a matter of throwing money or resources at the problem, especially with such a limited quantity of both already. One movement that is starting to take hold is the drive toward a SOC-less enterprise. Already adopted by huge companies such as Netflix, it’s a strategy that requires a fundamental shift in how an organization tackles security.

While full conversion to this approach will take a long time, there are a few changes we’ve discussed before that can help combat alert fatigue, such as focusing on telemetry data that can provide meaningful signals and not just noise. This can provide better quality alerts for your team to prioritize, instead of sifting through piles of low-quality, low-confidence alerts. An automated response will also help reduce alert fatigue, triggering alerts that need manual investigation or intervention only when necessary, so your team can focus on what could be most impactful.

Alert fatigue has a long-lasting impact on your team and your organization. The SOC model is broken, and without a significant shift in how organizations think about and approach security, it will always be a losing battle.

For more information on the current state of the SOC and how to fix it, read our article about moving toward an SOC-less enterprise.

Read More About SOC-less Enterprise

Time to Blow Up the SOC?

Posted by

Your Security Operations Center (SOC) is barraged with so many alerts that your team may be shell shocked into believing that they are under a constant and unmanageable assault. Indeed, they are under siege – from a constant barrage of data. Alert fatigue is not just some industry buzz phrase – it’s a very real phenomenon that even the most well-resourced SOC teams find themselves facing.

A recent report found that 10 percent of SOC teams are inundated with more than 15,000 security alerts each and every day. And according to a Ponemon survey of IT security professionals, 37 percent of respondents faced more than 10,000 alerts per day and more than half of those were false positives, which can easily cost organizations thousands of wasted hour and millions of wasted dollars every year. Realistically, many “true positives” are for security events with incredibly low value, such as reconnaissance scans. Most scans don’t turn into an issue, and the ones that do often don’t correlate with any information that can be used to defend against the attack.

And, it’s pretty common to miss the signal through the noise—spending too much time on the low value stuff, and missing the actual attacks.

The model of gathering as many logs as possible and sending them off to be centrally analyzed is like trying to find needles in haystacks by gathering all the hay you can find in a 10 mile radius.  You could make a case for it around completeness and the ability to apply analytics, but in reality it turns out to be a horrible approach.   And when the approach does identify an attack, it tends to be hours or even days after the attack has taken hold.

The problem of false positives is much bigger than wasted resources. Anyone who remembers the tale of The Boy Who Cried Wolf can tell you that being desensitized to alarm bells can have devastating consequences. Consider that nearly one third of IT professionals admit to ignoring security alerts altogether because they are so inundated. In a nutshell, the landscape is changing at light speed, the current model is largely broken and a drastically new approach is in order.

So, if your SOC team is spending their day sorting through thousands of false and near-valueless alerts, while missing real attacks, it’s critical to ask a very basic-but-important question: do you really need a security operations center, or are you just wasting time and money? There are certainly practical arguments to be made in favor of the SOC and many organizations require one – or at the very least might consider their MSP options – given the current landscape.

Conventional wisdom is that if the threat of a breach is keeping your c-suite awake at night, a security operations center is probably a good idea. The intentions of those who have relied on the SOC model are certainly on target, as are the goals of their teams themselves. However, the fears keeping your c-suite awake at night are as likely caused by false alarms as they are by the media’s fever-pitched coverage of overhyped threats and breaches.

It’s certainly unrealistic to forego security operations, but it’s important to look at the root causes for failure, and ask if we can transform the model.

The primary problem is the quality of data. Today, even after all the raw data coming from around the network goes through a best-of-breed correlation and analysis engine, SOC teams will still find themselves drowning in a sea of alerts. The signal-to-noise ratio in security appliances is the main culprit and as the number of alerts increase, the problem is only going to get worse. Companies will need to find a way to move toward detection approaches with much lower noise levels – meaning they can no longer afford to rely on a shift away from appliances.

A secondary problem is improper staffing.  Every SOC in the world is chronically understaffed, and would be even if the alert volume were halved.  When the industry talks about there being over a million unfilled cybersecurity jobs, with that number burgeoning to 6 million by the end of the decade, most experts except the bulk of those jobs to be in a SOC.   It should be clear that, with our current approach to SOC operations, there will never be enough people for the job.

This is why much of your evaluation process should be automated. Burden your technology with the task of vetting alarm bells, so that your most seasoned analysts can spend their valuable time evaluating the most likely and interesting threats, while monitoring the truly critical events in real time.  Investing heavily in automation will ultimately allow SOCs to run with far fewer, much more highly skilled resources.

If you were to manage your detection at the machine-level, the problem of data overload and false alerts would largely disappear. On a machine, you have visibility into what’s happening on the file system, what’s happening in memory, what’s happening in the OS, and even what’s happening in the application (for common applications). That’s far more telemetry data to pick out signals and ignore the noise – as long as the data is used wisely.

While large enterprises may not be ready to shutter the windows of their SOCs quite yet, it’s important to take the most proactive approach to security alerts possible to maximize whatever resources are available to those teams and your organization. That means that neither your IT team nor your SOC can afford to waste time and effort pouring over alerts to determine which are real and which are not. In short, if something triggers an alarm, shoot first and ask questions later. This approach will help to eliminate the waste of countless hours investigating false alarms.

Threat Protection Appliances Are as Valuable to Security as Your Toaster

Posted by

Nothing in the IT security community is as widely deployed and universally reviled as Anti-Virus. But, threat detection appliances, including intrusion prevention appliances, application firewalls and advanced threat protection appliances should be almost as reviled. These appliances are nearly as useless as they are toxic. They do a horrible job finding problems and ultimately create more costs for security organizations than they save by crippling organizations with false alerts.

Protection is Impractical

In terms of security, detection appliances have never been effective, since the network makes a lousy location for any kind of security hub from the outset. Few network protection devices can even prevent attacks, much less respond to them – and most appliances are simply used for detection – even those with rudimentary prevention capabilities. In theory, an Intrusion Prevention device can drop a connection before letting the bad stuff through, but, in practice, that seldom happens.

In most enterprises, security detection appliances are sitting off to the side, only capable of looking at a copy of network traffic as opposed to the actual network traffic itself. Organizations choose this less-than-ideal setup to avoid the inevitable network latency issues, which otherwise occur when a single point of failure such as an appliance gets flooded or has a bug, for example.

That’s not to say that a network appliance can’t respond to an attack in progress in any way. If an appliance determines that a particular network connection is malicious, and there is complete confidence in the appliance’s judgement, then an organization can have the device signal something that is in-line, such as a router or firewall. That device can then take action by killing the connection or blacklisting the offending IP address.

The problem with that scenario is that there often isn’t enough context at the network layer. If a network appliance tries to make fast decisions, it’s going to produce an extraordinarily high number of false positives, making its judgement suspect.  Whenever there’s any doubt about an appliance’s judgement, it shouldn’t be trusted to make decisions whether to take action. Action in such circumstances could affect legitimate connections, which could have catastrophic consequences for the entire business.

Appliances: What’s Really Happening

While old-school devices (e.g., traditional Intrusion Prevention Devices) sacrificed accuracy for speed, today’s more sophisticated appliances do a better job of processing available data, so which gives vastly superior results with far fewer false positives.

The typical “next gen” security detection appliance (such as an “advanced threat detection” appliance) tries to give as good an answer as possible about whether the connection represents an attack by simulating the end destination of the traffic to predict whether anything malicious will happen. This is done by running a copy of the traffic into a specialized virtual machine, and then monitoring the simulated outcome. If something malicious is found at the destination in the virtual machine, then there’s reasonably high confidence that the connection represents an attack. The goal of this approach is to limit false positives to the point where the organization can respond to every event. Some organization may even be willing to automate that response, by killing the connection if a suspected attack is detected in time, for example.

While this class of a device tends to reduce noise, it doesn’t eliminate “false negatives,” which allow attacks to sail through without being noticed. In fact, some believe they make the false negatives problem of appliances even worse.

If an attack works against an actual end user, yet not in the virtual environment, it could easily be overlooked by an appliance. Imagine a company buys an appliance and configures it to test network traffic by sending it to a virtual machine running Windows 10, but the traffic being tested only runs its proper course if running on Windows 9. In that case, if the network traffic’s destination is a user’s desktop that is running Windows 9, the attack can easily be successful, since the appliance won’t find the problem (because it’s running Windows 10).

This class of problem was first discovered for Network Intrusion Prevention by Ptacek and Newsham in 1998. While “next gen” technologies are capable of more accurately modeling the systems they’re trying to protect, this problem will never go away when protecting from the network. Even when companies attempt to force all systems onto a single “gold” image, exceptions tend to quickly come out of the woodwork. For example, users within development and technical organizations may often find themselves in need of software that isn’t on the gold image, or they may need to run older software for testing purposes. Also, the gold image approach tends to only work for desktops.

Making things worse, the “black hat” community can get their hands on the same appliances, and reverse engineer the virtualization technology and detection techniques it uses. Then, they figure out how to detect whether they’re running on such a device. Once they’ve done that, their real-life attacks will go unnoticed on the appliance, while still reaching the intended target.

In the “advanced threat detection” appliance space, this problem is prevalent. A dirty secret in the appliance sector is that the majority of detections from these devices are discovered on machines that have already been infected, which are then beaconing out to external command-and-control infrastructure. If instead, these appliances had detected the actual inbound threat, the attack might have been prevented.

Alert Fatigue: Poor Detection Quality

Drowning in alerts is perhaps the single biggest problem IT Security organizations face.  Even after all the raw data coming from around the network goes through a best-of-breed correlation and analysis engine, there is still an overflowing sea of information. These besieged engines generate so many alerts that organizations can’t possibly and cost effectively provide proper review for every priority alert they would like to investigate.  As it turns out, the vast majority of alerts are likely to be false positives anyway. Even the best talent can often and easily miss real incidents when pouring through so much superfluous data.

Imagine the proverbial needle in the haystack, but multiply the number of haystacks so that the entire state of Kansas is covered in hay. Now you can begin imagine the scope of the problem. Sure, you might find an occasional needle or two – especially with some ingenuity. But, there’s virtually no chance that you could tractably look for them all even if you wanted to. That aside, your vigilance is likely to wane for at least a haystack or two, which is all it takes for one devastating needle to puncture your organization.

Security appliances are – particularly more traditional devices – represent the single biggest contributor to alert fatigue by far.  When you’re limiting your investigation to network traffic and trying to get work done as quickly as possible, you don’t have enough context to make good decisions. So, if you want to err on the side of caution, keep in mind that appliances will give you plenty of reason for caution, while uncovering few actual problems.

If we are serious about eliminating the problem over alert-fatigue, we must move beyond appliances and begin managing all of our detection at the machine-level. On a machine we can see what’s happening on the file system, what’s happening in memory, what’s happening in the OS, and even what’s happening in the application (for common applications). That’s far more telemetric data to isolate signals and ignore noise (assuming we use all that data wisely).

Many of the “next-gen” approaches live in a poor middle ground; they try to simulate the machine, but generally can’t afford to create an accurate or effective simulation. If an organization is trying to protect a thousand servers, how many appliances would be needed to thoroughly simulate every inch of network and each bit of traffic that crosses? The answer is about a thousand appliances, which is far too cost prohibitive. Most organizations prefer one appliance to protect hundreds or even thousands of machines, which is clearly insufficient.

The Future: Will Be Worse

The standards community is finally embracing the less is more philosophy as it prepares to launch TLS 1.3. This new protocol will ensure there aren’t straightforward back doors to subvert end-to-end encryption. Right now, most security appliances rely on those back doors as a means to provide detection (they need to be able to see the decrypted traffic). The new standard is going to require IT organizations have man-in-the-middle for every hop if they want their security appliances to work. That will add massive latency and expense, so most people won’t do it, and appliances will become even less reliable as a result.

Additionally, the rise of containerization and micro-services will exacerbate the situation further, since putting container-to-container traffic on the same machine is well beyond the reach of any appliance.

Security appliances may seem better than anti-virus, because they occasionally provide some value and don’t add as much attack surface. On the other hand, Anti-Virus software doesn’t drown organizations in an ocean of spurious reports. Besides, at least AV now has credible alternatives, unlike an appliance. Security appliances are not worth the silicon they’re printed on and should be cursed in the same breath as Anti-Virus.