Virtualization Technology News and Information
Article
RSS
Optimizing SOC Team Performance with MTTD and MTTR

Great Britain's cycling team went from one of the worst-performing teams in the world to dominating the Gold Medal count in the Beijing Olympics. Their breakthrough theory? Do all the little things 1% better.

As we look towards making wholesale improvements, maybe the key is to hone in on the seemingly subtle things. It is a well-established fact that when measuring a Security Operation Center's (SOC's) effectiveness, two primary metrics come to mind: Mean Time to Detect (MTTD) and Mean Time to Respond (MTTR).

Here's the question: Can making small improvements to these two foundational metrics really have a significant impact on SOC team performance? If the history of British Cycling is to be believed, then yes.

How to Measure SOC Team Performance

Every practitioner has a general sense of how their SOC is performing, but to look at things analytically, there is a formula that can be employed. To find out "SOC Capacity," one needs to find out "Expected Work" - and then spot the gaps, if any.

As Grant Oviatt, Head of Security Operations at Prophet Security, notes, "SOC Capacity measures how much total available time your team has to disposition security alerts," while expected work is "the total amount of alert management work you expect in a given month." Subtract one from the other, and you see how well your SOC is doing - or isn't.

Calculate SOC Capacity

To figure out how much time your SOC has to attend to business, you can generally assume that 70% of their work time will actually go towards security-natured tasks, the rest being given over to breaks, meetings, and perhaps even distractedness. Now, you multiply those hours (minimized to 70%) by how many analysts you have, and viola! You've got the total number of hours your team can "do work."

Calculate Expected Work

Now, you have to figure out just how much work there is to do. Multiply the Mean Time to Respond (how much time you're spending on alerts) by the total number of alerts (on average), and you have that number, too.

Are you on track? Or at a deficit.

At this point, you're ready to see how your SOC's capacity stacks up against the work it has to do. Subtract Expected Work from SOC Capacity (or just compare the two) and see what you're left with.

In a perfect world, your SOC would have 15% more time than its Expected Workload. This means that even in times of alert spikes, your team will be able to handle them. Unfortunately, many teams will find themselves in a deficit, with Expected Work far exceeding their poor team's capacity to do. For times like these, turn to managed security providers, automation, and AI-driven assistance.

Take a good, hard look at your metrics and see where you can cut back, improve, and tighten.

Analyzing MTTD (Mean Time to Detect)

Mean Time to Detect is "the average time it takes to discover a security threat or incident." You can figure out what yours is by subtracting the "Alerted At" time from the "Activity Started At" time (of the earliest detection of the incident). This should obviously be as close to "instantaneous" as possible, but we all know that's a tough climb. However, a good rule of thumb is to stay within 0 and 4 hours.

While this seems reasonable, when we're dealing with a cyber talent shortage, barrages of new AI-generated malware, swamps of alerts, and complex technology that often ends up as shelfware, these numbers can seem like unreachable unicorns.

Keep reaching. There are ways to bring these numbers down, and the first step is information. Find out what is making your MTTD lag; the answer isn't always "more resources." Sometimes, you can clean up and do more with less.

First, make sure what you're doing is working. Are your detection investments catching a lot of incidents? If so, keep investing there. If not, make some changes. Also, for a more granular look, spend some time investigating a few samples after you successfully chase down detected threats. What was the first indication of that threat? Is there anything you can do to double down on detecting that behavior in the future?  

Enhancing MTTR (Mean Time to Respond)

Mean Time to Respond is the average time it takes to remediate a threat and "get things under control." This takes in:

  • The time it takes to receive security intelligence (telemetry)
  • The time it takes to see and action on the alert
  • The time it takes to triage and investigate
  • The time it takes to contain the threat (at least initial containment)

All these factors (and times) combined is going to make up how speedy your SOC is at responding to threats, on average.Give yourself between 1 and 8 hours to pin these down; less for critical incidents, more (is acceptable) for less-severe ones. However, nothing, not even low-profile incidents, should exceed a full work day. 

Again, the time should be as close to zero as possible, and, again, that can seem like an impossible dream.

But it doesn't have to be. First, look for inefficiencies in the way you're doing things, and you may be surprised. Are certain alert types generating faster responses? Have your SOC lean into those (and find out why). Then, look into the individual steps outlined above and see if any are taking an inordinate amount of time. What on the outside may appear as "one big number" (and one you may not like) it is made up of a lot of smaller metrics that can be identified, isolated, and improved upon.

Conclusion

Mixed with AI-based automation and technology, auditing your Mean Time to Detect and Mean Time to Respond will bring some insight into your security strategy that you likely didn't have before. Even if the eventual plan is to implement these additional technologies (which would be wise, given the threat landscape), knowing your MTTD and MTTR - their weak spots, strengths, and capacities - can help you apply those technologies in the most efficient manner.

As cybercriminals double down on offense, teams can find success by taking a fine-toothed comb to defense. With an eye towards getting just 1% better, they can find hidden opportunities to improve, notice slack that can be pulled in, and uncover ways they can lower MTTD and MTTR times to optimize SOC team performance.

##

ABOUT THE AUTHOR

Katrina Thompson 

An ardent believer in personal data privacy and the technology behind it, Katrina Thompson is a freelance writer leaning into encryption, data privacy legislation, and the intersection of information technology and human rights. She has written for Bora, Venafi, Tripwire, and many other sites.

Published Friday, October 18, 2024 7:29 AM by David Marshall
Filed under: ,
Comments
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
Calendar
<October 2024>
SuMoTuWeThFrSa
293012345
6789101112
13141516171819
20212223242526
272829303112
3456789