When there is a network or security incident in an organization, every second counts, as any downtime minute can translate into thousands of dollars lost. According to Gartner, the average downtime minute in IT generates losses of $5600, but this can go up to $14,000, depending on the organization’s size (Michael Copeland – The Cost of IT Downtime, Jun 18 2020 – full article at https://www.the20.com/blog/the-cost-of-it-downtime/).
Artificial intelligence (AI) is considered a viable solution to improve threat detection and reduce alert fatigue that would otherwise paralyze NOC and SOC teams. Early adopters of machine-learning algorithms are experiencing a reduction of more than 90% of the number of alerts and a sharp increase in efficiency.
Potential usages of this new technology include EPP, SOAR, and NTA.
Data analysis is not a new entry in cybersecurity. It has already been used as SIEM tools, but these require an entire team to work around the clock to classify alerts, create tickets and manage reports manually. Unfortunately, most organizations can’t afford a unit large enough to analyze all the signals generated by the automatic systems they use.
Monitor and review security operations
When a breach or issue occurs, putting together a team to firefight these issues takes time to get any help from an automated system in the meantime is crucial.
An AI-based threat detection system offers real-time alerts which empower the NOC team. Such a system only escalates high-priority alerts. The role of AI is to make a clear difference between real and consistent threats from false positives. The end goal is to defer to the security specialists those alerts which require their attention and knowledge.
A ticket-based service for alert management can be enhanced with keywords that fire context-based warnings depending on what they identify in tickets. Even more, systems like these create and automatically delegate the tickets to the right specialist, always making sure there is an equitable task distribution between team members.
Security operation center metrics
Without proper metrics and KPIs, an organization can’t evaluate how well they are doing, their success rate at handling issues, etc.
However, there is a real struggle for SOC teams to define the best metrics for each organization since there are no predefined best practices in a one-size-fits-all approach.
Each enterprise has specific needs and goals. They are also limited by the tools they use, the architecture of their IT infrastructure, and the daily processes.
Here are a few examples of possible SOC metrics, but these need customization and context to reflect the actual health of the system:
- – As we have previously stated, the number of tickets, if this number is in the 10K-100K or more per day, means that the system can use some improvements.
- – The severity of the alerts generating these tickets. In an ideal setting, tickets would only be created for high-severity incidents, while simple ones would be automatically solved and classified.
- – Mean time to discovery/containment/resolution/recovery. Since every minute of attack, security breach, or downtime can cost an enterprise thousands of dollars, every second counts. The goal is to have a predictive system or a real-time response. Once the problem is identified, it matters how long it takes to contain it, minimize damage, solve it and restore the system to its original state.
- – The number of threats detected/solved- Evaluates the system’s overall efficiency and could pinpoint ways to improve it.
- – Percentage of false positives- This is one of the main KPIs that need to be reduced and can evaluate the efficiency of an AI-based alert triage system.
These are just a few general examples of metrics for the support teams. An organization will define their own after they map their processes.
The KPIs for monitoring tools should follow the entire process, from firewall alerts to closing the feedback loop after an attack or an attempt.
It makes sense to build metrics around the number of steps needed to fix an incident to see if there is any way to simplify it. Another metric that could be important is cost-related.
Some organizations can choose to have metrics attached to each member of their team to use as a way to evaluate their professional performance, although this subject can be controversial.
Enabling performant alert triage with AI
Enterprise-level security means filtering out risks while maintaining an appropriate level of vigilance. Top organizations face an alert overload and need to reduce noise alerts to avoid system paralysis.
They need exact alerts that are only escalated after being filtered by several automated systems, found threatening, and there is no standard procedure for that case.
AI tools like Arcanna.ai alert triage use big data to replicate the complex triage process performed by security specialists. The machine learning algorithm looks at millions of data points to understand the context of each alert, which is then classified based on both NLP (Natural Language Processing) on the alert text and on the learnings made based on the analyst’s feedback. Such a system should yield under 1% of events that are not automatically categorized since their pattern doesn’t match the pre-existing database. These are the real exciting cases.
One of the primary benefits of using an automated system is replacing reactiveness with a proactive approach. This also means that having an AI-powered alert triage system allows hunting for new threats instead of waiting for them to generate damage before being included on a red list.
Organizations looking to take some of the plates for their NOC and SOC teams should always look for solutions that automate incident response, especially for simple, mundane tasks.
Organizations waste about 25% of their time responding to false positives, according to a study by Ponemon Institute, which also evaluates the losses at about $1.27 million per year, on average.
The solution is to select a tool that makes the distinction between malware and actual attacks. The goals include reducing false positives drastically, adding context to each event, and consolidating different alerts related to the same event.
With the rise of IoT, adopting a solution powered by machine learning will become the only viable solution since the alerts will be characterized more and more by the 3 Vs. of big data: volume, velocity, and variety.