August 7, 2024

How we Combine AI, ML, LLMs & Human Expertise to Prioritize CNAPP Alerts

Michael St.Onge

Principal Security Architect

Cloud-Native Application Protection Platforms (CNAPPs) have become an essential part of the modern cloud security toolkit due to their ability to detect misconfigurations, compliance issues, and threats across cloud environments. However, this comprehensive coverage – spanning code, infrastructure, runtime, and more – results in a high volume of alerts.

Security teams need to decide which incidents to investigate first. Then, a further decision needs to be made when it comes to devoting ever-limited developer resources to remediation. This is the prioritization issue, which has become one of the main focal points for cloud security.

In this article, we’ll explore why traditional alert prioritization methods fall short and how machine learning techniques (with expert oversight) can provide a more effective solution for managing the growing volume of CNAPP alerts.

Why Manual and Rules-Based Prioritization Approaches Don’t Scale

Organizations often start with manual triage or simple rule-based systems to manage CNAPP alerts. However, these methods quickly become ineffective as cloud environments grow.

Too many alerts, not enough hours in a day

The sheer volume of things that could go wrong in a cloud estate is hard to grasp. Hundreds of resources are deployed by different teams, often using a combination of manual and automated methods or managed by 3rd party providers. This complexity can lead to inconsistent security practices and an increased risk of misconfigurations slipping through the cracks.

For a mid-size organization, thousands of alerts per day would not be unusual. These could range from minor configuration issues, such as failing to encrypt publicly available data, to urgent problems that can lead to customer data exposure or exposed API keys. A manual review of each alert is time-consuming and impractical. Even with basic filtering, the number of alerts often exceeds what a team can reasonably handle.

Cloud environments expand faster than security teams

Companies continually add new services, applications, and resources to their cloud infrastructure. This growth multiplies the potential security issues and subsequent alerts. Security teams, often working with limited budgets, struggle to keep pace.

But while cloud spend is seen as an unavoidable cost of modern R&D, organizations are averse to grow security headcount at the best of times. Financial pressures in recent years have led to further cost-cutting, forcing security teams to do more with less. This makes manual review and prioritization of each alert even less feasible.

Complex relationships between assets, vulnerabilities, and impact

Cloud environments involve intricate relationships between various components. A single vulnerability might have different levels of risk depending on the affected asset and its connections to other resources. Rule-based systems struggle to account for these complex relationships. They often lack the flexibility to consider:

The criticality of the affected asset
The potential downstream impact of a vulnerability
The changing context of the cloud environment

Static rules can’t easily adapt to these dynamic factors. As a result, they may miss critical issues or generate false positives, further burdening the security team. For instance, a rule might flag all instances of unencrypted data storage as high priority but miss the nuance that unencrypted data in a tightly controlled internal development environment poses less immediate risk than in a public-facing production system.

Machine Learning: The Key to Intelligent CNAPP Prioritization

Prioritization is essentially a big data problem. As such, it invites solutions based on artificial intelligence (AI) and machine learning (ML), which can simplify the process of sifting through incidents and finding the most important ones to tackle.

The principle: co-pilot, not autopilot

Our customers hire Tamnoon in order to help them manage their CNAPP, which includes streamlining the way they prioritize alerts and remediation. As part of our solution, we’ve introduced several ML-based techniques. However, our general approach is not to replace human expertise, but to enhance it. We believe AI should serve as a co-pilot, making security processes more efficient and allowing human experts to focus their attention where it matters most.

Our AI-driven systems are mostly designed to process vast amounts of data from CNAPP and other sources, using sophisticated algorithms to surface the most critical issues. However, algorithms alone are not enough; some of the most important inputs for prioritization are related to business impact – of the vulnerability and the remediation. Much of this information lives in people’s heads, in emails, or in various domain-specific knowledge bases (e.g., “What will be the impact of this dashboard not refreshing when we block the database it’s reading from?”). Taking these signals into account – something the security industry has historically been bad at – is only possible with the combination of humans and machines.

Additionally, on a practical level, even the most advanced models tend to have certain biases and failure states, which can lead to situations that most organizations would not be willing to accept (e.g., production environments crashing due to an AI model mistake).

Hence, while we do not shy away from using ML models, our service delivery team will always provide feedback, verify recommendations, and make final decisions when necessary. This hybrid approach ensures that we maintain high accuracy while benefiting from the speed and scalability of AI.

How it Works in Practice – a Simplified Reference Architecture

AI and ML are used to analyze, categorize, and prioritize CNAPP alerts, significantly reducing the workload on both security and development teams while improving the accuracy and relevance of prioritized issues. Our approach combines several ML techniques to process security data and provide actionable remediation steps.

AI-powered triage: We employ various AI and ML models to analyze hundreds of thousands of alerts, distilling them down to tens of recommended tasks. This process involves:
- Ranking algorithms to assess alert importance
- Large language models (LLMs) for contextual understanding
- Clustering techniques to group-related alerts
- Classification approaches to categorize issues
Contextual enrichment: Our engines enrich alert data with crucial context, including:
- Asset attributes and criticality
- Historical incident data
- Threat intelligence feeds
- Organization-specific classifications (e.g., “Crown Jewel” assets)
Cluster analysis: After initial prioritization, our AI algorithms group related alerts into clusters. This approach allows security teams to address interconnected issues holistically rather than treating each alert in isolation.
Operational impact analysis: We use AI models to streamline the process of analyzing the potential impact of security issues and their remediation. This provides our service team with a prompt, efficient way to investigate and craft actionable remediation tasks.
Dynamic scoring and ranking: Once data has been enriched, our models assign scores to both individual alerts and clusters of related alerts. This scoring mechanism considers the outputs of the previous models, alert severity and relevance, asset criticality within the organizational context, and potential impact on the broader cloud environment.
Continuous improvement: Our service delivery team provides ongoing feedback to the AI system, helping to refine and improve its recommendations over time. This feedback loop ensures that the prioritization process becomes increasingly accurate and tailored to each organization’s specific needs.
Cross-CNAPP contextual learning: Tamnoon’s AI learns from various CNAPP contexts, enabling it to generate configurations automatically based on text and context similarity. This capability allows for quick integration of new CNAPP coverage and adaptability to changes in the cloud security landscape.

By combining these advanced ML techniques with human expertise, Tamnoon’s approach to alert prioritization addresses the key challenges faced by modern cloud security teams. It allows organizations to effectively manage the high volume of alerts generated in complex, rapidly evolving cloud environments, ensuring that critical issues are addressed promptly while reducing the burden on security personnel.

Advantages of the ML + Human Approach

Scalability: ML models can process enormous volumes of alerts far faster than manual methods, keeping pace with rapidly expanding cloud environments. While this is not a “set it and forget it” solution and human oversight is still required, it is only a small fraction of the headcount growth that would be required to prioritize alerts using other methods.
Adaptability: Our system can continuously learn and improve, and every implementation we deal with allows us to make our models smarter and help us surface more relevant alerts.
Improved accuracy: The combination of ML-driven analysis and human expertise leads to more accurate and relevant prioritization, reducing false positives and negatives.
Efficiency: Automating the initial triage and prioritization process allows security teams to spend more time on high-value tasks like threat hunting and strategic planning. It also ensures that developers are only tasked with remediating truly critical incidents, which streamlines working relationships between security and dev teams.
Business context-aware: Having a regular feedback loop from the people on the ground keeps security on the same side of the business, and ensures that high-stakes decisions are taken with the full relevant business context.

>> Want to see Tamnoon’s machine learning in action? Schedule time with our team today.