LLMs != Security Products

Cybersecurity stocks took a dive after Anthropic released a blog post titled “Making frontier cybersecurity capabilities available to defenders" What stood out was not the post itself, but the market reaction. Companies tied to endpoint protection, cloud security, and other traditional cybersecurity products were affected, even though the post had little direct relevance to those companies.

That reaction highlights a disconnect between the perceived capabilities of “AI” and its actual impact on cybersecurity products, a disconnect that likely extends beyond the market. To make sense of that gap, it helps to start with what is actually meant by ‘AI’ in this context. Usage of the term AI (short for Artificial Intelligence) has increased sharply since the release of ChatGPT in November of 2022. In practice, much of what is labeled “AI” today is better described as large language models (LLMs). For readers unfamiliar with LLMs, a common definition is:

“A large language model (LLM) is a type of artificial intelligence that can understand and create human language. These models learn by studying huge amounts of text from books, websites, and other sources.”

What makes LLMs fascinating and applicable to our modern life is how they solved (on a surface level) a field of AI called Natural Language Processing (NLP). For readers not familiar with NLP, autocomplete, email spam filters and auto-correct are all examples of NLP. Here is a definition of NLP.  

“A field in Artificial Intelligence, and also related to linguistics, focused on enabling computers to understand and generate human language.”

Long-time readers of this blog may recall that I previously used a sub-field of NLP, Natural Language Generation (NLG) to automatically create descriptions of disassembled functions via API calls. On their own, LLMs require text for both training and inference. They are not autonomous systems;  without prompts, they do not function. This distinction is important when discussing AI and cybersecurity, because evaluating or classifying security events requires context that does not exist as text as input to a prompt. That context has to be generated by additional software.

Generating the context requires an understanding and access to the complete lifecycle of the security event that is being used for the context. Walking through this lifecycle matters because it highlights how much logic exists before an event ever becomes text.

A classic example of a security event is a process initiating an outbound network connection directly to an IP address. How that event is handled varies widely depending on the type of security product and where it operates in the OSI model. For this example, assume the product operates at Layer 7, the application layer. The event pipeline in this case includes several distinct steps. A kernel-mode driver or user-mode component monitors process creation and relevant networking APIs. The destination IP address is evaluated to ensure it is not local, then serialized into text and logged. That log data is subsequently forwarded to a file-based or cloud-based centralized logging system. Even this simplified path omits important actions such as blocking the connection or terminating the process. Writing code is not the same as building a security product, and LLMs do not possess the authority or signal access required to determine whether an IP address is benign or malicious. An LLM can describe an alert very well; it cannot, on its own, determine whether that alert represents malicious behavior without pre-existing detection logic, telemetry, or intelligence-derived indicators of compromise.

In practice, an agent is an LLM placed inside a loop, where it can inspect the current state of a system, run tools or commands, observe the results, and decide what to do next until it reaches some stopping point. Without the output of those tools and commands, the LLM provides no value; it has nothing to reason over. The surrounding software is what produces the text that gives the model context.

As of this publication date, LLMs are not going to replace cybersecurity products. These systems are large, long-lived codebases, and their value is not defined by code generation alone. What matters is the telemetry collected and the logic built on top of that telemetry to determine whether the text describing an event represents something benign or something malicious. Large language models can help explain security events, but they don’t replace the systems that detect them, and confusing the two is how markets end up reacting to the wrong things.

No comments:

Post a Comment