Learn from our AI Security Experts
Discover every model. Secure every workflow. Prevent AI attacks - without slowing innovation.

All Resources
Reflections on RSAC 2026: Moving Beyond Messaging and Sponsored Lists to Measurable AI Security
It was evident at RSAC Conference 2026 that AI security has firmly arrived as a top priority across the cybersecurity industry.
Nearly every vendor now positions themselves as an “AI security” provider. Many announced new capabilities, expanded messaging, or rebranded existing offerings to align with this shift. On the surface, this reflects positive momentum, recognizing that securing AI systems is critical as companies increasingly deploy AI and agents into production. However, a closer look reveals a more nuanced reality.
This rapid expansion has also driven a growing need for structure and shared understanding across the industry. Industry groups and communities have continued to grow, playing an important and necessary role by working to harness community expertise and provide CISOs with clearer frameworks, guidance, and shared understanding in a rapidly evolving space. This kind of industry coordination is critical as organizations seek common standards and practical ways to manage new risk categories. While well-intentioned, the vendor landscapes they publish can add to the confusion when the lists are created from self-assessment forms or sponsorships. This can make it more difficult for security leaders to distinguish between self-assessed capabilities vs. production-ready platforms, ultimately adding to the noise at a time when clarity and validation are most needed.
A Familiar Pattern: Strong Messaging, Limited Maturity
A consistent theme across RSAC was that many vendors are still early in their AI security journey. In many cases, solutions announced over the past year were presented again, often with updated language, broader claims, or expanded positioning. While this is typical of emerging markets, it highlights an important gap between market awareness and operational maturity.
Organizations evaluating AI security solutions should look beyond messaging and focus on things like evidence of real-world deployment, demonstrated effectiveness against adversarial techniques, and integration into production AI workflows. AI security is not a conceptual problem but an operational one.
The Expansion of “AI Security” as a Category
Another clear trend is the rapid expansion of vendors entering the space. Many traditional cybersecurity providers are extending existing capabilities, such as API security, identity, data loss prevention, or monitoring, into AI use cases. This is a natural evolution, and these controls can provide value at certain layers. However, AI systems introduce fundamentally new risk categories that extend beyond traditional security domains.
AI systems introduce a distinct set of challenges, including unpredictable model behavior and non-deterministic outputs, adversarial inputs such as prompt manipulation, risks within the model supply chain, including embedded threats, and the growing complexity of autonomous agent actions and decision-making. Together, these factors create a fundamentally different security landscape; one that cannot be adequately addressed by extending traditional tools, but instead requires specialized, purpose-built approaches designed specifically for how AI systems operate.
The Risk of Over-Simplification
One of the most common narratives at RSAC was that AI security can be addressed through relatively narrow control points, most often through guardrails, filtering, or policy enforcement. These controls are important. These controls are important, they help reduce risk and establish a baseline, but they are not sufficient on their own.
AI systems operate across a complex lifecycle, with risk present from training and data ingestion through model development and the supply chain, into deployment, runtime behavior, and integration with applications and agents. Focusing on just one of these layers can create gaps in coverage, especially as adversarial techniques continue to evolve.
In practice, effective AI security requires depth across multiple domains. This includes understanding how models behave, anticipating and testing against adversarial techniques, detecting and responding to threats in real time, and integrating security into the broader application and infrastructure stack.
As a result, many organizations are finding that AI security cannot simply be absorbed into existing tools or teams. It requires dedicated focus and specialized capability. Industry frameworks increasingly reflect this reality, recognizing that AI risk spans environmental, algorithmic, and output layers, each requiring its own controls and ongoing monitoring.
From Concept to Capability: What to Look For
As the market evolves, organizations should prioritize solutions that demonstrate purpose-built AI security capabilities rather than repurposed controls, along with coverage across the full AI lifecycle. Strong solutions also show continuous validation through red teaming and testing, the ability to detect and respond to adversarial activity in real time, and proven deployment in complex enterprise environments.
This becomes especially important as AI systems are embedded into high-impact workflows where failures can directly affect business outcomes. Protecting these systems requires consistent security across both development pipelines and runtime environments, ensuring coverage at scale as AI adoption grows.
The Path Forward: From Awareness to Execution
The growth of AI security as a category is a positive signal. It reflects both the importance of the challenge and the urgency felt across the industry. At the same time, the market is still early, and messaging often moves faster than real capability.
The next phase will be shaped by a shift toward measurable outcomes, demonstrated resilience against real adversaries, and security that is integrated into how systems operate, not added as an afterthought. RSAC 2026 highlighted both the opportunity and the work ahead. While there is clear alignment that AI systems must be secured, there is still progress to be made in turning that awareness into effective, production-ready solutions.
For organizations, this means evaluating AI security with the same rigor as any other critical domain, grounded in evidence, validated in real environments, and designed for how systems actually function. In practice, confidence comes from what works, not just how it’s described. We welcome and encourage that rigor, as those who spent time with us at RSAC can attest.

Securing AI Agents: The Questions That Actually Matter
At RSA this year, a familiar theme kept surfacing in conversations around AI:
Organizations are moving fast. Faster than their security strategies.
AI agents are no longer experimental. They’re being deployed into real environments, connected to tools, data, and infrastructure, and trusted to take action on behalf of users. And as that autonomy increases, so does the risk.
Because, unlike traditional systems, these agents don’t just follow predefined logic. They interpret, decide, and act. And that means they can be manipulated, misled, or simply make the wrong call.
So the question isn’t whether something will go wrong, but rather if you’ve accounted for it when it does.
Joshua Saxe recently outlined a framework for evaluating security-for-AI vendors, centered around three areas: deterministic controls, probabilistic guardrails, and monitoring and response. It’s a useful way to structure the conversation, but the real value lies in the questions beneath it, questions that get at whether a solution is designed for how AI systems actually behave.
Start With What Must Never Happen
The first and most important question is also the simplest:
What outcomes are unacceptable, no matter what the model does?
This is where many approaches to AI security break down. They assume the model will behave correctly, or that alignment and prompting will be enough to keep it on track. In practice, that assumption doesn’t hold. Models can be influenced. They can be attacked. And in some cases, they can fail in ways that are hard to predict.
That’s why security has to operate independently of the model’s reasoning.
At HiddenLayer, this is enforced through a policy engine that allows teams to define deterministic controls, rules that make certain actions impossible regardless of the model’s intent. That could mean blocking destructive operations, such as deleting infrastructure, preventing sensitive data from being accessed or exfiltrated, or stopping risky sequences of tool usage before they complete. These controls exist outside the agent itself, so even if the model is compromised, the boundaries still hold.
The goal isn’t to make the model perfect. It’s to ensure that certain failures can’t happen at all.
Then Ask: Who Has Tried to Break It?
Defining controls is one thing. Validating them is another.
A common pattern in this space is to rely on internal testing or controlled benchmarks. But AI systems don’t operate in controlled environments, and neither do attackers.
A more useful question is: who has actually tried to break these controls?
HiddenLayer’s approach has been to test under real pressure, running capture-the-flag challenges at events like Black Hat and DEF CON, where thousands of security researchers actively attempt to bypass protections. At the same time, an internal research team is continuously developing new attack techniques and using those findings to improve detection and enforcement.
That combination matters. It ensures the system is tested not just against known threats, but also against novel approaches that emerge as the space evolves.
Because in AI security, yesterday’s defenses don’t hold up for long.
Security Has to Adapt as Fast as the System
Even with strong controls, another challenge quickly emerges: flexibility.
AI systems don’t stay static. Teams iterate, expand capabilities, and push for more autonomy over time. If security controls can’t evolve alongside them, they either become bottlenecks or are bypassed entirely.
That’s why it’s important to understand how easily controls can be adjusted.
Rather than requiring rebuilds or engineering changes, controls should be configurable. Teams should be able to start in an observe-only mode, understand how agents behave, and then gradually enforce stricter policies as confidence grows. At the same time, different layers of control, organization-wide, project-specific, or even per-request, should allow for precision without sacrificing consistency.
This kind of flexibility ensures that security keeps pace with development rather than slowing it down.
Not Every Risk Can Be Eliminated
Even with deterministic controls in place, not everything can be prevented.
There will always be scenarios where risk has to be accepted, whether for usability, performance, or business reasons. The question then becomes how to manage that risk.
This is where probabilistic guardrails come in.
Rather than trying to block every possible attack, the goal shifts to making attacks visible, detectable, and ultimately containable. HiddenLayer approaches this by using multiple detection models that operate across different dimensions, rather than relying on a single classifier. If one model is bypassed, others still have the opportunity to identify the behavior.
These systems are continuously tested and retrained against new attack techniques, both from internal research and external validation efforts. The objective isn’t perfection, but resilience.
Because in practice, security isn’t about eliminating risk entirely. It’s about ensuring that when something goes wrong, it doesn’t go unnoticed.
Detection Only Works If It Happens Before Execution
One of the most critical examples of this is prompt injection.
Many solutions attempt to address prompt injection within the model itself, but this approach inherits the model's weaknesses. A more effective strategy is to detect malicious input before it ever reaches the agent.
HiddenLayer uses a purpose-built detection model that classifies inputs prior to execution, operating outside the agent’s reasoning process. This allows it to identify injection attempts without being susceptible to them and to stop them before any action is taken.
That distinction is important.
Once an agent executes a malicious instruction, the opportunity to prevent damage has already passed.
Visibility Isn’t Enough Without Enforcement
As AI systems scale, another reality becomes clear: they move faster than human response times.
This raises a practical question: can your team actually monitor and intervene in real time?
The answer, increasingly, is no. Not without automation.
That’s why enforcement needs to happen in line. Every prompt, tool call, and response should be inspected before execution, with policies applied immediately. Risky actions can be blocked, and high-risk workflows can automatically trigger checkpoints.
At the same time, visibility still matters. Security teams need full session-level context, integrations with existing tools like SIEMs, and the ability to trace behavior after the fact.
But visibility alone isn’t sufficient. Without real-time enforcement, detection becomes hindsight.
Coverage Is Where Most Strategies Break Down
Even strong controls and detection models can fail if they don’t apply everywhere.
AI environments are inherently fragmented. Agents can exist across frameworks, cloud platforms, and custom implementations. If security only covers part of that surface area, gaps emerge, and those gaps become the path of least resistance.
That’s why enforcement has to be layered.
Gateway-level controls can automatically discover and protect agents as they are deployed. SDK integrations extend coverage into specific frameworks. Cloud discovery ensures that assets across environments like AWS, Azure, and Databricks are continuously identified and brought under policy.
No single control point is sufficient on its own. The goal is comprehensive coverage, not partial visibility.
The Question Most People Avoid
Finally, there’s the question that tends to get overlooked:
What happens if something gets through?
Because eventually, something will.
When that happens, the priority is understanding and containment. Every interaction should be logged with full context, allowing teams to trace what occurred and identify similar behavior across the environment. From there, new protections should be deployable quickly, closing gaps before they can be exploited again.
What security solutions can’t do, however, is undo the impact entirely.
They can’t restore deleted data or reverse external actions. That’s why the focus has to be on limiting the blast radius, ensuring that failures are small enough to recover from.
Prevention and containment are what make recovery possible.
A Different Way to Think About Security
AI agents introduce a fundamentally different security challenge.
They aren’t static systems or predictable workflows. They are dynamic, adaptive, and capable of acting in ways that are difficult to anticipate.
Securing them requires a shift in mindset. It means defining what must never happen, managing the remaining risks, enforcing controls in real time, and assuming failures will occur.
Because they will.
The organizations that succeed with AI won’t be the ones that assume everything works as expected.
They’ll be the ones prepared for when it doesn’t.

The Hidden Risk of Agentic AI: What Happens Beyond the Prompt
As organizations adopt AI agents that can plan, reason, call tools, and execute multi-step tasks, the nature of AI security is changing.
AI is no longer confined to generating text or answering prompts. It is becoming operational actors inside the business, interacting with applications, accessing sensitive data, and taking action across workflows without human intervention. Each execution expands the potential blast radius. A single prompt can redirect an agent, trigger unsafe tool use, expose sensitive data, and cascade across systems in an execution chain — before security teams have visibility.
This shift introduces a new class of security risk. Attacks are no longer limited to manipulating model outputs. They can influence how an agent behaves during execution, leading to unintended tool usage, data exposure, or persistent compromise across sessions. In agentic systems, a single injected instruction can cascade through multiple steps, compounding impact as the agent continues to act.
According to HiddenLayer’s 2026 AI Threat Landscape Report, 1 in 8 AI breaches are now linked to agentic systems. Yet 31% of organizations cannot determine whether they’ve experienced one.
The root of the problem is a visibility gap.
Most AI security controls were designed for static interactions, and they remain essential. They inspect prompts and responses, enforce policies at the boundaries, and govern access to models.
But once an agent begins executing, those controls no longer provide visibility into what happens next. Security teams cannot see which tools are being called, what data is being accessed, or how a sequence of actions evolves over time.
In agentic environments, risk doesn’t replace the prompt layer. It extends beyond it. It emerges during execution, where decisions turn into actions across systems and workflows. Without visibility into runtime behavior, security teams are left blind to how autonomous systems operate and where they may be compromised.
To address this gap, HiddenLayer is extending its AI Runtime Protection module to cover agentic execution. These capabilities extend runtime protection beyond prompts and policies to secure what agents actually do — providing visibility, hunting and investigation, and detection and enforcement as autonomous systems operate.
Why Runtime Security Matters for Agentic AI
Agentic AI systems operate differently from traditional AI applications. Instead of producing a single response, they execute multi-step workflows that may involve:
- Calling external tools or APIs
- Accessing internal data sources
- Interacting with other agents or services
- Triggering downstream actions across systems
This means security teams must understand what agents are doing in real time, not just the prompt that initiated the interaction.
Bringing Visibility to Autonomous Execution
The next generation of AI runtime security enables security teams to observe and control how AI agents operate across complex workflows.
With these new capabilities, organizations can:
- Understand what actually happened
Reconstruct multi-step agent sessions to see how agents interact with tools, data, and other systems.
- Investigate and hunt across agent activity
Search and analyze agent workflows across sessions, execution paths, and tools to identify anomalous behavior and uncover emerging threats.
- Detect and stop agentic attack chains
Identify prompt injection, malicious tool sequences, and data exfiltration across multi-step execution and agent activity before they propagate across systems.
- Enforce runtime controls
Automatically block, redact, or detect unsafe agent actions based on real-time behavior and policies.
Together, these capabilities help organizations move from limited prompt-level inspection to full runtime visibility and control over autonomous execution.
Supporting the Next Phase of AI Adoption
HiddenLayer’s expanded runtime security capabilities integrate with agent gateways and frameworks, enabling organizations to deploy protections without rewriting applications or disrupting existing AI workflows.
Delivered as part of the HiddenLayer AI Security Platform, allowing organizations to gain immediate visibility into agent behavior and expand protections as their AI programs evolve.
As enterprises move toward autonomous AI systems, securing execution becomes a critical requirement.
What This Means for You
As organizations begin deploying AI agents that can call tools, access data, and execute multi-step workflows, security teams need visibility beyond the prompt. Traditional AI protections were designed for static interactions, not autonomous systems operating across enterprise environments.
Extending runtime protection to agent behavior enables organizations to observe how AI systems actually operate, detect risk as it emerges, and enforce controls in real time. As agentic AI adoption grows, securing the runtime layer will be essential to deploying these systems safely and confidently.

Why Autonomous AI Is the Next Great Attack Surface
Large language models (LLMs) excel at automating mundane tasks, but they have significant limitations. They struggle with accuracy, producing factual errors, reflecting biases from their training process, and generating hallucinations. They also have trouble with specialized knowledge, recent events, and contextual nuance, often delivering generic responses that miss the mark. Their lack of autonomy and need for constant guidance to complete tasks has given them a reputation of little more than sophisticated autocomplete tools.
The path toward true AI agency addresses these shortcomings in stages. Retrieval-Augmented Generation (RAG) systems pull in external, up-to-date information to improve accuracy and reduce hallucinations. Modern agentic systems go further, combining LLMs with frameworks for autonomous planning, reasoning, and execution.
The promise of AI agents is compelling: systems that can autonomously navigate complex tasks, make decisions, and deliver results with minimal human oversight. We are, by most reasonable measures, at the beginning of a new industrial revolution. Where previous waves of automation transformed manual and repetitive labor, this one is reshaping intelligent work itself, the kind that requires reasoning, judgment, and coordination across systems. AI agents sit at the heart of that shift.
But their autonomy cuts both ways. The very capabilities that make agents useful, their ability to access tools, retain memory, and act independently, are the same capabilities that introduce new and often unpredictable risks. An agent that can query your database and take action on the results is powerful when it works as intended, and potentially dangerous when it doesn't. As organizations race to deploy agentic systems, the central challenge isn't just building agents that can do more; it's ensuring they do so safely, reliably, and within boundaries we can trust.
What Makes an AI Agent?

At its core, an agent is a large language model augmented with capabilities that enable it to do things in the world, not just generate text. As the diagram shows, the key ingredients include: memory to remember past interactions, access to external tools such as APIs and search engines, the ability to read and write to databases and file systems, and the ability to execute multi-step sequences toward a goal. Stack these together, and you turn a passive text predictor into something that can plan, act, and learn.
The critical distinguishing feature of an agent is autonomy. Rather than simply responding to a single prompt, an agent can make decisions, take actions in its environment, observe the results, and adapt based on feedback, all in service of completing a broader objective. For example, an agent asked to "book the cheapest flight to Tokyo next week" might search for flights, compare options across multiple sites, check your calendar for conflicts, and proceed to book, executing a whole chain of reasoning and tool use without needing step-by-step human instruction. This loop of planning, acting, and adapting is what separates agents from standard chatbot interactions.
In the enterprise, agents are quickly moving from novelty to necessity. Companies are deploying them to handle complex workflows that previously required significant human coordination, things like processing invoices end-to-end, triaging customer support tickets across multiple systems, or orchestrating data pipelines. The real value comes when agents are connected to a company's internal tools and data sources, allowing them to operate within existing infrastructure rather than alongside it. As these systems mature, the focus is shifting from "can an agent do this task?" to "how do we reliably govern, monitor, and scale agents across the organization?"
The Evolution of Prompt Injection
When prompt injection first emerged, it was treated as a curiosity. Researchers tricked chatbots into ignoring their system prompts, producing funny or embarrassing outputs that made for good social media posts. That era is over. Prompt injection has matured into a legitimate delivery mechanism for real attacks, and the reason is simple: the targets have changed. Adversaries are no longer injecting prompts into chatbots that can only generate text. We're injecting them into agents that can execute code, call APIs, access databases, browse the web, and deploy tools. A successful prompt injection against a browsing agent can lead to data exfiltration. Against an enterprise agent with access to internal systems, it functions as an insider threat. Against a coding agent, it can result in malware being written and deployed without a human ever reviewing it. Prompt injection is no longer about making an AI say something it shouldn't. It's about making an AI take an action that it shouldn't, and the blast radius grows with every new capability we hand these systems.
Et Tu, Jarvis?
Nowhere is this more visible than in the rise of personal agents. Tony Stark's Jarvis in the Marvel Cinematic Universe set the bar for a personal AI assistant that manages your life, automates complex tasks, monitors your systems, and never sleeps. But what if Jarvis wasn't always on his side? OpenClaw brought that vision closer to reality than anything before it. Formerly known as Moltbot and ClawdBot, this open-source autonomous AI assistant exploded onto the scene in late 2025, amassing over 100,000 GitHub stars and becoming one of the fastest-growing open-source projects in history. It offered a "24/7 personal assistant" that could manage calendars, automate browsing, run system commands, and integrate with WhatsApp, Telegram, and Discord, all from your local machine. Around it, an entire ecosystem materialized almost overnight: Moltbook, a Reddit-style social network exclusively for AI agents with over 1.5 million registered bots, and ClawHub, a repository of skills and plugins.
The problem? The security story was almost nonexistent. Our research demonstrated that a simple indirect prompt injection, hidden in a webpage, could achieve full remote code execution, install a persistent backdoor via OpenClaw's heartbeat system, and establish an attacker-controlled command-and-control server. Tools ran without user approval, secrets were stored in plaintext, and the agent's own system prompt was modifiable by the agent itself. ClawHub lacked any mechanisms to distinguish legitimate skills from malicious ones, and sure enough, malicious skill files distributing macOS and Windows infostealers soon appeared. Moltbook's own backing database was found wide open with no access controls, meaning anyone could spoof any agent on the platform. What was designed as an ecosystem for autonomous AI assistants had inadvertently become near-perfect infrastructure for a distributed botnet.
The Agentic Supply Chain: A New Attack Surface
OpenClaw's ecosystem problems aren't unique to OpenClaw. The way agents discover, install, and depend on third-party skills and tools is creating the same supply chain risks that have plagued software package managers for years, just with higher stakes. New protocols like MCP (Model Context Protocol) are enabling agents to plug into external tools and data sources in a standardized way, and around them, entire ecosystems are emerging. Skills marketplaces, agent directories, and even social media-style platforms like Smithery are popping up as hubs for sharing and discovering agent capabilities. It's exciting, but it's also a story we've seen before.
Think npm, PyPI, or Docker Hub. These platforms revolutionized software development while simultaneously creating sprawling supply chains in which a single compromised package could ripple across thousands of applications. Agentic ecosystems are heading down the same path, arguably with higher stakes. When your agent connects to a third-party MCP server or installs a community-built skill, you're not only importing code, but also granting access to systems that can take autonomous action. Every external data source an agent touches, whether browsing the web, calling an API, or pulling from a third-party tool, is potentially untrusted input. And unlike a traditional application where bad data might cause a display error, in an agentic system, it can influence decisions, trigger actions, and cascade through workflows. We're building new dependency chains, and with them, new vectors for attack that the industry is only beginning to understand.
Shadow Agents, Shadow Employees
External attackers are one part of the equation. Sometimes the threat comes from within. We've already seen the rise of shadow IT and shadow AI, where employees adopt tools and models outside of approved channels. Agents take this a step further. It's no longer just an unauthorized chatbot answering questions; it's an unauthorized agent with access to company systems, making decisions and taking actions autonomously. At a certain point, these shadow tools become more like shadow employees, operating with real agency within your organization but without the oversight, onboarding, or governance you'd apply to an actual hire. They're harder to detect, harder to govern, and carry far more risk than a rogue spreadsheet or an unsanctioned SaaS subscription ever did. The threat model here is different from a compromised account or a disgruntled employee. Even when these agents are on IT's radar, the risk of an autonomous system quietly operating in an unforeseen manner across company infrastructure is easy to underestimate, as the BodySnatcher vulnerability demonstrated.
An Agent Will Do What It's Told
Suppose an attacker sits halfway across the globe with no credentials, no prior access, and no insider knowledge. Just a target's email address. They connect to a Virtual Agent API using a hardcoded credential identical across every customer environment. They impersonate an administrator, bypassing MFA and SSO entirely. They engage a prebuilt AI agent and instruct it to create a new account with full admin privileges. Persistent, privileged access to one of the most sensitive platforms in enterprise IT, achieved with nothing more than an email. This is BodySnatcher, a vulnerability discovered by AppOmni in January 2026 and described as one of the most severe AI-driven security flaws uncovered to date. Hardcoded credentials and weak identity logic made the initial access possible, but it was the agentic capabilities that turned a misconfiguration into a full platform takeover. It's a clear example of how agentic AI can amplify traditional exploitation techniques into something far more damaging.
Conclusions
Agents represent a fundamental shift in how individuals and organizations interact with AI. Autonomous systems with access to sensitive data, critical infrastructure, and the ability to act on both - how long before autonomous systems subsume critical infrastructure itself? As we've explored in this blog, that shift introduces risk at every level: from the supply chains that power agent ecosystems, to the prompt injection techniques that have evolved to exploit them, to the shadow agents operating inside organizations without any security oversight.
The challenge for security teams is that existing frameworks and controls were not designed with autonomous, tool-using AI systems in mind. The questions that matter now are ones many organizations haven't yet had to ask. How do you govern a non-human actor? How do you monitor a chain of autonomous decisions across multiple systems? How do you secure a supply chain built on community-contributed skills and open protocols?
This blog has focused on framing the problem. In part two, we'll go deeper into the technical details. We'll examine specific attack techniques targeting agentic systems, walk through real exploit chains, and discuss the defensive strategies and architectural decisions that can help organizations deploy agents without inheriting unacceptable risk.

Model Intelligence
Bringing Transparency to Third-Party AI Models
From Blind Model Adoption to Informed AI Deployment
As organizations accelerate AI adoption, they increasingly rely on third-party and open-source models to drive new capabilities across their business. Frequently, these models arrive with limited or nonexistent metadata around licensing, geographic exposure, and risk posture. The result is blind deployment decisions that introduce legal, financial, and reputational risk. HiddenLayer’s Model Intelligence eliminates that uncertainty by delivering structured insight and risk transparency into the models your organization depends on.
Three Core Attributes of Model Intelligence
HiddenLayer’s Model Intelligence focuses on three core attributes that enable risk aware deployment decisions:
License
Licenses define how a model can be used, modified, and shared. Some, such as MIT Open Source or Apache 2.0, are permissive. Others impose commercial, attribution, or use-case restrictions.
Identifying license terms early ensures models are used within approved boundaries and aligned with internal governance policies and regulatory requirements.
For example, a development team integrates a high-performing open-source model into a revenue-generating product, only to later discover the license restricts commercial use or imposes field-of-use limitations. What initially accelerated development quickly turns into a legal review, customer disruption, and a costly product delay.
Geographic Footprint
A model’s geographic footprint reflects the countries where it has been discovered across global repositories. This provides visibility into where the model is circulating, hosted, or redistributed.
Understanding this footprint helps organizations assess geopolitical, intellectual property, and security risks tied to jurisdiction and potential exposure before deployment.
For example, a model widely mirrored across repositories in sanctioned or high-risk jurisdictions may introduce export control considerations, sanctions exposure, or heightened compliance scrutiny, particularly for organizations operating in regulated industries such as financial services or defense.
Trust Level
Trust Level provides a measurable indicator of how established and credible a model’s publisher is on the hosting platform.
For example, two models may offer comparable performance. One is published by an established organization with a history of maintained releases, version control, and transparent documentation. The other is released by a little-known publisher with limited history or observable track record. Without visibility into publisher credibility, teams may unknowingly introduce unnecessary supply chain risk.
This enables teams to prioritize review efforts: applying deeper scrutiny to lower-trust sources while reducing friction for higher-trust ones. When combined with license and geographic context, trust becomes a powerful input for supply chain governance and compliance decisions.

Turning Intelligence into Operational Action
Model Intelligence operationalizes these data points across the model lifecycle through the following capabilities:
- Automated Metadata Detection – Identifies license and geographic footprint during scanning.
- Trust Level Scoring – Assesses publisher credibility to inform risk prioritization.
- AIBOM Integration – Embeds metadata into a structured inventory of model components, datasets, and dependencies to support licensing reviews and compliance workflows.
This transforms fragmented metadata into structured, actionable intelligence across the model lifecycle.
What This Means for Your Organization
Model Intelligence enables organizations to vet models quickly and confidently, eliminating manual guesswork and fragmented research. It provides clear visibility into licensing terms and geographic exposure, helping teams understand usage rights before deployment. By embedding this insight into governance workflows, it strengthens alignment with internal policies and regulatory requirements while reducing the risk of deploying improperly licensed or high-risk models. The result is faster, responsible AI adoption without increasing organizational risk.

Introducing Workflow-Aligned Modules in the HiddenLayer AI Security Platform
Modern AI environments don’t fail because of a single vulnerability. They fail when security can’t keep pace with how AI is actually built, deployed, and operated. That’s why our latest platform update represents more than a UI refresh. It’s a structural evolution of how AI security is delivered.
Modern AI environments don’t fail because of a single vulnerability. They fail when security can’t keep pace with how AI is actually built, deployed, and operated. That’s why our latest platform update represents more than a UI refresh. It’s a structural evolution of how AI security is delivered.
With the release of HiddenLayer AI Security Platform Console v25.12, we’ve introduced workflow-aligned modules, a unified Security Dashboard, and an expanded Learning Center, all designed to give security and AI teams clearer visibility, faster action, and better alignment with real-world AI risk.
From Products to Platform Modules
As AI adoption accelerates, security teams need clarity, not fragmented tools. In this release, we’ve transitioned from standalone product names to platform modules that map directly to how AI systems move from discovery to production.
Here’s how the modules align:
| Previous Name | New Module Name |
|---|---|
| Model Scanner | AI Supply Chain Security |
| Automated Red Teaming for AI | AI Attack Simulation |
| AI Detection & Response (AIDR) | AI Runtime Security |
This change reflects a broader platform philosophy: one system, multiple tightly integrated modules, each addressing a critical stage of the AI lifecycle.
What’s New in the Console

Workflow-Driven Navigation & Updated UI
The Console now features a redesigned sidebar and improved navigation, making it easier to move between modules, policies, detections, and insights. The updated UX reduces friction and keeps teams focused on what matters most, understanding and mitigating AI risk.
Unified Security Dashboard
Formerly delivered through reports, the new Security Dashboard offers a high-level view of AI security posture, presented in charts and visual summaries. It’s designed for quick situational awareness, whether you’re a practitioner monitoring activity or a leader tracking risk trends.
Exportable Data Across Modules
Every module now includes exportable data tables, enabling teams to analyze findings, integrate with internal workflows, and support governance or compliance initiatives.
Learning Center
AI security is evolving fast, and so should enablement. The new Learning Center centralizes tutorials and documentation, enabling teams to onboard quicker and derive more value from the platform.
Incremental Enhancements That Improve Daily Operations
Alongside the foundational platform changes, recent updates also include quality-of-life improvements that make day-to-day use smoother:
- Default date ranges for detections and interactions
- Severity-based filtering for Model Scanner and AIDR
- Improved pagination and table behavior
- Updated detection badges for clearer signal
- Optional support for custom logout redirect URLs (via SSO)
These enhancements reflect ongoing investment in usability, performance, and enterprise readiness.
Why This Matters
The new Console experience aligns directly with the broader HiddenLayer AI Security Platform vision: securing AI systems end-to-end, from discovery and testing to runtime defense and continuous validation.
By organizing capabilities into workflow-aligned modules, teams gain:
- Clear ownership across AI security responsibilities
- Faster time to insight and response
- A unified view of AI risk across models, pipelines, and environments
This update reinforces HiddenLayer’s focus on real-world AI security, purpose-built for modern AI systems, model-agnostic by design, and deployable without exposing sensitive data or IP
Looking Ahead
These Console updates are a foundational step. As AI systems become more autonomous and interconnected, platform-level security, not point solutions, will define how organizations safely innovate.
We’re excited to continue building alongside our customers and partners as the AI threat landscape evolves.

Inside HiddenLayer’s Research Team: The Experts Securing the Future of AI
Every new AI model expands what’s possible and what’s vulnerable. Protecting these systems requires more than traditional cybersecurity. It demands expertise in how AI itself can be manipulated, misled, or attacked. Adversarial manipulation, data poisoning, and model theft represent new attack surfaces that traditional cybersecurity isn’t equipped to defend.
Every new AI model expands what’s possible and what’s vulnerable. Protecting these systems requires more than traditional cybersecurity. It demands expertise in how AI itself can be manipulated, misled, or attacked. Adversarial manipulation, data poisoning, and model theft represent new attack surfaces that traditional cybersecurity isn’t equipped to defend.
At HiddenLayer, our AI Security Research Team is at the forefront of understanding and mitigating these emerging threats from generative and predictive AI to the next wave of agentic systems capable of autonomous decision-making. Their mission is to ensure organizations can innovate with AI securely and responsibly.
The Industry’s Largest and Most Experienced AI Security Research Team
HiddenLayer has established the largest dedicated AI security research organization in the industry, and with it, a depth of expertise unmatched by any security vendor.
Collectively, our researchers represent more than 150 years of combined experience in AI security, data science, and cybersecurity. What sets this team apart is the diversity, as well as the scale, of skills and perspectives driving their work:
- Adversarial prompt engineers who have captured flags (CTFs) at the world’s most competitive security events.
- Data scientists and machine learning engineers responsible for curating threat data and training models to defend AI
- Cybersecurity veterans specializing in reverse engineering, exploit analysis, and helping to secure AI supply chains.
- Threat intelligence researchers who connect AI attacks to broader trends in cyber operations.
Together, they form a multidisciplinary force capable of uncovering and defending every layer of the AI attack surface.
Establishing the First Adversarial Prompt Engineering (APE) Taxonomy
Prompt-based attacks have become one of the most pressing challenges in securing large language models (LLMs). To help the industry respond, HiddenLayer’s research team developed the first comprehensive Adversarial Prompt Engineering (APE) Taxonomy, a structured framework for identifying, classifying, and defending against prompt injection techniques.
By defining the tactics, techniques, and prompts used to exploit LLMs, the APE Taxonomy provides security teams with a shared and holistic language and methodology for mitigating this new class of threats. It represents a significant step forward in securing generative AI and reinforces HiddenLayer’s commitment to advancing the science of AI defense.
Strengthening the Global AI Security Community
HiddenLayer’s researchers focus on discovery and impact. Our team actively contributes to the global AI security community through:
- Participation in AI security working groups developing shared standards and frameworks, such as model signing with OpenSFF.
- Collaboration with government and industry partners to improve threat visibility and resilience, such as the JCDC, CISA, MITRE, NIST, and OWASP.
- Ongoing contributions to the CVE Program, helping ensure AI-related vulnerabilities are responsibly disclosed and mitigated with over 48 CVEs.
These partnerships extend HiddenLayer’s impact beyond our platform, shaping the broader ecosystem of secure AI development.
Innovation with Proven Impact
HiddenLayer’s research has directly influenced how leading organizations protect their AI systems. Our researchers hold 25 granted patents and 56 pending patents in adversarial detection, model protection, and AI threat analysis, translating academic insights into practical defense.
Their work has uncovered vulnerabilities in popular AI platforms, improved red teaming methodologies, and informed global discussions on AI governance and safety. Beyond generative models, the team’s research now explores the unique risks of agentic AI, autonomous systems capable of independent reasoning and execution, ensuring security evolves in step with capability.
This innovation and leadership have been recognized across the industry. HiddenLayer has been named a Gartner Cool Vendor, a SINET16 Innovator, and a featured authority in Forbes, SC Magazine, and Dark Reading.
Building the Foundation for Secure AI
From research and disclosure to education and product innovation, HiddenLayer’s SAI Research Team drives our mission to make AI secure for everyone.
“Every discovery moves the industry closer to a future where AI innovation and security advance together. That’s what makes pioneering the foundation of AI security so exciting.”
— HiddenLayer AI Security Research Team
Through their expertise, collaboration, and relentless curiosity, HiddenLayer continues to set the standard for Security for AI.
About HiddenLayer
HiddenLayer, a Gartner-recognized Cool Vendor for AI Security, is the leading provider of Security for AI. Its AI Security Platform unifies supply chain security, runtime defense, posture management, and automated red teaming to protect agentic, generative, and predictive AI applications. The platform enables organizations across the private and public sectors to reduce risk, ensure compliance, and adopt AI with confidence.
Founded by a team of cybersecurity and machine learning veterans, HiddenLayer combines patented technology with industry-leading research to defend against prompt injection, adversarial manipulation, model theft, and supply chain compromise.

Why Traditional Cybersecurity Won’t “Fix” AI
When an AI system misbehaves, from leaking sensitive data to producing manipulated outputs, the instinct across the industry is to reach for familiar tools: patch the issue, run another red team, test more edge cases.
When an AI system misbehaves, from leaking sensitive data to producing manipulated outputs, the instinct across the industry is to reach for familiar tools: patch the issue, run another red team, test more edge cases.
But AI doesn’t fail like traditional software.
It doesn’t crash, it adapts. It doesn’t contain bugs, it develops behaviors.
That difference changes everything.
AI introduces an entirely new class of risk that cannot be mitigated with the same frameworks, controls, or assumptions that have defined cybersecurity for decades. To secure AI, we need more than traditional defenses. We need a shift in mindset.
The Illusion of the Patch
In software security, vulnerabilities are discrete: a misconfigured API, an exploitable buffer, an unvalidated input. You can identify the flaw, patch it, and verify the fix.
AI systems are different. A vulnerability isn’t a line of code, it’s a learned behavior distributed across billions of parameters. You can’t simply patch a pattern of reasoning or retrain away an emergent capability.
As a result, many organizations end up chasing symptoms, filtering prompts or retraining on “safer” data, without addressing the fundamental exposure: the model itself can be manipulated.
Traditional controls such as access management, sandboxing, and code scanning remain essential. However, they were never designed to constrain a system that fuses code and data into one inseparable process. AI models interpret every input as a potential instruction, making prompt injection a persistent, systemic risk rather than a single bug to patch.
Testing for the Unknowable
Quality assurance and penetration testing work because traditional systems are deterministic: the same input produces the same output.
AI doesn’t play by those rules. Each response depends on context, prior inputs, and how the user frames a request. Modern models also inject intentional randomness, or temperature, to promote creativity and variation in their outputs. This built-in entropy means that even identical prompts can yield different responses, which is a feature that enhances flexibility but complicates reproducibility and validation. Combined with the inherent nondeterminism found in large-scale inference systems, as highlighted by the Thinking Machines Lab, this variability ensures that no static test suite can fully map an AI system’s behavior.
That’s why AI red teaming remains critical. Traditional testing alone can’t capture a system designed to behave probabilistically. Still, adaptive red teaming, built to probe across contexts, temperature settings, and evolving model states, helps reveal vulnerabilities that deterministic methods miss. When combined with continuous monitoring and behavioral analytics, it becomes a dynamic feedback loop that strengthens defenses over time.
Saxe and others argue that the path forward isn’t abandoning traditional security but fusing it with AI-native concepts. Deterministic controls, such as policy enforcement and provenance checks, should coexist with behavioral guardrails that monitor model reasoning in real time.
You can’t test your way to safety. Instead, AI demands continuous, adaptive defense that evolves alongside the systems it protects.
A New Attack Surface
In AI, the perimeter no longer ends at the network boundary. It extends into the data, the model, and even the prompts themselves. Every phase of the AI lifecycle, from data collection to deployment, introduces new opportunities for exploitation:
- Data poisoning: Malicious inputs during training implant hidden backdoors that trigger under specific conditions.
- Prompt injection: Natural language becomes a weapon, overriding instructions through subtle context.
Some industry experts argue that prompt injections can be solved with traditional controls such as input sanitization, access management, or content filtering. Those measures are important, but they only address the symptoms of the problem, not its root cause. Prompt injection is not just malformed input, but a by-product of how large language models merge data and instructions into a single channel. Preventing it requires more than static defenses. It demands runtime awareness, provenance tracking, and behavioral guardrails that understand why a model is acting, not just what it produces. The future of AI security depends on integrating these AI-native capabilities with proven cybersecurity controls to create layered, adaptive protection.
- Data exposure: Models often have access to proprietary or sensitive data through retrieval-augmented generation (RAG) pipelines or Model Context Protocols (MCPs). Weak access controls, misconfigurations, or prompt injections can cause that information to be inadvertently exposed to unprivileged users.
- Malicious realignment: Attackers or downstream users fine-tune existing models to remove guardrails, reintroduce restricted behaviors, or add new harmful capabilities. This type of manipulation doesn’t require stealing the model. Rather, it exploits the openness and flexibility of the model ecosystem itself.
- Inference attacks: Sensitive data is extracted from model outputs, even without direct system access.
These are not coding errors. They are consequences of how machine learning generalizes.
Traditional security techniques, such as static analysis and taint tracking, can strengthen defenses but must evolve to analyze AI-specific artifacts, both supply chain artifacts like datasets, model files, and configurations; as well as runtime artifacts like context windows, RAG or memory stores, and tools or MCP servers.
Securing AI means addressing the unique attack surface that emerges when data, models, and logic converge.
Red Teaming Isn’t the Finish Line
Adversarial testing is essential, but it’s only one layer of defense. In many cases, “fixes” simply teach the model to avoid certain phrases, rather than eliminating the underlying risk.
Attackers adapt faster than defenders can retrain, and every model update reshapes the threat landscape. Each retraining cycle also introduces functional change, such as new behaviors, decision boundaries, and emergent properties that can affect reliability or safety. Recent industry examples, such as OpenAI’s temporary rollback of GPT-4o and the controversy surrounding behavioral shifts in early GPT-5 models, illustrate how even well-intentioned updates can create new vulnerabilities or regressions. This reality forces defenders to treat security not as a destination, but as a continuous relationship with a learning system that evolves with every iteration.
Borrowing from Saxe’s framework, effective AI defense should integrate four key layers: security-aware models, risk-reduction guardrails, deterministic controls, and continuous detection and response mechanisms. Together, they form a lifecycle approach rather than a perimeter defense.
Defending AI isn’t about eliminating every flaw, just as it isn’t in any other domain of security. The difference is velocity: AI systems change faster than any software we’ve secured before, so our defenses must be equally adaptive. Capable of detecting, containing, and recovering in real time.
Securing AI Requires a Different Mindset
Securing AI requires a different mindset because the systems we’re protecting are not static. They learn, generalize, and evolve. Traditional controls were built for deterministic code; AI introduces nondeterminism, semantic behavior, and a constant feedback loop between data, model, and environment.
At HiddenLayer, we operate on a core belief: you can’t defend what you don’t understand.
AI Security requires context awareness, not just of the model, but of how it interacts with data, users, and adversaries.
A modern AI security posture should reflect those realities. It combines familiar principles with new capabilities designed specifically for the AI lifecycle. HiddenLayer’s approach centers on four foundational pillars:
- AI Discovery: Identify and inventory every model in use across the organization, whether developed internally or integrated through third-party services. You can’t protect what you don’t know exists.
- AI Supply Chain Security: Protect the data, dependencies, and components that feed model development and deployment, ensuring integrity from training through inference.
- AI Security Testing: Continuously test models through adaptive red teaming and adversarial evaluation, identifying vulnerabilities that arise from learned behavior and model drift.
- AI Runtime Security: Monitor deployed models for signs of compromise, malicious prompting, or manipulation, and detect adversarial patterns in real time.
These capabilities build on proven cybersecurity principles, discovery, testing, integrity, and monitoring, but extend them into an environment defined by semantic reasoning and constant change.
This is how AI security must evolve. From protecting code to protecting capability, with defenses designed for systems that think and adapt.
The Path Forward
AI represents both extraordinary innovation and unprecedented risk. Yet too many organizations still attempt to secure it as if it were software with slightly more math.
The truth is sharper.
AI doesn’t break like code, and it won’t be fixed like code.
Securing AI means balancing the proven strengths of traditional controls with the adaptive awareness required for systems that learn.
Traditional cybersecurity built the foundation. Now, AI Security must build what comes next.
Learn More
To stay ahead of the evolving AI threat landscape, explore HiddenLayer’s Innovation Hub, your source for research, frameworks, and practical guidance on securing machine learning systems.
Or connect with our team to see how the HiddenLayer AI Security Platform protects models, data, and infrastructure across the entire AI lifecycle.

Securing AI Through Patented Innovation
As AI systems power critical decisions and customer experiences, the risks they introduce must be addressed. From prompt injection attacks to adversarial manipulation and supply chain threats, AI applications face vulnerabilities that traditional cybersecurity can’t defend against. HiddenLayer was built to solve this problem, and today, we hold one of the world’s strongest intellectual property portfolios in AI security.
As AI systems power critical decisions and customer experiences, the risks they introduce must be addressed. From prompt injection attacks to adversarial manipulation and supply chain threats, AI applications face vulnerabilities that traditional cybersecurity can’t defend against. HiddenLayer was built to solve this problem, and today, we hold one of the world’s strongest intellectual property portfolios in AI security.
A Patent Portfolio Built for the Entire AI Lifecycle
Our innovations protect AI models from development through deployment. With 25 granted patents, 56 pending and planned U.S. applications, and 31 international filings, HiddenLayer has established a global foundation for AI security leadership.
This portfolio is the foundation of our strategic product lines:
- AIDR: Provides runtime protection for generative, predictive, and Agentic applications against privacy leaks, and output manipulation.
- Model Scanner: Delivering supply chain security and integrity verification for machine learning models.
- Automated Red Teaming: Continuously stress-tests AI systems with techniques that simulate real-world adversarial attacks, uncovering hidden vulnerabilities before attackers can exploit them.
Patented Innovation in Action
Each granted patent reinforces our core capabilities:
- LLM Protection (14 patents): Multi-layered defenses against prompt injection, data leakage, and PII exposure.
- Model Integrity (5 patents): Cryptographic provenance tracking and hidden backdoor detection for supply chain safety.
- Runtime Monitoring (2 patents): Detecting and disrupting adversarial attacks in real time.
- Encryption (4 patents): Advanced ML-driven multi-layer encryption with hidden compartments for maximum data protection.
Why It Matters
In AI security, patents are proof of solving problems no one else has. With one of the industry's largest portfolios, HiddenLayer demonstrates technical depth and the foresight to anticipate emerging threats. Our breadth of granted patents signals to customers and partners that they can rely on tested, defensible innovations, not unproven claims.
- Stay compliant with global regulations:
With patents covering advanced privacy protections and policy-driven PII redaction, organizations can meet strict data protection standards like GDPR, CCPA, and upcoming AI regulatory frameworks. Built-in audit trails and configurable privacy budgets ensure that compliance is a natural part of AI governance, not a roadblock. - Defend against sophisticated AI threats before they cause damage:
Our patented methods for detecting prompt injections, model inversion, hidden backdoors, and adversarial attacks provide multi-layered defense across the entire AI lifecycle. These capabilities give organizations early warning and automated response mechanisms that neutralize threats before they can be exploited. - Accelerate adoption of AI technologies without compromising security:
By embedding patented security innovations directly into model deployment and runtime environments, HiddenLayer eliminates trade-offs between innovation and safety. Customers can confidently adopt cutting-edge GenAI, multimodal models, and third-party ML assets knowing that the integrity, confidentiality, and resilience of their AI systems are safeguarded.
Together, these protections transform AI from a potential liability into a secure growth driver, enabling enterprises, governments, and innovators to harness the full value of artificial intelligence.
The Future of AI Security
HiddenLayer’s patent portfolio is only possible because of the ingenuity of our research team, the minds who anticipate tomorrow’s threats and design the defenses to stop them. Their work has already produced industry-defining protections, and they continue to push boundaries with innovations in multimodal attack detection, agentic AI security, and automated vulnerability discovery.
By investing in this research talent, HiddenLayer ensures we’re not just keeping pace with AI’s evolution but shaping the future of how it can be deployed safely, responsibly, and at scale.
HiddenLayer — Protecting AI at every layer.

Understand AI Security, Clearly Defined
Explore our glossary to get clear, practical definitions of the terms shaping AI security, governance, and risk management.
