Blog / April 15, 2026

Why Your Plugin Returns the Right Data But the Agent Gets It Wrong

Kevin Mok, Staff Developer Advocate

hero-momentum-transparent-circles-horizontal

Table of contents

Your plugin returns the right data. The API call succeeds. But the agent gives the user a garbled summary, or worse, says it can't find information that's sitting right there in the response. You've checked the logs. The data is there. So what went wrong?

The answer is usually the response format.

What happens after your tool returns data

When a plugin runs and returns data, that response passes through a processing step before the reasoning engine sees it. What the engine can do with that data depends entirely on the format of the response.

When a tool response comes back as structured JSON, the engine can parse it. It extracts the schema, trims irrelevant fields, and routes pieces to downstream tools like code execution or summarization. Only the relevant subset reaches the reasoner, keeping context clean and focused.

When a tool response comes back as flat text, the engine can't parse it. It's an opaque blob. The entire string gets forwarded to the reasoner as-is. No filtering. No routing. No intelligence applied.

Think of it like a mail room. Structured JSON is a labeled package with a packing slip: the mail room can read the label, sort it, and route it to the right desk. A plaintext string is an unmarked envelope. The mail room can't open it, can't sort it, can't do anything except pass it along and hope the recipient figures it out.

That's why structured data wins. The engine can work with JSON intelligently, but plaintext bypasses that intelligence entirely.

If you've ever had a plugin that works perfectly in testing but gives weird results with real user queries, the response format is the first thing to check.

Step 0 of 7

Structured data flows through parsing, filtering, and routing before reaching the reasoner. Plaintext skips all of that.

The loop that runs everything

Plaintext responses can never trigger code execution. They can't become variables. They can't be handed to the code interpreter for sorting or filtering. To understand why, you need to see how the reasoning engine actually works.

The engine isn't a single LLM call. It's an iterative loop that runs up to 10 times per user message, blending two paradigms: ReAct (reasoning-and-acting) and CodeAct. Each iteration, the engine plans what to do next, executes an action (calling a plugin, running code, responding to the user), then processes the result before looping back. The processing step is where it parses structured data, extracts schemas, creates variables, and prepares instructions for the next iteration. For the full breakdown, see the docs on how the reasoning engine works.

Code execution is just another tool the planner can invoke, same as calling a plugin or querying a knowledge base. When the engine processes a large structured response, it stores the full data as a variable and tells the planner: "use code execution if the preview doesn't have what you need." On the next iteration, the planner can invoke the code interpreter, which receives the stored variable, runs Python against it, and returns the result. That result gets processed again, and the loop continues. More detail on that flow is in the processing tool responses docs.

The processing step between "execute" and "repeat" is the key. When the engine can parse your data, it can create the variables and instructions that make code execution possible. When it can't (plaintext), the code execution path is never triggered.

Step 0 of 5

The Structured Path: JSON + MAP()

Here's what the structured path looks like in practice. Say your plugin returns a list of expenses from an API. In Agent Studio, the output mapper is where you control the format of the response. A structured output mapper using $MAP() produces clean JSON:

output_mapper:
  expenses:
    MAP():
      items: data.expenses
      converter:
        description: item.description
        amount: item.amount
        category: item.category
  total_amount: data.total_amount
  expense_count: data.expenses.$LENGTH()

This produces a JSON object with typed fields. The engine receives it and can do real work:

Schema extraction. The engine reads the JSON structure and builds a schema. It knows expenses is an array of objects, each with description (string), amount (number), and category (string). It knows total_amount is a number.
Field-level filtering. If the response is large, the engine can trim fields that aren't relevant to the current reasoning step. Only what matters gets forwarded.
Tool routing. For large datasets, the engine can store the full response as a variable and route the reasoner to use code execution for sorting, counting, or aggregating. The reasoner gets a compact preview plus instructions, not the entire payload.
Developer instructions. If your response includes a display_instructions_for_model key, the engine extracts it and injects it as a planning instruction for the reasoner. This is a system-level directive, not data. It tells the reasoner how to present the results.

The result: the reasoner gets a clean, typed representation of your data. It can reference specific fields. It can hand off heavy processing to code execution. Context stays lean.

The Plaintext Path: RENDER Strings

Now the same data, different response format. This time the output mapper uses RENDER with Mustache loops:

output_mapper:
  expense_summary:
    RENDER():
    template: |
      Here are your submitted expenses:
      {{#each expenses}}
      - {{this.description}}: ${{this.amount}} ({{this.category}})
      {{/each}}
      Total: ${{data.total_amount}}
    args:
      expenses: data.expenses

The RENDER directive triggers a text interpolation pass. The output is a flat string. The engine receives it and... can't do much.

No schema extraction. No field-level filtering. No tool routing. The entire rendered string goes straight into the reasoner's context window. If the list has 200 expenses, that's 200 lines of text dumped into context. The reasoner has to parse it visually, like reading a wall of text instead of querying a table.

And the problems stack. Without structured data, the reasoner can't hand off aggregation to code execution. It can't sort or filter programmatically. Every operation on that data has to happen through text manipulation in the LLM's context, which is slower, less reliable, and eats tokens.

What happens with large responses

When a structured plugin response is large enough (the threshold is around 7K tokens), the engine triggers Structured Data Analysis (SDA).

Here's what happens step by step:

Schema extraction. The engine parses the JSON and builds a schema describing the structure and types.
Truncated preview. Long text fields get trimmed (the first portion and last portion of the text, with the middle replaced by an ellipsis). The reasoner gets enough to understand the shape of the data without the full payload.
Variable storage. The full response is stored as a named variable. The reasoner receives the schema, the truncated preview, and an instruction: "Use code execution if the preview doesn't contain the exact records you need."

The reasoner now has a choice. If the preview contains the answer, it responds directly. If it needs to sort, count, filter, or aggregate, it writes code against the stored variable. The code interpreter has access to the full dataset. The reasoner's context stays compact.

Step 0 of 5

There's also a display_instructions_for_model mechanism. Plugin developers can include this key in their output mapper response, and the engine extracts it and injects it as a planning instruction. It's not data the reasoner displays. It's a directive that shapes how the reasoner presents the results. Think of it as a system-level hint from the plugin developer to the reasoning engine.

Compare this to the plaintext path. A text string that's too long just gets truncated with a blunt "TOO LONG OF A RESPONSE TO DISPLAY" message. No variable. No schema. No code interpreter routing. The data is effectively lost to the reasoning pipeline.

The rough edges

There are real pain points here that you should know about before you spend hours debugging.

The 7K token threshold for SDA is a black box. You can't see it, can't configure it, and there's no log entry when it triggers. You're designing your output mapper with no way to know whether the engine will treat your response as "small enough to inline" or "large enough for variable storage." You're guessing.

Plaintext strings silently drop missing data. If you're building a text string with headers and a list of items, and some fields are null, those entries just output empty. The header is there, the list formatting is there, but the values are blank. No error, no fallback. It looks fine in your test data and falls apart in production when real responses have gaps. You won't know until a user complains.

Stringified JSON inside a string field won't trigger the structured path. If your API returns JSON but your output mapper wraps it in a string, or the upstream service encodes JSON as an escaped string inside another field, the engine sees a string, not an object. It takes the plaintext path. This trips people up constantly, because the data is JSON, it's just not being treated as JSON by the time the engine sees it.

Truncation means silent data loss. When a text response is too long, the engine cuts it off and appends a notice. No error surfaced to the builder. No indication in your logs that data was dropped. The user just gets an incomplete answer, and you have no signal that anything went wrong unless you go looking.

The one thing you control

You can't configure how the engine filters or truncates tool responses. That behavior is automatic, based on the response format and size. Your lever is the output mapper. Structured JSON opts you into the full processing pipeline. Plaintext opts you out.

→ Use MAP for lists. Direct field mapping for single objects. Save plaintext output for cases where you genuinely need a static text string and nothing else.

What to do about it

The way the reasoning engine processes tool responses is easy to miss because it works silently. You never interact with it directly. But every plugin response flows through it, and the format of that response determines whether the engine can work with your data intelligently or has to pass it through untouched.

Structured JSON: the engine parses, filters, routes, stores, and instructs. Plaintext: the engine passes it through and hopes for the best.

If you've hit weird behavior with large plugin responses, I want to hear about it. What happened? Drop it in the community, I'm collecting patterns.

The content of this blog post is for informational purposes only.

Subscribe to our Insights blog

The AI Assistant platform for your entire workforce

Find what you need instantly and automate tasks end-to-end in seconds across apps with AI agents that can realize your business objectives.

The results of ServiceNow's Enterprise AI Maturity Index 2026 are in: More spend. More gaps. More ROI.

Why Your Plugin Returns the Right Data But the Agent Gets It Wrong

What happens after your tool returns data

The loop that runs everything

The Structured Path: JSON + MAP()

The Plaintext Path: RENDER Strings

What happens with large responses

The rough edges

The one thing you control

What to do about it

Subscribe to our Insights blog

What Is Agentic AIOps for IT? An Enterprise Guide

How to Develop an HR Chatbot That Scales with Your Workforce

Top Government AI Tools for Agencies: 5 Options

The AI Assistant platform for your entire workforce

Why Your Plugin Returns the Right Data But the Agent Gets It Wrong

What happens after your tool returns data

Got questions about slots? Bring them to Office Hours — we'll unpack it live.

The loop that runs everything

The Structured Path: JSON + MAP()

The Plaintext Path: RENDER Strings

What happens with large responses

The rough edges

The one thing you control

What to do about it

Subscribe to our Insights blog

Related articles

What Is Agentic AIOps for IT? An Enterprise Guide

How to Develop an HR Chatbot That Scales with Your Workforce

Top Government AI Tools for Agencies: 5 Options

The AI Assistant platform for your entire workforce