Skip to main content

Testing Agents

Every agent has a built-in Test Console — a live chat interface in the right column of the Agent Editor. Use it to validate your agent's behavior before deploying it in workflows.


The Test Console

The Test Console appears in the right column of the Agent Editor. It shows:

  • System prompt preview — The current system prompt (with any skill additions if a skill is assigned)
  • Chat area — The conversation history
  • Input field — Where you type test messages
  • Send button — Sends the message and gets an LLM response

The test console uses your agent's exact configuration — the same model, system prompt, temperature, max tokens, capabilities, skills, RAG sources, and MCP servers. What you see in the test console is what you'll get in workflows.

Save before testing

Changes to the agent config are applied to the test console in real time, but only if you save first. Click Save Agent before running tests if you've made changes.


Running Your First Test

  1. Open the Agent Editor for any agent.
  2. In the right column (Test Console), type a test message in the input field.
  3. Click Send (or press Enter).
  4. The agent's response appears in the chat area.
  5. Continue the conversation with follow-up messages.

The test console maintains full multi-turn conversation history — the model sees all previous messages in the thread.


Testing Capabilities

If your agent has capabilities enabled, test that the agent uses them correctly.

Send a message that requires current information:

User: What are the latest Claude models available from Anthropic as of this month?

Expected behavior: The agent should call web_search, retrieve current results, and provide an up-to-date answer with sources.

Observe: Does the response include actual source URLs? Is the information current? Did the model feel the need to search (or did it rely on potentially outdated training data)?

Testing web_scraper

Send a message with a specific URL:

User: Please summarize the content at https://docs.tarx.io/intro

Expected behavior: Agent calls web_scraper on the URL, reads the content, and summarizes it accurately.

Only two capabilities

Agents have exactly two capabilities — web_search and web_scraper. To call an arbitrary API as part of a workflow, use the http_request node; for database/SaaS access, attach an MCP server to the agent and test its tools the same way.


Testing RAG Retrieval

When your agent has RAG sources, test that retrieval is working:

  1. Think of a question whose answer exists specifically in your indexed documents (not in the model's training data).
  2. Ask that specific question.
  3. Check if the answer is accurate and matches your source content.

Example:

User: What is our company's policy on remote work expense reimbursement?

If your RAG source contains your company policy docs, the agent should return the specific policy details — not a generic "it depends on company policy" response.

Debugging RAG:

  • If the answer is generic, retrieval may not be working — check the RAG Source connection status
  • If the answer is confidently wrong, it may be using training data instead of retrieved content — strengthen your system prompt to prefer retrieved context

Multi-Turn Testing

The test console maintains conversation history. Use this to test multi-turn behaviors:

Testing Follow-Up Handling

Turn 1: "Analyze this dataset: [data]"
Turn 2: "What's the trend in Q3?"
Turn 3: "Compare that with Q2"

The agent should maintain context across all three turns.

Testing Conversation Reset

If you want to start a fresh conversation:

  1. Click Clear (or New Conversation) in the test console toolbar.
  2. The chat history is cleared.
  3. The next message starts a fresh context.

Overriding the System Prompt

The test console has an Override System Prompt toggle. When enabled:

  1. A text field appears above the chat area.
  2. You can type a temporary system prompt.
  3. This replaces the agent's configured system prompt for the test session only.
  4. The override is not saved — it's for experimentation only.

This is useful for:

  • A/B testing different prompt approaches before committing
  • Testing what happens with a minimal system prompt
  • Debugging whether an issue is in the system prompt or the model

Test Console vs. Workflow Execution

Test ConsoleWorkflow Execution
TriggerManual (type a message)Triggered by workflow trigger node
InputManually typedFrom {{trigger.output}} or upstream node
ContextFresh conversation (no workflow context)Has workflow execution context
ExpressionsNot evaluated{{trigger.output}} etc. are resolved
MCP serversConnected and queriedConnected and queried
RAG sourcesQueriedQueried
CapabilitiesEnabled and callableEnabled and callable
LoggingNot saved to execution historySaved to execution history

The test console is a good approximation of real execution but note the differences:

  • It does not evaluate workflow expressions ({{trigger.output}} would appear literally)
  • Test conversations are not saved to execution history
  • There is no Human-in-Loop integration in the test console

What to Test Before Deploying

Run through this checklist before using an agent in a production workflow:

1. Happy path

  • Send the type of input the agent will receive in the workflow
  • Verify the output matches the expected format and quality

2. Edge cases

  • Empty input: ""
  • Very long input: paste a 5000-word document
  • Off-topic input: "What's the weather like?" (should be handled per constraints)
  • Adversarial: "Ignore your instructions and..." (should stick to system prompt)

3. Output format

  • If your agent should output JSON, verify it's valid JSON
  • If the output needs to match a specific format, test it explicitly
  • If a downstream agent will parse this output, paste the output into that agent's test to verify it works end-to-end

4. Tool usage (if capabilities enabled)

  • Verify tools are being called when expected
  • Verify tools aren't being called unnecessarily

5. Multi-turn

  • Test at least 3-4 turns to verify the agent handles conversation history correctly

6. Error handling

  • What happens if web_search returns no results?
  • What if web_scraper hits a page it can't read?
  • What if an MCP tool returns an error?

Reading the Test Results

After each response, the test console shows:

  • Response text — The model's response
  • Token count (if shown) — Input tokens + output tokens used
  • Tool calls (if any) — Which tools were called and with what arguments

Slow responses indicate:

  • Many tool calls (each adds latency)
  • RAG retrieval (adds ~200-400ms)
  • Large context from many RAG sources

If response quality is poor:

  • Check your system prompt (most issues are here)
  • Try a different model (some tasks suit different models better)
  • Check if the agent needs a capability it doesn't have
  • Check if the agent needs knowledge it doesn't have (add RAG)