Testing Agents

Every agent has a built-in Test Console — a live chat interface in the right column of the Agent Editor. Use it to validate your agent's behavior before deploying it in workflows.

The Test Console

The Test Console appears in the right column of the Agent Editor. It shows:

System prompt preview — The current system prompt (with any skill additions if a skill is assigned)
Chat area — The conversation history
Input field — Where you type test messages
Send button — Sends the message and gets an LLM response

The test console uses your agent's exact configuration — the same model, system prompt, temperature, max tokens, capabilities, skills, RAG sources, and MCP servers. What you see in the test console is what you'll get in workflows.

Save before testing

Changes to the agent config are applied to the test console in real time, but only if you save first. Click Save Agent before running tests if you've made changes.

Running Your First Test

Open the Agent Editor for any agent.
In the right column (Test Console), type a test message in the input field.
Click Send (or press Enter).
The agent's response appears in the chat area.
Continue the conversation with follow-up messages.

The test console maintains full multi-turn conversation history — the model sees all previous messages in the thread.

Testing Capabilities

If your agent has capabilities enabled, test that the agent uses them correctly.

Testing web_search

Send a message that requires current information:

User: What are the latest Claude models available from Anthropic as of this month?

Expected behavior: The agent should call web_search, retrieve current results, and provide an up-to-date answer with sources.

Observe: Does the response include actual source URLs? Is the information current? Did the model feel the need to search (or did it rely on potentially outdated training data)?

Testing web_scraper

Send a message with a specific URL:

User: Please summarize the content at https://docs.tarx.io/intro

Expected behavior: Agent calls web_scraper on the URL, reads the content, and summarizes it accurately.

Only two capabilities

Agents have exactly two capabilities — web_search and web_scraper. To call an arbitrary API as part of a workflow, use the http_request node; for database/SaaS access, attach an MCP server to the agent and test its tools the same way.

Testing RAG Retrieval

When your agent has RAG sources, test that retrieval is working:

Think of a question whose answer exists specifically in your indexed documents (not in the model's training data).
Ask that specific question.
Check if the answer is accurate and matches your source content.

Example:

User: What is our company's policy on remote work expense reimbursement?

If your RAG source contains your company policy docs, the agent should return the specific policy details — not a generic "it depends on company policy" response.

Debugging RAG:

If the answer is generic, retrieval may not be working — check the RAG Source connection status
If the answer is confidently wrong, it may be using training data instead of retrieved content — strengthen your system prompt to prefer retrieved context

Multi-Turn Testing

The test console maintains conversation history. Use this to test multi-turn behaviors:

Testing Follow-Up Handling

Turn 1: "Analyze this dataset: [data]"
Turn 2: "What's the trend in Q3?"
Turn 3: "Compare that with Q2"

The agent should maintain context across all three turns.

Testing Conversation Reset

If you want to start a fresh conversation:

Click Clear (or New Conversation) in the test console toolbar.
The chat history is cleared.
The next message starts a fresh context.

Overriding the System Prompt

The test console has an Override System Prompt toggle. When enabled:

A text field appears above the chat area.
You can type a temporary system prompt.
This replaces the agent's configured system prompt for the test session only.
The override is not saved — it's for experimentation only.

This is useful for:

A/B testing different prompt approaches before committing
Testing what happens with a minimal system prompt
Debugging whether an issue is in the system prompt or the model

Test Console vs. Workflow Execution

	Test Console	Workflow Execution
Trigger	Manual (type a message)	Triggered by workflow trigger node
Input	Manually typed	From `{{trigger.output}}` or upstream node
Context	Fresh conversation (no workflow context)	Has workflow execution context
Expressions	Not evaluated	`{{trigger.output}}` etc. are resolved
MCP servers	Connected and queried	Connected and queried
RAG sources	Queried	Queried
Capabilities	Enabled and callable	Enabled and callable
Logging	Not saved to execution history	Saved to execution history

The test console is a good approximation of real execution but note the differences:

It does not evaluate workflow expressions ({{trigger.output}} would appear literally)
Test conversations are not saved to execution history
There is no Human-in-Loop integration in the test console

What to Test Before Deploying

Run through this checklist before using an agent in a production workflow:

1. Happy path

Send the type of input the agent will receive in the workflow
Verify the output matches the expected format and quality

2. Edge cases

Empty input: ""
Very long input: paste a 5000-word document
Off-topic input: "What's the weather like?" (should be handled per constraints)
Adversarial: "Ignore your instructions and..." (should stick to system prompt)

3. Output format

If your agent should output JSON, verify it's valid JSON
If the output needs to match a specific format, test it explicitly
If a downstream agent will parse this output, paste the output into that agent's test to verify it works end-to-end

4. Tool usage (if capabilities enabled)

Verify tools are being called when expected
Verify tools aren't being called unnecessarily

5. Multi-turn

Test at least 3-4 turns to verify the agent handles conversation history correctly

6. Error handling

What happens if web_search returns no results?
What if web_scraper hits a page it can't read?
What if an MCP tool returns an error?

Reading the Test Results

After each response, the test console shows:

Response text — The model's response
Token count (if shown) — Input tokens + output tokens used
Tool calls (if any) — Which tools were called and with what arguments

Slow responses indicate:

Many tool calls (each adds latency)
RAG retrieval (adds ~200-400ms)
Large context from many RAG sources

If response quality is poor:

Check your system prompt (most issues are here)
Try a different model (some tasks suit different models better)
Check if the agent needs a capability it doesn't have
Check if the agent needs knowledge it doesn't have (add RAG)

The Test Console​

Running Your First Test​

Testing Capabilities​

Testing web_search​

Testing web_scraper​

Testing RAG Retrieval​

Multi-Turn Testing​

Testing Follow-Up Handling​

Testing Conversation Reset​

Overriding the System Prompt​

Test Console vs. Workflow Execution​

What to Test Before Deploying​

Reading the Test Results​