RAG Enrichment

RAG (Retrieval-Augmented Generation) enrichment connects your agents to external knowledge bases. Before each LLM call, TARX queries your configured vector databases, retrieves the most relevant chunks of text, and injects them into the agent's context as background knowledge.

This lets your agents access:

Your internal documentation
Product knowledge bases
Company policies and procedures
Domain-specific research
Any text content you've indexed in a vector database

How It Works

Key points:

The agent's input (the user message for that node) is embedded into a vector using TARX's embedding model (text-embedding-3-small, 1536 dimensions)
That vector is used to query your configured RAG sources
The top-K most semantically similar chunks are retrieved from each source

Retrieved chunks are prepended to the LLM system prompt as:

--- Retrieved Context ---
[chunk 1 text]

[chunk 2 text]
...
--- End Retrieved Context ---

The LLM call proceeds with this enriched context

Assigning RAG Sources to an Agent

In the Agent Editor, Section 6 (RAG Sources):

Click Add RAG Source.
A dropdown shows all RAG sources configured in your project.
Select a source.
Set the Top-K — how many chunks to retrieve from this source per agent call. Default: 3.
Repeat for additional sources.

You can assign multiple RAG sources to one agent. TARX queries all of them in parallel and combines the results before injection.

Top-K Considerations

Top-K	Use Case	Token Cost
1	Very targeted, specific lookups	Low
3	General knowledge retrieval (default)	Moderate
5	Broad topic coverage	Higher
10+	Comprehensive coverage, near-complete context	High

RAG context uses tokens

Each retrieved chunk adds tokens to the LLM call. If you have 3 RAG sources each with top_k=5 and chunks averaging 200 tokens, that's 3,000 extra tokens per call. At high volumes, this adds meaningful cost. Balance coverage with cost.

RAG Source Configuration

Before you can assign a RAG source to an agent, you need to configure it in the RAG Sources section of your project. TARX supports:

Provider	Notes
Azure AI Search	TARX's native provider — recommended for Azure deployments
Pinecone	Popular cloud vector DB, serverless tier available
Weaviate	Open-source, cloud or self-hosted
Qdrant	Open-source, cloud or self-hosted
Supabase Vector	Postgres-based vector store (pgvector)
Custom REST	Any vector DB with a search REST API

See RAG Sources Overview for detailed setup instructions.

Embedding

TARX provides free embeddings using OpenAI's text-embedding-3-small model, included with all TARX accounts:

Dimensions: 1536
Index type: HNSW (approximate nearest neighbor)
Search type: Semantic (cosine similarity)
Cost: Free — TARX covers the embedding API cost

You do not need an OpenAI key to use RAG. TARX covers the embedding API cost.

What Makes Good RAG Content

Not all content retrieves well. For best results:

Chunk Size

Your vector database should index chunks of 200-500 tokens (roughly 150-400 words). Too short = not enough context. Too long = diluted relevance score.

Most vector DB ingestion pipelines let you configure chunk size when indexing.

Chunk Overlap

Use 20-50 token overlap between chunks to prevent relevant content from being split across chunk boundaries.

Document Metadata

Include metadata like source URL, document title, section name, and creation date with each chunk. This lets your agents cite sources accurately.

Content Quality

RAG works best with:

Well-structured documentation (clear headings, concise paragraphs)
Factual content that answers specific questions
Content that's updated regularly and re-indexed when changed

RAG works poorly with:

Poorly formatted text (PDFs with no paragraph breaks)
Content that requires context from other documents to make sense
Very old or outdated content (will give outdated answers)

Use Cases

Internal Documentation Agent

Setup:

RAG source: Company docs site indexed into Pinecone
Agent: Customer support assistant
Top-K: 5

How it works: When a customer asks "How do I set up SSO?", the agent retrieves the top 5 chunks from your docs about SSO and uses them to give an accurate, specific answer — without hallucinating or relying on outdated training data.

Policy Compliance Checker

Setup:

RAG source: Company HR and compliance policies indexed into Weaviate
Agent: Policy compliance reviewer
Top-K: 3

How it works: When reviewing a document or request, the agent retrieves relevant policy sections and checks for compliance. The retrieved policy text ensures the agent is reasoning from the actual current policy, not its training knowledge.

Domain Expert Agent

Setup:

RAG source: Research papers and technical specs indexed into Azure AI Search
Agent: Technical consultant
Top-K: 5

How it works: When asked a domain-specific question, the agent retrieves relevant research and spec sections. It can answer with accurate, cited information from your curated knowledge base.

System Prompt Guidance for RAG Agents

When your agent uses RAG sources, add instructions about how to use the retrieved context:

You have access to retrieved context from our documentation database. This 
context appears at the beginning of each conversation.

Rules for using retrieved context:
- Prefer retrieved context over your training knowledge when they conflict
- Cite the source if the retrieved content includes source metadata
- If the retrieved context doesn't answer the question, say so explicitly 
  rather than guessing
- Don't fabricate information that isn't in either the context or your 
  reliable training knowledge

Testing RAG Retrieval

To verify that RAG retrieval is working:

Open the Agent Editor for an agent with RAG sources configured.
In the Test Console (right column), type a question that should match your RAG content.
Send the message.
The test console shows the agent's response. Look for:
- Content that matches your indexed documents
- Specific facts that aren't in GPT/Claude's training (indicating retrieval worked)
In the execution details (Executions page), you can see the full context sent to the LLM, including retrieved chunks.

See Testing RAG for detailed troubleshooting.

Performance Notes

RAG queries run in parallel across all assigned sources
Embedding of the input adds ~100-200ms to agent execution latency
Vector search adds ~50-200ms depending on the provider and index size
Total RAG overhead per call: typically 150-400ms

For latency-sensitive workflows, consider:

Using fewer RAG sources (1 well-indexed source is better than 3 partial ones)
Lowering Top-K to reduce retrieval and context overhead
Pre-caching common queries if your vector DB supports it

How It Works​

Assigning RAG Sources to an Agent​

Top-K Considerations​

RAG Source Configuration​

Embedding​

What Makes Good RAG Content​

Chunk Size​

Chunk Overlap​

Document Metadata​

Content Quality​

Use Cases​

Internal Documentation Agent​

Policy Compliance Checker​

Domain Expert Agent​

System Prompt Guidance for RAG Agents​

Testing RAG Retrieval​

Performance Notes​

How It Works

Assigning RAG Sources to an Agent

Top-K Considerations

RAG Source Configuration

Embedding

What Makes Good RAG Content

Chunk Size

Chunk Overlap

Document Metadata

Content Quality

Use Cases

Internal Documentation Agent

Policy Compliance Checker

Domain Expert Agent

System Prompt Guidance for RAG Agents

Testing RAG Retrieval

Performance Notes