Skip to main content

RAG Enrichment

RAG (Retrieval-Augmented Generation) enrichment connects your agents to external knowledge bases. Before each LLM call, TARX queries your configured vector databases, retrieves the most relevant chunks of text, and injects them into the agent's context as background knowledge.

This lets your agents access:

  • Your internal documentation
  • Product knowledge bases
  • Company policies and procedures
  • Domain-specific research
  • Any text content you've indexed in a vector database

How It Works

Key points:

  1. The agent's input (the user message for that node) is embedded into a vector using TARX's embedding model (text-embedding-3-small, 1536 dimensions)
  2. That vector is used to query your configured RAG sources
  3. The top-K most semantically similar chunks are retrieved from each source
  4. Retrieved chunks are prepended to the LLM system prompt as:
    --- Retrieved Context ---
    [chunk 1 text]

    [chunk 2 text]
    ...
    --- End Retrieved Context ---
  5. The LLM call proceeds with this enriched context

Assigning RAG Sources to an Agent

In the Agent Editor, Section 6 (RAG Sources):

  1. Click Add RAG Source.
  2. A dropdown shows all RAG sources configured in your project.
  3. Select a source.
  4. Set the Top-K — how many chunks to retrieve from this source per agent call. Default: 3.
  5. Repeat for additional sources.

You can assign multiple RAG sources to one agent. TARX queries all of them in parallel and combines the results before injection.

Top-K Considerations

Top-KUse CaseToken Cost
1Very targeted, specific lookupsLow
3General knowledge retrieval (default)Moderate
5Broad topic coverageHigher
10+Comprehensive coverage, near-complete contextHigh
RAG context uses tokens

Each retrieved chunk adds tokens to the LLM call. If you have 3 RAG sources each with top_k=5 and chunks averaging 200 tokens, that's 3,000 extra tokens per call. At high volumes, this adds meaningful cost. Balance coverage with cost.


RAG Source Configuration

Before you can assign a RAG source to an agent, you need to configure it in the RAG Sources section of your project. TARX supports:

ProviderNotes
Azure AI SearchTARX's native provider — recommended for Azure deployments
PineconePopular cloud vector DB, serverless tier available
WeaviateOpen-source, cloud or self-hosted
QdrantOpen-source, cloud or self-hosted
Supabase VectorPostgres-based vector store (pgvector)
Custom RESTAny vector DB with a search REST API

See RAG Sources Overview for detailed setup instructions.


Embedding

TARX provides free embeddings using OpenAI's text-embedding-3-small model, included with all TARX accounts:

  • Dimensions: 1536
  • Index type: HNSW (approximate nearest neighbor)
  • Search type: Semantic (cosine similarity)
  • Cost: Free — TARX covers the embedding API cost

You do not need an OpenAI key to use RAG. TARX covers the embedding API cost.


What Makes Good RAG Content

Not all content retrieves well. For best results:

Chunk Size

Your vector database should index chunks of 200-500 tokens (roughly 150-400 words). Too short = not enough context. Too long = diluted relevance score.

Most vector DB ingestion pipelines let you configure chunk size when indexing.

Chunk Overlap

Use 20-50 token overlap between chunks to prevent relevant content from being split across chunk boundaries.

Document Metadata

Include metadata like source URL, document title, section name, and creation date with each chunk. This lets your agents cite sources accurately.

Content Quality

RAG works best with:

  • Well-structured documentation (clear headings, concise paragraphs)
  • Factual content that answers specific questions
  • Content that's updated regularly and re-indexed when changed

RAG works poorly with:

  • Poorly formatted text (PDFs with no paragraph breaks)
  • Content that requires context from other documents to make sense
  • Very old or outdated content (will give outdated answers)

Use Cases

Internal Documentation Agent

Setup:

  • RAG source: Company docs site indexed into Pinecone
  • Agent: Customer support assistant
  • Top-K: 5

How it works: When a customer asks "How do I set up SSO?", the agent retrieves the top 5 chunks from your docs about SSO and uses them to give an accurate, specific answer — without hallucinating or relying on outdated training data.

Policy Compliance Checker

Setup:

  • RAG source: Company HR and compliance policies indexed into Weaviate
  • Agent: Policy compliance reviewer
  • Top-K: 3

How it works: When reviewing a document or request, the agent retrieves relevant policy sections and checks for compliance. The retrieved policy text ensures the agent is reasoning from the actual current policy, not its training knowledge.

Domain Expert Agent

Setup:

  • RAG source: Research papers and technical specs indexed into Azure AI Search
  • Agent: Technical consultant
  • Top-K: 5

How it works: When asked a domain-specific question, the agent retrieves relevant research and spec sections. It can answer with accurate, cited information from your curated knowledge base.


System Prompt Guidance for RAG Agents

When your agent uses RAG sources, add instructions about how to use the retrieved context:

You have access to retrieved context from our documentation database. This
context appears at the beginning of each conversation.

Rules for using retrieved context:
- Prefer retrieved context over your training knowledge when they conflict
- Cite the source if the retrieved content includes source metadata
- If the retrieved context doesn't answer the question, say so explicitly
rather than guessing
- Don't fabricate information that isn't in either the context or your
reliable training knowledge

Testing RAG Retrieval

To verify that RAG retrieval is working:

  1. Open the Agent Editor for an agent with RAG sources configured.
  2. In the Test Console (right column), type a question that should match your RAG content.
  3. Send the message.
  4. The test console shows the agent's response. Look for:
    • Content that matches your indexed documents
    • Specific facts that aren't in GPT/Claude's training (indicating retrieval worked)
  5. In the execution details (Executions page), you can see the full context sent to the LLM, including retrieved chunks.

See Testing RAG for detailed troubleshooting.


Performance Notes

  • RAG queries run in parallel across all assigned sources
  • Embedding of the input adds ~100-200ms to agent execution latency
  • Vector search adds ~50-200ms depending on the provider and index size
  • Total RAG overhead per call: typically 150-400ms

For latency-sensitive workflows, consider:

  • Using fewer RAG sources (1 well-indexed source is better than 3 partial ones)
  • Lowering Top-K to reduce retrieval and context overhead
  • Pre-caching common queries if your vector DB supports it