Embedding Strategy
TARX uses OpenAI's text-embedding-3-small model for all RAG query embeddings. This page explains the strategy, why it matters, and how to align your indexing pipeline with it.
What TARX Uses
| Property | Value |
|---|---|
| Model | text-embedding-3-small (OpenAI) |
| Dimensions | 1536 |
| Distance metric | Cosine similarity |
| Index type | HNSW (Hierarchical Navigable Small World) |
| Search type | Approximate Nearest Neighbor (ANN) |
| Cost to users | Free — TARX covers embedding API costs |
Why text-embedding-3-small
OpenAI's text-embedding-3-small offers an excellent balance of:
- Performance — Strong semantic understanding across domains
- Cost — One of the cheapest embedding APIs available
- Speed — Fast inference (< 100ms per embedding)
- Dimensions — 1536 dimensions is a sweet spot between quality and storage
It's the same model that TARX uses for Bridge's own RAG (indexing the TARX documentation).
Aligning Your Indexing Pipeline
Critical: Your indexing pipeline must use the same embedding model as TARX's query embeddings. Vectors from different models are not comparable — mixing them produces meaningless results.
Use for indexing: text-embedding-3-small, 1536 dimensions
Example indexing code (Python):
from openai import OpenAI
client = OpenAI(api_key="your-openai-key")
def embed_chunk(text: str) -> list[float]:
response = client.embeddings.create(
model="text-embedding-3-small",
input=text
)
return response.data[0].embedding # 1536 floats
# Upsert to Pinecone:
import pinecone
pc = pinecone.Pinecone(api_key="your-pinecone-key")
index = pc.Index("product-docs")
chunks = [
{"id": "doc1-chunk1", "text": "SSO setup requires...", "source": "https://docs.example.com/sso"},
{"id": "doc1-chunk2", "text": "SAML configuration needs...", "source": "https://docs.example.com/sso"}
]
vectors = [
{
"id": chunk["id"],
"values": embed_chunk(chunk["text"]),
"metadata": {"text": chunk["text"], "source": chunk["source"]}
}
for chunk in chunks
]
index.upsert(vectors=vectors)
Chunking Strategy
How you chunk your documents significantly impacts retrieval quality:
Recommended Chunk Settings
| Parameter | Recommended Value | Notes |
|---|---|---|
| Chunk size | 300-500 tokens | ~250-400 words |
| Chunk overlap | 50-100 tokens | Prevents cutting off context at boundaries |
| Min chunk size | 100 tokens | Discard very short chunks (headings, footnotes) |
| Max chunk size | 800 tokens | Larger chunks dilute relevance scores |
Chunking with LangChain (Python)
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=400, # tokens
chunk_overlap=50,
length_function=len, # Or use tiktoken for accurate token counting
separators=["\n\n", "\n", ". ", " ", ""]
)
chunks = splitter.split_text(document_text)
Semantic Chunking
For better quality, use semantic chunking instead of fixed-size chunks:
- Split on paragraph boundaries (double newlines)
- Keep code blocks together
- Keep list items with their list headers
def semantic_chunks(text: str, max_tokens: int = 400) -> list[str]:
# Split on double newlines (paragraph boundaries)
paragraphs = [p.strip() for p in text.split('\n\n') if p.strip()]
chunks = []
current_chunk = []
current_len = 0
for para in paragraphs:
para_len = len(para.split()) # rough token estimate
if current_len + para_len > max_tokens and current_chunk:
chunks.append('\n\n'.join(current_chunk))
current_chunk = [para]
current_len = para_len
else:
current_chunk.append(para)
current_len += para_len
if current_chunk:
chunks.append('\n\n'.join(current_chunk))
return chunks
Metadata to Include
Each chunk should include metadata that agents can reference in their responses:
| Metadata Field | Example | Use |
|---|---|---|
source_url | https://docs.example.com/sso | Agents can cite the source |
title | "SSO Setup Guide" | Human-readable source name |
section | "Prerequisites" | Which section of the doc |
created_at | "2024-01-15" | Helps filter for recency |
category | "configuration" | Domain filtering |
Configure which metadata fields to include in the RAG Source's Metadata Fields setting. Those fields are returned with each chunk and injected into the agent's context alongside the text.
Re-Indexing
When your source documents change, you need to re-index:
- Delete old vectors for changed documents (by doc ID or source URL)
- Re-chunk the updated documents
- Re-embed with
text-embedding-3-small - Upsert new vectors
TARX doesn't manage this process — it's your responsibility to keep the vector DB in sync with your source documents.
Automation tip: Create a TARX workflow with a webhook trigger that fires when your documentation system updates a page. Use an http_request node to call your embedding/index endpoint (or your vector DB's API) so the changed document is re-indexed automatically.
HNSW Index Configuration
For best performance with text-embedding-3-small at 1536 dimensions:
Pinecone
Pinecone handles HNSW configuration automatically. Just specify cosine metric when creating the index.
Weaviate
{
"vectorIndexConfig": {
"distance": "cosine",
"ef": 100,
"efConstruction": 128,
"maxConnections": 64
}
}
Qdrant
from qdrant_client.models import VectorParams, Distance
client.create_collection(
collection_name="product_docs",
vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
hnsw_config={"m": 16, "ef_construct": 100}
)
Azure AI Search
{
"vectorSearch": {
"algorithms": [{
"name": "hnsw-config",
"kind": "hnsw",
"hnswParameters": {
"m": 4,
"efConstruction": 400,
"efSearch": 500,
"metric": "cosine"
}
}]
}
}
Quality vs. Cost Trade-offs
| Configuration | Retrieval Quality | Token Cost | Query Latency |
|---|---|---|---|
| Top-K=1, threshold=0.85 | Very targeted | Very low | Low |
| Top-K=3, threshold=0.7 (default) | Good balance | Low | Low |
| Top-K=5, threshold=0.6 | Higher recall | Moderate | Moderate |
| Top-K=10, threshold=0.5 | Maximum recall | High | Moderate |
For most use cases, the default (top-K=3, threshold=0.7) is the right starting point. Adjust based on your specific needs and the size/quality of your index.