Skip to main content

Embedding Strategy

TARX uses OpenAI's text-embedding-3-small model for all RAG query embeddings. This page explains the strategy, why it matters, and how to align your indexing pipeline with it.


What TARX Uses

PropertyValue
Modeltext-embedding-3-small (OpenAI)
Dimensions1536
Distance metricCosine similarity
Index typeHNSW (Hierarchical Navigable Small World)
Search typeApproximate Nearest Neighbor (ANN)
Cost to usersFree — TARX covers embedding API costs

Why text-embedding-3-small

OpenAI's text-embedding-3-small offers an excellent balance of:

  • Performance — Strong semantic understanding across domains
  • Cost — One of the cheapest embedding APIs available
  • Speed — Fast inference (< 100ms per embedding)
  • Dimensions — 1536 dimensions is a sweet spot between quality and storage

It's the same model that TARX uses for Bridge's own RAG (indexing the TARX documentation).


Aligning Your Indexing Pipeline

Critical: Your indexing pipeline must use the same embedding model as TARX's query embeddings. Vectors from different models are not comparable — mixing them produces meaningless results.

Use for indexing: text-embedding-3-small, 1536 dimensions

Example indexing code (Python):

from openai import OpenAI

client = OpenAI(api_key="your-openai-key")

def embed_chunk(text: str) -> list[float]:
response = client.embeddings.create(
model="text-embedding-3-small",
input=text
)
return response.data[0].embedding # 1536 floats

# Upsert to Pinecone:
import pinecone

pc = pinecone.Pinecone(api_key="your-pinecone-key")
index = pc.Index("product-docs")

chunks = [
{"id": "doc1-chunk1", "text": "SSO setup requires...", "source": "https://docs.example.com/sso"},
{"id": "doc1-chunk2", "text": "SAML configuration needs...", "source": "https://docs.example.com/sso"}
]

vectors = [
{
"id": chunk["id"],
"values": embed_chunk(chunk["text"]),
"metadata": {"text": chunk["text"], "source": chunk["source"]}
}
for chunk in chunks
]

index.upsert(vectors=vectors)

Chunking Strategy

How you chunk your documents significantly impacts retrieval quality:

ParameterRecommended ValueNotes
Chunk size300-500 tokens~250-400 words
Chunk overlap50-100 tokensPrevents cutting off context at boundaries
Min chunk size100 tokensDiscard very short chunks (headings, footnotes)
Max chunk size800 tokensLarger chunks dilute relevance scores

Chunking with LangChain (Python)

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
chunk_size=400, # tokens
chunk_overlap=50,
length_function=len, # Or use tiktoken for accurate token counting
separators=["\n\n", "\n", ". ", " ", ""]
)

chunks = splitter.split_text(document_text)

Semantic Chunking

For better quality, use semantic chunking instead of fixed-size chunks:

  • Split on paragraph boundaries (double newlines)
  • Keep code blocks together
  • Keep list items with their list headers
def semantic_chunks(text: str, max_tokens: int = 400) -> list[str]:
# Split on double newlines (paragraph boundaries)
paragraphs = [p.strip() for p in text.split('\n\n') if p.strip()]

chunks = []
current_chunk = []
current_len = 0

for para in paragraphs:
para_len = len(para.split()) # rough token estimate
if current_len + para_len > max_tokens and current_chunk:
chunks.append('\n\n'.join(current_chunk))
current_chunk = [para]
current_len = para_len
else:
current_chunk.append(para)
current_len += para_len

if current_chunk:
chunks.append('\n\n'.join(current_chunk))

return chunks

Metadata to Include

Each chunk should include metadata that agents can reference in their responses:

Metadata FieldExampleUse
source_urlhttps://docs.example.com/ssoAgents can cite the source
title"SSO Setup Guide"Human-readable source name
section"Prerequisites"Which section of the doc
created_at"2024-01-15"Helps filter for recency
category"configuration"Domain filtering

Configure which metadata fields to include in the RAG Source's Metadata Fields setting. Those fields are returned with each chunk and injected into the agent's context alongside the text.


Re-Indexing

When your source documents change, you need to re-index:

  1. Delete old vectors for changed documents (by doc ID or source URL)
  2. Re-chunk the updated documents
  3. Re-embed with text-embedding-3-small
  4. Upsert new vectors

TARX doesn't manage this process — it's your responsibility to keep the vector DB in sync with your source documents.

Automation tip: Create a TARX workflow with a webhook trigger that fires when your documentation system updates a page. Use an http_request node to call your embedding/index endpoint (or your vector DB's API) so the changed document is re-indexed automatically.


HNSW Index Configuration

For best performance with text-embedding-3-small at 1536 dimensions:

Pinecone

Pinecone handles HNSW configuration automatically. Just specify cosine metric when creating the index.

Weaviate

{
"vectorIndexConfig": {
"distance": "cosine",
"ef": 100,
"efConstruction": 128,
"maxConnections": 64
}
}

Qdrant

from qdrant_client.models import VectorParams, Distance

client.create_collection(
collection_name="product_docs",
vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
hnsw_config={"m": 16, "ef_construct": 100}
)
{
"vectorSearch": {
"algorithms": [{
"name": "hnsw-config",
"kind": "hnsw",
"hnswParameters": {
"m": 4,
"efConstruction": 400,
"efSearch": 500,
"metric": "cosine"
}
}]
}
}

Quality vs. Cost Trade-offs

ConfigurationRetrieval QualityToken CostQuery Latency
Top-K=1, threshold=0.85Very targetedVery lowLow
Top-K=3, threshold=0.7 (default)Good balanceLowLow
Top-K=5, threshold=0.6Higher recallModerateModerate
Top-K=10, threshold=0.5Maximum recallHighModerate

For most use cases, the default (top-K=3, threshold=0.7) is the right starting point. Adjust based on your specific needs and the size/quality of your index.