Embedding Strategy

TARX uses OpenAI's text-embedding-3-small model for all RAG query embeddings. This page explains the strategy, why it matters, and how to align your indexing pipeline with it.

What TARX Uses

Property	Value
Model	`text-embedding-3-small` (OpenAI)
Dimensions	1536
Distance metric	Cosine similarity
Index type	HNSW (Hierarchical Navigable Small World)
Search type	Approximate Nearest Neighbor (ANN)
Cost to users	Free — TARX covers embedding API costs

Why text-embedding-3-small

OpenAI's text-embedding-3-small offers an excellent balance of:

Performance — Strong semantic understanding across domains
Cost — One of the cheapest embedding APIs available
Speed — Fast inference (< 100ms per embedding)
Dimensions — 1536 dimensions is a sweet spot between quality and storage

It's the same model that TARX uses for Bridge's own RAG (indexing the TARX documentation).

Aligning Your Indexing Pipeline

Critical: Your indexing pipeline must use the same embedding model as TARX's query embeddings. Vectors from different models are not comparable — mixing them produces meaningless results.

Use for indexing: text-embedding-3-small, 1536 dimensions

Example indexing code (Python):

from openai import OpenAI

client = OpenAI(api_key="your-openai-key")

def embed_chunk(text: str) -> list[float]:
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding  # 1536 floats

# Upsert to Pinecone:
import pinecone

pc = pinecone.Pinecone(api_key="your-pinecone-key")
index = pc.Index("product-docs")

chunks = [
    {"id": "doc1-chunk1", "text": "SSO setup requires...", "source": "https://docs.example.com/sso"},
    {"id": "doc1-chunk2", "text": "SAML configuration needs...", "source": "https://docs.example.com/sso"}
]

vectors = [
    {
        "id": chunk["id"],
        "values": embed_chunk(chunk["text"]),
        "metadata": {"text": chunk["text"], "source": chunk["source"]}
    }
    for chunk in chunks
]

index.upsert(vectors=vectors)

Chunking Strategy

How you chunk your documents significantly impacts retrieval quality:

Recommended Chunk Settings

Parameter	Recommended Value	Notes
Chunk size	300-500 tokens	~250-400 words
Chunk overlap	50-100 tokens	Prevents cutting off context at boundaries
Min chunk size	100 tokens	Discard very short chunks (headings, footnotes)
Max chunk size	800 tokens	Larger chunks dilute relevance scores

Chunking with LangChain (Python)

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=400,           # tokens
    chunk_overlap=50,
    length_function=len,      # Or use tiktoken for accurate token counting
    separators=["\n\n", "\n", ". ", " ", ""]
)

chunks = splitter.split_text(document_text)

Semantic Chunking

For better quality, use semantic chunking instead of fixed-size chunks:

Split on paragraph boundaries (double newlines)
Keep code blocks together
Keep list items with their list headers

def semantic_chunks(text: str, max_tokens: int = 400) -> list[str]:
    # Split on double newlines (paragraph boundaries)
    paragraphs = [p.strip() for p in text.split('\n\n') if p.strip()]
    
    chunks = []
    current_chunk = []
    current_len = 0
    
    for para in paragraphs:
        para_len = len(para.split())  # rough token estimate
        if current_len + para_len > max_tokens and current_chunk:
            chunks.append('\n\n'.join(current_chunk))
            current_chunk = [para]
            current_len = para_len
        else:
            current_chunk.append(para)
            current_len += para_len
    
    if current_chunk:
        chunks.append('\n\n'.join(current_chunk))
    
    return chunks

Metadata to Include

Each chunk should include metadata that agents can reference in their responses:

Metadata Field	Example	Use
`source_url`	`https://docs.example.com/sso`	Agents can cite the source
`title`	`"SSO Setup Guide"`	Human-readable source name
`section`	`"Prerequisites"`	Which section of the doc
`created_at`	`"2024-01-15"`	Helps filter for recency
`category`	`"configuration"`	Domain filtering

Configure which metadata fields to include in the RAG Source's Metadata Fields setting. Those fields are returned with each chunk and injected into the agent's context alongside the text.

Re-Indexing

When your source documents change, you need to re-index:

Delete old vectors for changed documents (by doc ID or source URL)
Re-chunk the updated documents
Re-embed with text-embedding-3-small
Upsert new vectors

TARX doesn't manage this process — it's your responsibility to keep the vector DB in sync with your source documents.

Automation tip: Create a TARX workflow with a webhook trigger that fires when your documentation system updates a page. Use an http_request node to call your embedding/index endpoint (or your vector DB's API) so the changed document is re-indexed automatically.

HNSW Index Configuration

For best performance with text-embedding-3-small at 1536 dimensions:

Pinecone

Pinecone handles HNSW configuration automatically. Just specify cosine metric when creating the index.

Weaviate

{
  "vectorIndexConfig": {
    "distance": "cosine",
    "ef": 100,
    "efConstruction": 128,
    "maxConnections": 64
  }
}

Qdrant

from qdrant_client.models import VectorParams, Distance

client.create_collection(
    collection_name="product_docs",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
    hnsw_config={"m": 16, "ef_construct": 100}
)

Azure AI Search

{
  "vectorSearch": {
    "algorithms": [{
      "name": "hnsw-config",
      "kind": "hnsw",
      "hnswParameters": {
        "m": 4,
        "efConstruction": 400,
        "efSearch": 500,
        "metric": "cosine"
      }
    }]
  }
}

Quality vs. Cost Trade-offs

Configuration	Retrieval Quality	Token Cost	Query Latency
Top-K=1, threshold=0.85	Very targeted	Very low	Low
Top-K=3, threshold=0.7 (default)	Good balance	Low	Low
Top-K=5, threshold=0.6	Higher recall	Moderate	Moderate
Top-K=10, threshold=0.5	Maximum recall	High	Moderate

For most use cases, the default (top-K=3, threshold=0.7) is the right starting point. Adjust based on your specific needs and the size/quality of your index.

What TARX Uses​

Why text-embedding-3-small​

Aligning Your Indexing Pipeline​

Chunking Strategy​

Recommended Chunk Settings​

Chunking with LangChain (Python)​

Semantic Chunking​

Metadata to Include​

Re-Indexing​

HNSW Index Configuration​

Pinecone​

Weaviate​

Qdrant​

Azure AI Search​

Quality vs. Cost Trade-offs​