Guides
Tuning retrieval
How to pick k, interpret search score, filter weak matches, and decide when to reach for the lower-level search() primitive.
The 5-minute RAG quickstart calls retrieve() and uses the results as-is. That works for most RAG flows. This page is for debugging answer quality, picking the right k, filtering out marginal matches, or wiring a custom retriever.
search vs retrieve
Aether exposes two retrieval primitives on the SDK. They hit the same vector index; they differ in what comes back.
| You want | Use | Returns |
|---|---|---|
| Passages to drop into an LLM prompt | retrieve() | doc_id, score, title, passage, content |
| Results for a search UI, or IDs to look up later | search() | doc_id, score, title, passage, optional inline content |
| Raw score data for ranking or analytics | search() | Same result shape; higher score is better |
Default to retrieve() for RAG. It returns usable content in one call and deduplicates by doc_id. Use search() when you need a lighter metadata response or want to fetch content later through a different code path.
SDK-first path
The SDK calls the hosted search routes for you. You normally do not need to build HTTP requests or set headers yourself; see Authentication for SDK setup.
Picking k
k is the maximum number of matches to return. The right value depends on how much context your LLM can hold, how dense your documents are, and how much noise your workflow can tolerate.
- Start with
k=3tok=5. This is enough context for most RAG flows without crowding the prompt. - Increase
kwhen answers feel incomplete. If relevant facts exist but are missing from context, try a largerk. - Decrease
kwhen marginal matches distract the model. Smallerkforces the model to use only the nearest matches. - Cap
kby your prompt budget. Large documents and largekvalues can exceed an LLM's context window.
Aether returns at most k results, but can return fewer when your store is small or filters narrow the candidate pool.
Search score
Each result carries a score field: a calibrated relevance integer from 0 to 100, where higher means more relevant.
| Score | Interpretation |
|---|---|
90 and above | Very close semantic match. |
60 to 89 | Often useful, depending on your corpus. |
Below 60 | Often weak or tangential. Validate before using as context. |
Validate thresholds on your data
Score distributions shift with your documents, query style, chunk size, and embedding model. Treat these bands as a starting point, not a universal rule.
Filtering weak matches
Current SDKs expose score on every search and retrieval result. The most portable way to filter weak matches is to request a few more results than you need, then keep only results above your score threshold:
results = client.retrieve("How many vacation days do I get?", k=10)
strong = [r for r in results if r.score >= 60]
if not strong:
print("No confident match -- escalating to a human.")
else:
context = "\n\n".join(r.content for r in strong)
The REST API has a max_distance parameter for advanced distance-threshold filtering before scores are returned. Prefer SDK-level score filtering unless you are deliberately working at the REST contract layer.
content vs passage
Each RetrievalResult carries two text fields:
content- the full document the chunk came from. Always populated byretrieve().passage- the matched chunk. Populated when Aether can identify the exact passage that matched the query.
For short entries, content and passage are often effectively the same. For long documents, prefer passage when constructing a compact prompt:
context = "\n\n".join(r.passage or r.content for r in results)
Passing an entire long document into a prompt when only one paragraph is relevant wastes tokens and can dilute the model's answer.
Context formatting
Every SDK ships a context-formatting helper for retrieval results. It prefers the matched passage over the full document content by default.
from aether import AetherClient, format_context
client = AetherClient()
results = client.retrieve("How many vacation days do I get?", k=3)
context = format_context(results)
Customize the per-source template, separator, or passage/content preference:
context = format_context(
results,
template="<{title} | score={score}>\n{text}",
separator="\n---\n",
prefer_passage=False,
)
Tag filters
Both search() and retrieve() accept tags to narrow search to a subset of your store. Tags use AND semantics: every requested tag must match.
results = client.retrieve("vacation policy", k=3, tags=["hr", "policy"])
Tag filters are useful for tenant, user, project, or document-type scoping. They are not a full metadata query language; see Search & Retrieval API for the current limits.
When to use search() directly
A few cases where search() beats retrieve():
- Search UIs. You are rendering a list of results with previews and do not need full text up front.
- Two-stage retrieval. You want IDs first, then fetch content selectively after an authorization or ranking step.
- Score-only workflows. Recommendation systems, similarity scoring, or analytics where you only care about ranking score.
- Batch workflows. You are running several independent queries and can use
batch_search/batchSearch/BatchSearchAsync/BatchSearch.
search() defaults to k=10; retrieve() defaults to k=5.
Going lower-level: search_by_vector
If you generate embeddings with your own model, call search_by_vector() directly with a raw vector to bypass Aether's built-in query embedding. The vector must match the active index dimension. See Search by vector.
Next steps
- Quickstart: 5-minute RAG - the happy path, before tuning
- Search & Retrieval API - full reference for every parameter and field
- Concepts - plain-language explanations of embeddings, vector search, and chunks
- Troubleshooting - fixing common retrieval problems