Guides

Tuning retrieval

How to pick k, interpret search score, filter weak matches, and decide when to reach for the lower-level search() primitive.

The 5-minute RAG quickstart calls retrieve() and uses the results as-is. That works for most RAG flows. This page is for debugging answer quality, picking the right k, filtering out marginal matches, or wiring a custom retriever.


search vs retrieve

Aether exposes two retrieval primitives on the SDK. They hit the same vector index; they differ in what comes back.

You wantUseReturns
Passages to drop into an LLM promptretrieve()doc_id, score, title, passage, content
Results for a search UI, or IDs to look up latersearch()doc_id, score, title, passage, optional inline content
Raw score data for ranking or analyticssearch()Same result shape; higher score is better

Default to retrieve() for RAG. It returns usable content in one call and deduplicates by doc_id. Use search() when you need a lighter metadata response or want to fetch content later through a different code path.

SDK-first path

The SDK calls the hosted search routes for you. You normally do not need to build HTTP requests or set headers yourself; see Authentication for SDK setup.


Picking k

k is the maximum number of matches to return. The right value depends on how much context your LLM can hold, how dense your documents are, and how much noise your workflow can tolerate.

  • Start with k=3 to k=5. This is enough context for most RAG flows without crowding the prompt.
  • Increase k when answers feel incomplete. If relevant facts exist but are missing from context, try a larger k.
  • Decrease k when marginal matches distract the model. Smaller k forces the model to use only the nearest matches.
  • Cap k by your prompt budget. Large documents and large k values can exceed an LLM's context window.

Aether returns at most k results, but can return fewer when your store is small or filters narrow the candidate pool.


Search score

Each result carries a score field: a calibrated relevance integer from 0 to 100, where higher means more relevant.

ScoreInterpretation
90 and aboveVery close semantic match.
60 to 89Often useful, depending on your corpus.
Below 60Often weak or tangential. Validate before using as context.

Validate thresholds on your data

Score distributions shift with your documents, query style, chunk size, and embedding model. Treat these bands as a starting point, not a universal rule.

Filtering weak matches

Current SDKs expose score on every search and retrieval result. The most portable way to filter weak matches is to request a few more results than you need, then keep only results above your score threshold:

results = client.retrieve("How many vacation days do I get?", k=10)
strong = [r for r in results if r.score >= 60]

if not strong:
    print("No confident match -- escalating to a human.")
else:
    context = "\n\n".join(r.content for r in strong)

The REST API has a max_distance parameter for advanced distance-threshold filtering before scores are returned. Prefer SDK-level score filtering unless you are deliberately working at the REST contract layer.


content vs passage

Each RetrievalResult carries two text fields:

  • content - the full document the chunk came from. Always populated by retrieve().
  • passage - the matched chunk. Populated when Aether can identify the exact passage that matched the query.

For short entries, content and passage are often effectively the same. For long documents, prefer passage when constructing a compact prompt:

context = "\n\n".join(r.passage or r.content for r in results)

Passing an entire long document into a prompt when only one paragraph is relevant wastes tokens and can dilute the model's answer.


Context formatting

Every SDK ships a context-formatting helper for retrieval results. It prefers the matched passage over the full document content by default.

from aether import AetherClient, format_context

client = AetherClient()
results = client.retrieve("How many vacation days do I get?", k=3)
context = format_context(results)

Customize the per-source template, separator, or passage/content preference:

context = format_context(
    results,
    template="<{title} | score={score}>\n{text}",
    separator="\n---\n",
    prefer_passage=False,
)

Tag filters

Both search() and retrieve() accept tags to narrow search to a subset of your store. Tags use AND semantics: every requested tag must match.

results = client.retrieve("vacation policy", k=3, tags=["hr", "policy"])

Tag filters are useful for tenant, user, project, or document-type scoping. They are not a full metadata query language; see Search & Retrieval API for the current limits.


When to use search() directly

A few cases where search() beats retrieve():

  • Search UIs. You are rendering a list of results with previews and do not need full text up front.
  • Two-stage retrieval. You want IDs first, then fetch content selectively after an authorization or ranking step.
  • Score-only workflows. Recommendation systems, similarity scoring, or analytics where you only care about ranking score.
  • Batch workflows. You are running several independent queries and can use batch_search / batchSearch / BatchSearchAsync / BatchSearch.

search() defaults to k=10; retrieve() defaults to k=5.


Going lower-level: search_by_vector

If you generate embeddings with your own model, call search_by_vector() directly with a raw vector to bypass Aether's built-in query embedding. The vector must match the active index dimension. See Search by vector.


Next steps