API Reference

Search & Retrieval API

Find relevant documents through the SDK search methods. REST details are included as a contract reference for debugging and advanced integrations.

SDK methods

Most applications should call the SDK. The SDK handles authentication, URL construction, retries, and response parsing.

OperationPythonTypeScript.NETGo
Search for metadata and passagessearchsearchSearchAsyncSearch
Retrieve full content for RAGretrieveretrieveRetrieveAsyncRetrieve
Search with your own query vectorsearch_by_vectorsearchByVectorSearchByVectorAsyncSearchByVector
Run multiple searches in one callbatch_searchbatchSearchBatchSearchAsyncBatchSearch

Search hits expose score: a calibrated relevance integer from 0 to 100, where higher is better. Treat any cutoff as application-specific and validate it against your own documents.

Use search when you need ranked document IDs, titles, content types, and matched passages without downloading full document content.

results = client.search("deployment best practices", k=10)

for res in results:
    print(res.doc_id, res.score, res.passage)

Each SearchResult has this shape:

TypeScript
interface SearchResult {
  doc_id: string;
  score: number;       // 0-100, higher is more relevant
  title?: string;
  content_type: string;
  content?: string;       // present only when inline content is requested
  passage?: string;       // matched chunk when available
}

Parameters

ParameterTypeDefaultApplies toNotes
query / qstringrequiredsearch, retrieve, batch queryNatural-language query text.
kint10 for search, 5 for retrieveall search methodsUpper bound on returned results.
tagsstring[]nonesearch, retrieve, search_by_vectorAND filter: every listed tag must be present.
include_content / includeContentboolfalsesearch, search_by_vector, batch queryRequests full document content inline. retrieve sets this for you.
embeddingfloat[]requiredsearch_by_vectorPre-computed query vector for BYOE workflows.

Retrieve for RAG

Use retrieve when you want text to pass to an LLM. It performs search, asks for inline content when the server supports it, deduplicates by doc_id, and falls back to downloading document text if needed.

results = client.retrieve("deployment best practices", k=5)

for res in results:
    print(res.doc_id, res.title, res.score)
    print(res.content[:200])

RetrievalResult is a SearchResult with content guaranteed:

TypeScript
interface RetrievalResult extends SearchResult {
  content: string;
}

Filtering

Partition scoping

For multi-tenant apps, scope a search to a single end-client with a partition. Unlike tags (a post-filter), a partition is a hard boundary the server applies before the search runs, so a scoped query never considers another partition's documents and a selective partition keeps full recall. Scope a client once and search through it — there's no per-call partition argument:

acme = client.partition("client_acme")
results = acme.retrieve("billing preferences", k=10)   # only ever Acme's docs

To prove a scoped search stays in its partition, use search_trace / searchTrace (returns the partitions a query touched) or the one-line verify_isolation / verifyIsolation self-test — see Provable isolation.

Tags

Tags are the supported post-filter today. Pass tags when inserting documents, then pass the same tags to search or retrieve. A result must match every requested tag.

client.insert_text(
    "Acme prefers invoices in EUR, billed quarterly.",
    filename="acme-billing.txt",
    tags=["customer:acme", "kind:memory"],
)

results = client.retrieve(
    "billing preferences",
    k=10,
    tags=["customer:acme", "kind:memory"],
)

There is no rich metadata query DSL yet: no OR groups, range operators, nested predicates, or arbitrary JSON metadata filters. Use stable tag strings such as customer:acme, user:42, and kind:policy, and keep your own metadata table when you need to audit or enumerate tags.

When a tag matches only a small slice of your documents, a filtered search can return fewer than k results even though more matching documents exist. Request a larger k and filter weak matches by score in your application:

Python
results = client.retrieve("billing preferences", k=10, tags=["customer:acme"])
strong = [r for r in results if r.score >= 60]

The hosted REST API also accepts a max_distance parameter for advanced distance-threshold filtering before scores are returned. Prefer SDK-level over-retrieval plus client-side score filtering unless you are deliberately working at the REST contract layer.

Use batch search when you have several independent queries and want one network round trip.

from aether import BatchSearchQuery

responses = client.batch_search([
    BatchSearchQuery(q="deployment", k=3),
    BatchSearchQuery(q="billing preferences", k=3),
])

for response in responses:
    print(response.query, [hit.doc_id for hit in response.results])

Batch responses are returned in the same order as the input queries. For filtered searches, prefer search or retrieve with tags until the batch tag encoding is aligned across the SDK models and the REST handler.

Search by vector

Use search_by_vector / searchByVector / SearchByVectorAsync / SearchByVector when you generate the query embedding yourself.

results = client.search_by_vector([0.1, 0.2, 0.3, ...], k=5)

for res in results:
    print(res.doc_id, res.score)

Your vector length must match the active embedding index. The default hosted configuration uses minilm-l6-v2; the node detects the model output dimension and defaults to 384 dimensions for the MiniLM path. A mismatched vector returns 400 Bad Request.

REST contract

The SDKs call these routes internally. Use them directly only for debugging, custom clients, or advanced integrations that cannot use an SDK.

MethodPathPurpose
GET/searchSearch by natural-language query.
POST/search/embedSearch by caller-provided embedding vector.
POST/search/batchRun multiple natural-language searches.

GET /search

Query parameterTypeRequiredNotes
qstringyesNatural-language query.
kintnoDefaults to 10.
include_contentboolnoAdds content to each result when possible.
tagscomma-separated stringnoAND filter. Tag values must not contain commas.
max_distancefloatnoAdvanced distance threshold. Results outside the threshold are dropped before the response is scored.

POST /search/embed

JSON
{
  "embedding": [0.1, 0.2, 0.3],
  "k": 5,
  "include_content": false,
  "tags": ["customer:acme"],
  "max_distance": 0.4
}

POST /search/batch

JSON
{
  "queries": [
    {
      "q": "deployment",
      "k": 3,
      "include_content": false,
      "tags": "customer:acme",
      "max_distance": 0.4
    }
  ]
}

In the current REST handler, batch-query tags are a comma-separated string. The SDK batch models expose tag arrays, so avoid filtered batch search until that contract is reconciled; use individual SDK search / retrieve calls for filtered queries.

Response shape

JSON
{
  "query": "deployment",
  "results": [
    {
      "doc_id": "doc_123",
      "score": 87,
      "title": "Production setup",
      "content_type": "text/plain",
      "passage": "Deploy from a protected branch...",
      "content": "Deploy from a protected branch..."
    }
  ]
}

Batch search wraps one response per query:

JSON
{
  "results": [
    {
      "query": "deployment",
      "results": []
    }
  ]
}

Errors

Search endpoints use the shared API error shape:

JSON
{
  "error": "Embedding dimension mismatch: got 3, expected 384",
  "code": null,
  "request_id": "req_..."
}

Common statuses are 400 for invalid input, 401 for missing or invalid authentication, 402 for plan limits, 429 for rate limits, and 500 / 503 for transient server errors. See Errors for retry guidance.