Guides
Structured queries and typed fields
Filter, sort, and aggregate your documents by their structured values — exact counts, rankings, and ranges — without ever running a semantic search. Structured queries are deterministic: the same query over the same data always returns the same answer, because no embedding is involved.
Semantic search answers "what is this about?". Structured queries answer "how many, which ones, sorted how, grouped by what?" — the questions a spreadsheet or a SQL GROUP BY answers. "How many paid invoices over $500 last quarter, by region, highest total first" is not a similarity question, and no amount of vector search will answer it exactly. Aether lets you ask it directly, over the same store your agent already searches.
When to use structured queries
Reach for a structured query when the answer depends on exact values, not meaning:
- Filtering on a field:
status = "paid",amount >= 500,created_atin a date range. - Sorting by a typed value: highest
amountfirst, oldestdue_datelast. - Counting and aggregating: totals, averages, min/max, distinct counts — optionally grouped.
- Exact, complete results: structured queries return the entire matching set (paged), not a top-k by relevance.
Keep using semantic search when the answer depends on meaning — "notes that discuss anxiety", "documents similar to this one". The two compose: the same filter grammar you use here also narrows a search call (see Filtering search).
Step 1 — Declare the fields you want to query
A field is a typed, indexed view over a value your documents already carry. You declare a field once per workspace; Aether then extracts and indexes that value from every existing and future document, so filters and aggregates over it are exact and fast.
Each field has a name, a type, and a source:
| Type | Accepts | Ordering |
|---|---|---|
string | a text value | lexicographic |
int | a whole number (or a whole-valued number like 3.0) | numeric |
float | any number | numeric |
bool | true / false | false < true |
datetime | an RFC 3339 timestamp string | chronological |
string_list | an array of text values (tag-like) | — |
A field's source is where its value comes from:
{ "metadata": "<key>" }— lift the value from a document's structuredmetadata(the typed key/value map you attach at insert time).{ "regex": "<pattern>" }— extract the first capture group of a regular expression run over the document's text (for example, pulling a ticket number out of a subject line).
client.schema.declare_fields([
{"name": "amount", "type": "float", "source": {"metadata": "amount"}},
{"name": "status", "type": "string", "source": {"metadata": "status"}},
{"name": "region", "type": "string", "source": {"metadata": "region"}},
{"name": "labels", "type": "string_list", "source": {"metadata": "labels"}},
{"name": "ticket", "type": "string", "source": {"regex": "TICKET-(\\d+)"}},
])
Declaring a field triggers a background backfill over your existing documents. list_fields reports each field's live coverage (how many documents have a value), mismatch_count, and backfill progress:
for field in client.schema.list_fields():
print(field.name, field.type, field.coverage, field.mismatch_count)
Re-declaring a field name replaces its definition and re-backfills. delete_field(name) removes it and returns the remaining fields.
A bad value never fails ingest
If a document's source value can't be coerced to the field's type — "cheap" for a float field, say — that document is simply treated as not having a value for the field (it's excluded from filters and aggregates on it), and it increments the field's mismatch_count. Inserting a document never fails because of a declared field, so declaring a field can't break your write path.
Step 2 — Filter with the unified grammar
The filter grammar is one small JSON shape used everywhere Aether takes a filter. A filter is either a leaf — one comparison — or a combinator that nests other filters.
A leaf: one comparison
{ "field": "amount", "op": "gte", "value": 100 }
field is a declared field or one of the always-available built-ins (see below). op is one of:
| Operator | Meaning | value |
|---|---|---|
eq | equal to | a scalar |
neq | not equal to | a scalar |
in | equal to any of | an array of scalars |
gt / gte | greater than / or equal | a scalar |
lt / lte | less than / or equal | a scalar |
between | within an inclusive range | a 2-element array [low, high] |
exists | has (or lacks) a value | true (default) or false |
contains | list contains the value | a string (only on string_list fields) |
prefix | string starts with the value | a string (only on string fields) |
Comparisons are typed, which is the whole point: int and float fields order numerically, so 9 is less than 10 (not the string surprise where "9" sorts after "10"), and datetime fields order chronologically.
Combinators: and, or, not
Nest leaves with and / or (arrays) and not (a single filter). They compose to any depth:
{ "and": [
{ "field": "status", "op": "eq", "value": "paid" },
{ "or": [
{ "field": "amount", "op": "gte", "value": 500 },
{ "field": "region", "op": "in", "value": ["us-east", "us-west"] }
]},
{ "not": { "field": "labels", "op": "contains", "value": "test" } }
] }
Built-in fields
Every document exposes these without a declaration:
| Field | Type |
|---|---|
created_at | datetime |
updated_at | datetime |
source | string |
content_type | string |
tags | string_list |
entity_id | string |
Missing values
A document with no value for a field — never set, or a type mismatch — does not match a comparison on that field. (Equivalently, not on such a comparison does match it.) So a filter never silently coerces missing data into a false positive.
Step 3 — Query documents (Mode A)
query with a filter, an optional typed sort, and limit / offset returns the matching documents as a page. sort is a list of { by, dir } keys (dir is asc or desc, default asc); documents missing a sort field come last in either direction. The page carries total (the full matching count) and has_more so you can page through the entire set.
page = client.query(
filter={"and": [
{"field": "status", "op": "eq", "value": "paid"},
{"field": "amount", "op": "gte", "value": 100},
]},
sort=[{"by": "amount", "dir": "desc"}],
limit=20,
)
print(page.total, page.has_more)
for doc in page:
print(doc.doc_id, doc.metadata.get("amount"))
Omit filter to page over every document in scope. To walk the whole result set, repeat with offset += limit while has_more is true.
Step 4 — Aggregate (Mode B)
Add an aggregate list and query switches to aggregation mode: instead of documents, it returns computed rows. Optionally group_by up to two fields to get one row per group.
Aggregate operators: count, count_distinct, sum, avg, min, max. The numeric operators (sum, avg, min, max) require an int or float field; give each an optional as to name its output. A document missing the aggregated field is excluded from that aggregate (a count of the group still counts it).
result = client.query(
filter={"field": "created_at", "op": "gte", "value": "2026-04-01T00:00:00Z"},
group_by=["region"],
aggregate=[
{"op": "count"},
{"op": "sum", "field": "amount", "as": "total"},
{"op": "avg", "field": "amount", "as": "avg_amount"},
],
sort=[{"by": "total", "dir": "desc"}],
limit=20,
)
for group in result.groups:
print(group.keys["region"], group.aggregates["total"])
sort in Mode B orders by an aggregate output name or a group key; limit caps the number of groups returned. The result carries total_groups and scanned (how many documents were considered). A sum over an int field stays an integer; sum and avg otherwise accumulate as floating point.
Guardrails fail loud
Structured queries never return a partial answer dressed up as a complete one. Anything Aether can't answer exactly returns a 400 with a precise message, so you fix the query rather than trust a silently truncated result:
- an unknown field (not declared, not a built-in);
- a type-mismatched literal (comparing a
floatfield to"cheap"); - a non-numeric aggregate (
sumover astringfield); - exceeding the group cap or the candidate-scan cap — narrow the filter or add a partition.
Exact means exact
Because the guardrails return 400 instead of a truncated 200, a structured query is safe to build a total or a ranking on: if it returns a result, that result is complete and exact. If your filter is too broad to answer within the caps, you get an error telling you to narrow it — never a wrong number.
Filtering search with the same grammar
The grammar above is not exclusive to query. You can pass the same filter to search and to document listings, as a superset of the simpler metadata filters — so a structured range predicate sharpens a semantic search's pre-filter too:
results = client.search(
"refund policy",
k=10,
filter={"field": "amount", "op": "gte", "value": 500},
)
Need the exact count or a ranking rather than the top-k by relevance? That's a structured query. Need the most relevant matches? That's search. Same filter, two read paths over one store.
Partition scoping
On a partition-scoped handle, query and every schema call are automatically scoped to that partition, exactly like the rest of the client:
scoped = client.partition("client-7")
page = scoped.query(filter={"field": "status", "op": "eq", "value": "open"})
See multi-tenant patterns for how partitions isolate one customer's data from another's.
Reference
The full request/response shapes and every parameter are in the Structured Query API reference.