Aether Docs

Build streaming chat and agentic RAG in Next.js by pairing Aether's vector search with the Vercel AI SDK.

The Vercel AI SDK (v5) gives you a unified interface over language model providers, streaming UI primitives for React, and a tool-calling loop for agents. Aether handles document storage and semantic retrieval. Together they cover the full stack of a knowledge-grounded chat app: ingest documents into Aether, retrieve relevant passages per request, and stream a grounded answer to the browser.

This guide builds up in four steps: a one-shot generateText call, a streaming chat route with useChat, an agent that decides when to search, and per-user scoping with tags.

Install and setup

Install the AI SDK core, a provider package, the React hooks, the Aether SDK, and Zod (used for tool schemas later):

Bash

npm install ai @ai-sdk/anthropic @ai-sdk/react @aether-ai/sdk zod

The examples use Anthropic Claude, but every snippet works with any AI SDK provider — swap @ai-sdk/anthropic for @ai-sdk/openai, @ai-sdk/google, and so on, and change the model line.

Set two environment variables (in .env.local for local Next.js development):

AETHER_API_KEY — your Aether API key. The AetherClient constructor reads it automatically, so you never have to pass it in code.
ANTHROPIC_API_KEY — your provider key. The anthropic() model factory reads it the same way.

Server-side only

Both keys are secrets. Create the AetherClient and call the AI SDK only in route handlers, Server Components, or server actions — never in client components.

Quick RAG with `generateText`

The simplest pattern: retrieve relevant documents, format them as context, and pass them to the model in the system prompt. The formatContext() helper ships with the Aether SDK and turns retrieve() results into a numbered, LLM-ready context block so you don't have to hand-roll the string assembly.

TypeScript

import { AetherClient, formatContext } from "@aether-ai/sdk";
import { anthropic } from "@ai-sdk/anthropic";
import { generateText } from "ai";

const aether = new AetherClient(); // reads AETHER_API_KEY from the environment

const question = "What's the company match for 401k?";

// Find the most relevant documents and build a context block
const results = await aether.retrieve(question, 3);
const context = formatContext(results);

const { text } = await generateText({
  model: anthropic("claude-sonnet-4-20250514"),
  system: `Answer using only this context:\n\n${context}`,
  prompt: question,
});

console.log(text);

By default formatContext() renders each result as [Source N] followed by the matched passage. Pass a template to change the shape — placeholders include {i}, {title}, {doc_id}, {text}, and {score}:

TypeScript

const context = formatContext(results, {
  template: "[{title}] (score {score})\n{text}",
});

This assumes you've already inserted documents — see the Documents API if your store is empty.

Streaming chat route

For a chat UI you want streaming. The server side is a Next.js route handler at app/api/chat/route.ts: take the conversation, retrieve context for the latest user message, and return a UI message stream.

TypeScript

// app/api/chat/route.ts
import { AetherClient, formatContext } from "@aether-ai/sdk";
import { anthropic } from "@ai-sdk/anthropic";
import { convertToModelMessages, streamText, type UIMessage } from "ai";

const aether = new AetherClient();

export async function POST(req: Request) {
  const { messages }: { messages: UIMessage[] } = await req.json();

  // Use the latest user message as the retrieval query
  const lastMessage = messages[messages.length - 1];
  const query = lastMessage.parts
    .filter((part) => part.type === "text")
    .map((part) => part.text)
    .join("\n");

  const results = await aether.retrieve(query, 5);
  const context = formatContext(results);

  const result = streamText({
    model: anthropic("claude-sonnet-4-20250514"),
    system: `You are a helpful assistant. Ground your answers in this context:\n\n${context}`,
    messages: convertToModelMessages(messages),
  });

  return result.toUIMessageStreamResponse();
}

Two AI SDK v5 details worth noting: UI messages and model messages are separate types, so the conversation from the browser passes through convertToModelMessages() before reaching the model, and toUIMessageStreamResponse() streams the reply back in the format the useChat hook expects.

On the client, useChat from @ai-sdk/react manages the conversation and posts to /api/chat by default. Message content lives in a parts array:

TypeScript

// app/page.tsx
"use client";

import { useChat } from "@ai-sdk/react";
import { useState } from "react";

export default function Chat() {
  const { messages, sendMessage, status } = useChat();
  const [input, setInput] = useState("");

  return (
    <div>
      {messages.map((message) => (
        <div key={message.id}>
          <strong>{message.role === "user" ? "You: " : "AI: "}</strong>
          {message.parts.map((part, i) =>
            part.type === "text" ? <span key={i}>{part.text}</span> : null,
          )}
        </div>
      ))}
      <form
        onSubmit={(e) => {
          e.preventDefault();
          if (!input.trim()) return;
          sendMessage({ text: input });
          setInput("");
        }}
      >
        <input
          value={input}
          onChange={(e) => setInput(e.target.value)}
          placeholder="Ask a question..."
          disabled={status !== "ready"}
        />
      </form>
    </div>
  );
}

This route retrieves on every message, whether or not the question needs it. That's fine for a knowledge-base assistant where most questions do. When retrieval should be the model's decision, use a tool instead.

Agentic retrieval with tools

Instead of always stuffing context into the system prompt, expose Aether as tools and let the model decide when to search — and what to search for. The model often writes better retrieval queries than the raw user message (it strips filler, splits compound questions, and can search more than once).

In AI SDK v5, tools take an inputSchema (a Zod schema — not parameters, which was the v4 name), and stopWhen: stepCountIs(n) lets the model take multiple tool-call steps before producing its final answer.

TypeScript

// app/api/chat/route.ts
import { AetherClient, formatContext } from "@aether-ai/sdk";
import { anthropic } from "@ai-sdk/anthropic";
import {
  convertToModelMessages,
  stepCountIs,
  streamText,
  tool,
  type UIMessage,
} from "ai";
import { z } from "zod";

const aether = new AetherClient();

export async function POST(req: Request) {
  const { messages }: { messages: UIMessage[] } = await req.json();

  const result = streamText({
    model: anthropic("claude-sonnet-4-20250514"),
    system:
      "You are a helpful assistant. Search the knowledge base before " +
      "answering factual questions. Save durable facts the user shares.",
    messages: convertToModelMessages(messages),
    tools: {
      searchKnowledge: tool({
        description: "Search the knowledge base for relevant documents.",
        inputSchema: z.object({
          query: z.string().describe("A natural-language search query"),
        }),
        execute: async ({ query }) => {
          const results = await aether.retrieve(query, 5);
          if (results.length === 0) return "No matching documents found.";
          return formatContext(results);
        },
      }),
      saveNote: tool({
        description: "Save a note to the knowledge base for later retrieval.",
        inputSchema: z.object({
          note: z.string().describe("The note content to save"),
        }),
        execute: async ({ note }) => {
          const doc = await aether.insertText(note, {
            filename: `note-${Date.now()}.txt`,
            tags: ["kind:note"],
          });
          return { saved: true, docId: doc.doc_id };
        },
      }),
    },
    stopWhen: stepCountIs(5),
  });

  return result.toUIMessageStreamResponse();
}

The same client component works unchanged — tool calls stream to the browser as their own message parts (typed tool-searchKnowledge, tool-saveNote), which you can render for a "searching..." indicator or ignore.

saveNote turns the chat into a memory loop: the model writes facts with insertText() and finds them again on later turns with retrieve(). Returning the formatted string (rather than raw result objects) keeps the tool result compact, which matters because tool results are fed back into the model's context.

Scoping documents per user

A shared chat app must not leak one user's notes into another's retrieval. Tag every document with its owner on write, and pass the same tag as a filter on read:

TypeScript

// Writing: tag the document with its owner
await aether.insertText(note, {
  filename: `note-${Date.now()}.txt`,
  tags: ["user:" + userId, "kind:note"],
});

// Reading: only this user's documents can match
const results = (await aether.retrieve(query, 10, {
  tags: ["user:" + userId],
})).filter((r) => r.score >= 60);

In the agentic route above, derive userId from your session on the server and close over it in the tools' execute functions — never accept it from the model or the request body.

A few rules to know when using tags this way:

Tag filters are AND-ed. A document matches only if it carries every tag in the filter, so tags: ["user:42", "kind:note"] returns only user 42's notes.
Tag values must not contain commas. Stick to a key:value slug convention — user:42, customer:acme, kind:note.
Tags can't be read back. retrieve() and search() results don't include a document's tags, and neither does get(). If you need to enumerate or audit tags, keep your own doc_id → tags mapping in your database.
Narrow tags can return fewer than k results. Filtering happens on a candidate set drawn from the whole store, so when a tag matches only a small slice of your documents, a filtered search can come back short even though more matches exist. Request a larger k when filtering to a narrow tag — that's why the example asks for 10 — and filter weak matches by score rather than padding the context with noise. See Tuning retrieval for picking a cutoff.
update() replaces tags. If you update a document, pass the full tag list again — including the user: tag — or the scoping is lost.

For the broader design space — per-user tags versus a store per tenant, and when to choose which — see Multi-tenant patterns.

Install and setup

Quick RAG with generateText

Streaming chat route

Agentic retrieval with tools

Scoping documents per user

Quick RAG with `generateText`