Vercel AI SDK v4: Build a Streaming AI Chat with Next.js 15 (2026)

The Vercel AI SDK is the fastest way to add AI to a Next.js app. It handles streaming, manages state on the client, supports tool calls, and works with any AI provider — including Claude. This guide covers everything you need to go from zero to a production-quality AI chat app with Next.js 15.

Why Vercel AI SDK

Before the SDK existed, adding streaming AI responses to Next.js meant manually handling SSE, parsing chunks, and managing loading state client-side. The SDK abstracts all of that:

streamText — streams a response from the model, handles tokens as they arrive
useChat — React hook that manages message history, input state, and streaming
Tool calls — define functions Claude can call, execute them server-side, feed results back
generateObject — get structured JSON output that matches a Zod schema
Provider-agnostic — swap between Claude, OpenAI, Mistral by changing one import

Setup

npm install ai @ai-sdk/anthropic zod

Set your key:

# .env.local
ANTHROPIC_API_KEY=sk-ant-...

Basic Streaming Chat

The API Route

Create app/api/chat/route.ts:

import { streamText } from 'ai'
import { anthropic } from '@ai-sdk/anthropic'
 
export async function POST(req: Request) {
  const { messages } = await req.json()
 
  const result = await streamText({
    model: anthropic('claude-sonnet-4-6'),
    system: 'You are a helpful assistant. Be concise and accurate.',
    messages,
  })
 
  return result.toDataStreamResponse()
}

That's it for the backend. toDataStreamResponse() returns the correct headers and body format that useChat expects.

The Chat Component

// components/Chat.tsx
'use client'
 
import { useChat } from 'ai/react'
import { useRef, useEffect } from 'react'
 
export function Chat() {
  const { messages, input, handleInputChange, handleSubmit, isLoading, stop } = useChat({
    api: '/api/chat',
  })
  const bottomRef = useRef<HTMLDivElement>(null)
 
  useEffect(() => {
    bottomRef.current?.scrollIntoView({ behavior: 'smooth' })
  }, [messages])
 
  return (
    <div className="flex flex-col h-screen max-w-2xl mx-auto p-4">
      <div className="flex-1 overflow-y-auto space-y-4 pb-4">
        {messages.map((m) => (
          <div
            key={m.id}
            className={`flex ${m.role === 'user' ? 'justify-end' : 'justify-start'}`}
          >
            <div
              className={`rounded-2xl px-4 py-2 max-w-[80%] ${
                m.role === 'user'
                  ? 'bg-blue-600 text-white'
                  : 'bg-gray-100 text-gray-900'
              }`}
            >
              {m.content}
            </div>
          </div>
        ))}
        {isLoading && (
          <div className="flex justify-start">
            <div className="bg-gray-100 rounded-2xl px-4 py-2 text-gray-500">
              Thinking...
            </div>
          </div>
        )}
        <div ref={bottomRef} />
      </div>
 
      <form onSubmit={handleSubmit} className="flex gap-2">
        <input
          value={input}
          onChange={handleInputChange}
          placeholder="Type a message..."
          className="flex-1 rounded-xl border px-4 py-2 focus:outline-none focus:ring-2 focus:ring-blue-500"
          disabled={isLoading}
        />
        {isLoading ? (
          <button
            type="button"
            onClick={stop}
            className="rounded-xl bg-red-500 px-4 py-2 text-white"
          >
            Stop
          </button>
        ) : (
          <button
            type="submit"
            disabled={!input.trim()}
            className="rounded-xl bg-blue-600 px-4 py-2 text-white disabled:opacity-50"
          >
            Send
          </button>
        )}
      </form>
    </div>
  )
}

Use it in a page:

// app/page.tsx
import { Chat } from '@/components/Chat'
 
export default function Home() {
  return <Chat />
}

Tool Calls

Tool calls are where the SDK really shines. Claude decides when to call your functions, executes them, and incorporates the results — all within the same streaming response.

Define Tools

// app/api/chat/route.ts
import { streamText, tool } from 'ai'
import { anthropic } from '@ai-sdk/anthropic'
import { z } from 'zod'
 
export async function POST(req: Request) {
  const { messages } = await req.json()
 
  const result = await streamText({
    model: anthropic('claude-sonnet-4-6'),
    messages,
    tools: {
      getWeather: tool({
        description: 'Get the current weather for a city',
        parameters: z.object({
          city: z.string().describe('The city name'),
          unit: z.enum(['celsius', 'fahrenheit']).default('celsius'),
        }),
        execute: async ({ city, unit }) => {
          // In production, call a real weather API
          return {
            city,
            temperature: unit === 'celsius' ? 18 : 64,
            condition: 'partly cloudy',
            humidity: 65,
          }
        },
      }),
      searchWeb: tool({
        description: 'Search the web for recent information',
        parameters: z.object({
          query: z.string().describe('The search query'),
        }),
        execute: async ({ query }) => {
          // In production, call a search API (Brave, Tavily, etc.)
          return {
            results: [`Result for: ${query}`],
          }
        },
      }),
    },
    maxSteps: 5, // allow up to 5 tool call rounds
  })
 
  return result.toDataStreamResponse()
}

maxSteps lets the model call multiple tools in sequence. Without it, the response stops after the first tool call.

Render Tool Calls on the Client

useChat gives you full access to tool calls in the message object:

{messages.map((m) => (
  <div key={m.id}>
    {/* Text content */}
    {m.content && <p>{m.content}</p>}
 
    {/* Tool calls */}
    {m.toolInvocations?.map((tool) => (
      <div key={tool.toolCallId} className="bg-gray-50 rounded-lg p-3 text-sm">
        <p className="font-medium text-gray-500">
          Called: {tool.toolName}
        </p>
        {tool.state === 'result' && (
          <pre className="mt-1 text-gray-700">
            {JSON.stringify(tool.result, null, 2)}
          </pre>
        )}
      </div>
    ))}
  </div>
))}

Structured Output with generateObject

When you need JSON, not freeform text, use generateObject. It validates the output against a Zod schema and retries if the model returns invalid data.

// app/api/analyze/route.ts
import { generateObject } from 'ai'
import { anthropic } from '@ai-sdk/anthropic'
import { z } from 'zod'
 
const SentimentSchema = z.object({
  sentiment: z.enum(['positive', 'negative', 'neutral']),
  score: z.number().min(0).max(1).describe('Confidence score'),
  summary: z.string().describe('One-sentence summary'),
  keywords: z.array(z.string()).describe('Key topics mentioned'),
})
 
export async function POST(req: Request) {
  const { text } = await req.json()
 
  const { object } = await generateObject({
    model: anthropic('claude-sonnet-4-6'),
    schema: SentimentSchema,
    prompt: `Analyze the sentiment of this text: "${text}"`,
  })
 
  return Response.json(object)
}

Call it from anywhere:

const response = await fetch('/api/analyze', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ text: reviewText }),
})
const analysis = await response.json()
// { sentiment: 'positive', score: 0.87, summary: '...', keywords: [...] }

useCompletion — Single-Turn Streaming

For one-shot text generation (not chat), useCompletion is simpler than useChat:

'use client'
import { useCompletion } from 'ai/react'
 
export function TextImprover() {
  const { completion, input, handleInputChange, handleSubmit, isLoading } = useCompletion({
    api: '/api/improve',
  })
 
  return (
    <div>
      <form onSubmit={handleSubmit}>
        <textarea value={input} onChange={handleInputChange} rows={4} />
        <button type="submit" disabled={isLoading}>
          {isLoading ? 'Improving...' : 'Improve text'}
        </button>
      </form>
      {completion && (
        <div className="mt-4 p-4 bg-green-50 rounded-lg">
          {completion}
        </div>
      )}
    </div>
  )
}

The API route:

// app/api/improve/route.ts
import { streamText } from 'ai'
import { anthropic } from '@ai-sdk/anthropic'
 
export async function POST(req: Request) {
  const { prompt } = await req.json()
 
  const result = await streamText({
    model: anthropic('claude-sonnet-4-6'),
    prompt: `Improve the following text, keeping the same meaning but making it clearer and more professional:\n\n${prompt}`,
  })
 
  return result.toTextStreamResponse()
}

Note: toTextStreamResponse() for useCompletion, toDataStreamResponse() for useChat.

Persisting Chat History

By default, useChat resets on page reload. To persist conversations, initialize it with stored messages and save them after each response:

'use client'
import { useChat } from 'ai/react'
import { useEffect } from 'react'
 
const STORAGE_KEY = 'chat-history'
 
export function PersistentChat() {
  const storedMessages = typeof window !== 'undefined'
    ? JSON.parse(localStorage.getItem(STORAGE_KEY) || '[]')
    : []
 
  const { messages, input, handleInputChange, handleSubmit, isLoading } = useChat({
    initialMessages: storedMessages,
  })
 
  useEffect(() => {
    if (messages.length > 0) {
      localStorage.setItem(STORAGE_KEY, JSON.stringify(messages))
    }
  }, [messages])
 
  // ... rest of UI
}

For multi-user apps, store the history in a database (Postgres, Supabase, etc.) and load it server-side via initialMessages.

Model Selection

The SDK works with multiple Claude models:

import { anthropic } from '@ai-sdk/anthropic'
 
// Fast and cheap — customer service, simple tasks
anthropic('claude-haiku-4-5-20251001')
 
// Best balance — most use cases
anthropic('claude-sonnet-4-6')
 
// Maximum capability — complex reasoning, long context
anthropic('claude-opus-4-6')

For a chat app where users ask general questions, claude-sonnet-4-6 is the right default. Switch to claude-haiku-4-5-20251001 if you're handling high volume and need to reduce costs. See the OpenAI vs Claude API comparison for a full breakdown.

Error Handling

Add error handling to both the route and the client:

// route.ts
export async function POST(req: Request) {
  try {
    const { messages } = await req.json()
 
    const result = await streamText({
      model: anthropic('claude-sonnet-4-6'),
      messages,
    })
 
    return result.toDataStreamResponse()
  } catch (error) {
    console.error('AI stream error:', error)
    return new Response('Failed to generate response', { status: 500 })
  }
}

// component
const { messages, error, reload } = useChat()
 
{error && (
  <div className="text-red-500">
    Error: {error.message}
    <button onClick={reload}>Retry</button>
  </div>
)}

Rate Limiting

In production, protect your AI endpoint from abuse:

// middleware.ts
import { NextRequest, NextResponse } from 'next/server'
 
const requestCounts = new Map<string, { count: number; resetTime: number }>()
 
export function middleware(request: NextRequest) {
  if (request.nextUrl.pathname === '/api/chat') {
    const ip = request.headers.get('x-forwarded-for') ?? 'anonymous'
    const now = Date.now()
    const windowMs = 60 * 1000 // 1 minute
    const maxRequests = 20
 
    const record = requestCounts.get(ip)
 
    if (!record || now > record.resetTime) {
      requestCounts.set(ip, { count: 1, resetTime: now + windowMs })
    } else if (record.count >= maxRequests) {
      return new NextResponse('Rate limit exceeded', { status: 429 })
    } else {
      record.count++
    }
  }
 
  return NextResponse.next()
}

For production, use Redis-based rate limiting (Upstash is a good fit with Vercel).

Complete App Structure

app/
  api/
    chat/route.ts        — streaming chat endpoint
    analyze/route.ts     — generateObject endpoint
  page.tsx               — Chat component
components/
  Chat.tsx               — useChat hook + UI
  Message.tsx            — single message with tool calls
lib/
  ai.ts                  — shared model config

Shared model config:

// lib/ai.ts
import { anthropic } from '@ai-sdk/anthropic'
 
export const defaultModel = anthropic('claude-sonnet-4-6')
export const fastModel = anthropic('claude-haiku-4-5-20251001')

What to Build Next

AI-powered search — combine generateObject with a search API to return structured results
Document Q&A — upload files, extract text, use as context in the prompt (see the RAG system guide for the embedding approach)
Code assistant — system prompt focused on your codebase, tool call to read files
Form auto-fill — generateObject with your form's Zod schema + user description

The Vercel AI SDK removes almost all boilerplate from AI integration. You focus on the product, it handles the streaming protocol, state management, and provider differences. If you're building a Next.js app with AI features, this is the stack to use in 2026.

For auth on top of your AI app, see Next.js authentication with Auth.js v5. For the full-stack foundation this builds on, see the Next.js 15 TypeScript tutorial. For understanding the AI provider options, read the OpenAI vs Claude API comparison.