Workers AI provider for the AI SDK. Run Cloudflare's models for chat, embeddings, image generation, transcription, text-to-speech, reranking, and AI Search — all from a single provider.
import { createWorkersAI } from "workers-ai-provider";
import { streamText } from "ai";
export default {
async fetch(req: Request, env: { AI: Ai }) {
const workersai = createWorkersAI({ binding: env.AI });
const result = streamText({
model: workersai("@cf/meta/llama-4-scout-17b-16e-instruct"),
messages: [{ role: "user", content: "Write a haiku about Cloudflare" }],
});
return result.toTextStreamResponse();
},
};npm install workers-ai-provider aiInside a Cloudflare Worker, pass the env.AI binding directly. No API keys needed.
const workersai = createWorkersAI({ binding: env.AI });Outside of Workers (Node.js, Bun, etc.), use your Cloudflare credentials:
const workersai = createWorkersAI({
accountId: process.env.CLOUDFLARE_ACCOUNT_ID,
apiKey: process.env.CLOUDFLARE_API_TOKEN,
});Route requests through AI Gateway for caching, rate limiting, and observability:
const workersai = createWorkersAI({
binding: env.AI,
gateway: { id: "my-gateway" },
});Browse the full catalog at developers.cloudflare.com/workers-ai/models.
Some good defaults:
| Task | Model | Notes |
|---|---|---|
| Chat | @cf/meta/llama-4-scout-17b-16e-instruct |
Fast, strong tool calling |
| Chat | @cf/meta/llama-3.3-70b-instruct-fp8-fast |
Largest Llama, best quality |
| Chat | @cf/openai/gpt-oss-120b |
OpenAI open-weights, high reason |
| Reasoning | @cf/qwen/qwq-32b |
Emits reasoning_content |
| Embeddings | @cf/baai/bge-base-en-v1.5 |
768-dim, English |
| Embeddings | @cf/google/embeddinggemma-300m |
100+ languages, by Google |
| Images | @cf/black-forest-labs/flux-1-schnell |
Fast image generation |
| Transcription | @cf/openai/whisper-large-v3-turbo |
Best accuracy, multilingual |
| Transcription | @cf/deepgram/nova-3 |
Fast, high accuracy |
| Text-to-Speech | @cf/deepgram/aura-2-en |
Context-aware, natural pacing |
| Reranking | @cf/baai/bge-reranker-base |
Fast document reranking |
import { generateText } from "ai";
const { text } = await generateText({
model: workersai("@cf/meta/llama-3.3-70b-instruct-fp8-fast"),
prompt: "Explain Workers AI in one paragraph",
});Streaming:
import { streamText } from "ai";
const result = streamText({
model: workersai("@cf/meta/llama-4-scout-17b-16e-instruct"),
messages: [{ role: "user", content: "Write a short story" }],
});
for await (const chunk of result.textStream) {
process.stdout.write(chunk);
}Send images to vision-capable models like Llama 4 Scout and Kimi K2.5:
import { generateText } from "ai";
const { text } = await generateText({
model: workersai("@cf/meta/llama-4-scout-17b-16e-instruct"),
messages: [
{
role: "user",
content: [
{ type: "text", text: "What's in this image?" },
{ type: "image", image: imageUint8Array },
],
},
],
});Images can be provided as Uint8Array, base64 strings, or data URLs. Multiple images per message are supported. Works with both the binding and REST API configurations.
import { generateText, stepCountIs } from "ai";
import { z } from "zod";
const { text } = await generateText({
model: workersai("@cf/meta/llama-4-scout-17b-16e-instruct"),
prompt: "What's the weather in London?",
tools: {
getWeather: {
description: "Get the current weather for a city",
inputSchema: z.object({ city: z.string() }),
execute: async ({ city }) => ({ city, temperature: 18, condition: "Cloudy" }),
},
},
stopWhen: stepCountIs(2),
});import { generateText, Output } from "ai";
import { z } from "zod";
const { output } = await generateText({
model: workersai("@cf/meta/llama-3.3-70b-instruct-fp8-fast"),
prompt: "Recipe for spaghetti bolognese",
output: Output.object({
schema: z.object({
name: z.string(),
ingredients: z.array(z.object({ name: z.string(), amount: z.string() })),
steps: z.array(z.string()),
}),
}),
});import { embedMany } from "ai";
const { embeddings } = await embedMany({
model: workersai.textEmbedding("@cf/baai/bge-base-en-v1.5"),
values: ["sunny day at the beach", "rainy afternoon in the city"],
});import { generateImage } from "ai";
const { images } = await generateImage({
model: workersai.image("@cf/black-forest-labs/flux-1-schnell"),
prompt: "A mountain landscape at sunset",
size: "1024x1024",
});
// images[0].uint8Array contains the PNG bytesTranscribe audio using Whisper or Deepgram Nova-3 models.
import { transcribe } from "ai";
import { readFile } from "node:fs/promises";
const { text, segments } = await transcribe({
model: workersai.transcription("@cf/openai/whisper-large-v3-turbo"),
audio: await readFile("./audio.mp3"),
mediaType: "audio/mpeg",
});With language hints (Whisper only):
const { text } = await transcribe({
model: workersai.transcription("@cf/openai/whisper-large-v3-turbo", {
language: "fr",
}),
audio: audioBuffer,
mediaType: "audio/wav",
});Deepgram Nova-3 is also supported and detects language automatically:
const { text } = await transcribe({
model: workersai.transcription("@cf/deepgram/nova-3"),
audio: audioBuffer,
mediaType: "audio/wav",
});Generate spoken audio from text using Deepgram Aura-2.
import { speech } from "ai";
const { audio } = await speech({
model: workersai.speech("@cf/deepgram/aura-2-en"),
text: "Hello from Cloudflare Workers AI!",
voice: "asteria",
});
// audio is a Uint8Array of MP3 bytesReorder documents by relevance to a query — useful for RAG pipelines.
import { rerank } from "ai";
const { results } = await rerank({
model: workersai.reranking("@cf/baai/bge-reranker-base"),
query: "What is Cloudflare Workers?",
documents: [
"Cloudflare Workers lets you run JavaScript at the edge.",
"A cookie is a small piece of data stored in the browser.",
"Workers AI runs inference on Cloudflare's global network.",
],
topN: 2,
});
// results is sorted by relevance scoreAI Search is Cloudflare's managed RAG service. Connect your data and query it with natural language.
// wrangler.jsonc
{
"ai_search": [{ "binding": "AI_SEARCH", "name": "my-search-index" }],
}import { createAISearch } from "workers-ai-provider";
import { generateText } from "ai";
const aisearch = createAISearch({ binding: env.AI_SEARCH });
const { text } = await generateText({
model: aisearch(),
messages: [{ role: "user", content: "How do I setup AI Gateway?" }],
});Streaming works the same way — use streamText instead of generateText.
createAutoRAGstill works but is deprecated. UsecreateAISearchinstead.
| Option | Type | Description |
|---|---|---|
binding |
Ai |
Workers AI binding (env.AI). Use this OR credentials. |
accountId |
string |
Cloudflare account ID. Required with apiKey. |
apiKey |
string |
Cloudflare API token. Required with accountId. |
gateway |
GatewayOptions |
Optional AI Gateway config. |
Returns a provider with model factories. Each factory accepts an optional second argument for per-model settings:
workersai("@cf/meta/llama-3.3-70b-instruct-fp8-fast", {
sessionAffinity: "my-unique-session-id",
});| Setting | Type | Description |
|---|---|---|
safePrompt |
boolean |
Inject a safety prompt before all conversations. |
sessionAffinity |
string |
Routes requests with the same key to the same backend replica for prefix-cache optimization. |
Model factories:
// Chat — for generateText / streamText
workersai(modelId);
workersai.chat(modelId);
// Embeddings — for embedMany / embed
workersai.textEmbedding(modelId);
// Images — for generateImage
workersai.image(modelId);
// Transcription — for transcribe
workersai.transcription(modelId, settings?);
// Text-to-Speech — for speech
workersai.speech(modelId);
// Reranking — for rerank
workersai.reranking(modelId);| Option | Type | Description |
|---|---|---|
binding |
AutoRAG |
AI Search binding (env.AI_SEARCH). |
Returns a callable provider:
aisearch(); // AI Search model (shorthand)
aisearch.chat(); // AI Search model