workers-ai-provider

3.1.5

Patch Changes

#451 2a62e23 Thanks @mchenco! - Fix reasoning content being concatenated into assistant message content in multi-turn conversations

Previously, reasoning parts in assistant messages were concatenated into the content string when building message history. This caused models like kimi-k2.5 and deepseek-r1 to receive their own internal reasoning as if it were spoken text, corrupting the conversation history and resulting in empty text responses or leaked special tokens on subsequent turns.

Reasoning parts are now sent as the reasoning field on the assistant message object, which is the field name vLLM expects on input for reasoning models (kimi-k2.5, glm-4.7-flash).

3.1.4

Patch Changes

#448 054ccb8 Thanks @threepointone! - Fix image inputs for vision-capable chat models
- Handle all LanguageModelV3DataContent variants (Uint8Array, base64 string, data URL) instead of only Uint8Array
- Send images as OpenAI-compatible image_url content parts inline in messages, enabling vision for models like Llama 4 Scout and Kimi K2.5
- Works with both the binding and REST API paths

3.1.3

Patch Changes

#429 ae24f06 Thanks @michaeldwan! - Pass tool_choice through to binding.run() so tool selection mode (auto, required, none) is respected when using Workers AI with the binding API
#410 bc2eba3 Thanks @vaibhavshn! - fix: route REST API requests through AI Gateway when the gateway option is provided in createRun()
#446 3c35051 Thanks @threepointone! - Remove tool_call_id sanitization that truncated IDs to 9 alphanumeric chars, which caused all tool call IDs to collide after round-trip
#444 b1c742b Thanks @mchenco! - Add sessionAffinity setting to send x-session-affinity header for prefix-cache optimization. Also forward extraHeaders in the REST API path instead of discarding them.

3.1.2

Patch Changes

#400 8822603 Thanks @threepointone! - Add early config validation to createWorkersAI that throws a clear error when neither a binding nor credentials (accountId + apiKey) are provided. Widen all model type parameters (TextGenerationModels, ImageGenerationModels, EmbeddingModels, TranscriptionModels, SpeechModels, RerankingModels) to accept arbitrary strings while preserving autocomplete for known models.

3.1.1

Patch Changes

#396 2fb3ca8 Thanks @threepointone! - - Rewrite README with updated model recommendations (GPT-OSS 120B, EmbeddingGemma 300M, Aura-2 EN)
- Stream tool calls incrementally using tool-input-start/delta/end events instead of buffering until stream end
- Fix REST streaming for models that don't support it on /ai/run/ (GPT-OSS, Kimi) by retrying without streaming
- Add Aura-2 EN/ES to SpeechModels type
- Log malformed SSE events with console.warn instead of silently swallowing

3.1.0

Minor Changes

#389 8538cd5 Thanks @vaibhavshn! - Add transcription, text-to-speech, and reranking support to the Workers AI provider.

New capabilities
- Transcription (provider.transcription(model)) — implements TranscriptionModelV3. Supports Whisper models (@cf/openai/whisper, whisper-tiny-en, whisper-large-v3-turbo) and Deepgram Nova-3 (@cf/deepgram/nova-3). Handles model-specific input formats: number arrays for basic Whisper, base64 for v3-turbo via REST, and { body, contentType } for Nova-3 via binding or raw binary upload for Nova-3 via REST.
- Speech / TTS (provider.speech(model)) — implements SpeechModelV3. Supports Workers AI TTS models including Deepgram Aura-1 (@cf/deepgram/aura-1). Accepts text, voice, and speed options. Returns audio as Uint8Array. Uses returnRawResponse to handle binary audio from the REST path without JSON parsing.
- Reranking (provider.reranking(model)) — implements RerankingModelV3. Supports BGE reranker models (@cf/baai/bge-reranker-base, bge-reranker-v2-m3). Converts AI SDK's document format to Workers AI's { query, contexts, top_k } input. Handles both text and JSON object documents.
Bug fixes
- AbortSignal passthrough — createRun REST shim now passes the abort signal to fetch, enabling request cancellation and timeout handling. Previously the signal was silently dropped.
- Nova-3 REST support — Added createRunBinary utility for models that require raw binary upload instead of JSON (used by Nova-3 transcription via REST).
Usage
```
import { createWorkersAI } from "workers-ai-provider";
import { experimental_transcribe, experimental_generateSpeech, rerank } from "ai";

const workersai = createWorkersAI({ binding: env.AI });

// Transcription
const transcript = await experimental_transcribe({
	model: workersai.transcription("@cf/openai/whisper-large-v3-turbo"),
	audio: audioData,
	mediaType: "audio/wav",
});

// Speech
const speech = await experimental_generateSpeech({
	model: workersai.speech("@cf/deepgram/aura-1"),
	text: "Hello world",
	voice: "asteria",
});

// Reranking
const ranked = await rerank({
	model: workersai.reranking("@cf/baai/bge-reranker-base"),
	query: "What is machine learning?",
	documents: ["ML is a branch of AI.", "The weather is sunny."],
});
```

3.0.5

Patch Changes

#393 91b32e0 Thanks @threepointone! - Comprehensive cleanup of the workers-ai-provider package.

Bug fixes:
- Fixed phantom dependency on fetch-event-stream that caused runtime crashes when installed outside the monorepo. Replaced with a built-in SSE parser.
- Fixed streaming buffering: responses now stream token-by-token instead of arriving all at once. The root cause was twofold — an eager ReadableStream start() pattern that buffered all chunks, and a heuristic that silently fell back to non-streaming doGenerate whenever tools were defined. Both are fixed. Streaming now uses a proper TransformStream pipeline with backpressure.
- Fixed reasoning-delta ID mismatch in simulated streaming — was using generateId() instead of the reasoningId from the preceding reasoning-start event, causing the AI SDK to drop reasoning content.
- Fixed REST API client (createRun) silently swallowing HTTP errors. Non-200 responses now throw with status code and response body.
- Fixed response_format being sent as undefined on every non-JSON request. Now only included when actually set.
- Fixed json_schema field evaluating to false (a boolean) instead of undefined when schema was missing.
Workers AI quirk workarounds:
- Added sanitizeToolCallId() — strips non-alphanumeric characters and pads/truncates to 9 chars, fixing tool call round-trips through the binding which rejects its own generated IDs.
- Added normalizeMessagesForBinding() — converts content: null to "" and sanitizes tool call IDs before every binding call. Only applied on the binding path (REST preserves original IDs).
- Added null-finalization chunk filtering for streaming tool calls.
- Added numeric value coercion in native-format streams (Workers AI sometimes returns numbers instead of strings for the response field).
- Improved image model to handle all output types from binding.run(): ReadableStream, Uint8Array, ArrayBuffer, Response, and { image: base64 } objects.
- Graceful degradation: if binding.run() returns a non-streaming response despite stream: true, it wraps the complete response as a simulated stream instead of throwing.
Premature stream termination detection:
- Streams that end without a [DONE] sentinel now report finishReason: "error" with raw: "stream-truncated" instead of silently reporting "stop".
- Stream read errors are caught and emit finishReason: "error" with raw: "stream-error".
AI Search (formerly AutoRAG):
- Added createAISearch and AISearchChatLanguageModel as the canonical exports, reflecting the rename from AutoRAG to AI Search.
- createAutoRAG still works but emits a one-time deprecation warning pointing to createAISearch.
- createAutoRAG preserves "autorag.chat" as the provider name for backward compatibility.
- AI Search now warns when tools or JSON response format are provided (unsupported by the aiSearch API).
- Simplified AI Search internals — removed dead tool/response-format processing code.
Code quality:
- Removed dead code: workersai-error.ts (never imported), workersai-image-config.ts (inlined).
- Consistent file naming: renamed workers-ai-embedding-model.ts to workersai-embedding-model.ts.
- Replaced StringLike catch-all index signatures with [key: string]: unknown on settings types.
- Replaced any types with proper interfaces (FlatToolCall, OpenAIToolCall, PartialToolCall).
- Tightened processToolCall format detection to check function.name instead of just the presence of a function property.
- Removed @ai-sdk/provider-utils and zod peer dependencies (no longer used in source).
- Added imageModel to the WorkersAI interface type for consistency.
Tests:
- 149 unit tests across 10 test files (up from 82).
- New test coverage: sanitizeToolCallId, normalizeMessagesForBinding, prepareToolsAndToolChoice, processText, mapWorkersAIUsage, image model output types, streaming error scenarios (malformed SSE, premature termination, empty stream), backpressure verification, graceful degradation (non-streaming fallback with text/tools/reasoning), REST API error handling (401/404/500), AI Search warnings, embedding TooManyEmbeddingValuesForCallError, message conversion with images and reasoning.
- Integration tests for REST API and binding across 12 models and 7 categories (chat, streaming, multi-turn, tool calling, tool round-trip, structured output, image generation, embeddings).
- All tests use the AI SDK's public APIs (generateText, streamText, generateImage, embedMany) instead of internal .doGenerate()/.doStream() methods.
README:
- Rewritten from scratch with concise examples, model recommendations, configuration guide, and known limitations section.
- Updated to use current AI SDK v6 APIs (generateText + Output.object instead of deprecated generateObject, generateImage instead of experimental_generateImage, stopWhen: stepCountIs(2) instead of maxSteps).
- Added sections for tool calling, structured output, embeddings, image generation, and AI Search.
- Uses wrangler.jsonc format for configuration examples.

3.0.4

Patch Changes

#390 41b92a3 Thanks @mchenco! - fix(workers-ai-provider): extract actual finish reason in streaming instead of hardcoded "stop"

Previously, the streaming implementation always returned finishReason: "stop" regardless of the actual completion reason. This caused:
- Tool calling scenarios to incorrectly report "stop" instead of "tool-calls"
- Multi-turn tool conversations to fail because the AI SDK couldn't detect when tools were requested
- Length limit scenarios to show "stop" instead of "length"
- Error scenarios to show "stop" instead of "error"
The fix extracts the actual finish_reason from streaming chunks and uses the existing mapWorkersAIFinishReason() function to properly map it to the AI SDK's finish reason format. This enables proper multi-turn tool calling and accurate completion status reporting.

3.0.3

Patch Changes

#384 0947ea2 Thanks @mchenco! - fix(workers-ai-provider): preserve tool call IDs in conversation history

3.0.2

Patch Changes

e5b0138 Thanks @threepointone! - update deps

3.0.1

Patch Changes

#315 5ee3b4d Thanks @agcty! - move deps to peer deps

3.0.0

Major Changes

#338 cd9e93c Thanks @threepointone! - migrate to ai sdk v6

2.0.2

Patch Changes

#339 ea16584 Thanks @threepointone! - remove blank tags array

2.0.1

Patch Changes

#336 23aa670 Thanks @threepointone! - update dependencies

2.0.0

Major Changes

#256 a538901 Thanks @jahands! - feat: Migrate to AI SDK v5

This updates workers-ai-provider and ai-gateway-provider to use the AI SDK v5. Please refer to the official migration guide to migrate your code https://ai-sdk.dev/docs/migration-guides/migration-guide-5-0

Patch Changes

#216 26e5fdb Thanks @wussh! - Improve documentation by adding generateText example to workers-ai-provider and clarifying supported methods in ai-gateway-provider.

0.7.5

Patch Changes

#263 7b2745a Thanks @byule! - fix: use correct fieldname and format for tool_call ids

0.7.4

Patch Changes

#261 50fad0f Thanks @threepointone! - fix: pass a tool call id and read it back out for tool calls

0.7.3

Patch Changes

#258 b1ee224 Thanks @threepointone! - fix: don't crash if a model response has only tool calls

0.7.2

Patch Changes

#233 836bc3d Thanks @JoaquinGimenez1! - Process Text from response content

0.7.1

Patch Changes

#231 143a384 Thanks @JoaquinGimenez1! - Adds support for getting delta content

0.7.0

Minor Changes

#205 804804b Thanks @JoaquinGimenez1! - Adds support for Chat Completions API responses

0.6.5

Patch Changes

414f85c Thanks @threepointone! - Trigger a release

0.6.4

Patch Changes

#208 a08f7d4 Thanks @G4brym! - Remove @Deprecated flag on gateway option

0.6.3

Patch Changes

#206 f7aa30d Thanks @threepointone! - update dependencies

0.6.2

Patch Changes

#197 6506faa Thanks @JoaquinGimenez1! - Add rawResponse from Workers AI

0.6.1

Patch Changes

c9d5636 Thanks @threepointone! - update dependencies

0.6.0

Minor Changes

#181 9f5562a Thanks @JoaquinGimenez1! - Adds support for new tool call format during streaming

0.5.3

Patch Changes

de992e6 Thanks @threepointone! - trigger a release for reverted change

0.5.2

Patch Changes

#170 4f57e61 Thanks @JoaquinGimenez1! - Support new tool call format on streaming responses

0.5.1

Patch Changes

7cc3626 Thanks @threepointone! - trigger a release to pick up new deps

0.5.0

Minor Changes

#163 6b25ed7 Thanks @andyjessop! - feat: adds support for embed and embedMany

0.4.1

Patch Changes

ac0693d Thanks @threepointone! - For #126; thanks @jokull for adding AutoRAG support to workers-ai-provider

0.4.0

Minor Changes

#153 ae5ac12 Thanks @JoaquinGimenez1! - Add support for new tool call format

0.3.2

Patch Changes

3ba9ac5 Thanks @threepointone! - Update dependencies

0.3.1

Patch Changes

#137 cb2cc87 Thanks @mchenco! - adding vision support (For Llama 3.2 11b vision right now)

0.3.0

Minor Changes

#72 9b8dfc1 Thanks @andyjessop! - feat: allow passthrough options as model settings

0.2.2

Patch Changes

#65 b17cf52 Thanks @andyjessop! - fix: gracefully handles streaming chunk without response property

0.2.1

Patch Changes

#47 e000b7c Thanks @andyjessop! - chore: implement generateImage function

0.2.0

Minor Changes

#41 5bffa40 Thanks @andyjessop! - feat: adds the ability to use the provider outside of the workerd environment by providing Cloudflare accountId/apiKey credentials.

0.1.3

Patch Changes

#39 9add2b5 Thanks @andyjessop! - Trigger release for recent bug fixes

0.1.2

Patch Changes

#35 9e74cc9 Thanks @andyjessop! - Ensures that tool call data is available to model, by providing the JSON of the tool call as the content in the assistant message.

0.1.1

Patch Changes

#32 9ffc5b8 Thanks @andyjessop! - Fixes structured outputs

0.1.0

Minor Changes

#29 762b37b Thanks @threepointone! - trigger a minor release

0.0.13

Patch Changes

#27 add4120 Thanks @jiang-zhexin! - Exclude BaseAiTextToImage model
#23 b15ad06 Thanks @andyjessop! - Fix streaming output by ensuring that events is only called once per stream
#26 6868be7 Thanks @andyjessop! - configures AI Gateway to work with streamText

FilesExpand file tree

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

workers-ai-provider

3.1.5

Patch Changes

3.1.4

Patch Changes

3.1.3

Patch Changes

3.1.2

Patch Changes

3.1.1

Patch Changes

3.1.0

Minor Changes

New capabilities

Bug fixes

Usage

3.0.5

Patch Changes

3.0.4

Patch Changes

3.0.3

Patch Changes

3.0.2

Patch Changes

3.0.1

Patch Changes

3.0.0

Major Changes

2.0.2

Patch Changes

2.0.1

Patch Changes

2.0.0

Major Changes

Patch Changes

0.7.5

Patch Changes

0.7.4

Patch Changes

0.7.3

Patch Changes

0.7.2

Patch Changes

0.7.1

Patch Changes

0.7.0

Minor Changes

0.6.5

Patch Changes

0.6.4

Patch Changes

0.6.3

Patch Changes

0.6.2

Patch Changes

0.6.1

Patch Changes

0.6.0

Minor Changes

0.5.3

Patch Changes

0.5.2

Patch Changes

0.5.1

Patch Changes

0.5.0

Minor Changes

0.4.1

Patch Changes

0.4.0

Minor Changes

0.3.2

Patch Changes

0.3.1