-
#451
2a62e23Thanks @mchenco! - Fix reasoning content being concatenated into assistant message content in multi-turn conversationsPreviously, reasoning parts in assistant messages were concatenated into the
contentstring when building message history. This caused models likekimi-k2.5anddeepseek-r1to receive their own internal reasoning as if it were spoken text, corrupting the conversation history and resulting in empty text responses or leaked special tokens on subsequent turns.Reasoning parts are now sent as the
reasoningfield on the assistant message object, which is the field name vLLM expects on input for reasoning models (kimi-k2.5, glm-4.7-flash).
- #448
054ccb8Thanks @threepointone! - Fix image inputs for vision-capable chat models- Handle all
LanguageModelV3DataContentvariants (Uint8Array, base64 string, data URL) instead of only Uint8Array - Send images as OpenAI-compatible
image_urlcontent parts inline in messages, enabling vision for models like Llama 4 Scout and Kimi K2.5 - Works with both the binding and REST API paths
- Handle all
-
#429
ae24f06Thanks @michaeldwan! - Pass tool_choice through to binding.run() so tool selection mode (auto, required, none) is respected when using Workers AI with the binding API -
#410
bc2eba3Thanks @vaibhavshn! - fix: route REST API requests through AI Gateway when thegatewayoption is provided increateRun() -
#446
3c35051Thanks @threepointone! - Remove tool_call_id sanitization that truncated IDs to 9 alphanumeric chars, which caused all tool call IDs to collide after round-trip -
#444
b1c742bThanks @mchenco! - AddsessionAffinitysetting to sendx-session-affinityheader for prefix-cache optimization. Also forwardextraHeadersin the REST API path instead of discarding them.
- #400
8822603Thanks @threepointone! - Add early config validation tocreateWorkersAIthat throws a clear error when neither a binding nor credentials (accountId + apiKey) are provided. Widen all model type parameters (TextGenerationModels, ImageGenerationModels, EmbeddingModels, TranscriptionModels, SpeechModels, RerankingModels) to accept arbitrary strings while preserving autocomplete for known models.
- #396
2fb3ca8Thanks @threepointone! - - Rewrite README with updated model recommendations (GPT-OSS 120B, EmbeddingGemma 300M, Aura-2 EN)- Stream tool calls incrementally using tool-input-start/delta/end events instead of buffering until stream end
- Fix REST streaming for models that don't support it on /ai/run/ (GPT-OSS, Kimi) by retrying without streaming
- Add Aura-2 EN/ES to SpeechModels type
- Log malformed SSE events with console.warn instead of silently swallowing
-
#389
8538cd5Thanks @vaibhavshn! - Add transcription, text-to-speech, and reranking support to the Workers AI provider.-
Transcription (
provider.transcription(model)) — implementsTranscriptionModelV3. Supports Whisper models (@cf/openai/whisper,whisper-tiny-en,whisper-large-v3-turbo) and Deepgram Nova-3 (@cf/deepgram/nova-3). Handles model-specific input formats: number arrays for basic Whisper, base64 for v3-turbo via REST, and{ body, contentType }for Nova-3 via binding or raw binary upload for Nova-3 via REST. -
Speech / TTS (
provider.speech(model)) — implementsSpeechModelV3. Supports Workers AI TTS models including Deepgram Aura-1 (@cf/deepgram/aura-1). Acceptstext,voice, andspeedoptions. Returns audio asUint8Array. UsesreturnRawResponseto handle binary audio from the REST path without JSON parsing. -
Reranking (
provider.reranking(model)) — implementsRerankingModelV3. Supports BGE reranker models (@cf/baai/bge-reranker-base,bge-reranker-v2-m3). Converts AI SDK's document format to Workers AI's{ query, contexts, top_k }input. Handles both text and JSON object documents.
- AbortSignal passthrough —
createRunREST shim now passes the abort signal tofetch, enabling request cancellation and timeout handling. Previously the signal was silently dropped. - Nova-3 REST support — Added
createRunBinaryutility for models that require raw binary upload instead of JSON (used by Nova-3 transcription via REST).
import { createWorkersAI } from "workers-ai-provider"; import { experimental_transcribe, experimental_generateSpeech, rerank } from "ai"; const workersai = createWorkersAI({ binding: env.AI }); // Transcription const transcript = await experimental_transcribe({ model: workersai.transcription("@cf/openai/whisper-large-v3-turbo"), audio: audioData, mediaType: "audio/wav", }); // Speech const speech = await experimental_generateSpeech({ model: workersai.speech("@cf/deepgram/aura-1"), text: "Hello world", voice: "asteria", }); // Reranking const ranked = await rerank({ model: workersai.reranking("@cf/baai/bge-reranker-base"), query: "What is machine learning?", documents: ["ML is a branch of AI.", "The weather is sunny."], });
-
-
#393
91b32e0Thanks @threepointone! - Comprehensive cleanup of the workers-ai-provider package.Bug fixes:
- Fixed phantom dependency on
fetch-event-streamthat caused runtime crashes when installed outside the monorepo. Replaced with a built-in SSE parser. - Fixed streaming buffering: responses now stream token-by-token instead of arriving all at once. The root cause was twofold — an eager
ReadableStreamstart()pattern that buffered all chunks, and a heuristic that silently fell back to non-streamingdoGeneratewhenever tools were defined. Both are fixed. Streaming now uses a properTransformStreampipeline with backpressure. - Fixed
reasoning-deltaID mismatch in simulated streaming — was usinggenerateId()instead of thereasoningIdfrom the precedingreasoning-startevent, causing the AI SDK to drop reasoning content. - Fixed REST API client (
createRun) silently swallowing HTTP errors. Non-200 responses now throw with status code and response body. - Fixed
response_formatbeing sent asundefinedon every non-JSON request. Now only included when actually set. - Fixed
json_schemafield evaluating tofalse(a boolean) instead ofundefinedwhen schema was missing.
Workers AI quirk workarounds:
- Added
sanitizeToolCallId()— strips non-alphanumeric characters and pads/truncates to 9 chars, fixing tool call round-trips through the binding which rejects its own generated IDs. - Added
normalizeMessagesForBinding()— convertscontent: nullto""and sanitizes tool call IDs before every binding call. Only applied on the binding path (REST preserves original IDs). - Added null-finalization chunk filtering for streaming tool calls.
- Added numeric value coercion in native-format streams (Workers AI sometimes returns numbers instead of strings for the
responsefield). - Improved image model to handle all output types from
binding.run():ReadableStream,Uint8Array,ArrayBuffer,Response, and{ image: base64 }objects. - Graceful degradation: if
binding.run()returns a non-streaming response despitestream: true, it wraps the complete response as a simulated stream instead of throwing.
Premature stream termination detection:
- Streams that end without a
[DONE]sentinel now reportfinishReason: "error"withraw: "stream-truncated"instead of silently reporting"stop". - Stream read errors are caught and emit
finishReason: "error"withraw: "stream-error".
AI Search (formerly AutoRAG):
- Added
createAISearchandAISearchChatLanguageModelas the canonical exports, reflecting the rename from AutoRAG to AI Search. createAutoRAGstill works but emits a one-time deprecation warning pointing tocreateAISearch.createAutoRAGpreserves"autorag.chat"as the provider name for backward compatibility.- AI Search now warns when tools or JSON response format are provided (unsupported by the
aiSearchAPI). - Simplified AI Search internals — removed dead tool/response-format processing code.
Code quality:
- Removed dead code:
workersai-error.ts(never imported),workersai-image-config.ts(inlined). - Consistent file naming: renamed
workers-ai-embedding-model.tstoworkersai-embedding-model.ts. - Replaced
StringLikecatch-all index signatures with[key: string]: unknownon settings types. - Replaced
anytypes with proper interfaces (FlatToolCall,OpenAIToolCall,PartialToolCall). - Tightened
processToolCallformat detection to checkfunction.nameinstead of just the presence of afunctionproperty. - Removed
@ai-sdk/provider-utilsandzodpeer dependencies (no longer used in source). - Added
imageModelto theWorkersAIinterface type for consistency.
Tests:
- 149 unit tests across 10 test files (up from 82).
- New test coverage:
sanitizeToolCallId,normalizeMessagesForBinding,prepareToolsAndToolChoice,processText,mapWorkersAIUsage, image model output types, streaming error scenarios (malformed SSE, premature termination, empty stream), backpressure verification, graceful degradation (non-streaming fallback with text/tools/reasoning), REST API error handling (401/404/500), AI Search warnings, embeddingTooManyEmbeddingValuesForCallError, message conversion with images and reasoning. - Integration tests for REST API and binding across 12 models and 7 categories (chat, streaming, multi-turn, tool calling, tool round-trip, structured output, image generation, embeddings).
- All tests use the AI SDK's public APIs (
generateText,streamText,generateImage,embedMany) instead of internal.doGenerate()/.doStream()methods.
README:
- Rewritten from scratch with concise examples, model recommendations, configuration guide, and known limitations section.
- Updated to use current AI SDK v6 APIs (
generateText+Output.objectinstead of deprecatedgenerateObject,generateImageinstead ofexperimental_generateImage,stopWhen: stepCountIs(2)instead ofmaxSteps). - Added sections for tool calling, structured output, embeddings, image generation, and AI Search.
- Uses
wrangler.jsoncformat for configuration examples.
- Fixed phantom dependency on
-
#390
41b92a3Thanks @mchenco! - fix(workers-ai-provider): extract actual finish reason in streaming instead of hardcoded "stop"Previously, the streaming implementation always returned
finishReason: "stop"regardless of the actual completion reason. This caused:- Tool calling scenarios to incorrectly report "stop" instead of "tool-calls"
- Multi-turn tool conversations to fail because the AI SDK couldn't detect when tools were requested
- Length limit scenarios to show "stop" instead of "length"
- Error scenarios to show "stop" instead of "error"
The fix extracts the actual
finish_reasonfrom streaming chunks and uses the existingmapWorkersAIFinishReason()function to properly map it to the AI SDK's finish reason format. This enables proper multi-turn tool calling and accurate completion status reporting.
- #384
0947ea2Thanks @mchenco! - fix(workers-ai-provider): preserve tool call IDs in conversation history
e5b0138Thanks @threepointone! - update deps
- #338
cd9e93cThanks @threepointone! - migrate to ai sdk v6
- #339
ea16584Thanks @threepointone! - remove blank tags array
- #336
23aa670Thanks @threepointone! - update dependencies
-
#256
a538901Thanks @jahands! - feat: Migrate to AI SDK v5This updates workers-ai-provider and ai-gateway-provider to use the AI SDK v5. Please refer to the official migration guide to migrate your code https://ai-sdk.dev/docs/migration-guides/migration-guide-5-0
- #216
26e5fdbThanks @wussh! - Improve documentation by adding generateText example to workers-ai-provider and clarifying supported methods in ai-gateway-provider.
- #261
50fad0fThanks @threepointone! - fix: pass a tool call id and read it back out for tool calls
- #258
b1ee224Thanks @threepointone! - fix: don't crash if a model response has only tool calls
- #233
836bc3dThanks @JoaquinGimenez1! - Process Text from response content
- #231
143a384Thanks @JoaquinGimenez1! - Adds support for getting delta content
- #205
804804bThanks @JoaquinGimenez1! - Adds support for Chat Completions API responses
414f85cThanks @threepointone! - Trigger a release
- #206
f7aa30dThanks @threepointone! - update dependencies
- #197
6506faaThanks @JoaquinGimenez1! - Add rawResponse from Workers AI
c9d5636Thanks @threepointone! - update dependencies
- #181
9f5562aThanks @JoaquinGimenez1! - Adds support for new tool call format during streaming
de992e6Thanks @threepointone! - trigger a release for reverted change
- #170
4f57e61Thanks @JoaquinGimenez1! - Support new tool call format on streaming responses
7cc3626Thanks @threepointone! - trigger a release to pick up new deps
- #163
6b25ed7Thanks @andyjessop! - feat: adds support for embed and embedMany
ac0693dThanks @threepointone! - For #126; thanks @jokull for adding AutoRAG support to workers-ai-provider
- #153
ae5ac12Thanks @JoaquinGimenez1! - Add support for new tool call format
3ba9ac5Thanks @threepointone! - Update dependencies
- #72
9b8dfc1Thanks @andyjessop! - feat: allow passthrough options as model settings
- #65
b17cf52Thanks @andyjessop! - fix: gracefully handles streaming chunk without response property
- #47
e000b7cThanks @andyjessop! - chore: implement generateImage function
- #41
5bffa40Thanks @andyjessop! - feat: adds the ability to use the provider outside of the workerd environment by providing Cloudflare accountId/apiKey credentials.
- #39
9add2b5Thanks @andyjessop! - Trigger release for recent bug fixes
- #35
9e74cc9Thanks @andyjessop! - Ensures that tool call data is available to model, by providing the JSON of the tool call as the content in the assistant message.
- #32
9ffc5b8Thanks @andyjessop! - Fixes structured outputs
- #29
762b37bThanks @threepointone! - trigger a minor release
-
#27
add4120Thanks @jiang-zhexin! - Exclude BaseAiTextToImage model -
#23
b15ad06Thanks @andyjessop! - Fix streaming output by ensuring that events is only called once per stream -
#26
6868be7Thanks @andyjessop! - configures AI Gateway to work with streamText
- #21
6e71dd2Thanks @andyjessop! - Fixes tool calling for generateText
eddaf37Thanks @threepointone! - update dependencies
d16ae4cThanks @threepointone! - update readme
deacf87Thanks @threepointone! - fix some types and buffering
bc6408cThanks @threepointone! - try another release
2a470cbThanks @threepointone! - publish
30e7eadThanks @threepointone! - try to trigger a build
4e967afThanks @threepointone! - fix readme, stray console log
-
66e48bcThanks @threepointone! - 🫧 -
3e15260Thanks @threepointone! - fix example
294c9a9Thanks @threepointone! - try to do a release