Skip to content

Latest commit

 

History

History
451 lines (259 loc) · 26.3 KB

File metadata and controls

451 lines (259 loc) · 26.3 KB

workers-ai-provider

3.1.5

Patch Changes

  • #451 2a62e23 Thanks @mchenco! - Fix reasoning content being concatenated into assistant message content in multi-turn conversations

    Previously, reasoning parts in assistant messages were concatenated into the content string when building message history. This caused models like kimi-k2.5 and deepseek-r1 to receive their own internal reasoning as if it were spoken text, corrupting the conversation history and resulting in empty text responses or leaked special tokens on subsequent turns.

    Reasoning parts are now sent as the reasoning field on the assistant message object, which is the field name vLLM expects on input for reasoning models (kimi-k2.5, glm-4.7-flash).

3.1.4

Patch Changes

  • #448 054ccb8 Thanks @threepointone! - Fix image inputs for vision-capable chat models
    • Handle all LanguageModelV3DataContent variants (Uint8Array, base64 string, data URL) instead of only Uint8Array
    • Send images as OpenAI-compatible image_url content parts inline in messages, enabling vision for models like Llama 4 Scout and Kimi K2.5
    • Works with both the binding and REST API paths

3.1.3

Patch Changes

  • #429 ae24f06 Thanks @michaeldwan! - Pass tool_choice through to binding.run() so tool selection mode (auto, required, none) is respected when using Workers AI with the binding API

  • #410 bc2eba3 Thanks @vaibhavshn! - fix: route REST API requests through AI Gateway when the gateway option is provided in createRun()

  • #446 3c35051 Thanks @threepointone! - Remove tool_call_id sanitization that truncated IDs to 9 alphanumeric chars, which caused all tool call IDs to collide after round-trip

  • #444 b1c742b Thanks @mchenco! - Add sessionAffinity setting to send x-session-affinity header for prefix-cache optimization. Also forward extraHeaders in the REST API path instead of discarding them.

3.1.2

Patch Changes

  • #400 8822603 Thanks @threepointone! - Add early config validation to createWorkersAI that throws a clear error when neither a binding nor credentials (accountId + apiKey) are provided. Widen all model type parameters (TextGenerationModels, ImageGenerationModels, EmbeddingModels, TranscriptionModels, SpeechModels, RerankingModels) to accept arbitrary strings while preserving autocomplete for known models.

3.1.1

Patch Changes

  • #396 2fb3ca8 Thanks @threepointone! - - Rewrite README with updated model recommendations (GPT-OSS 120B, EmbeddingGemma 300M, Aura-2 EN)
    • Stream tool calls incrementally using tool-input-start/delta/end events instead of buffering until stream end
    • Fix REST streaming for models that don't support it on /ai/run/ (GPT-OSS, Kimi) by retrying without streaming
    • Add Aura-2 EN/ES to SpeechModels type
    • Log malformed SSE events with console.warn instead of silently swallowing

3.1.0

Minor Changes

  • #389 8538cd5 Thanks @vaibhavshn! - Add transcription, text-to-speech, and reranking support to the Workers AI provider.

    New capabilities

    • Transcription (provider.transcription(model)) — implements TranscriptionModelV3. Supports Whisper models (@cf/openai/whisper, whisper-tiny-en, whisper-large-v3-turbo) and Deepgram Nova-3 (@cf/deepgram/nova-3). Handles model-specific input formats: number arrays for basic Whisper, base64 for v3-turbo via REST, and { body, contentType } for Nova-3 via binding or raw binary upload for Nova-3 via REST.

    • Speech / TTS (provider.speech(model)) — implements SpeechModelV3. Supports Workers AI TTS models including Deepgram Aura-1 (@cf/deepgram/aura-1). Accepts text, voice, and speed options. Returns audio as Uint8Array. Uses returnRawResponse to handle binary audio from the REST path without JSON parsing.

    • Reranking (provider.reranking(model)) — implements RerankingModelV3. Supports BGE reranker models (@cf/baai/bge-reranker-base, bge-reranker-v2-m3). Converts AI SDK's document format to Workers AI's { query, contexts, top_k } input. Handles both text and JSON object documents.

    Bug fixes

    • AbortSignal passthroughcreateRun REST shim now passes the abort signal to fetch, enabling request cancellation and timeout handling. Previously the signal was silently dropped.
    • Nova-3 REST support — Added createRunBinary utility for models that require raw binary upload instead of JSON (used by Nova-3 transcription via REST).

    Usage

    import { createWorkersAI } from "workers-ai-provider";
    import { experimental_transcribe, experimental_generateSpeech, rerank } from "ai";
    
    const workersai = createWorkersAI({ binding: env.AI });
    
    // Transcription
    const transcript = await experimental_transcribe({
    	model: workersai.transcription("@cf/openai/whisper-large-v3-turbo"),
    	audio: audioData,
    	mediaType: "audio/wav",
    });
    
    // Speech
    const speech = await experimental_generateSpeech({
    	model: workersai.speech("@cf/deepgram/aura-1"),
    	text: "Hello world",
    	voice: "asteria",
    });
    
    // Reranking
    const ranked = await rerank({
    	model: workersai.reranking("@cf/baai/bge-reranker-base"),
    	query: "What is machine learning?",
    	documents: ["ML is a branch of AI.", "The weather is sunny."],
    });

3.0.5

Patch Changes

  • #393 91b32e0 Thanks @threepointone! - Comprehensive cleanup of the workers-ai-provider package.

    Bug fixes:

    • Fixed phantom dependency on fetch-event-stream that caused runtime crashes when installed outside the monorepo. Replaced with a built-in SSE parser.
    • Fixed streaming buffering: responses now stream token-by-token instead of arriving all at once. The root cause was twofold — an eager ReadableStream start() pattern that buffered all chunks, and a heuristic that silently fell back to non-streaming doGenerate whenever tools were defined. Both are fixed. Streaming now uses a proper TransformStream pipeline with backpressure.
    • Fixed reasoning-delta ID mismatch in simulated streaming — was using generateId() instead of the reasoningId from the preceding reasoning-start event, causing the AI SDK to drop reasoning content.
    • Fixed REST API client (createRun) silently swallowing HTTP errors. Non-200 responses now throw with status code and response body.
    • Fixed response_format being sent as undefined on every non-JSON request. Now only included when actually set.
    • Fixed json_schema field evaluating to false (a boolean) instead of undefined when schema was missing.

    Workers AI quirk workarounds:

    • Added sanitizeToolCallId() — strips non-alphanumeric characters and pads/truncates to 9 chars, fixing tool call round-trips through the binding which rejects its own generated IDs.
    • Added normalizeMessagesForBinding() — converts content: null to "" and sanitizes tool call IDs before every binding call. Only applied on the binding path (REST preserves original IDs).
    • Added null-finalization chunk filtering for streaming tool calls.
    • Added numeric value coercion in native-format streams (Workers AI sometimes returns numbers instead of strings for the response field).
    • Improved image model to handle all output types from binding.run(): ReadableStream, Uint8Array, ArrayBuffer, Response, and { image: base64 } objects.
    • Graceful degradation: if binding.run() returns a non-streaming response despite stream: true, it wraps the complete response as a simulated stream instead of throwing.

    Premature stream termination detection:

    • Streams that end without a [DONE] sentinel now report finishReason: "error" with raw: "stream-truncated" instead of silently reporting "stop".
    • Stream read errors are caught and emit finishReason: "error" with raw: "stream-error".

    AI Search (formerly AutoRAG):

    • Added createAISearch and AISearchChatLanguageModel as the canonical exports, reflecting the rename from AutoRAG to AI Search.
    • createAutoRAG still works but emits a one-time deprecation warning pointing to createAISearch.
    • createAutoRAG preserves "autorag.chat" as the provider name for backward compatibility.
    • AI Search now warns when tools or JSON response format are provided (unsupported by the aiSearch API).
    • Simplified AI Search internals — removed dead tool/response-format processing code.

    Code quality:

    • Removed dead code: workersai-error.ts (never imported), workersai-image-config.ts (inlined).
    • Consistent file naming: renamed workers-ai-embedding-model.ts to workersai-embedding-model.ts.
    • Replaced StringLike catch-all index signatures with [key: string]: unknown on settings types.
    • Replaced any types with proper interfaces (FlatToolCall, OpenAIToolCall, PartialToolCall).
    • Tightened processToolCall format detection to check function.name instead of just the presence of a function property.
    • Removed @ai-sdk/provider-utils and zod peer dependencies (no longer used in source).
    • Added imageModel to the WorkersAI interface type for consistency.

    Tests:

    • 149 unit tests across 10 test files (up from 82).
    • New test coverage: sanitizeToolCallId, normalizeMessagesForBinding, prepareToolsAndToolChoice, processText, mapWorkersAIUsage, image model output types, streaming error scenarios (malformed SSE, premature termination, empty stream), backpressure verification, graceful degradation (non-streaming fallback with text/tools/reasoning), REST API error handling (401/404/500), AI Search warnings, embedding TooManyEmbeddingValuesForCallError, message conversion with images and reasoning.
    • Integration tests for REST API and binding across 12 models and 7 categories (chat, streaming, multi-turn, tool calling, tool round-trip, structured output, image generation, embeddings).
    • All tests use the AI SDK's public APIs (generateText, streamText, generateImage, embedMany) instead of internal .doGenerate()/.doStream() methods.

    README:

    • Rewritten from scratch with concise examples, model recommendations, configuration guide, and known limitations section.
    • Updated to use current AI SDK v6 APIs (generateText + Output.object instead of deprecated generateObject, generateImage instead of experimental_generateImage, stopWhen: stepCountIs(2) instead of maxSteps).
    • Added sections for tool calling, structured output, embeddings, image generation, and AI Search.
    • Uses wrangler.jsonc format for configuration examples.

3.0.4

Patch Changes

  • #390 41b92a3 Thanks @mchenco! - fix(workers-ai-provider): extract actual finish reason in streaming instead of hardcoded "stop"

    Previously, the streaming implementation always returned finishReason: "stop" regardless of the actual completion reason. This caused:

    • Tool calling scenarios to incorrectly report "stop" instead of "tool-calls"
    • Multi-turn tool conversations to fail because the AI SDK couldn't detect when tools were requested
    • Length limit scenarios to show "stop" instead of "length"
    • Error scenarios to show "stop" instead of "error"

    The fix extracts the actual finish_reason from streaming chunks and uses the existing mapWorkersAIFinishReason() function to properly map it to the AI SDK's finish reason format. This enables proper multi-turn tool calling and accurate completion status reporting.

3.0.3

Patch Changes

  • #384 0947ea2 Thanks @mchenco! - fix(workers-ai-provider): preserve tool call IDs in conversation history

3.0.2

Patch Changes

3.0.1

Patch Changes

3.0.0

Major Changes

2.0.2

Patch Changes

2.0.1

Patch Changes

2.0.0

Major Changes

Patch Changes

  • #216 26e5fdb Thanks @wussh! - Improve documentation by adding generateText example to workers-ai-provider and clarifying supported methods in ai-gateway-provider.

0.7.5

Patch Changes

  • #263 7b2745a Thanks @byule! - fix: use correct fieldname and format for tool_call ids

0.7.4

Patch Changes

0.7.3

Patch Changes

0.7.2

Patch Changes

0.7.1

Patch Changes

0.7.0

Minor Changes

0.6.5

Patch Changes

0.6.4

Patch Changes

0.6.3

Patch Changes

0.6.2

Patch Changes

0.6.1

Patch Changes

0.6.0

Minor Changes

0.5.3

Patch Changes

0.5.2

Patch Changes

0.5.1

Patch Changes

0.5.0

Minor Changes

0.4.1

Patch Changes

  • ac0693d Thanks @threepointone! - For #126; thanks @jokull for adding AutoRAG support to workers-ai-provider

0.4.0

Minor Changes

0.3.2

Patch Changes

0.3.1

Patch Changes

0.3.0

Minor Changes

0.2.2

Patch Changes

0.2.1

Patch Changes

0.2.0

Minor Changes

  • #41 5bffa40 Thanks @andyjessop! - feat: adds the ability to use the provider outside of the workerd environment by providing Cloudflare accountId/apiKey credentials.

0.1.3

Patch Changes

0.1.2

Patch Changes

  • #35 9e74cc9 Thanks @andyjessop! - Ensures that tool call data is available to model, by providing the JSON of the tool call as the content in the assistant message.

0.1.1

Patch Changes

0.1.0

Minor Changes

0.0.13

Patch Changes

0.0.12

Patch Changes

0.0.11

Patch Changes

0.0.10

Patch Changes

0.0.9

Patch Changes

0.0.8

Patch Changes

0.0.7

Patch Changes

0.0.6

Patch Changes

0.0.5

Patch Changes

0.0.4

Patch Changes

0.0.3

Patch Changes