Skip to content

Support image inputs for vision chat models#448

Merged
threepointone merged 1 commit intomainfrom
vision-inputs
Mar 19, 2026
Merged

Support image inputs for vision chat models#448
threepointone merged 1 commit intomainfrom
vision-inputs

Conversation

@threepointone
Copy link
Collaborator

Summary

  • Fix image input handling in workers-ai-provider and @cloudflare/tanstack-ai for vision-capable chat models (Llama 4 Scout, Kimi K2.5, etc.)
  • Handle all LanguageModelV3DataContent variants (Uint8Array, base64 string, data URL) — previously only Uint8Array was handled, silently dropping base64 and data URL inputs
  • Send images as OpenAI-compatible image_url content parts inline in messages, which works with both the binding and REST API paths
  • Add Vision sections to both READMEs with usage examples

What changed

workers-ai-provider

  • convert-to-workersai-chat-messages.ts: Added toUint8Array (normalises all data content types) and uint8ArrayToBase64 (chunked encoder). File parts are now converted to image_url content parts in the messages array.
  • workersai-chat-prompt.ts: Added WorkersAIContentPart type, widened WorkersAIUserMessage.content to string | WorkersAIContentPart[]
  • workersai-chat-language-model.ts: Simplified buildRunInputs — both REST and binding paths pass content arrays through directly
  • Added 17 unit tests for image handling, e2e vision tests for Llama 4 Scout + Kimi K2.5 (REST) and uform-gen2 (binding)

@cloudflare/tanstack-ai

  • Updated normalizeMessagesForBinding docs to reflect that content arrays pass through (binding accepts them at runtime)
  • Updated tests to expect content arrays in binding path

Test plan

  • workers-ai-provider: 210 unit tests pass, tsc --noEmit clean
  • @cloudflare/tanstack-ai: 219 unit tests pass, tsc --noEmit clean
  • E2E: Llama 4 Scout and Kimi K2.5 correctly describe test images via REST API
  • E2E: Confirmed content arrays work through the binding for both Llama 4 Scout and Kimi K2.5

Made with Cursor

Convert various image sources into OpenAI-compatible image_url parts and send them inline in chat messages so vision-capable models work via both binding and REST paths. Key changes:

- convertToWorkersAIChatMessages: accept LanguageModelV3DataContent (Uint8Array, base64, data URL), normalize to bytes, and emit content arrays with image_url data: URLs; removed the separate images array.
- workers-ai-provider: allow messages.content to be either string or content-part arrays, normalize binding messages but pass content arrays through at runtime.
- workersai-chat-language-model / create-fetcher: stop extracting a separate image payload and instead include content arrays in inputs; cast inputs for binding runtime use.
- Tests and e2e fixtures: added/updated tests for base64, data URLs, multiple images, REST & binding vision flows; updated mock binding worker to handle vision route.
- Docs: added Vision (Image Inputs) usage examples to READMEs.

This enables sending images (Uint8Array, base64, or data URLs) inline as image_url parts so models like Llama 4 Scout and Kimi K2.5 can perform vision tasks.
@changeset-bot
Copy link

changeset-bot bot commented Mar 19, 2026

🦋 Changeset detected

Latest commit: 054ccb8

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 2 packages
Name Type
workers-ai-provider Patch
@cloudflare/tanstack-ai Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@pkg-pr-new
Copy link

pkg-pr-new bot commented Mar 19, 2026

Open in StackBlitz

npx https://pkg.pr.new/cloudflare/ai/ai-gateway-provider@448
npx https://pkg.pr.new/cloudflare/ai/@cloudflare/tanstack-ai@448
npx https://pkg.pr.new/cloudflare/ai/workers-ai-provider@448

commit: 054ccb8

@threepointone threepointone merged commit 5f603ff into main Mar 19, 2026
3 checks passed
@threepointone threepointone deleted the vision-inputs branch March 19, 2026 12:54
@github-actions github-actions bot mentioned this pull request Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant