Skip to content

[vllm-omni]: Omni Quant Support #1507

@thuang6

Description

@thuang6

Feature Description

This epic is created to track Omni model (vllm-omni targeted) quantization support status.
main dtypes includes: INT4, MXFP8, MXFP4 is future target

Motivation and Use Case

omni model enable status:

model architecture owner INT4 MXFP8
Qwen3-omni Qwen3OmniMoeForConditionalGeneration Liang ok
Qwen2.5-omni Qwen2_5OmniForConditionalGeneration Liang ok
HunyuanImage-3.0 HunyuanImage3ForCausalMM Chang ok
Z-Image ZImageTransformer2DModel Xin ok
Next-Step ZImageTransformer2DModel Xin WIP (Feature request in vllm-omni)
MiMo-Audio-7B-Instruct MiMoAudioForCausalLM Weiwei fail
Qwen3-TTS-12Hz-1.7B-CustomVoice Qwen3TTSForConditionalGeneration Weiwei fail
GLM-Image GlmImageForConditionalGeneration GlmImageTransformer2DModel Liang ok
BAGEL-7B-MoT BagelForConditionalGeneration Liang
Ovis-Image OvisImageTransformer2DModel Liang

Alternatives Considered

No response

Definition of Done

No response

Additional Context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions