-
Notifications
You must be signed in to change notification settings - Fork 72
Open
Description
用默认配置试了下hunyuanocr的fp8量化,效果断档下降,请问是否是默认配置的数据集问题?
另使用了hunyuanocr-eagle3,速度似乎没有明显提升,可能有1.16倍左右的提升,请问是否在正常范围内?以下是我的模型启动参数:
python3 -m vllm.entrypoints.openai.api_server
--host 0.0.0.0
--port 8090
--model ./huggingface_models/HunyuanOCR
--speculative-config '{"method": "eagle3", "model": "./huggingface_models/HunyuanOCR_eagle3", "num_speculative_tokens": 4, "max_model_len": 2048}'
--served-model-name HunyuanOCR
--pipeline_parallel_size 1
--tensor-parallel-size 2
--trust-remote-code
--gpu-memory-utilization 0.7
--max-model-len 8192
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels