Skip to content

hunyuanocr量化问题 #227

@Cliveliew

Description

@Cliveliew

用默认配置试了下hunyuanocr的fp8量化,效果断档下降,请问是否是默认配置的数据集问题?
另使用了hunyuanocr-eagle3,速度似乎没有明显提升,可能有1.16倍左右的提升,请问是否在正常范围内?以下是我的模型启动参数:
python3 -m vllm.entrypoints.openai.api_server
--host 0.0.0.0
--port 8090
--model ./huggingface_models/HunyuanOCR
--speculative-config '{"method": "eagle3", "model": "./huggingface_models/HunyuanOCR_eagle3", "num_speculative_tokens": 4, "max_model_len": 2048}'
--served-model-name HunyuanOCR
--pipeline_parallel_size 1
--tensor-parallel-size 2
--trust-remote-code
--gpu-memory-utilization 0.7
--max-model-len 8192

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions