Lm_eval static generation improved by 12010486 · Pull Request #2241 · huggingface/optimum-habana

12010486 · 2025-09-03T13:47:45Z

Main changes as below:

Precision and device support improvements

Added a mixed_precision_dtype parameter to the ModelAdapter class, allowing users to specify the desired precision for HPU autocasting. The torch.autocast context is now used during generation, improving performance and memory usage on HPUs.

Static shape and padding logic

Improved static shape bucket management: the code now calculates the required left-padding for input contexts to fit the selected bucket size, ensuring correct input shapes for HPUs. This includes padding both the context and the attention mask.

Argument and API updates

Updated the default input length buckets in run_lm_eval.py to remove the largest value. It might caused mismatch with previous legacy results
Added new arguments (ignore_eos) and updated references for compatibility with the latest version of the underlying evaluation harness.

Note: ignore_eos is passed but not used, as it is decreasing accuracy results.

How to test the impact

The command below was producing an OOM error, now on G2 memory is ~ 27 GB

PT_HPU_LAZY_MODE=1 python run_lm_eval.py --model_name_or_path meta-llama/Llama-3.1-8B-Instruct \
--attn_softmax_bf16 --use_hpu_graphs --limit_hpu_graphs   --use_kv_cache --bf16 --sdp_on_bf16 --trim_logits \
--batch_size=4 --tasks gsm8k_cot_llama -o eval_gsm8k.json --num_fewshot=8 \
--fewshot_as_multiturn --apply_chat_template True

HuggingFaceDocBuilderDev · 2025-09-03T13:51:55Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

12010486 · 2025-09-03T16:31:19Z

Checking this PR on other lm tasks I've seen a drop wrt to the numbers I was having before (on mbpp or humaneval, for example) so converting it to draft while I investigate more

astachowiczhabana · 2025-09-08T13:05:46Z

examples/text-generation/model_adapter.py

+            bucket_length = self.find_bucket(context.shape[1])
+            padding_length = bucket_length - context.shape[1]
            max_gen_toks = max_length - context.shape[1]
+            if padding_length > 0 and self.hpu_graphs:


Hi @12010486
This code is to counter-effect tokenizer with hardcoded padding right?

Correct, we could have done it also modifying this function https://github.com/EleutherAI/lm-evaluation-harness/blob/v0.4.7/lm_eval/models/huggingface.py/#L858 but I wanted to avoid another function to patch

astachowiczhabana · 2025-09-08T13:06:32Z

examples/text-generation/run_lm_eval.py

        nargs="+",
        help="Input length buckets to use with static_shapes",
-        default=[16, 32, 64, 128, 189, 284, 384, 985],
+        default=[16, 32, 64, 128, 189, 284, 384],


is this change intentional?

Yes, but thanks for double checking. I've introduced it because on v1.19 there was this commit, for granite accuracy, so I wanted to be sure not to introduce back a known regression. 0222c48

regisss

LGTM

Co-authored-by: Silvia Colabrese <silvia.colabrese@intel.com>

12010486 added 3 commits September 3, 2025 10:11

Add torch.autocast in _model_generate() + minor

ff88ba9

Improve generation for Lazy mode (static input)

09281bd

Revert utils.py changes - separate PR

211abc5

12010486 requested a review from regisss as a code owner September 3, 2025 13:47

12010486 requested review from astachowiczhabana and removed request for regisss September 3, 2025 13:48

12010486 marked this pull request as draft September 3, 2025 16:31

12010486 added 4 commits September 4, 2025 16:23

Merge branch 'huggingface:main' into lm_eval_static_generation

4813f66

Max_gen_toks more flexible

6099239

Spare padding for eager

0012dde

Right padding

9cf3581

12010486 marked this pull request as ready for review September 5, 2025 14:23

astachowiczhabana closed this Sep 8, 2025

astachowiczhabana reopened this Sep 8, 2025

astachowiczhabana reviewed Sep 8, 2025

View reviewed changes

karol-brejna-i assigned AKloniecki Sep 11, 2025

regisss approved these changes Sep 11, 2025

View reviewed changes

regisss merged commit 962056c into huggingface:main Sep 11, 2025
4 of 9 checks passed

astachowiczhabana pushed a commit that referenced this pull request Sep 12, 2025

Lm_eval static generation improved (#2241)

6585d71

astachowiczhabana pushed a commit that referenced this pull request Sep 17, 2025

Lm_eval static generation improved (#2241)

a2e38ee

gplutop7 pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Oct 15, 2025

Lm_eval static generation improved (huggingface#2241) (huggingface#678)

8a6cfe1

Co-authored-by: Silvia Colabrese <silvia.colabrese@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lm_eval static generation improved#2241

Lm_eval static generation improved#2241
regisss merged 7 commits intohuggingface:mainfrom
12010486:lm_eval_static_generation

12010486 commented Sep 3, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Sep 3, 2025

Uh oh!

12010486 commented Sep 3, 2025

Uh oh!

astachowiczhabana Sep 8, 2025

Uh oh!

12010486 Sep 8, 2025

Uh oh!

astachowiczhabana Sep 8, 2025

Uh oh!

12010486 Sep 8, 2025

Uh oh!

regisss left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

12010486 commented Sep 3, 2025

Precision and device support improvements

Static shape and padding logic

Argument and API updates

How to test the impact

Uh oh!

HuggingFaceDocBuilderDev commented Sep 3, 2025

Uh oh!

12010486 commented Sep 3, 2025

Uh oh!

astachowiczhabana Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

12010486 Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

astachowiczhabana Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

12010486 Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

regisss left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants