Fix `run_clm.py` for streaming datasets by pbielak · Pull Request #2309 · huggingface/optimum-habana

pbielak · 2025-10-16T11:23:00Z

What does this PR do?

The variable max_eval_samples in L769 was referenced before assignment. This commit fixes the computation of eval_samples for the metrics report by computing it in a similar way as for the training split.

Note:

bug introduced during Transformers 4.55 upgrade
applies only to scenarios with --streaming enabled

The variable 'max_eval_samples' in L769 was referenced before assignment. This commit fixes the computation of `eval_samples` for the metrics report by computing it in a similar way as for the training split.

pbielak · 2025-10-16T11:23:53Z

Needs to be cherry-picked to v1.20-release

HuggingFaceDocBuilderDev · 2025-10-16T11:27:38Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

regisss

LGTM

Co-authored-by: Piotr Bielak <pbielak@habana.ai>

regisss · 2025-10-17T16:15:38Z

Needs to be cherry-picked to v1.20-release

Done

…ce#779) Co-authored-by: Piotr Bielak <pbielak@users.noreply.github.com> Co-authored-by: Piotr Bielak <pbielak@habana.ai>

Fix run_clm.py for streaming datasets

e1a853d

The variable 'max_eval_samples' in L769 was referenced before assignment. This commit fixes the computation of `eval_samples` for the metrics report by computing it in a similar way as for the training split.

karol-brejna-i assigned pbielak Oct 17, 2025

pbielak marked this pull request as ready for review October 17, 2025 09:27

pbielak requested a review from regisss as a code owner October 17, 2025 09:27

regisss approved these changes Oct 17, 2025

View reviewed changes

regisss merged commit f5ed4bf into huggingface:main Oct 17, 2025
3 of 5 checks passed

regisss pushed a commit that referenced this pull request Oct 17, 2025

Fix run_clm.py for streaming datasets (#2309)

855b840

Co-authored-by: Piotr Bielak <pbielak@habana.ai>

pbielak deleted the dev/pbielak/fix-clm-script branch October 20, 2025 08:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `run_clm.py` for streaming datasets#2309

Fix `run_clm.py` for streaming datasets#2309
regisss merged 1 commit intohuggingface:mainfrom
HabanaAI:dev/pbielak/fix-clm-script

pbielak commented Oct 16, 2025

Uh oh!

pbielak commented Oct 16, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Oct 16, 2025

Uh oh!

regisss left a comment

Uh oh!

Uh oh!

regisss commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pbielak commented Oct 16, 2025

What does this PR do?

Uh oh!

pbielak commented Oct 16, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Oct 16, 2025

Uh oh!

regisss left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

regisss commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants