Skip to content

Fix run_clm.py for streaming datasets#2309

Merged
regisss merged 1 commit intohuggingface:mainfrom
HabanaAI:dev/pbielak/fix-clm-script
Oct 17, 2025
Merged

Fix run_clm.py for streaming datasets#2309
regisss merged 1 commit intohuggingface:mainfrom
HabanaAI:dev/pbielak/fix-clm-script

Conversation

@pbielak
Copy link
Collaborator

@pbielak pbielak commented Oct 16, 2025

What does this PR do?

The variable max_eval_samples in L769 was referenced before assignment. This commit fixes the computation of eval_samples for the metrics report by computing it in a similar way as for the training split.

Note:

  • bug introduced during Transformers 4.55 upgrade
  • applies only to scenarios with --streaming enabled

The variable 'max_eval_samples' in L769 was referenced before
assignment. This commit fixes the computation of `eval_samples`
for the metrics report by computing it in a similar way as for the
training split.
@pbielak
Copy link
Collaborator Author

pbielak commented Oct 16, 2025

Needs to be cherry-picked to v1.20-release

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@pbielak pbielak marked this pull request as ready for review October 17, 2025 09:27
@pbielak pbielak requested a review from regisss as a code owner October 17, 2025 09:27
Copy link
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@regisss regisss merged commit f5ed4bf into huggingface:main Oct 17, 2025
3 of 5 checks passed
regisss pushed a commit that referenced this pull request Oct 17, 2025
Co-authored-by: Piotr Bielak <pbielak@habana.ai>
@regisss
Copy link
Collaborator

regisss commented Oct 17, 2025

Needs to be cherry-picked to v1.20-release

Done

@pbielak pbielak deleted the dev/pbielak/fix-clm-script branch October 20, 2025 08:23
gplutop7 pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Nov 6, 2025
…ce#779)

Co-authored-by: Piotr Bielak <pbielak@users.noreply.github.com>
Co-authored-by: Piotr Bielak <pbielak@habana.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants