Lm_eval static generation improved#2241
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Checking this PR on other lm tasks I've seen a drop wrt to the numbers I was having before (on mbpp or humaneval, for example) so converting it to draft while I investigate more |
| bucket_length = self.find_bucket(context.shape[1]) | ||
| padding_length = bucket_length - context.shape[1] | ||
| max_gen_toks = max_length - context.shape[1] | ||
| if padding_length > 0 and self.hpu_graphs: |
There was a problem hiding this comment.
Hi @12010486
This code is to counter-effect tokenizer with hardcoded padding right?
There was a problem hiding this comment.
Correct, we could have done it also modifying this function https://github.com/EleutherAI/lm-evaluation-harness/blob/v0.4.7/lm_eval/models/huggingface.py/#L858 but I wanted to avoid another function to patch
| nargs="+", | ||
| help="Input length buckets to use with static_shapes", | ||
| default=[16, 32, 64, 128, 189, 284, 384, 985], | ||
| default=[16, 32, 64, 128, 189, 284, 384], |
There was a problem hiding this comment.
is this change intentional?
There was a problem hiding this comment.
Yes, but thanks for double checking. I've introduced it because on v1.19 there was this commit, for granite accuracy, so I wanted to be sure not to introduce back a known regression. 0222c48
Co-authored-by: Silvia Colabrese <silvia.colabrese@intel.com>
Main changes as below:
Precision and device support improvements
mixed_precision_dtypeparameter to theModelAdapterclass, allowing users to specify the desired precision for HPU autocasting. Thetorch.autocastcontext is now used during generation, improving performance and memory usage on HPUs.Static shape and padding logic
Argument and API updates
run_lm_eval.pyto remove the largest value. It might caused mismatch with previous legacy resultsignore_eos) and updated references for compatibility with the latest version of the underlying evaluation harness.How to test the impact
The command below was producing an OOM error, now on G2 memory is ~ 27 GB