Remove truncation logic, fix corresponding tests#508
Remove truncation logic, fix corresponding tests#508tyler-griggs merged 9 commits intoNovaSky-AI:mainfrom
Conversation
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
/gemini review |
|
/gemini review |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request removes a complex and buggy truncation logic from the agent_loop and fixes the corresponding tests. The core change is to conditionally append an EOS token based on the stop_reason, which simplifies the code and corrects the behavior for length-limited generations. The tests have been significantly improved by making the mock LLM generation more realistic and fixing the mock environment's behavior to respect max_turns.
My feedback focuses on some TODO comments and commented-out code in the tests that should be cleaned up before merging to improve code clarity and maintainability. Overall, the changes are a significant improvement in terms of correctness and simplicity.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a significant improvement by refactoring the trajectory generation logic in SkyRLGymGenerator. Moving the max_input_length check to the beginning of the agent loop for an early exit is a smart change that simplifies the control flow and improves efficiency by avoiding unnecessary generation and post-truncation. The corresponding removal of the complex truncation logic cleans up the codebase nicely. The conditional appending of the EOS token based on the stop_reason is also a correct and important fix.
The updates to the tests are thorough and correctly reflect the changes in the generator. Fixing the mock environment to respect max_turns and updating the assertions for reward placement and stop reasons in test_agent_loop_truncation_drops_out_of_range_rewards ensures the test is now correctly validating the intended behavior. Similarly, the adjustments in test_apply_overlong_filtering_non_batched make the test cases more consistent and accurate. Overall, this is a well-executed refactoring that improves both the implementation and its test coverage.
*Early exit agent loop on exceeding max_input_length to avoid post-generation truncation* This PR modifies the SkyRLGymGenerator to stop trajectory generation once the token count exceeds the maximum input length, instead of truncating trajectories after completion. This simplifies control flow and removes redundant truncation logic in skyrl_gym_generator.py. Additionally, the test was failing because the mock environment wasn't respecting the max_turns parameter and was returning incorrect rewards. The environment was hardcoded to do 2 turns regardless of max_turns setting, and the reward logic was placing rewards at the wrong token positions. **Changes:** - In 'skyrl_gym_generator', I added early-exit logic to stop trajectory generation once the token count exceeds max_input_length, preventing unnecessary continuation beyond limits. - In 'test_agent_loop_truncation_drops_out_of_range_rewards', I updated TruncEnv.step() logic to correctly handle max_turns=1 case (was prev. letting 2 turns happen, causing 9 = 5 error). I also fixed reward placement to expect reward at last EOS token position as we removed the premature truncation step such that we instead add the EOS token manually (thus, reward expected at last token) --> expect reward=2.0 at index 4 (EOS token). I also changed stop_reason expectation from "length" to "stop" - In 'test_apply_overlong_filtering_non_batched', we expect 5 tokens but got 6 because we add the EOS token. If we don't want to add an EOS token for stop_reason = "length", then I updated the skyrl_gym_generator logic to account for this. Additionally, there was another test with stop reason "length" but actually expected an EOS token so I changed the stop reason to "stop" and that resolved test errors. If the stop reason = length, we do not add the EOS token. If the stop reason = stop, we do add the EOS token. We update the loss mask and reward placement accordingly. --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Tyler Griggs <131809874+tyler-griggs@users.noreply.github.com>
Currently, the agent_loop can return trajectories that are "too long", namely the length of `prompt_ids` + `response_ids` can be higher than `max_input_length + max_tokens`. This is because, even if the loop will be interrupted due to a "length too long" introduced in #508 , `response_ids` doesn't _just_ contain the last response generated by the model, but also the content of `new_obs`, that is unbound. Thus the env's last step can append an arbitrarily long sequence As these last observations are not actually needed for the gradient, I fixed the issue by always removing them. Alternatively, the loop could be reworked, for example by keeping new observation tokens in a separate object. Feel free to suggest a different approach The PR was tested with our code, but not with the tests. It is clearly non-exhaustive as it only passed the "if" branch.
*Early exit agent loop on exceeding max_input_length to avoid post-generation truncation* This PR modifies the SkyRLGymGenerator to stop trajectory generation once the token count exceeds the maximum input length, instead of truncating trajectories after completion. This simplifies control flow and removes redundant truncation logic in skyrl_gym_generator.py. Additionally, the test was failing because the mock environment wasn't respecting the max_turns parameter and was returning incorrect rewards. The environment was hardcoded to do 2 turns regardless of max_turns setting, and the reward logic was placing rewards at the wrong token positions. **Changes:** - In 'skyrl_gym_generator', I added early-exit logic to stop trajectory generation once the token count exceeds max_input_length, preventing unnecessary continuation beyond limits. - In 'test_agent_loop_truncation_drops_out_of_range_rewards', I updated TruncEnv.step() logic to correctly handle max_turns=1 case (was prev. letting 2 turns happen, causing 9 = 5 error). I also fixed reward placement to expect reward at last EOS token position as we removed the premature truncation step such that we instead add the EOS token manually (thus, reward expected at last token) --> expect reward=2.0 at index 4 (EOS token). I also changed stop_reason expectation from "length" to "stop" - In 'test_apply_overlong_filtering_non_batched', we expect 5 tokens but got 6 because we add the EOS token. If we don't want to add an EOS token for stop_reason = "length", then I updated the skyrl_gym_generator logic to account for this. Additionally, there was another test with stop reason "length" but actually expected an EOS token so I changed the stop reason to "stop" and that resolved test errors. If the stop reason = length, we do not add the EOS token. If the stop reason = stop, we do add the EOS token. We update the loss mask and reward placement accordingly. --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Tyler Griggs <131809874+tyler-griggs@users.noreply.github.com>
Currently, the agent_loop can return trajectories that are "too long", namely the length of `prompt_ids` + `response_ids` can be higher than `max_input_length + max_tokens`. This is because, even if the loop will be interrupted due to a "length too long" introduced in NovaSky-AI#508 , `response_ids` doesn't _just_ contain the last response generated by the model, but also the content of `new_obs`, that is unbound. Thus the env's last step can append an arbitrarily long sequence As these last observations are not actually needed for the gradient, I fixed the issue by always removing them. Alternatively, the loop could be reworked, for example by keeping new observation tokens in a separate object. Feel free to suggest a different approach The PR was tested with our code, but not with the tests. It is clearly non-exhaustive as it only passed the "if" branch.
Early exit agent loop on exceeding max_input_length to avoid post-generation truncation
This PR modifies the SkyRLGymGenerator to stop trajectory generation once the token count exceeds the maximum input length, instead of truncating trajectories after completion. This simplifies control flow and removes redundant truncation logic in skyrl_gym_generator.py. Additionally, the test was failing because the mock environment wasn't respecting the max_turns parameter and was returning incorrect rewards. The environment was hardcoded to do 2 turns regardless of max_turns setting, and the reward logic was placing rewards at the wrong token positions.
Changes:
In 'skyrl_gym_generator', I added early-exit logic to stop trajectory generation once the token count exceeds max_input_length, preventing unnecessary continuation beyond limits.
In 'test_agent_loop_truncation_drops_out_of_range_rewards', I updated TruncEnv.step() logic to correctly handle max_turns=1 case (was prev. letting 2 turns happen, causing 9 = 5 error). I also fixed reward placement to expect reward at last EOS token position as we removed the premature truncation step such that we instead add the EOS token manually (thus, reward expected at last token) --> expect reward=2.0 at index 4 (EOS token). I also changed stop_reason expectation from "length" to "stop"
In 'test_apply_overlong_filtering_non_batched', we expect 5 tokens but got 6 because we add the EOS token. If we don't want to add an EOS token for stop_reason = "length", then I updated the skyrl_gym_generator logic to account for this. Additionally, there was another test with stop reason "length" but actually expected an EOS token so I changed the stop reason to "stop" and that resolved test errors.
If the stop reason = length, we do not add the EOS token. If the stop reason = stop, we do add the EOS token. We update the loss mask and reward placement accordingly.