Remove truncation logic, fix corresponding tests by devpatelio · Pull Request #508 · NovaSky-AI/SkyRL

devpatelio · 2025-10-17T19:53:03Z

Early exit agent loop on exceeding max_input_length to avoid post-generation truncation

This PR modifies the SkyRLGymGenerator to stop trajectory generation once the token count exceeds the maximum input length, instead of truncating trajectories after completion. This simplifies control flow and removes redundant truncation logic in skyrl_gym_generator.py. Additionally, the test was failing because the mock environment wasn't respecting the max_turns parameter and was returning incorrect rewards. The environment was hardcoded to do 2 turns regardless of max_turns setting, and the reward logic was placing rewards at the wrong token positions.

Changes:

In 'skyrl_gym_generator', I added early-exit logic to stop trajectory generation once the token count exceeds max_input_length, preventing unnecessary continuation beyond limits.
In 'test_agent_loop_truncation_drops_out_of_range_rewards', I updated TruncEnv.step() logic to correctly handle max_turns=1 case (was prev. letting 2 turns happen, causing 9 = 5 error). I also fixed reward placement to expect reward at last EOS token position as we removed the premature truncation step such that we instead add the EOS token manually (thus, reward expected at last token) --> expect reward=2.0 at index 4 (EOS token). I also changed stop_reason expectation from "length" to "stop"
In 'test_apply_overlong_filtering_non_batched', we expect 5 tokens but got 6 because we add the EOS token. If we don't want to add an EOS token for stop_reason = "length", then I updated the skyrl_gym_generator logic to account for this. Additionally, there was another test with stop reason "length" but actually expected an EOS token so I changed the stop reason to "stop" and that resolved test errors.

If the stop reason = length, we do not add the EOS token. If the stop reason = stop, we do add the EOS token. We update the loss mask and reward placement accordingly.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

devpatelio · 2025-10-17T20:00:41Z

/gemini review

devpatelio · 2025-10-17T23:00:37Z

/gemini review

devpatelio · 2025-10-20T20:54:30Z

/gemini review

gemini-code-assist

Code Review

This pull request removes a complex and buggy truncation logic from the agent_loop and fixes the corresponding tests. The core change is to conditionally append an EOS token based on the stop_reason, which simplifies the code and corrects the behavior for length-limited generations. The tests have been significantly improved by making the mock LLM generation more realistic and fixing the mock environment's behavior to respect max_turns.

My feedback focuses on some TODO comments and commented-out code in the tests that should be cleaned up before merging to improve code clarity and maintainability. Overall, the changes are a significant improvement in terms of correctness and simplicity.

skyrl-train/tests/cpu/generators/test_skyrl_gym_generator.py

skyrl-train/skyrl_train/generators/skyrl_gym_generator.py

skyrl-train/tests/cpu/generators/test_skyrl_gym_generator.py

devpatelio · 2025-10-23T23:27:54Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a significant improvement by refactoring the trajectory generation logic in SkyRLGymGenerator. Moving the max_input_length check to the beginning of the agent loop for an early exit is a smart change that simplifies the control flow and improves efficiency by avoiding unnecessary generation and post-truncation. The corresponding removal of the complex truncation logic cleans up the codebase nicely. The conditional appending of the EOS token based on the stop_reason is also a correct and important fix.

The updates to the tests are thorough and correctly reflect the changes in the generator. Fixing the mock environment to respect max_turns and updating the assertions for reward placement and stop reasons in test_agent_loop_truncation_drops_out_of_range_rewards ensures the test is now correctly validating the intended behavior. Similarly, the adjustments in test_apply_overlong_filtering_non_batched make the test cases more consistent and accurate. Overall, this is a well-executed refactoring that improves both the implementation and its test coverage.

*Early exit agent loop on exceeding max_input_length to avoid post-generation truncation* This PR modifies the SkyRLGymGenerator to stop trajectory generation once the token count exceeds the maximum input length, instead of truncating trajectories after completion. This simplifies control flow and removes redundant truncation logic in skyrl_gym_generator.py. Additionally, the test was failing because the mock environment wasn't respecting the max_turns parameter and was returning incorrect rewards. The environment was hardcoded to do 2 turns regardless of max_turns setting, and the reward logic was placing rewards at the wrong token positions. **Changes:** - In 'skyrl_gym_generator', I added early-exit logic to stop trajectory generation once the token count exceeds max_input_length, preventing unnecessary continuation beyond limits. - In 'test_agent_loop_truncation_drops_out_of_range_rewards', I updated TruncEnv.step() logic to correctly handle max_turns=1 case (was prev. letting 2 turns happen, causing 9 = 5 error). I also fixed reward placement to expect reward at last EOS token position as we removed the premature truncation step such that we instead add the EOS token manually (thus, reward expected at last token) --> expect reward=2.0 at index 4 (EOS token). I also changed stop_reason expectation from "length" to "stop" - In 'test_apply_overlong_filtering_non_batched', we expect 5 tokens but got 6 because we add the EOS token. If we don't want to add an EOS token for stop_reason = "length", then I updated the skyrl_gym_generator logic to account for this. Additionally, there was another test with stop reason "length" but actually expected an EOS token so I changed the stop reason to "stop" and that resolved test errors. If the stop reason = length, we do not add the EOS token. If the stop reason = stop, we do add the EOS token. We update the loss mask and reward placement accordingly. --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Tyler Griggs <131809874+tyler-griggs@users.noreply.github.com>

Currently, the agent_loop can return trajectories that are "too long", namely the length of `prompt_ids` + `response_ids` can be higher than `max_input_length + max_tokens`. This is because, even if the loop will be interrupted due to a "length too long" introduced in #508 , `response_ids` doesn't _just_ contain the last response generated by the model, but also the content of `new_obs`, that is unbound. Thus the env's last step can append an arbitrarily long sequence As these last observations are not actually needed for the gradient, I fixed the issue by always removing them. Alternatively, the loop could be reworked, for example by keeping new observation tokens in a separate object. Feel free to suggest a different approach The PR was tested with our code, but not with the tests. It is clearly non-exhaustive as it only passed the "if" branch.

*Early exit agent loop on exceeding max_input_length to avoid post-generation truncation* This PR modifies the SkyRLGymGenerator to stop trajectory generation once the token count exceeds the maximum input length, instead of truncating trajectories after completion. This simplifies control flow and removes redundant truncation logic in skyrl_gym_generator.py. Additionally, the test was failing because the mock environment wasn't respecting the max_turns parameter and was returning incorrect rewards. The environment was hardcoded to do 2 turns regardless of max_turns setting, and the reward logic was placing rewards at the wrong token positions. **Changes:** - In 'skyrl_gym_generator', I added early-exit logic to stop trajectory generation once the token count exceeds max_input_length, preventing unnecessary continuation beyond limits. - In 'test_agent_loop_truncation_drops_out_of_range_rewards', I updated TruncEnv.step() logic to correctly handle max_turns=1 case (was prev. letting 2 turns happen, causing 9 = 5 error). I also fixed reward placement to expect reward at last EOS token position as we removed the premature truncation step such that we instead add the EOS token manually (thus, reward expected at last token) --> expect reward=2.0 at index 4 (EOS token). I also changed stop_reason expectation from "length" to "stop" - In 'test_apply_overlong_filtering_non_batched', we expect 5 tokens but got 6 because we add the EOS token. If we don't want to add an EOS token for stop_reason = "length", then I updated the skyrl_gym_generator logic to account for this. Additionally, there was another test with stop reason "length" but actually expected an EOS token so I changed the stop reason to "stop" and that resolved test errors. If the stop reason = length, we do not add the EOS token. If the stop reason = stop, we do add the EOS token. We update the loss mask and reward placement accordingly. --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Tyler Griggs <131809874+tyler-griggs@users.noreply.github.com>

Currently, the agent_loop can return trajectories that are "too long", namely the length of `prompt_ids` + `response_ids` can be higher than `max_input_length + max_tokens`. This is because, even if the loop will be interrupted due to a "length too long" introduced in NovaSky-AI#508 , `response_ids` doesn't _just_ contain the last response generated by the model, but also the content of `new_obs`, that is unbound. Thus the env's last step can append an arbitrarily long sequence As these last observations are not actually needed for the gradient, I fixed the issue by always removing them. Alternatively, the loop could be reworked, for example by keeping new observation tokens in a separate object. Feel free to suggest a different approach The PR was tested with our code, but not with the tests. It is clearly non-exhaustive as it only passed the "if" branch.

done

e04695b

This comment was marked as outdated.

Sign in to view

devpatelio and others added 4 commits October 17, 2025 15:56

Update skyrl-train/skyrl_train/generators/skyrl_gym_generator.py

85d1922

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update skyrl-train/skyrl_train/generators/skyrl_gym_generator.py

e0d9663

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

done

e43dda3

done

8e024d3

This comment was marked as outdated.

Sign in to view

linter

7ba4da1

This comment was marked as outdated.

Sign in to view

devpatelio requested review from CharlieFRuan and tyler-griggs October 17, 2025 23:12

gemini-code-assist bot reviewed Oct 20, 2025

View reviewed changes

skyrl-train/tests/cpu/generators/test_skyrl_gym_generator.py Outdated Show resolved Hide resolved

skyrl-train/tests/cpu/generators/test_skyrl_gym_generator.py Outdated Show resolved Hide resolved

skyrl-train/tests/cpu/generators/test_skyrl_gym_generator.py Outdated Show resolved Hide resolved

tyler-griggs reviewed Oct 23, 2025

View reviewed changes

devpatelio added 2 commits October 23, 2025 16:24

updates

ec33c8e

improved comment

aa1607b

gemini-code-assist bot reviewed Oct 23, 2025

View reviewed changes

Update skyrl_gym_generator.py

ecba16e

tyler-griggs approved these changes Oct 24, 2025

View reviewed changes

tyler-griggs merged commit 8d81f29 into NovaSky-AI:main Oct 24, 2025
3 checks passed

Swynfel mentioned this pull request Nov 21, 2025

Remove tokens after last response in agent_loop #696

Merged

Conversation

devpatelio commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

devpatelio commented Oct 17, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

devpatelio commented Oct 17, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

devpatelio commented Oct 20, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

devpatelio commented Oct 23, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

devpatelio commented Oct 17, 2025 •

edited

Loading