Skip to content

Remove truncation logic, fix corresponding tests#508

Merged
tyler-griggs merged 9 commits intoNovaSky-AI:mainfrom
devpatelio:devpatel/skyrl-truncation
Oct 24, 2025
Merged

Remove truncation logic, fix corresponding tests#508
tyler-griggs merged 9 commits intoNovaSky-AI:mainfrom
devpatelio:devpatel/skyrl-truncation

Conversation

@devpatelio
Copy link
Collaborator

@devpatelio devpatelio commented Oct 17, 2025

Early exit agent loop on exceeding max_input_length to avoid post-generation truncation

This PR modifies the SkyRLGymGenerator to stop trajectory generation once the token count exceeds the maximum input length, instead of truncating trajectories after completion. This simplifies control flow and removes redundant truncation logic in skyrl_gym_generator.py. Additionally, the test was failing because the mock environment wasn't respecting the max_turns parameter and was returning incorrect rewards. The environment was hardcoded to do 2 turns regardless of max_turns setting, and the reward logic was placing rewards at the wrong token positions.

Changes:

  • In 'skyrl_gym_generator', I added early-exit logic to stop trajectory generation once the token count exceeds max_input_length, preventing unnecessary continuation beyond limits.

  • In 'test_agent_loop_truncation_drops_out_of_range_rewards', I updated TruncEnv.step() logic to correctly handle max_turns=1 case (was prev. letting 2 turns happen, causing 9 = 5 error). I also fixed reward placement to expect reward at last EOS token position as we removed the premature truncation step such that we instead add the EOS token manually (thus, reward expected at last token) --> expect reward=2.0 at index 4 (EOS token). I also changed stop_reason expectation from "length" to "stop"

  • In 'test_apply_overlong_filtering_non_batched', we expect 5 tokens but got 6 because we add the EOS token. If we don't want to add an EOS token for stop_reason = "length", then I updated the skyrl_gym_generator logic to account for this. Additionally, there was another test with stop reason "length" but actually expected an EOS token so I changed the stop reason to "stop" and that resolved test errors.

If the stop reason = length, we do not add the EOS token. If the stop reason = stop, we do add the EOS token. We update the loss mask and reward placement accordingly.

gemini-code-assist[bot]

This comment was marked as outdated.

devpatelio and others added 4 commits October 17, 2025 15:56
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@devpatelio
Copy link
Collaborator Author

/gemini review

gemini-code-assist[bot]

This comment was marked as outdated.

@devpatelio
Copy link
Collaborator Author

/gemini review

gemini-code-assist[bot]

This comment was marked as outdated.

@devpatelio
Copy link
Collaborator Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request removes a complex and buggy truncation logic from the agent_loop and fixes the corresponding tests. The core change is to conditionally append an EOS token based on the stop_reason, which simplifies the code and corrects the behavior for length-limited generations. The tests have been significantly improved by making the mock LLM generation more realistic and fixing the mock environment's behavior to respect max_turns.

My feedback focuses on some TODO comments and commented-out code in the tests that should be cleaned up before merging to improve code clarity and maintainability. Overall, the changes are a significant improvement in terms of correctness and simplicity.

@devpatelio
Copy link
Collaborator Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant improvement by refactoring the trajectory generation logic in SkyRLGymGenerator. Moving the max_input_length check to the beginning of the agent loop for an early exit is a smart change that simplifies the control flow and improves efficiency by avoiding unnecessary generation and post-truncation. The corresponding removal of the complex truncation logic cleans up the codebase nicely. The conditional appending of the EOS token based on the stop_reason is also a correct and important fix.

The updates to the tests are thorough and correctly reflect the changes in the generator. Fixing the mock environment to respect max_turns and updating the assertions for reward placement and stop reasons in test_agent_loop_truncation_drops_out_of_range_rewards ensures the test is now correctly validating the intended behavior. Similarly, the adjustments in test_apply_overlong_filtering_non_batched make the test cases more consistent and accurate. Overall, this is a well-executed refactoring that improves both the implementation and its test coverage.

@tyler-griggs tyler-griggs merged commit 8d81f29 into NovaSky-AI:main Oct 24, 2025
3 checks passed
li-boxuan pushed a commit to li-boxuan/SkyRL that referenced this pull request Nov 23, 2025
*Early exit agent loop on exceeding max_input_length to avoid
post-generation truncation*

This PR modifies the SkyRLGymGenerator to stop trajectory generation
once the token count exceeds the maximum input length, instead of
truncating trajectories after completion. This simplifies control flow
and removes redundant truncation logic in skyrl_gym_generator.py.
Additionally, the test was failing because the mock environment wasn't
respecting the max_turns parameter and was returning incorrect rewards.
The environment was hardcoded to do 2 turns regardless of max_turns
setting, and the reward logic was placing rewards at the wrong token
positions.

**Changes:**
- In 'skyrl_gym_generator', I added early-exit logic to stop trajectory
generation once the token count exceeds max_input_length, preventing
unnecessary continuation beyond limits.

- In 'test_agent_loop_truncation_drops_out_of_range_rewards', I updated
TruncEnv.step() logic to correctly handle max_turns=1 case (was prev.
letting 2 turns happen, causing 9 = 5 error). I also fixed reward
placement to expect reward at last EOS token position as we removed the
premature truncation step such that we instead add the EOS token
manually (thus, reward expected at last token) --> expect reward=2.0 at
index 4 (EOS token). I also changed stop_reason expectation from
"length" to "stop"

- In 'test_apply_overlong_filtering_non_batched', we expect 5 tokens but
got 6 because we add the EOS token. If we don't want to add an EOS token
for stop_reason = "length", then I updated the skyrl_gym_generator logic
to account for this. Additionally, there was another test with stop
reason "length" but actually expected an EOS token so I changed the stop
reason to "stop" and that resolved test errors.

If the stop reason = length, we do not add the EOS token. If the stop
reason = stop, we do add the EOS token. We update the loss mask and
reward placement accordingly.

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Tyler Griggs <131809874+tyler-griggs@users.noreply.github.com>
erictang000 pushed a commit that referenced this pull request Dec 8, 2025
Currently, the agent_loop can return trajectories that are "too long",
namely the length of `prompt_ids` + `response_ids` can be higher than
`max_input_length + max_tokens`. This is because, even if the loop will
be interrupted due to a "length too long" introduced in #508 ,
`response_ids` doesn't _just_ contain the last response generated by the
model, but also the content of `new_obs`, that is unbound. Thus the
env's last step can append an arbitrarily long sequence

As these last observations are not actually needed for the gradient, I
fixed the issue by always removing them.

Alternatively, the loop could be reworked, for example by keeping new
observation tokens in a separate object. Feel free to suggest a
different approach

The PR was tested with our code, but not with the tests. It is clearly
non-exhaustive as it only passed the "if" branch.
dzorlu pushed a commit to fleet-ai/SkyRL that referenced this pull request Feb 4, 2026
*Early exit agent loop on exceeding max_input_length to avoid
post-generation truncation*

This PR modifies the SkyRLGymGenerator to stop trajectory generation
once the token count exceeds the maximum input length, instead of
truncating trajectories after completion. This simplifies control flow
and removes redundant truncation logic in skyrl_gym_generator.py.
Additionally, the test was failing because the mock environment wasn't
respecting the max_turns parameter and was returning incorrect rewards.
The environment was hardcoded to do 2 turns regardless of max_turns
setting, and the reward logic was placing rewards at the wrong token
positions.

**Changes:**
- In 'skyrl_gym_generator', I added early-exit logic to stop trajectory
generation once the token count exceeds max_input_length, preventing
unnecessary continuation beyond limits.

- In 'test_agent_loop_truncation_drops_out_of_range_rewards', I updated
TruncEnv.step() logic to correctly handle max_turns=1 case (was prev.
letting 2 turns happen, causing 9 = 5 error). I also fixed reward
placement to expect reward at last EOS token position as we removed the
premature truncation step such that we instead add the EOS token
manually (thus, reward expected at last token) --> expect reward=2.0 at
index 4 (EOS token). I also changed stop_reason expectation from
"length" to "stop"

- In 'test_apply_overlong_filtering_non_batched', we expect 5 tokens but
got 6 because we add the EOS token. If we don't want to add an EOS token
for stop_reason = "length", then I updated the skyrl_gym_generator logic
to account for this. Additionally, there was another test with stop
reason "length" but actually expected an EOS token so I changed the stop
reason to "stop" and that resolved test errors.

If the stop reason = length, we do not add the EOS token. If the stop
reason = stop, we do add the EOS token. We update the loss mask and
reward placement accordingly.

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Tyler Griggs <131809874+tyler-griggs@users.noreply.github.com>
dzorlu pushed a commit to fleet-ai/SkyRL that referenced this pull request Feb 4, 2026
Currently, the agent_loop can return trajectories that are "too long",
namely the length of `prompt_ids` + `response_ids` can be higher than
`max_input_length + max_tokens`. This is because, even if the loop will
be interrupted due to a "length too long" introduced in NovaSky-AI#508 ,
`response_ids` doesn't _just_ contain the last response generated by the
model, but also the content of `new_obs`, that is unbound. Thus the
env's last step can append an arbitrarily long sequence

As these last observations are not actually needed for the gradient, I
fixed the issue by always removing them.

Alternatively, the loop could be reworked, for example by keeping new
observation tokens in a separate object. Feel free to suggest a
different approach

The PR was tested with our code, but not with the tests. It is clearly
non-exhaustive as it only passed the "if" branch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants