Skip to content

[skyrl-train] add option to specify ref model path#623

Merged
erictang000 merged 2 commits intoNovaSky-AI:mainfrom
erictang000:add_ref_path
Nov 4, 2025
Merged

[skyrl-train] add option to specify ref model path#623
erictang000 merged 2 commits intoNovaSky-AI:mainfrom
erictang000:add_ref_path

Conversation

@erictang000
Copy link
Collaborator

Add option to specify ref model path separately from policy model path. Default stays as policy model path.

@erictang000 erictang000 requested a review from SumanthRH November 4, 2025 00:47
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable enhancement by allowing the reference model path to be specified independently from the policy model path. The changes are well-implemented across the configuration, documentation, and trainer logic, ensuring backward compatibility by defaulting the reference model path to the policy model's path. My review includes a minor suggestion to improve the clarity of the documentation. Overall, this is a good change that increases the flexibility of the training setup.

fsdp_size: -1
sequence_parallel_size: 1

- ``ref.model.path``: Path to the reference model. Defaults to the policy model path, but can be separately set (i.e. for on policy distillation, the reference model can be a different model than the policy model).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The term "on policy distillation" is a bit ambiguous and could be a typo. While PPO is an on-policy algorithm, using a separate reference model is a concept often associated with off-policy methods or distillation. To improve clarity, I suggest rephrasing this part of the sentence.

For example, you could say "(i.e., for distillation, the reference model can be a different model than the policy model)" or more generally "(e.g., for distillation-based approaches, ...)".

Copy link
Member

@SumanthRH SumanthRH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stamp

@erictang000 erictang000 merged commit 31f8f51 into NovaSky-AI:main Nov 4, 2025
3 checks passed
@erictang000 erictang000 deleted the add_ref_path branch November 4, 2025 00:53
li-boxuan pushed a commit to li-boxuan/SkyRL that referenced this pull request Nov 23, 2025
Add option to specify ref model path separately from policy model path.
Default stays as policy model path.
dzorlu pushed a commit to fleet-ai/SkyRL that referenced this pull request Feb 4, 2026
Add option to specify ref model path separately from policy model path.
Default stays as policy model path.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants