[skyrl-train] add option to specify ref model path#623
[skyrl-train] add option to specify ref model path#623erictang000 merged 2 commits intoNovaSky-AI:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable enhancement by allowing the reference model path to be specified independently from the policy model path. The changes are well-implemented across the configuration, documentation, and trainer logic, ensuring backward compatibility by defaulting the reference model path to the policy model's path. My review includes a minor suggestion to improve the clarity of the documentation. Overall, this is a good change that increases the flexibility of the training setup.
| fsdp_size: -1 | ||
| sequence_parallel_size: 1 | ||
|
|
||
| - ``ref.model.path``: Path to the reference model. Defaults to the policy model path, but can be separately set (i.e. for on policy distillation, the reference model can be a different model than the policy model). |
There was a problem hiding this comment.
The term "on policy distillation" is a bit ambiguous and could be a typo. While PPO is an on-policy algorithm, using a separate reference model is a concept often associated with off-policy methods or distillation. To improve clarity, I suggest rephrasing this part of the sentence.
For example, you could say "(i.e., for distillation, the reference model can be a different model than the policy model)" or more generally "(e.g., for distillation-based approaches, ...)".
Add option to specify ref model path separately from policy model path. Default stays as policy model path.
Add option to specify ref model path separately from policy model path. Default stays as policy model path.
Add option to specify ref model path separately from policy model path. Default stays as policy model path.