[train] Enable custom chat template for get_response_ids_and_loss_mask_from_messages#981
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces an optional chat_template parameter to several utility functions to allow for custom tokenization, which is crucial for on-policy training with custom agents. The changes are logical and well-implemented, propagating the chat_template through get_generation_prompt_ids, encode_messages_subset, and get_response_ids_and_loss_mask_from_messages. The accompanying tests are thorough, covering both default and custom template behaviors, including edge cases with Qwen3's thinking blocks.
My review focuses on minor code simplifications for improved readability and maintainability. I've suggested simplifying how the chat_template is passed to tokenizer.apply_chat_template and using pathlib for more readable path construction in the tests. Overall, this is a solid contribution.
We add an optional
chat_templatekwarg toget_response_ids_and_loss_mask_from_messages(), which is used to tokenize the messages into token IDs for custom agents.The motivation is that, if you used a custom chat template to perform rollout, you should use the same custom chat template to tokenize it.
For more motivation, see the PR description here: mlfoundations#12
Note
token-in-token-out is supported in SkyRLGymGenerator, so this PR is irrelevant to that codepath. For custom agent, to really get on-policy training, we would need step-wise training (support coming soon). But empirically tokenizing at the end is not catastrophic for many tasks.