Skip to content

[train] Enable custom chat template for get_response_ids_and_loss_mask_from_messages#981

Merged
CharlieFRuan merged 2 commits intomainfrom
charlie/pr012726-custom-chat-template
Jan 28, 2026
Merged

[train] Enable custom chat template for get_response_ids_and_loss_mask_from_messages#981
CharlieFRuan merged 2 commits intomainfrom
charlie/pr012726-custom-chat-template

Conversation

@CharlieFRuan
Copy link
Member

We add an optional chat_template kwarg to get_response_ids_and_loss_mask_from_messages(), which is used to tokenize the messages into token IDs for custom agents.

The motivation is that, if you used a custom chat template to perform rollout, you should use the same custom chat template to tokenize it.

For more motivation, see the PR description here: mlfoundations#12

Note

token-in-token-out is supported in SkyRLGymGenerator, so this PR is irrelevant to that codepath. For custom agent, to really get on-policy training, we would need step-wise training (support coming soon). But empirically tokenizing at the end is not catastrophic for many tasks.

@CharlieFRuan CharlieFRuan changed the title [train] Add custom chat template for get_response_ids_and_loss_mask_from_messages [train] Enable custom chat template for get_response_ids_and_loss_mask_from_messages Jan 28, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an optional chat_template parameter to several utility functions to allow for custom tokenization, which is crucial for on-policy training with custom agents. The changes are logical and well-implemented, propagating the chat_template through get_generation_prompt_ids, encode_messages_subset, and get_response_ids_and_loss_mask_from_messages. The accompanying tests are thorough, covering both default and custom template behaviors, including edge cases with Qwen3's thinking blocks.

My review focuses on minor code simplifications for improved readability and maintainability. I've suggested simplifying how the chat_template is passed to tokenizer.apply_chat_template and using pathlib for more readable path construction in the tests. Overall, this is a solid contribution.

@CharlieFRuan CharlieFRuan merged commit eaeb4a8 into main Jan 28, 2026
4 checks passed
@CharlieFRuan CharlieFRuan deleted the charlie/pr012726-custom-chat-template branch January 28, 2026 07:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant