Add entropy loss by pbokc · Pull Request #622 · NovaSky-AI/SkyRL

pbokc · 2025-11-04T00:42:38Z

Add entropy loss to address #583

Ran FSDP with and without entropy loss with Qwen2.5-0.5B-Instruct and saw that policy entropy remained ~stable with entropy loss off and it increased rapidly with entropy loss on with a coefficient of 10.

SumanthRH · 2025-11-07T05:54:42Z

/gemini review

gemini-code-assist

Code Review

This pull request adds entropy loss to the PPO algorithm, which is a valuable addition for encouraging exploration. The implementation is mostly correct, but there are a few areas for improvement regarding efficiency, code duplication, and dead code. I've left specific comments with suggestions to address these points.

skyrl-train/skyrl_train/model_wrapper.py

skyrl-train/skyrl_train/workers/megatron/megatron_model_wrapper.py

skyrl-train/skyrl_train/workers/worker.py

skyrl-train/skyrl_train/config/ppo_base_config.yaml

SumanthRH · 2025-11-13T03:02:54Z

/gemini review

gemini-code-assist

Code Review

This pull request adds support for entropy loss regularization in the PPO algorithm. The changes look good overall, introducing the necessary configuration options and modifying the training loop to incorporate the entropy term in the loss function. I've identified a couple of areas for improvement: one is a redundant piece of code in worker.py that can be simplified for clarity, and the other is an inconsistency in how policy_loss is calculated and logged in the Megatron worker compared to the standard worker. Addressing these points will improve the code's clarity and consistency.

gemini-code-assist · 2025-11-13T03:05:04Z

skyrl-train/skyrl_train/workers/megatron/megatron_model_wrapper.py

+            if self.cfg.trainer.algorithm.use_entropy_loss:
+                policy_loss = policy_loss - self.cfg.trainer.algorithm.entropy_loss_coef * entropy_scalar


This in-place modification of policy_loss leads to an inconsistency in logging. The logged policy_loss metric will include the entropy term, which is inconsistent with the non-Megatron worker (worker.py) where policy_loss only contains the PPO loss. For consistent and clearer metrics across implementations, it's better to not modify policy_loss here. Instead, you could add the entropy loss term directly to the final loss variable on line 257.

Can you make this fix @pbokc

pbokc · 2025-11-13T03:56:06Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces entropy loss to the PPO training algorithm, adding configurations and updating the model wrappers and workers. The implementation correctly computes entropy and adds it to the loss function to encourage policy exploration. However, I've identified a couple of issues. In both the Megatron and general worker implementations, the entropy loss is not applied conditionally based on the use_entropy_loss flag, which could lead to incorrect behavior if misconfigured. Additionally, the Megatron worker could be optimized to avoid unnecessary gradient calculations for entropy, consistent with the general worker's implementation. I've provided specific comments and code suggestions to address these points.

skyrl-train/skyrl_train/workers/megatron/megatron_model_wrapper.py

skyrl-train/skyrl_train/workers/worker.py

skyrl-train/skyrl_train/workers/megatron/megatron_model_wrapper.py

pbokc · 2025-11-13T04:07:22Z

/gemini review

gemini-code-assist

Code Review

This pull request adds support for entropy loss to the PPO training algorithm. The changes include adding configuration options for enabling entropy loss and setting its coefficient, modifying the model wrappers to compute entropy with gradients when needed, and updating the loss calculation in the training workers to include the entropy term.

The implementation is mostly correct and follows the pattern of other loss components. However, I found a critical bug in skyrl_train/workers/worker.py where the entropy loss is multiplied by a boolean flag instead of the configured coefficient. This would cause the entropy loss to be incorrectly scaled. I've provided a specific comment and suggestion to fix this.

Once this issue is addressed, the changes look good to go.

skyrl-train/skyrl_train/workers/worker.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

SumanthRH · 2025-11-17T22:14:30Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces entropy loss regularization to the PPO algorithm, which is a valuable addition for promoting exploration. The implementation is well-integrated into both the standard and Megatron training workflows, including configuration options and updated loss calculations. The accompanying tests are thorough, covering various combinations of KL and entropy loss.

My review focuses on a critical correctness issue related to device placement for tensors, which could lead to runtime errors, and a suggestion for improving maintainability by refactoring duplicated code. Addressing these points will make the implementation more robust and easier to maintain.

skyrl-train/skyrl_train/workers/megatron/megatron_model_wrapper.py

skyrl-train/skyrl_train/workers/worker.py

SumanthRH · 2025-11-17T22:23:35Z

Thanks @pbokc ! I made some updates to include this in the tests for FSDP and Megatron backends and also added the final loss (after aggregation and normalization with accumulation steps) as final_loss

Add entropy loss to address NovaSky-AI#583 Ran FSDP with and without entropy loss with Qwen2.5-0.5B-Instruct and saw that policy entropy remained ~stable with entropy loss off and it increased rapidly with entropy loss on with a coefficient of 10. <img width="400" height="200" alt="W B Chart 11_5_2025, 8_18_42 PM" src="https://github.com/user-attachments/assets/422c2e0f-9228-4fd1-a090-71d1bccaeab3" /> <img width="400" height="200" alt="W B Chart 11_5_2025, 8_19_56 PM" src="https://github.com/user-attachments/assets/afecb015-7ba7-4003-9e6e-8960e22d15d0" /> --------- Signed-off-by: SumanthRH <sumanthrh99@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: SumanthRH <sumanthrh99@gmail.com>

Add entropy loss

ca34428

pbokc changed the title ~~[WIP] Add entropy loss~~ Add entropy loss Nov 6, 2025

pbokc marked this pull request as ready for review November 6, 2025 23:06

black

417fe14

gemini-code-assist bot reviewed Nov 7, 2025

View reviewed changes

Gemini comments

5797117

pbokc commented Nov 8, 2025

View reviewed changes

skyrl-train/skyrl_train/config/ppo_base_config.yaml Outdated Show resolved Hide resolved

pbokc added 3 commits November 7, 2025 23:05

ruff

ea6b5b7

rename

2facfe2

format

d0ec224

gemini-code-assist bot reviewed Nov 13, 2025

View reviewed changes

gemini

52bfbd1

gemini-code-assist bot reviewed Nov 13, 2025

View reviewed changes

skyrl-train/skyrl_train/workers/megatron/megatron_model_wrapper.py Show resolved Hide resolved

skyrl-train/skyrl_train/workers/worker.py Outdated Show resolved Hide resolved

skyrl-train/skyrl_train/workers/megatron/megatron_model_wrapper.py Outdated Show resolved Hide resolved

gemini

bcdca8e

gemini-code-assist bot reviewed Nov 13, 2025

View reviewed changes

skyrl-train/skyrl_train/workers/worker.py Show resolved Hide resolved

pbokc and others added 6 commits November 12, 2025 23:09

Update skyrl-train/skyrl_train/workers/worker.py

203b88a

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

log total loss

21cfdf8

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

x

3da479f

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

x

0e2c5cf

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

add tests; rename metric

b938558

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

change default

9efa1c5

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

gemini-code-assist bot reviewed Nov 17, 2025

View reviewed changes

SumanthRH merged commit 2e9e4be into NovaSky-AI:main Nov 17, 2025
3 checks passed

pbokc deleted the add_entropy_loss branch November 17, 2025 22:27

SumanthRH mentioned this pull request Nov 20, 2025

Support entropy loss #583

Closed

		if self.cfg.trainer.algorithm.use_entropy_loss:
		policy_loss = policy_loss - self.cfg.trainer.algorithm.entropy_loss_coef * entropy_scalar

Conversation

pbokc commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SumanthRH commented Nov 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SumanthRH commented Nov 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

SumanthRH Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

pbokc Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

pbokc commented Nov 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pbokc commented Nov 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

SumanthRH commented Nov 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SumanthRH commented Nov 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pbokc commented Nov 4, 2025 •

edited

Loading