Support Flexible Scheduler by benji-cannot-code · Pull Request #142 · NovaSky-AI/SkyRL

benji-cannot-code · 2025-08-11T18:57:41Z

GitHub Issue: Support more learning rate schedulers

Changes:

Changed scheduler for FSDP strategy from hard-coded constant_with_warmup to flexible HF scheduler.
Added scheduler to config
Removed trailing whitespaces in config and docs

Testing:

Ran uv run --isolated --extra dev pytest tests/cpu -q sanity tests
vLLM smoke test

python - << 'PY'
from vllm import LLM, SamplingParams
m="TinyLlama/TinyLlama-1.1B-Chat-v1.0"
llm=LLM(model=m, tensor_parallel_size=1, trust_remote_code=True)
print(llm.generate(["hiiii im running test"], SamplingParams(max_tokens=16))[0].outputs[0].text.strip())
PY

E2E training test

uv run --isolated -m skyrl_train.entrypoints.main_base \
  --config examples/gsm8k/minimal.yaml \
  data.gsm8k_path=/skyrl-data/datasets/gsm8k \
  model.name="TinyLlama/TinyLlama-1.1B-Chat-v1.0" \
  generator.max_new_tokens=32 \
  trainer.steps=10 \
  trainer.global_batch_size=2 \
  generator.rollouts_per_iter=4 \
  ray.num_cpus=2 ray.num_gpus=1 \
  logging.output_dir=/skyrl-data/logs/gsm8k-smoke \
  checkpoint.dir=/skyrl-data/checkpoints/gsm8k-smoke

gemini-code-assist

Summary of Changes

Hello @benji-cannot-code, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the flexibility of learning rate scheduling within the training framework. Previously, the learning rate scheduler for the FSDP strategy was hard-coded. This change introduces a configurable option, allowing users to select different learning rate schedules via the configuration file. It achieves this by integrating Hugging Face's get_scheduler utility, providing access to a broader array of scheduling strategies. Additionally, the PR includes minor cleanups by removing trailing whitespaces in documentation and configuration files, improving overall code hygiene.

Highlights

Configurable Learning Rate Scheduler: The learning rate scheduler for the FSDP strategy is no longer hard-coded to constant_with_warmup. It can now be specified via the optimizer_config.scheduler parameter in the configuration.
Integration of Hugging Face Schedulers: The implementation now leverages the transformers.trainer.get_scheduler utility, allowing users to choose from a wider range of pre-defined learning rate schedules provided by Hugging Face.
Configuration and Documentation Updates: The ppo_base_config.yaml file has been updated to include the new scheduler parameter in the optimizer configuration, and the config.rst documentation has been revised to reflect this change and explain its usage.
Whitespace Cleanup: Various trailing whitespaces were removed from documentation and configuration files, contributing to cleaner and more consistent code formatting.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces support for flexible learning rate schedulers, a valuable enhancement. The changes are well-implemented for the FSDP strategy and the DeepSpeed policy worker, and the configuration and documentation are updated accordingly. To make this feature more robust, consider two improvements: 1) Pass num_training_steps to get_scheduler to support a wider range of schedulers (e.g., 'linear', 'cosine'). 2) Update the DeepSpeed critic worker (DeepSpeedCriticWorkerBase) to use the configurable scheduler for consistency, as it appears to remain hardcoded. The removal of trailing whitespaces is also a nice cleanup.

gemini-code-assist · 2025-08-11T18:59:11Z

skyrl-train/docs/configuration/config.rst

 - ``optimizer_config.max_grad_norm``: Gradient clipping parameter. The total L2 norm of the model gradients will be scaled to this value during training.
 - ``optimizer_config.offload_after_step``: Whether to offload optimizer state to CPU after step if colocated. When generation and training workers are colocated, we recommend using the default setting of ``true``. In some cases with non-colocation, it can be desirable to leave optimizer state on GPU memory to avoid offloading costs as well as additional CPU memory usage.
 - ``optimizer_config.num_warmup_steps``: Number of warmup steps for the learning rate scheduler.
+- ``optimizer_config.scheduler``: Which learning rate scheduler to use.


To improve clarity for users, consider expanding the description for optimizer_config.scheduler. You could mention that it accepts scheduler types from Hugging Face's transformers library and provide a link to the documentation for available options.

- ``optimizer_config.scheduler``: Which learning rate scheduler to use. See `transformers.SchedulerType <https://huggingface.co/docs/transformers/main_classes/optimizer_schedules#transformers.SchedulerType>`_ for available options.

Probably don't need to leave a link, but it would be good to mention that this config param is intended to align with schedulers in transformers.SchedulerType

gemini-code-assist · 2025-08-11T18:59:11Z

skyrl-train/skyrl_train/distributed/fsdp_strategy.py

+            actor_lr_scheduler = get_scheduler(
+                optim_config.scheduler,
+                actor_optimizer,
+                num_warmup_steps=optim_config.num_warmup_steps,
            )


Using get_scheduler from Transformers is a great step for flexibility. However, to fully support schedulers like linear, cosine, and polynomial, the num_training_steps argument is required. Without it, these schedulers won't function correctly as they need to know the total duration of training to decay the learning rate.

Consider refactoring to calculate and pass num_training_steps during scheduler initialization. This will make the implementation more robust and truly flexible.

This makes sense to me. We'll need to know the number of training steps for many schedulers. This information is known in the trainer, so it can/should be plumbed down.

gemini-code-assist · 2025-08-11T18:59:11Z

skyrl-train/skyrl_train/workers/deepspeed/deepspeed_worker.py

        actor_scheduler = get_scheduler(
-            "constant_with_warmup",
+            self.cfg.trainer.policy.optimizer_config.scheduler,
            actor_optim,
            num_warmup_steps=self.cfg.trainer.policy.optimizer_config.num_warmup_steps,
        )


Similar to the FSDP strategy, the get_scheduler call here is missing the num_training_steps argument. This will prevent schedulers that decay over the course of training (e.g., 'linear', 'cosine') from working as expected. To enable full support for all Hugging Face schedulers, num_training_steps should be provided.

benji-cannot-code · 2025-08-11T18:59:20Z

My editor automatically trimmed trailing white spaces for the doc and the config. Is this fine?

tyler-griggs

Hey @benji-cannot-code , this is great, thank you!

Only two comments:

Please plumb num_training_steps argument into get_scheduler, otherwise we can't actually use many of the schedulers.
Please test this with a new scheduler (ie, not constant_with_warmup). I don't believe we have a test to cover this, so you consider adding one or launching short training runs with ~2 new schedulers and validate that they are working.

tyler-griggs · 2025-08-14T16:43:43Z

skyrl-train/docs/configuration/config.rst

 - ``optimizer_config.max_grad_norm``: Gradient clipping parameter. The total L2 norm of the model gradients will be scaled to this value during training.
 - ``optimizer_config.offload_after_step``: Whether to offload optimizer state to CPU after step if colocated. When generation and training workers are colocated, we recommend using the default setting of ``true``. In some cases with non-colocation, it can be desirable to leave optimizer state on GPU memory to avoid offloading costs as well as additional CPU memory usage.
 - ``optimizer_config.num_warmup_steps``: Number of warmup steps for the learning rate scheduler.
+- ``optimizer_config.scheduler``: Which learning rate scheduler to use.


Probably don't need to leave a link, but it would be good to mention that this config param is intended to align with schedulers in transformers.SchedulerType

benji-cannot-code · 2025-08-15T06:35:15Z

Hey @benji-cannot-code , this is great, thank you!

Only two comments:

Please plumb num_training_steps argument into get_scheduler, otherwise we can't actually use many of the schedulers.

Please test this with a new scheduler (ie, not constant_with_warmup). I don't believe we have a test to cover this, so you consider adding one or launching short training runs with ~2 new schedulers and validate that they are working.

Done. Tested both FSDP and DeepSpeed with linear and cosine (requires training steps).

tyler-griggs · 2025-08-15T17:23:29Z

Awesome, thanks @benji-cannot-code . Can you past a screenshot of the wandb log for the FSDP and DeepSpeed tests, especially for the learning rate plot? When we run mini training run tests, we typically share screenshots to keep track of what the curves look like.

benji-cannot-code · 2025-08-16T01:35:56Z

Awesome, thanks @benji-cannot-code . Can you past a screenshot of the wandb log for the FSDP and DeepSpeed tests, especially for the learning rate plot? When we run mini training run tests, we typically share screenshots to keep track of what the curves look like.

Policy graphs (rl graphs match expectations):

DeepSpeed with cosine scheduler:

FSDP with linear scheduler:

tyler-griggs · 2025-08-16T20:22:53Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces support for flexible learning rate schedulers from Hugging Face Transformers, which is a great improvement over the hardcoded scheduler. The implementation is mostly correct, with changes to the configuration, documentation, and both FSDP and DeepSpeed strategies to accommodate the new flexible scheduler. I've identified one issue in the DeepSpeed critic worker where the scheduler is still hardcoded, preventing the new feature from working correctly in that specific case. Once that is addressed, this PR will be in excellent shape.

gemini-code-assist · 2025-08-16T20:23:32Z

skyrl-train/skyrl_train/workers/deepspeed/deepspeed_worker.py

@@ -265,6 +266,7 @@ def init_model(self, model_id_or_path):
            "constant_with_warmup",


The critic scheduler is hardcoded to "constant_with_warmup". To support flexible schedulers for the critic as well, this should use the value from the configuration, similar to how it's done for the policy scheduler.

Suggested change

"constant_with_warmup",

self.cfg.trainer.critic.optimizer_config.scheduler,

@benji-cannot-code This is a good point ^

Nice spot, fixed!

tyler-griggs · 2025-08-18T17:22:56Z

All looks good to me! Once you get a chance to confirm that checkpointing correctly saves-and-resumes the scheduler state (which it should), we can merge.

benji-cannot-code · 2025-08-18T23:03:46Z

All looks good to me! Once you get a chance to confirm that checkpointing correctly saves-and-resumes the scheduler state (which it should), we can merge.

Thanks! Just did:

Ran FSDP with cosine scheduler and checkpoint every 5 steps. Killed the job after the first 5 steps (graph below)

and restarted from there, and it looks fine (graph below).

After the first pause, I paused again at the 35th step and resumed, this time using the same WandB run, and thus it's the same graph as above, and it also looks fine.

tyler-griggs

Thanks, LGTM!

**GitHub Issue:** [Support more learning rate schedulers](NovaSky-AI#140) **Changes:** 1. Changed scheduler for FSDP strategy from hard-coded `constant_with_warmup` to flexible HF scheduler. 2. Added scheduler to config 3. Removed trailing whitespaces in config and docs **Testing:** 1. Ran `uv run --isolated --extra dev pytest tests/cpu -q` sanity tests 2. vLLM smoke test ``` python - << 'PY' from vllm import LLM, SamplingParams m="TinyLlama/TinyLlama-1.1B-Chat-v1.0" llm=LLM(model=m, tensor_parallel_size=1, trust_remote_code=True) print(llm.generate(["hiiii im running test"], SamplingParams(max_tokens=16))[0].outputs[0].text.strip()) PY ``` 3. E2E training test ``` uv run --isolated -m skyrl_train.entrypoints.main_base \ --config examples/gsm8k/minimal.yaml \ data.gsm8k_path=/skyrl-data/datasets/gsm8k \ model.name="TinyLlama/TinyLlama-1.1B-Chat-v1.0" \ generator.max_new_tokens=32 \ trainer.steps=10 \ trainer.global_batch_size=2 \ generator.rollouts_per_iter=4 \ ray.num_cpus=2 ray.num_gpus=1 \ logging.output_dir=/skyrl-data/logs/gsm8k-smoke \ checkpoint.dir=/skyrl-data/checkpoints/gsm8k-smoke ```

benji-cannot-code added 3 commits August 10, 2025 22:45

hf scheduler for fsdp, added scheduler to config

8fb8e1f

remove unused warmup

27b5d54

added flex scheduler to docs

16f929b

gemini-code-assist bot reviewed Aug 11, 2025

View reviewed changes

SumanthRH assigned tyler-griggs Aug 11, 2025

tyler-griggs reviewed Aug 14, 2025

View reviewed changes

benji-cannot-code added 2 commits August 14, 2025 21:51

more docs for scheduler

8f07524

added training steps

0610564

benji-cannot-code requested a review from tyler-griggs August 15, 2025 06:35

Merge branch 'main' into support-flexible-scheduler

e7c0f08

SumanthRH linked an issue Aug 15, 2025 that may be closed by this pull request

Support more learning rate schedulers #140

Closed

2 tasks

gemini-code-assist bot reviewed Aug 16, 2025

View reviewed changes

scheduler for critic

3bd5c38

tyler-griggs approved these changes Aug 18, 2025

View reviewed changes

tyler-griggs merged commit c1f6376 into NovaSky-AI:main Aug 18, 2025
3 checks passed

		@@ -265,6 +266,7 @@ def init_model(self, model_id_or_path):
		"constant_with_warmup",

	"constant_with_warmup",
	self.cfg.trainer.critic.optimizer_config.scheduler,

Conversation

benji-cannot-code commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

benji-cannot-code commented Aug 11, 2025

Uh oh!

tyler-griggs left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

benji-cannot-code commented Aug 15, 2025

Uh oh!

tyler-griggs commented Aug 15, 2025

Uh oh!

benji-cannot-code commented Aug 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tyler-griggs commented Aug 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 16, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tyler-griggs commented Aug 18, 2025

Uh oh!

benji-cannot-code commented Aug 18, 2025

Uh oh!

tyler-griggs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

benji-cannot-code commented Aug 11, 2025 •

edited

Loading

benji-cannot-code commented Aug 16, 2025 •

edited

Loading