Skip to content

Support Flexible Scheduler#142

Merged
tyler-griggs merged 7 commits intoNovaSky-AI:mainfrom
benji-cannot-code:support-flexible-scheduler
Aug 18, 2025
Merged

Support Flexible Scheduler#142
tyler-griggs merged 7 commits intoNovaSky-AI:mainfrom
benji-cannot-code:support-flexible-scheduler

Conversation

@benji-cannot-code
Copy link
Contributor

@benji-cannot-code benji-cannot-code commented Aug 11, 2025

GitHub Issue: Support more learning rate schedulers

Changes:

  1. Changed scheduler for FSDP strategy from hard-coded constant_with_warmup to flexible HF scheduler.
  2. Added scheduler to config
  3. Removed trailing whitespaces in config and docs

Testing:

  1. Ran uv run --isolated --extra dev pytest tests/cpu -q sanity tests
  2. vLLM smoke test
python - << 'PY'
from vllm import LLM, SamplingParams
m="TinyLlama/TinyLlama-1.1B-Chat-v1.0"
llm=LLM(model=m, tensor_parallel_size=1, trust_remote_code=True)
print(llm.generate(["hiiii im running test"], SamplingParams(max_tokens=16))[0].outputs[0].text.strip())
PY
  1. E2E training test
uv run --isolated -m skyrl_train.entrypoints.main_base \
  --config examples/gsm8k/minimal.yaml \
  data.gsm8k_path=/skyrl-data/datasets/gsm8k \
  model.name="TinyLlama/TinyLlama-1.1B-Chat-v1.0" \
  generator.max_new_tokens=32 \
  trainer.steps=10 \
  trainer.global_batch_size=2 \
  generator.rollouts_per_iter=4 \
  ray.num_cpus=2 ray.num_gpus=1 \
  logging.output_dir=/skyrl-data/logs/gsm8k-smoke \
  checkpoint.dir=/skyrl-data/checkpoints/gsm8k-smoke

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @benji-cannot-code, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the flexibility of learning rate scheduling within the training framework. Previously, the learning rate scheduler for the FSDP strategy was hard-coded. This change introduces a configurable option, allowing users to select different learning rate schedules via the configuration file. It achieves this by integrating Hugging Face's get_scheduler utility, providing access to a broader array of scheduling strategies. Additionally, the PR includes minor cleanups by removing trailing whitespaces in documentation and configuration files, improving overall code hygiene.

Highlights

  • Configurable Learning Rate Scheduler: The learning rate scheduler for the FSDP strategy is no longer hard-coded to constant_with_warmup. It can now be specified via the optimizer_config.scheduler parameter in the configuration.
  • Integration of Hugging Face Schedulers: The implementation now leverages the transformers.trainer.get_scheduler utility, allowing users to choose from a wider range of pre-defined learning rate schedules provided by Hugging Face.
  • Configuration and Documentation Updates: The ppo_base_config.yaml file has been updated to include the new scheduler parameter in the optimizer configuration, and the config.rst documentation has been revised to reflect this change and explain its usage.
  • Whitespace Cleanup: Various trailing whitespaces were removed from documentation and configuration files, contributing to cleaner and more consistent code formatting.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for flexible learning rate schedulers, a valuable enhancement. The changes are well-implemented for the FSDP strategy and the DeepSpeed policy worker, and the configuration and documentation are updated accordingly. To make this feature more robust, consider two improvements: 1) Pass num_training_steps to get_scheduler to support a wider range of schedulers (e.g., 'linear', 'cosine'). 2) Update the DeepSpeed critic worker (DeepSpeedCriticWorkerBase) to use the configurable scheduler for consistency, as it appears to remain hardcoded. The removal of trailing whitespaces is also a nice cleanup.

- ``optimizer_config.max_grad_norm``: Gradient clipping parameter. The total L2 norm of the model gradients will be scaled to this value during training.
- ``optimizer_config.offload_after_step``: Whether to offload optimizer state to CPU after step if colocated. When generation and training workers are colocated, we recommend using the default setting of ``true``. In some cases with non-colocation, it can be desirable to leave optimizer state on GPU memory to avoid offloading costs as well as additional CPU memory usage.
- ``optimizer_config.num_warmup_steps``: Number of warmup steps for the learning rate scheduler.
- ``optimizer_config.scheduler``: Which learning rate scheduler to use.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve clarity for users, consider expanding the description for optimizer_config.scheduler. You could mention that it accepts scheduler types from Hugging Face's transformers library and provide a link to the documentation for available options.

- ``optimizer_config.scheduler``: Which learning rate scheduler to use. See `transformers.SchedulerType <https://huggingface.co/docs/transformers/main_classes/optimizer_schedules#transformers.SchedulerType>`_ for available options.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably don't need to leave a link, but it would be good to mention that this config param is intended to align with schedulers in transformers.SchedulerType

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +279 to 283
actor_lr_scheduler = get_scheduler(
optim_config.scheduler,
actor_optimizer,
num_warmup_steps=optim_config.num_warmup_steps,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using get_scheduler from Transformers is a great step for flexibility. However, to fully support schedulers like linear, cosine, and polynomial, the num_training_steps argument is required. Without it, these schedulers won't function correctly as they need to know the total duration of training to decay the learning rate.

Consider refactoring to calculate and pass num_training_steps during scheduler initialization. This will make the implementation more robust and truly flexible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me. We'll need to know the number of training steps for many schedulers. This information is known in the trainer, so it can/should be plumbed down.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

Comment on lines 75 to 79
actor_scheduler = get_scheduler(
"constant_with_warmup",
self.cfg.trainer.policy.optimizer_config.scheduler,
actor_optim,
num_warmup_steps=self.cfg.trainer.policy.optimizer_config.num_warmup_steps,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the FSDP strategy, the get_scheduler call here is missing the num_training_steps argument. This will prevent schedulers that decay over the course of training (e.g., 'linear', 'cosine') from working as expected. To enable full support for all Hugging Face schedulers, num_training_steps should be provided.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@benji-cannot-code
Copy link
Contributor Author

My editor automatically trimmed trailing white spaces for the doc and the config. Is this fine?

Copy link
Member

@tyler-griggs tyler-griggs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @benji-cannot-code , this is great, thank you!

Only two comments:

  • Please plumb num_training_steps argument into get_scheduler, otherwise we can't actually use many of the schedulers.
  • Please test this with a new scheduler (ie, not constant_with_warmup). I don't believe we have a test to cover this, so you consider adding one or launching short training runs with ~2 new schedulers and validate that they are working.

- ``optimizer_config.max_grad_norm``: Gradient clipping parameter. The total L2 norm of the model gradients will be scaled to this value during training.
- ``optimizer_config.offload_after_step``: Whether to offload optimizer state to CPU after step if colocated. When generation and training workers are colocated, we recommend using the default setting of ``true``. In some cases with non-colocation, it can be desirable to leave optimizer state on GPU memory to avoid offloading costs as well as additional CPU memory usage.
- ``optimizer_config.num_warmup_steps``: Number of warmup steps for the learning rate scheduler.
- ``optimizer_config.scheduler``: Which learning rate scheduler to use.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably don't need to leave a link, but it would be good to mention that this config param is intended to align with schedulers in transformers.SchedulerType

@benji-cannot-code
Copy link
Contributor Author

Hey @benji-cannot-code , this is great, thank you!

Only two comments:

  • Please plumb num_training_steps argument into get_scheduler, otherwise we can't actually use many of the schedulers.
  • Please test this with a new scheduler (ie, not constant_with_warmup). I don't believe we have a test to cover this, so you consider adding one or launching short training runs with ~2 new schedulers and validate that they are working.

Done. Tested both FSDP and DeepSpeed with linear and cosine (requires training steps).

@tyler-griggs
Copy link
Member

Awesome, thanks @benji-cannot-code . Can you past a screenshot of the wandb log for the FSDP and DeepSpeed tests, especially for the learning rate plot? When we run mini training run tests, we typically share screenshots to keep track of what the curves look like.

@SumanthRH SumanthRH linked an issue Aug 15, 2025 that may be closed by this pull request
2 tasks
@benji-cannot-code
Copy link
Contributor Author

benji-cannot-code commented Aug 16, 2025

Awesome, thanks @benji-cannot-code . Can you past a screenshot of the wandb log for the FSDP and DeepSpeed tests, especially for the learning rate plot? When we run mini training run tests, we typically share screenshots to keep track of what the curves look like.

Policy graphs (rl graphs match expectations):

DeepSpeed with cosine scheduler:
Screenshot 2025-08-15 at 8 33 40 PM

FSDP with linear scheduler:
Screenshot 2025-08-15 at 8 35 32 PM

@tyler-griggs
Copy link
Member

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for flexible learning rate schedulers from Hugging Face Transformers, which is a great improvement over the hardcoded scheduler. The implementation is mostly correct, with changes to the configuration, documentation, and both FSDP and DeepSpeed strategies to accommodate the new flexible scheduler. I've identified one issue in the DeepSpeed critic worker where the scheduler is still hardcoded, preventing the new feature from working correctly in that specific case. Once that is addressed, this PR will be in excellent shape.

@@ -265,6 +266,7 @@ def init_model(self, model_id_or_path):
"constant_with_warmup",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The critic scheduler is hardcoded to "constant_with_warmup". To support flexible schedulers for the critic as well, this should use the value from the configuration, similar to how it's done for the policy scheduler.

Suggested change
"constant_with_warmup",
self.cfg.trainer.critic.optimizer_config.scheduler,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benji-cannot-code This is a good point ^

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice spot, fixed!

@tyler-griggs
Copy link
Member

All looks good to me! Once you get a chance to confirm that checkpointing correctly saves-and-resumes the scheduler state (which it should), we can merge.

@benji-cannot-code
Copy link
Contributor Author

All looks good to me! Once you get a chance to confirm that checkpointing correctly saves-and-resumes the scheduler state (which it should), we can merge.

Thanks! Just did:

Ran FSDP with cosine scheduler and checkpoint every 5 steps. Killed the job after the first 5 steps (graph below)
Screenshot 2025-08-18 at 5 59 16 PM
and restarted from there, and it looks fine (graph below).
Screenshot 2025-08-18 at 6 03 16 PM
After the first pause, I paused again at the 35th step and resumed, this time using the same WandB run, and thus it's the same graph as above, and it also looks fine.

Copy link
Member

@tyler-griggs tyler-griggs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM!

@tyler-griggs tyler-griggs merged commit c1f6376 into NovaSky-AI:main Aug 18, 2025
3 checks passed
dzorlu referenced this pull request in fleet-ai/SkyRL Feb 4, 2026
**GitHub Issue:** [Support more learning rate
schedulers](NovaSky-AI#140)

**Changes:** 
1. Changed scheduler for FSDP strategy from hard-coded
`constant_with_warmup` to flexible HF scheduler.
2. Added scheduler to config
3. Removed trailing whitespaces in config and docs

**Testing:** 
1. Ran `uv run --isolated --extra dev pytest tests/cpu -q` sanity tests
2. vLLM smoke test
```
python - << 'PY'
from vllm import LLM, SamplingParams
m="TinyLlama/TinyLlama-1.1B-Chat-v1.0"
llm=LLM(model=m, tensor_parallel_size=1, trust_remote_code=True)
print(llm.generate(["hiiii im running test"], SamplingParams(max_tokens=16))[0].outputs[0].text.strip())
PY
```
3. E2E training test
```
uv run --isolated -m skyrl_train.entrypoints.main_base \
  --config examples/gsm8k/minimal.yaml \
  data.gsm8k_path=/skyrl-data/datasets/gsm8k \
  model.name="TinyLlama/TinyLlama-1.1B-Chat-v1.0" \
  generator.max_new_tokens=32 \
  trainer.steps=10 \
  trainer.global_batch_size=2 \
  generator.rollouts_per_iter=4 \
  ray.num_cpus=2 ray.num_gpus=1 \
  logging.output_dir=/skyrl-data/logs/gsm8k-smoke \
  checkpoint.dir=/skyrl-data/checkpoints/gsm8k-smoke
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support more learning rate schedulers

2 participants