[megatron] refactor megatron param/grad offload to directly use new megatron built in functions by erictang000 · Pull Request #1266 · NovaSky-AI/SkyRL

erictang000 · 2026-03-03T20:21:04Z

upgrading to megatron-core==0.16.0 temporarily broke grad offloading code in specific cases due to new grad/param buffer offloading logic built in to megatron-core. Refactors our param/grad offload logic to use megatron builtins from: NVIDIA/Megatron-LM#3112

Fixes tests/backends/skyrl_train/gpu/gpu_ci/test_worker_dispatch_offload.py::test_dispatch_set_lr test which previously was encountering an error when offloading grad.

Offloading test still passes

Before

After

…lt ins

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

…x) and PR #1266 (megatron offload refactor)

refactor megatron param/grad offload to directly use new megatron bui…

d882f90

…lt ins

erictang000 changed the title ~~refactor megatron param/grad offload to directly use new megatron built in functions~~ [megatron] refactor megatron param/grad offload to directly use new megatron built in functions Mar 3, 2026

This comment was marked as resolved.

Sign in to view

devin-ai-integration bot reviewed Mar 3, 2026

View reviewed changes

add comments for context

82e9cfa

erictang000 merged commit 16d652c into NovaSky-AI:main Mar 3, 2026
5 of 6 checks passed

erictang000 deleted the fix_megatron_0.16.0_offload branch March 3, 2026 21:16

tyler-griggs pushed a commit that referenced this pull request Mar 5, 2026

Merge main into pr-1280 to get PR #1268 (dp_reshardable checkpoint fi…

2bbf71c

…x) and PR #1266 (megatron offload refactor)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[megatron] refactor megatron param/grad offload to directly use new megatron built in functions#1266

[megatron] refactor megatron param/grad offload to directly use new megatron built in functions#1266
erictang000 merged 2 commits intoNovaSky-AI:mainfrom
erictang000:fix_megatron_0.16.0_offload

erictang000 commented Mar 3, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

erictang000 commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Before

After

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

erictang000 commented Mar 3, 2026 •

edited

Loading