Skip to content

[megatron] refactor megatron param/grad offload to directly use new megatron built in functions#1266

Merged
erictang000 merged 2 commits intoNovaSky-AI:mainfrom
erictang000:fix_megatron_0.16.0_offload
Mar 3, 2026
Merged

[megatron] refactor megatron param/grad offload to directly use new megatron built in functions#1266
erictang000 merged 2 commits intoNovaSky-AI:mainfrom
erictang000:fix_megatron_0.16.0_offload

Conversation

@erictang000
Copy link
Copy Markdown
Collaborator

@erictang000 erictang000 commented Mar 3, 2026

upgrading to megatron-core==0.16.0 temporarily broke grad offloading code in specific cases due to new grad/param buffer offloading logic built in to megatron-core. Refactors our param/grad offload logic to use megatron builtins from: NVIDIA/Megatron-LM#3112

Fixes tests/backends/skyrl_train/gpu/gpu_ci/test_worker_dispatch_offload.py::test_dispatch_set_lr test which previously was encountering an error when offloading grad.

image

Offloading test still passes

Before

image

After

image
Open with Devin

@erictang000 erictang000 changed the title refactor megatron param/grad offload to directly use new megatron built in functions [megatron] refactor megatron param/grad offload to directly use new megatron built in functions Mar 3, 2026
gemini-code-assist[bot]

This comment was marked as resolved.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

@erictang000 erictang000 merged commit 16d652c into NovaSky-AI:main Mar 3, 2026
5 of 6 checks passed
@erictang000 erictang000 deleted the fix_megatron_0.16.0_offload branch March 3, 2026 21:16
tyler-griggs pushed a commit that referenced this pull request Mar 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant