Skip to content

[skyrl-train] Add example for 235B LoRA training with Megatron on 4 H100 nodes#1000

Merged
erictang000 merged 6 commits intoNovaSky-AI:mainfrom
erictang000:235b_lora
Jan 31, 2026
Merged

[skyrl-train] Add example for 235B LoRA training with Megatron on 4 H100 nodes#1000
erictang000 merged 6 commits intoNovaSky-AI:mainfrom
erictang000:235b_lora

Conversation

@erictang000
Copy link
Collaborator

@erictang000 erictang000 commented Jan 31, 2026

Adds example script for Qwen3-235-A3B-Instruct-2507 DAPO training with the Megatron backend + LoRA. Mean @ 32 increases from ~60% -> 67.5%.

Also adds eval generation length metrics to the evaluate function by default - this change affects both the generation and training entry points.

Training Curve

image image

Metrics

image

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new example script for 235B LoRA training with Megatron on 4 H100 nodes using DAPO. It also updates an existing Megatron training script with a new configuration and adds LoRA parameters. Additionally, it enhances the evaluation process by logging more detailed rollout metrics.

My review focuses on improving the clarity and correctness of the new and modified shell scripts. I've pointed out a redundant variable definition and inconsistencies between comments and code in the configuration parameters. These changes will help improve the maintainability and readability of the scripts.

Comment on lines +30 to +31
# Qwen3-235B-A22B has 94 blocks, so we need to set the last pipeline stage layer to use 4 blocks
MEGATRON_LAST_PIPELINE_STAGE_LAYER=16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment on line 30 states that the last pipeline stage should use 4 blocks, but the value of MEGATRON_LAST_PIPELINE_STAGE_LAYER is set to 16 on line 31. This is inconsistent. Please update either the comment or the value to reflect the correct configuration.

MEGATRON_ETP=1
# Qwen3-235B-A22B has 94 blocks, so we need to set the last pipeline stage layer to use 4 blocks
MEGATRON_LAST_PIPELINE_STAGE_LAYER=4
MEGATRON_LAST_PIPELINE_STAGE_LAYER=10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment on line 29 states that the last pipeline stage should use 4 blocks, but the value of MEGATRON_LAST_PIPELINE_STAGE_LAYER is changed to 10. This is inconsistent. Please update either the comment or the value to reflect the correct configuration.

@erictang000 erictang000 merged commit afa60b7 into NovaSky-AI:main Jan 31, 2026
@erictang000 erictang000 deleted the 235b_lora branch January 31, 2026 19:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant