Skip to content
Merged
Show file tree
Hide file tree
Changes from 53 commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
bd462d3
use new SkyRL config in the main script; add new InferenceEngineConfi…
SumanthRH Feb 4, 2026
08bbcca
[train] Pythonic Configs 2/N - Migrate all examples, core modules, an…
SumanthRH Feb 6, 2026
ad93f7f
[train] Pythonic Configs 3/N - Remove DictConfig from all type hints,…
SumanthRH Feb 6, 2026
e27339d
latest progress
SumanthRH Feb 6, 2026
e3759c4
[train] Pythonic Configs 4/N - Fix worker subclass config paths and u…
SumanthRH Feb 8, 2026
4a816cf
Merge upstream/main into pythonic-configs-vv1
SumanthRH Feb 17, 2026
bc2dea2
Port pythonic configs migration from skyrl-train to skyrl folder
SumanthRH Feb 18, 2026
c449c90
x
SumanthRH Feb 18, 2026
66745b1
Merge upstream/main into pythonic-configs-vv1
SumanthRH Feb 18, 2026
9792d7d
Fix config field access bugs across skyrl-train and skyrl
SumanthRH Feb 18, 2026
9d356ab
Migrate examples/train scripts to pythonic config, fix config default…
SumanthRH Feb 19, 2026
40c4ca3
x
SumanthRH Feb 19, 2026
5433959
x
SumanthRH Feb 19, 2026
0c7ade5
Merge upstream/main: resolve docs conflicts by accepting upstream del…
SumanthRH Feb 19, 2026
548327f
Update docs to reflect pythonic config migration
SumanthRH Feb 19, 2026
060c54d
Merge remote-tracking branch 'upstream/main' into pythonic-configs-vv1
SumanthRH Feb 19, 2026
ebb58c6
Add configuration API reference docs for pythonic config classes
SumanthRH Feb 19, 2026
c6f1486
Fix cross-field config defaults lost in YAML-to-dataclass migration
SumanthRH Feb 20, 2026
bb7506a
Merge remote-tracking branch 'upstream/main' into pythonic-configs-vv1
SumanthRH Feb 20, 2026
70dfe05
Don't change skyrl-train
SumanthRH Feb 20, 2026
8588f6e
fix cross field defaults
SumanthRH Feb 20, 2026
5c6cfca
x
SumanthRH Feb 20, 2026
b2410f1
fix backend dtypes; fix megatron test
SumanthRH Feb 20, 2026
0b0edce
x
SumanthRH Feb 20, 2026
4bc25da
more fixes
SumanthRH Feb 21, 2026
d1e995d
x
SumanthRH Feb 21, 2026
12711cb
x
SumanthRH Feb 21, 2026
11d12de
fix skyrl train backend
SumanthRH Feb 21, 2026
4153277
renaming
SumanthRH Feb 21, 2026
1cff09d
x
SumanthRH Feb 21, 2026
e6fc36b
x
SumanthRH Feb 21, 2026
5bf149b
Update skyrl/train/config/config.py
SumanthRH Feb 22, 2026
2c8fcb3
Update skyrl/train/config/config.py
SumanthRH Feb 22, 2026
c99183b
Update skyrl/train/config/config.py
SumanthRH Feb 22, 2026
e04bca2
x
SumanthRH Feb 22, 2026
00b3580
Merge remote-tracking branch 'upstream/main' into pythonic-configs-vv1
SumanthRH Feb 23, 2026
f067dad
x
SumanthRH Feb 23, 2026
966bd02
x
SumanthRH Feb 24, 2026
b0c7c92
fix default value for sampling params
SumanthRH Feb 24, 2026
8a29e0c
revert change
SumanthRH Feb 24, 2026
030a568
revert skyrl-train changes
SumanthRH Feb 24, 2026
e31f155
x
SumanthRH Feb 24, 2026
cb523b2
remove prints
SumanthRH Feb 24, 2026
5aab05b
Migrate examples/ off Hydra to pythonic dataclass configs
SumanthRH Feb 24, 2026
f426aff
x
SumanthRH Feb 24, 2026
d07108e
x
SumanthRH Feb 24, 2026
a833669
x
SumanthRH Feb 24, 2026
c73bd9d
x
SumanthRH Feb 24, 2026
c9fd166
x
SumanthRH Feb 24, 2026
14f8f9b
x
SumanthRH Feb 25, 2026
7ec226e
x
SumanthRH Feb 25, 2026
43ac8ab
Merge remote-tracking branch 'upstream/main' into pythonic-configs-vv1
SumanthRH Feb 25, 2026
1655229
x
SumanthRH Feb 25, 2026
16edac0
Update skyrl/backends/skyrl_train/workers/worker.py
SumanthRH Feb 25, 2026
76865e8
x
SumanthRH Feb 25, 2026
77a72ef
fix max seq len test
SumanthRH Feb 25, 2026
9ea2c39
remove model name arg
SumanthRH Feb 25, 2026
397edf7
switch to ray init fixture
SumanthRH Feb 25, 2026
50e8b09
Merge remote-tracking branch 'upstream/main' into pythonic-configs-vv1
SumanthRH Feb 25, 2026
c52fe54
self review
SumanthRH Feb 25, 2026
5e711df
revert skyrl-agent change
SumanthRH Feb 25, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
182 changes: 103 additions & 79 deletions docs/content/docs/configuration/config.mdx

Large diffs are not rendered by default.

227 changes: 227 additions & 0 deletions docs/mkdocs/content/config.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,227 @@
# Configuration Reference

SkyRL-Train uses Python dataclasses for configuration. The top-level
`SkyRLTrainConfig` mirrors the YAML configuration structure and can be
constructed from YAML files, CLI overrides, or plain dicts.

## Top-Level Config

::: skyrl.train.config.SkyRLTrainConfig
options:
show_root_heading: true
members_order: source
show_bases: true

## Trainer

::: skyrl.train.config.TrainerConfig
options:
show_root_heading: true
members_order: source
show_bases: true

### Placement

::: skyrl.train.config.PlacementConfig
options:
show_root_heading: true
members_order: source
show_bases: true

### Policy / Critic / Ref

::: skyrl.train.config.PolicyConfig
options:
show_root_heading: true
members_order: source
show_bases: true

::: skyrl.train.config.CriticConfig
options:
show_root_heading: true
members_order: source
show_bases: true

::: skyrl.train.config.RefConfig
options:
show_root_heading: true
members_order: source
show_bases: true

### Model & LoRA

::: skyrl.train.config.ModelConfig
options:
show_root_heading: true
members_order: source
show_bases: true

::: skyrl.train.config.SkyRLLoraConfig
options:
show_root_heading: true
members_order: source
show_bases: true

### Optimizer & Mixed Precision

::: skyrl.train.config.OptimizerConfig
options:
show_root_heading: true
members_order: source
show_bases: true

::: skyrl.train.config.MixedPrecisionConfig
options:
show_root_heading: true
members_order: source
show_bases: true

### FSDP

::: skyrl.train.config.FSDPConfig
options:
show_root_heading: true
members_order: source
show_bases: true

### Megatron

::: skyrl.train.config.MegatronConfig
options:
show_root_heading: true
members_order: source
show_bases: true

::: skyrl.train.config.MegatronDDPConfig
options:
show_root_heading: true
members_order: source
show_bases: true

::: skyrl.train.config.MegatronLoraConfig
options:
show_root_heading: true
members_order: source
show_bases: true

::: skyrl.train.config.MegatronTorchProfilerConfig
options:
show_root_heading: true
members_order: source
show_bases: true

### Algorithm

::: skyrl.train.config.AlgorithmConfig
options:
show_root_heading: true
members_order: source
show_bases: true

::: skyrl.train.config.SAPOConfig
options:
show_root_heading: true
members_order: source
show_bases: true

::: skyrl.train.config.CISPOConfig
options:
show_root_heading: true
members_order: source
show_bases: true

::: skyrl.train.config.ClipCovConfig
options:
show_root_heading: true
members_order: source
show_bases: true

::: skyrl.train.config.KLCovConfig
options:
show_root_heading: true
members_order: source
show_bases: true

::: skyrl.train.config.KLCtrlConfig
options:
show_root_heading: true
members_order: source
show_bases: true

::: skyrl.train.config.DynamicSamplingConfig
options:
show_root_heading: true
members_order: source
show_bases: true

::: skyrl.train.config.OffPolicyCorrectionConfig
options:
show_root_heading: true
members_order: source
show_bases: true

### Fully Async

::: skyrl.train.config.FullyAsyncConfig
options:
show_root_heading: true
members_order: source
show_bases: true

## Generator

::: skyrl.train.config.GeneratorConfig
options:
show_root_heading: true
members_order: source
show_bases: true

### Inference Engine

::: skyrl.train.config.InferenceEngineConfig
options:
show_root_heading: true
members_order: source
show_bases: true

### Sampling

::: skyrl.train.config.SamplingParams
options:
show_root_heading: true
members_order: source
show_bases: true

::: skyrl.train.config.ChatTemplateConfig
options:
show_root_heading: true
members_order: source
show_bases: true

## Environment

::: skyrl.train.config.EnvironmentConfig
options:
show_root_heading: true
members_order: source
show_bases: true

::: skyrl.train.config.SkyRLGymConfig
options:
show_root_heading: true
members_order: source
show_bases: true

## Data

::: skyrl.train.config.DataConfig
options:
show_root_heading: true
members_order: source
show_bases: true

## Utilities

::: skyrl.train.config.config.make_config
options:
show_root_heading: true
6 changes: 3 additions & 3 deletions docs/mkdocs/content/skyrl_train_backend.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,19 @@ Backend using the SkyRL-Train distributed training framework (FSDP/Megatron).

## Configuration

::: skyrl.backends.skyrl_train_backend.SkyRLTrainBackendConfig
::: skyrl.backends.skyrl_train_backend.SkyRLTrainBackendOverrides
options:
show_root_heading: true
members_order: source
show_bases: true

::: skyrl.backends.skyrl_train_backend.FSDPBackendConfig
::: skyrl.backends.skyrl_train_backend.FSDPBackendOverrides
options:
show_root_heading: true
members_order: source
show_bases: true

::: skyrl.backends.skyrl_train_backend.MegatronBackendConfig
::: skyrl.backends.skyrl_train_backend.MegatronBackendOverrides
options:
show_root_heading: true
members_order: source
Expand Down
1 change: 1 addition & 0 deletions docs/mkdocs/mkdocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ nav:
- JAX Backend: jax_backend.md
- "SkyRL-Train Backend":
- Backend API: skyrl_train_backend.md
- Configuration: config.md
- Data Interface: data.md
- Generator: generator.md
- Trainer: trainer.md
Expand Down
14 changes: 7 additions & 7 deletions examples/train/algorithms/cispo/run_cispo_gsm8k.sh
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,8 @@ uv run --isolated --extra fsdp -m skyrl.train.entrypoints.main_base \
trainer.placement.colocate_all=true \
trainer.strategy=fsdp2 \
trainer.placement.policy_num_gpus_per_node=$NUM_GPUS \
generator.num_inference_engines=$NUM_GPUS \
generator.inference_engine_tensor_parallel_size=1 \
generator.inference_engine.num_engines=$NUM_GPUS \
generator.inference_engine.tensor_parallel_size=1 \
trainer.epochs=20 \
trainer.eval_batch_size=1024 \
trainer.eval_before_train=true \
Expand All @@ -45,14 +45,14 @@ uv run --isolated --extra fsdp -m skyrl.train.entrypoints.main_base \
generator.sampling_params.max_generate_length=1024 \
trainer.policy.optimizer_config.lr=1.0e-6 \
trainer.algorithm.use_kl_loss=$USE_KL_LOSS \
generator.backend=vllm \
generator.run_engines_locally=true \
generator.weight_sync_backend=nccl \
generator.async_engine=true \
generator.inference_engine.backend=vllm \
generator.inference_engine.run_engines_locally=true \
generator.inference_engine.weight_sync_backend=nccl \
generator.inference_engine.async_engine=true \
generator.batched=true \
environment.env_class=gsm8k \
generator.n_samples_per_prompt=5 \
generator.gpu_memory_utilization=0.8 \
generator.inference_engine.gpu_memory_utilization=0.8 \
trainer.logger="$LOGGER" \
trainer.project_name="cispo_gsm8k" \
trainer.run_name="cispo_gsm8k_test" \
Expand Down
14 changes: 7 additions & 7 deletions examples/train/algorithms/clip_cov_kl_cov/run_clip_cov.sh
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,8 @@ uv run --isolated --extra fsdp -m skyrl.train.entrypoints.main_base \
trainer.strategy=fsdp2 \
trainer.placement.policy_num_gpus_per_node=$NUM_GPUS \
trainer.placement.ref_num_gpus_per_node=$NUM_GPUS \
generator.num_inference_engines=$NUM_GPUS \
generator.inference_engine_tensor_parallel_size=1 \
generator.inference_engine.num_engines=$NUM_GPUS \
generator.inference_engine.tensor_parallel_size=1 \
trainer.epochs=20 \
trainer.eval_batch_size=1024 \
trainer.eval_before_train=true \
Expand All @@ -48,14 +48,14 @@ uv run --isolated --extra fsdp -m skyrl.train.entrypoints.main_base \
trainer.policy.optimizer_config.lr=1.0e-6 \
trainer.algorithm.use_kl_loss=true \
trainer.algorithm.kl_loss_coef=0.001 \
generator.backend=vllm \
generator.run_engines_locally=true \
generator.weight_sync_backend=nccl \
generator.async_engine=true \
generator.inference_engine.backend=vllm \
generator.inference_engine.run_engines_locally=true \
generator.inference_engine.weight_sync_backend=nccl \
generator.inference_engine.async_engine=true \
generator.batched=true \
environment.env_class=gsm8k \
generator.n_samples_per_prompt=5 \
generator.gpu_memory_utilization=0.8 \
generator.inference_engine.gpu_memory_utilization=0.8 \
trainer.logger="$LOGGER" \
trainer.project_name="clip_cov_gsm8k" \
trainer.run_name="clip_cov_gsm8k_test" \
Expand Down
14 changes: 7 additions & 7 deletions examples/train/algorithms/clip_cov_kl_cov/run_kl_cov.sh
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,8 @@ uv run --isolated --extra fsdp -m skyrl.train.entrypoints.main_base \
trainer.strategy=fsdp2 \
trainer.placement.policy_num_gpus_per_node=$NUM_GPUS \
trainer.placement.ref_num_gpus_per_node=$NUM_GPUS \
generator.num_inference_engines=$NUM_GPUS \
generator.inference_engine_tensor_parallel_size=1 \
generator.inference_engine.num_engines=$NUM_GPUS \
generator.inference_engine.tensor_parallel_size=1 \
trainer.epochs=20 \
trainer.eval_batch_size=1024 \
trainer.eval_before_train=true \
Expand All @@ -47,14 +47,14 @@ uv run --isolated --extra fsdp -m skyrl.train.entrypoints.main_base \
trainer.policy.optimizer_config.lr=1.0e-6 \
trainer.algorithm.use_kl_loss=true \
trainer.algorithm.kl_loss_coef=0.001 \
generator.backend=vllm \
generator.run_engines_locally=true \
generator.weight_sync_backend=nccl \
generator.async_engine=true \
generator.inference_engine.backend=vllm \
generator.inference_engine.run_engines_locally=true \
generator.inference_engine.weight_sync_backend=nccl \
generator.inference_engine.async_engine=true \
generator.batched=true \
environment.env_class=gsm8k \
generator.n_samples_per_prompt=5 \
generator.gpu_memory_utilization=0.8 \
generator.inference_engine.gpu_memory_utilization=0.8 \
trainer.logger="$LOGGER" \
trainer.project_name="kl_cov_gsm8k" \
trainer.run_name="kl_cov_gsm8k_test" \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,14 @@
uv run --isolated --extra fsdp -m examples.train.algorithms.custom_advantage_estimator.main_custom_adv_est
"""

import sys

import ray
import hydra
import torch
import numpy as np
from omegaconf import DictConfig
from skyrl.train.config import SkyRLTrainConfig
from skyrl.train.utils import initialize_ray
from skyrl.train.entrypoints.main_base import BasePPOExp, config_dir, validate_cfg
from skyrl.train.entrypoints.main_base import BasePPOExp, validate_cfg
from skyrl.backends.skyrl_train.utils.ppo_utils import AdvantageEstimatorRegistry


Expand Down Expand Up @@ -38,13 +39,13 @@ def compute_simple_baseline_advantage(


@ray.remote(num_cpus=1)
def skyrl_entrypoint(cfg: DictConfig):
def skyrl_entrypoint(cfg: SkyRLTrainConfig):
exp = BasePPOExp(cfg)
exp.run()


@hydra.main(config_path=config_dir, config_name="ppo_base_config", version_base=None)
def main(cfg: DictConfig) -> None:
def main() -> None:
cfg = SkyRLTrainConfig.from_cli_overrides(sys.argv[1:])
# validate the arguments
validate_cfg(cfg)

Expand Down
Loading
Loading