[train] Pythonic Configs 2/N: Switch to new dataclasses in entrypoint scripts; Change instantiation signatures by SumanthRH · Pull Request #1187 · NovaSky-AI/SkyRL

SumanthRH · 2026-02-20T06:21:14Z

What does this PR do?

Part 2 of the configuration refactoring outlined in https://docs.google.com/document/d/1YzLPCmVZpBauRyh4l-17AUhXiXALmOX-N1gXZkA9slA/edit?tab=t.b04hjkpsxoo5#heading=h.3aplhdu3kl77

Changes

Complete migration to the new configuration structure
New instantiation signatures with fewer cross-config dependencies (e.g., Worker receives only its relevant sub-config instead of the full TrainerConfig)
Keep backwards compatibility for old configs - internally translate older configuration objects to the new configuration
Migrate all examples/ to the new CLI and config classes
Use OmegaConf for CLI parsing: + overrides are no longer supported
Migrates only the skyrl/ package - skyrl-train will be blipped soon
Update docs

Test Plan

Migrate existing unit and integration tests
Add new tests for configuration dataclasses. Add test for backwards compatibility with older hierarchy in the YAML, as well as consistent cross configuration defaults (ex: ref model path should default to policy model config override)
GSM8K E2E test with FSDP
GSM8K E2E test with Megatron
DAPO Example E2E test
Tinker API server + SkyRLTrain backend

…g; migrate DAPO Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

…d tests to use typed dataclass configs Extend the configuration dataclasses with InferenceEngineConfig and additional fields. Update all example scripts, core training modules (trainer, workers, generators, weight sync, utils), and tests to use the new SkyRLConfig system instead of raw DictConfig/YAML access. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

… migrate remaining entrypoints, and clean up APIs - Migrate 12 remaining Python entrypoints from @hydra.main + DictConfig to SkyRLConfig.from_cli_overrides() (examples, integrations, scripts, sft_trainer) - Remove all DictConfig/ListConfig type hints and isinstance branches from core library (inference_engines/utils.py, workers/worker.py, config/config.py) - Remove unused model_name parameter from SkyRLGymGenerator and all callers - Fix verifiers_generator.py to use nested InferenceEngineConfig field paths - Update docs and README examples to reflect new config patterns - Add legacy.py for backward-compatible config translation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

…pdate broadcast_to_inference_engines signature Workers now receive TrainerConfig instead of SkyRLConfig, so update all self.cfg.trainer.X -> self.cfg.X in fsdp_worker, megatron_worker, and megatron_model_wrapper. Fix self.cfg.generator.* references to use the correct new paths (algorithm.temperature, inference_engine_cfg param). Update broadcast_to_inference_engines to accept inference_engine_cfg as an explicit parameter across all call sites. Fix policy.lora -> policy.model.lora in remaining files. Fix flat field reads for nested InferenceEngineConfig fields across tests and examples. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Resolve conflicts keeping pythonic dataclass config patterns: - self.cfg.trainer.X -> self.cfg.X in workers - cfg.generator.inference_engine.X for nested InferenceEngineConfig fields - policy.model.lora instead of policy.lora - broadcast_to_inference_engines(client, client.inference_engine_cfg) - No DictConfig type hints in migrated code - Accept upstream deletion of test_tinker_api_e2e/integration tests - Accept upstream terminal_bench -> harbor rename with our config patterns - Fix DictConfig reference in new importance_sampling_loss from upstream Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replicate the Hydra/YAML DictConfig → Python dataclasses config migration that was completed in skyrl-train/ across to the skyrl/ folder, adapting import paths accordingly. Key changes: - Extract InferenceEngineConfig from flat GeneratorConfig fields - Workers now accept TrainerConfig instead of full SkyRLConfig - InferenceEngineClient uses 5-arg constructor (engines, tokenizer, model_path, lora_cfg, inference_engine_cfg) - Add SkyRLConfig.from_cli_overrides() and make_config() entry points - Add legacy.py for translating old YAML config paths - Remove DictConfig type hints from non-config modules - Update all test files to match new config structure Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

Resolve conflicts in skyrl-train test files by taking upstream's InferenceEngineState context manager pattern. Keep our import cleanup in skyrl/train/utils/utils.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Fix generator.inference_engine.batched -> generator.batched (batched is a GeneratorConfig field) - Fix flat generator.X accesses to generator.inference_engine.X for IE fields in source and test files - Fix inference_servers/utils.py to use nested inference_engine config in both packages - Update all GPU/CPU test files to use correct nested config paths - Fix run_dapo_gsm8k.sh CLI override for batched field Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…s and validation - Migrate 82 shell scripts in examples/train/: remove Hydra '+' prefix from config overrides, move flat generator fields under generator.inference_engine, flatten overlong_buffer dot-notation to underscores - Fix config defaults to match YAML: tensor_parallel_size=1, logprobs=1, critic lr=5e-6, DataConfig with HOME-based paths - Sync validate_generator_cfg from skyrl-train: allow logprobs=1 (was rejecting it), fix auto-set logprobs value, update error messages - Fix broadcast_to_inference_engines test calls to pass inference_engine_cfg arg Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

…etion Upstream PR #1177 migrated docs to docs/mkdocs/ and deleted skyrl-train/docs/. Accept the deletion and drop local SkyRLConfig→SkyRLTrainConfig renames in the old RST files (new docs don't reference these). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace Hydra/YAML references (OmegaConf resolvers, DictConfig, @hydra.main) with new Python dataclass patterns (SkyRLTrainConfig.from_cli_overrides()). Rename flat generator.* fields to nested generator.inference_engine.* paths. Remove stale placement fields (colocate_critic_reward, reward_num_*) from docs. Point config references from ppo_base_config.yaml to config.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The legacy YAML config used OmegaConf interpolations (e.g. ${trainer.policy.model.path}) to derive defaults across config sections. These were lost when migrating to dataclasses, causing the ref model to default to a hardcoded Qwen2.5-1.5B-Instruct instead of matching the policy model. This caused KL loss blow-up in megatron e2e tests. Fix by resolving cross-field defaults in SkyRLTrainConfig.__post_init__: - ref.model.path defaults to policy.model.path (was hardcoded) - generator.max_input_length defaults to trainer.max_prompt_length (was 512) - eval max_generate_length defaults to train max_generate_length - generator rope_scaling/rope_theta default to trainer values Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

docs/mkdocs/content/config.md

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

gemini-code-assist

Code Review

The pull request successfully transitions the configuration system to Python dataclasses, enhancing type safety and reducing cross-config dependencies during worker instantiation. The inclusion of legacy translation ensures backward compatibility with existing YAML configs. However, there are several issues that need to be addressed: hardcoded environment-specific paths in DataConfig and TrainerConfig should be generalized; the default for max_generate_length in SamplingParams being None can lead to crashes in configuration validation; and the use of AlgorithmConfig.from_dict_config when merging loss overrides will cause loss of subclass-specific fields (e.g., for DAPO). Additionally, the weight extractor threshold update mentioned in comments appears to be missing from the implementation.

skyrl/backends/skyrl_train/workers/megatron/megatron_model_wrapper.py

skyrl/backends/skyrl_train/workers/worker.py

skyrl/train/config/config.py

gemini-code-assist · 2026-02-20T22:27:05Z

skyrl/train/config/config.py

    kl_loss_coef: float = 0.001
    use_entropy_loss: bool = False
    entropy_loss_coef: float = 0.01
+    temperature: float = 1.0


AlgorithmConfig.temperature currently defaults to 1.0. However, in SkyRLTrainConfig.__post_init__ (line 686), it is unconditionally overwritten by generator.sampling_params.temperature. This makes any CLI override of trainer.algorithm.temperature ineffective. Consider making it Optional[float] = None and only setting it in __post_init__ if it hasn't been explicitly provided.

Suggested change

temperature: float = 1.0

temperature: Optional[float] = None

Hmm CLI overrides for algorithm temperature are not meant to be. Perhaps we can add some validation

skyrl/train/config/config.py

gemini-code-assist · 2026-02-20T22:27:05Z

skyrl/train/config/config.py

+            self.generator.rope_theta = self.trainer.rope_theta
+        # Copy temperature from generator sampling params to algorithm config
+        # so workers can access it without needing the generator config
+        self.trainer.algorithm.temperature = self.generator.sampling_params.temperature


As noted in the AlgorithmConfig feedback, this unconditional overwrite ignores user-provided values for trainer.algorithm.temperature. It should only apply the default from the generator if the algorithm temperature is not set.

Suggested change

self.trainer.algorithm.temperature = self.generator.sampling_params.temperature

if self.trainer.algorithm.temperature is None:

self.trainer.algorithm.temperature = self.generator.sampling_params.temperature

skyrl/backends/skyrl_train/workers/fsdp/fsdp_worker.py

gemini-code-assist · 2026-02-20T22:27:05Z

skyrl/backends/skyrl_train/workers/megatron/megatron_worker.py

            enable_bucketing=self._transfer_strategy_cls is CudaIpcTransferStrategy,
-            bucket_size_threshold_GB=self.cfg.generator.weight_transfer_threshold_cuda_ipc_GB,
-            training_dtype=torch.bfloat16 if self.cfg.trainer.bf16 else torch.float32,
+            bucket_size_threshold_GB=0.0,  # Updated in init_weight_transfer_communicator when strategy is known


The comment refers to a non-existent method init_weight_transfer_communicator. It should be updated to init_weight_sync_state to match the implementation in the base Worker class.

Suggested change

bucket_size_threshold_GB=0.0, # Updated in init_weight_transfer_communicator when strategy is known

bucket_size_threshold_GB=0.0, # Updated in init_weight_sync_state when strategy is known

skyrl/tinker/engine.py

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

… from skyrl-train migration (#1246) ## Summary Fix stale test docstrings and remaining docs references left over from the `skyrl-train` → `skyrl/train` migration (follow-up to #1236, #1187). ### Tests (`tests/backends/skyrl_train/`) - **`--extra vllm` → `--extra fsdp`** in 11 test file docstrings — the `vllm` extra was renamed to `fsdp` during the migration - **`tests/gpu/` → `tests/backends/skyrl_train/gpu/`** in 26 test file docstrings — paths were stale after the test directory was relocated - **`inf_engines/` → `inference_engines/`** in 2 test file docstrings — abbreviated directory name was never updated - **`skyrl-train/skyrl_train/` → `skyrl/backends/skyrl_train/`** in 1 test file docstring — old package directory reference ### Docs (`docs/content/docs/`) - **`DictConfig` → `SkyRLTrainConfig`** in entrypoint function signatures (`custom_algorithms.mdx`, `dapo.mdx`, `new_env.mdx`) - **Remove `@hydra.main` boilerplate** — replaced with `SkyRLTrainConfig.from_cli_overrides()` pattern (`new_env.mdx`) - **Remove `+` prefix** from CLI overrides — no longer needed after Hydra removal (`dapo.mdx`, `harbor/index.mdx`) - **`${oc.env:HOME}` → `~`** for default paths — OmegaConf interpolations replaced with tilde (`checkpointing.mdx`) - **Fix stale GitHub URL** pointing to old `skyrl-train/skyrl_train/workers/worker.py` (`overview.mdx`) - **Remove Hydra searchpath syntax** and `+`/`++` config prefixes from Harbor example invocation (`harbor/index.mdx`) ## Test plan - [x] Verified no remaining `--extra vllm` in `tests/backends/skyrl_train/` - [x] Verified no remaining `tests/gpu/` paths in test docstrings - [x] Verified no remaining `DictConfig`, `@hydra.main`, `+generator.`, `+trainer.`, `${oc.env:HOME}`, or `skyrl-train/skyrl_train/` in docs  --- <a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1246" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a>  Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@justinvyu

…1317) After #1187, the `eval_sampling_params.max_generate_length` no longer was automatically set to `sampling_params.max_generate_length` in the case that `eval_sampling_params` was not `None`. ([code link](https://github.com/NovaSky-AI/SkyRL/blob/073b1f3b626b760885b94e6e53e4a8bce5df1a38/skyrl/train/config/config.py#L515)) This was causing example scripts to have matching training behavior but diverging eval behavior. Manually setting `eval_sampling_params.max_generate_length` in example scripts should fix this Examples of divergence due to unexpected lower `eval_sampling_params.max_generate_length` cc: @justinvyu <img width="2086" height="1332" alt="image" src="https://github.com/user-attachments/assets/0b1cfee9-1dfd-4a25-b85c-cbeb1991e515" /> <img width="696" height="664" alt="image" src="https://github.com/user-attachments/assets/07c0b300-6785-4b44-b46f-f4b840fa2a0c" />  --- <a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1317" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a>

SumanthRH and others added 23 commits February 4, 2026 13:49

use new SkyRL config in the main script; add new InferenceEngineConfi…

bd462d3

…g; migrate DAPO Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

latest progress

e27339d

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

x

c449c90

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

Merge upstream/main into pythonic-configs-vv1

66745b1

Resolve conflicts in skyrl-train test files by taking upstream's InferenceEngineState context manager pattern. Keep our import cleanup in skyrl/train/utils/utils.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

x

40c4ca3

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

x

5433959

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

Merge remote-tracking branch 'upstream/main' into pythonic-configs-vv1

060c54d

Add configuration API reference docs for pythonic config classes

ebb58c6

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge remote-tracking branch 'upstream/main' into pythonic-configs-vv1

bb7506a

Don't change skyrl-train

70dfe05

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

fix cross field defaults

8588f6e

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

x

5c6cfca

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

fix backend dtypes; fix megatron test

b2410f1

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

SumanthRH marked this pull request as ready for review February 20, 2026 22:21

SumanthRH commented Feb 20, 2026

View reviewed changes

docs/mkdocs/content/config.md Show resolved Hide resolved

x

0b0edce

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

gemini-code-assist bot reviewed Feb 20, 2026

View reviewed changes

This comment was marked as resolved.

Sign in to view

pcmoritz reviewed Feb 20, 2026

View reviewed changes

skyrl/tinker/engine.py Outdated Show resolved Hide resolved

more fixes

4bc25da

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

SumanthRH added 2 commits February 24, 2026 23:53

x

c9fd166

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

x

14f8f9b

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

This comment was marked as resolved.

Sign in to view

SumanthRH added 2 commits February 25, 2026 00:14

x

7ec226e

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

Merge remote-tracking branch 'upstream/main' into pythonic-configs-vv1

43ac8ab

This comment was marked as resolved.

Sign in to view

x

1655229

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

This comment was marked as resolved.

Sign in to view

SumanthRH and others added 2 commits February 24, 2026 16:51

Update skyrl/backends/skyrl_train/workers/worker.py

16edac0

Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>

x

76865e8

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

CharlieFRuan approved these changes Feb 25, 2026

View reviewed changes

SumanthRH added 6 commits February 25, 2026 02:22

fix max seq len test

77a72ef

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

remove model name arg

9ea2c39

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

switch to ray init fixture

397edf7

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

Merge remote-tracking branch 'upstream/main' into pythonic-configs-vv1

50e8b09

self review

c52fe54

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

revert skyrl-agent change

5e711df

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>

SumanthRH merged commit 9cc85fa into main Feb 25, 2026
5 checks passed

SumanthRH deleted the pythonic-configs-vv1 branch February 26, 2026 23:48

SumanthRH mentioned this pull request Mar 2, 2026

[train] Cleanup docs references to skyrl-train #1236

Merged

CharlieFRuan mentioned this pull request Mar 2, 2026

[train][docs] Fix stale test docstrings and remaining docs references from skyrl-train migration #1246

Merged

3 tasks

This was referenced Mar 3, 2026

SkyRL Frictions #644

Open

[train] A better contract for Custom Generators, and Generator configs #1156

Open

[tinker] Implement better validation for CLI parameters with the skyrl-train backend #1279

Open

SumanthRH mentioned this pull request Mar 11, 2026

[AsyncRL] Use keep mode for pause and resume #1179

Merged

3 tasks

erictang000 mentioned this pull request Mar 11, 2026

[examples][bug] fix silent eval max generate length not overriding #1317

Merged

This was referenced Mar 23, 2026

Make SkyRL configuration and initialization more pythonic #748

Closed

SkyRL H1 2026 Roadmap #1391

Open

	self.trainer.algorithm.temperature = self.generator.sampling_params.temperature
	if self.trainer.algorithm.temperature is None:
	self.trainer.algorithm.temperature = self.generator.sampling_params.temperature

	bucket_size_threshold_GB=0.0, # Updated in init_weight_transfer_communicator when strategy is known
	bucket_size_threshold_GB=0.0, # Updated in init_weight_sync_state when strategy is known

Conversation

SumanthRH commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Changes

Test Plan

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

SumanthRH Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SumanthRH commented Feb 20, 2026 •

edited

Loading