Skip to content

[train] Pythonic Configs 2/N: Switch to new dataclasses in entrypoint scripts; Change instantiation signatures #1187

Merged
SumanthRH merged 61 commits intomainfrom
pythonic-configs-vv1
Feb 25, 2026
Merged

[train] Pythonic Configs 2/N: Switch to new dataclasses in entrypoint scripts; Change instantiation signatures #1187
SumanthRH merged 61 commits intomainfrom
pythonic-configs-vv1

Conversation

@SumanthRH
Copy link
Member

@SumanthRH SumanthRH commented Feb 20, 2026

What does this PR do?

Part 2 of the configuration refactoring outlined in https://docs.google.com/document/d/1YzLPCmVZpBauRyh4l-17AUhXiXALmOX-N1gXZkA9slA/edit?tab=t.b04hjkpsxoo5#heading=h.3aplhdu3kl77

Changes

  • Complete migration to the new configuration structure
  • New instantiation signatures with fewer cross-config dependencies (e.g., Worker receives only its relevant sub-config instead of the full TrainerConfig)
  • Keep backwards compatibility for old configs - internally translate older configuration objects to the new configuration
  • Migrate all examples/ to the new CLI and config classes
  • Use OmegaConf for CLI parsing: + overrides are no longer supported
  • Migrates only the skyrl/ package - skyrl-train will be blipped soon
  • Update docs

Test Plan

  • Migrate existing unit and integration tests
  • Add new tests for configuration dataclasses. Add test for backwards compatibility with older hierarchy in the YAML, as well as consistent cross configuration defaults (ex: ref model path should default to policy model config override)
  • GSM8K E2E test with FSDP
  • GSM8K E2E test with Megatron
  • DAPO Example E2E test
  • Tinker API server + SkyRLTrain backend

Open with Devin

SumanthRH and others added 23 commits February 4, 2026 13:49
…g; migrate DAPO

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
…d tests to use typed dataclass configs

Extend the configuration dataclasses with InferenceEngineConfig and additional fields.
Update all example scripts, core training modules (trainer, workers, generators,
weight sync, utils), and tests to use the new SkyRLConfig system instead of raw
DictConfig/YAML access.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
… migrate remaining entrypoints, and clean up APIs

- Migrate 12 remaining Python entrypoints from @hydra.main + DictConfig to
  SkyRLConfig.from_cli_overrides() (examples, integrations, scripts, sft_trainer)
- Remove all DictConfig/ListConfig type hints and isinstance branches from core
  library (inference_engines/utils.py, workers/worker.py, config/config.py)
- Remove unused model_name parameter from SkyRLGymGenerator and all callers
- Fix verifiers_generator.py to use nested InferenceEngineConfig field paths
- Update docs and README examples to reflect new config patterns
- Add legacy.py for backward-compatible config translation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
…pdate broadcast_to_inference_engines signature

Workers now receive TrainerConfig instead of SkyRLConfig, so update all
self.cfg.trainer.X -> self.cfg.X in fsdp_worker, megatron_worker, and
megatron_model_wrapper. Fix self.cfg.generator.* references to use the
correct new paths (algorithm.temperature, inference_engine_cfg param).
Update broadcast_to_inference_engines to accept inference_engine_cfg as
an explicit parameter across all call sites. Fix policy.lora ->
policy.model.lora in remaining files. Fix flat field reads for nested
InferenceEngineConfig fields across tests and examples.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolve conflicts keeping pythonic dataclass config patterns:
- self.cfg.trainer.X -> self.cfg.X in workers
- cfg.generator.inference_engine.X for nested InferenceEngineConfig fields
- policy.model.lora instead of policy.lora
- broadcast_to_inference_engines(client, client.inference_engine_cfg)
- No DictConfig type hints in migrated code
- Accept upstream deletion of test_tinker_api_e2e/integration tests
- Accept upstream terminal_bench -> harbor rename with our config patterns
- Fix DictConfig reference in new importance_sampling_loss from upstream

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replicate the Hydra/YAML DictConfig → Python dataclasses config
migration that was completed in skyrl-train/ across to the skyrl/
folder, adapting import paths accordingly.

Key changes:
- Extract InferenceEngineConfig from flat GeneratorConfig fields
- Workers now accept TrainerConfig instead of full SkyRLConfig
- InferenceEngineClient uses 5-arg constructor (engines, tokenizer,
  model_path, lora_cfg, inference_engine_cfg)
- Add SkyRLConfig.from_cli_overrides() and make_config() entry points
- Add legacy.py for translating old YAML config paths
- Remove DictConfig type hints from non-config modules
- Update all test files to match new config structure

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
x
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Resolve conflicts in skyrl-train test files by taking upstream's
InferenceEngineState context manager pattern. Keep our import cleanup
in skyrl/train/utils/utils.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix generator.inference_engine.batched -> generator.batched (batched is a GeneratorConfig field)
- Fix flat generator.X accesses to generator.inference_engine.X for IE fields in source and test files
- Fix inference_servers/utils.py to use nested inference_engine config in both packages
- Update all GPU/CPU test files to use correct nested config paths
- Fix run_dapo_gsm8k.sh CLI override for batched field

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…s and validation

- Migrate 82 shell scripts in examples/train/: remove Hydra '+' prefix from
  config overrides, move flat generator fields under generator.inference_engine,
  flatten overlong_buffer dot-notation to underscores
- Fix config defaults to match YAML: tensor_parallel_size=1, logprobs=1,
  critic lr=5e-6, DataConfig with HOME-based paths
- Sync validate_generator_cfg from skyrl-train: allow logprobs=1 (was rejecting
  it), fix auto-set logprobs value, update error messages
- Fix broadcast_to_inference_engines test calls to pass inference_engine_cfg arg

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
x
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
x
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
…etion

Upstream PR #1177 migrated docs to docs/mkdocs/ and deleted skyrl-train/docs/.
Accept the deletion and drop local SkyRLConfig→SkyRLTrainConfig renames in the
old RST files (new docs don't reference these).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace Hydra/YAML references (OmegaConf resolvers, DictConfig, @hydra.main)
with new Python dataclass patterns (SkyRLTrainConfig.from_cli_overrides()).
Rename flat generator.* fields to nested generator.inference_engine.* paths.
Remove stale placement fields (colocate_critic_reward, reward_num_*) from docs.
Point config references from ppo_base_config.yaml to config.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The legacy YAML config used OmegaConf interpolations (e.g. ${trainer.policy.model.path})
to derive defaults across config sections. These were lost when migrating to dataclasses,
causing the ref model to default to a hardcoded Qwen2.5-1.5B-Instruct instead of
matching the policy model. This caused KL loss blow-up in megatron e2e tests.

Fix by resolving cross-field defaults in SkyRLTrainConfig.__post_init__:
- ref.model.path defaults to policy.model.path (was hardcoded)
- generator.max_input_length defaults to trainer.max_prompt_length (was 512)
- eval max_generate_length defaults to train max_generate_length
- generator rope_scaling/rope_theta default to trainer values

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
x
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
@SumanthRH SumanthRH marked this pull request as ready for review February 20, 2026 22:21
x
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request successfully transitions the configuration system to Python dataclasses, enhancing type safety and reducing cross-config dependencies during worker instantiation. The inclusion of legacy translation ensures backward compatibility with existing YAML configs. However, there are several issues that need to be addressed: hardcoded environment-specific paths in DataConfig and TrainerConfig should be generalized; the default for max_generate_length in SamplingParams being None can lead to crashes in configuration validation; and the use of AlgorithmConfig.from_dict_config when merging loss overrides will cause loss of subclass-specific fields (e.g., for DAPO). Additionally, the weight extractor threshold update mentioned in comments appears to be missing from the implementation.

kl_loss_coef: float = 0.001
use_entropy_loss: bool = False
entropy_loss_coef: float = 0.01
temperature: float = 1.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

AlgorithmConfig.temperature currently defaults to 1.0. However, in SkyRLTrainConfig.__post_init__ (line 686), it is unconditionally overwritten by generator.sampling_params.temperature. This makes any CLI override of trainer.algorithm.temperature ineffective. Consider making it Optional[float] = None and only setting it in __post_init__ if it hasn't been explicitly provided.

Suggested change
temperature: float = 1.0
temperature: Optional[float] = None

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm CLI overrides for algorithm temperature are not meant to be. Perhaps we can add some validation

self.generator.rope_theta = self.trainer.rope_theta
# Copy temperature from generator sampling params to algorithm config
# so workers can access it without needing the generator config
self.trainer.algorithm.temperature = self.generator.sampling_params.temperature
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

As noted in the AlgorithmConfig feedback, this unconditional overwrite ignores user-provided values for trainer.algorithm.temperature. It should only apply the default from the generator if the algorithm temperature is not set.

Suggested change
self.trainer.algorithm.temperature = self.generator.sampling_params.temperature
if self.trainer.algorithm.temperature is None:
self.trainer.algorithm.temperature = self.generator.sampling_params.temperature

enable_bucketing=self._transfer_strategy_cls is CudaIpcTransferStrategy,
bucket_size_threshold_GB=self.cfg.generator.weight_transfer_threshold_cuda_ipc_GB,
training_dtype=torch.bfloat16 if self.cfg.trainer.bf16 else torch.float32,
bucket_size_threshold_GB=0.0, # Updated in init_weight_transfer_communicator when strategy is known
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment refers to a non-existent method init_weight_transfer_communicator. It should be updated to init_weight_sync_state to match the implementation in the base Worker class.

Suggested change
bucket_size_threshold_GB=0.0, # Updated in init_weight_transfer_communicator when strategy is known
bucket_size_threshold_GB=0.0, # Updated in init_weight_sync_state when strategy is known

devin-ai-integration[bot]

This comment was marked as resolved.

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
x
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
x
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

x
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
devin-ai-integration[bot]

This comment was marked as resolved.

SumanthRH and others added 2 commits February 24, 2026 16:51
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
x
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
@SumanthRH SumanthRH merged commit 9cc85fa into main Feb 25, 2026
5 checks passed
@SumanthRH SumanthRH deleted the pythonic-configs-vv1 branch February 26, 2026 23:48
SumanthRH pushed a commit that referenced this pull request Mar 2, 2026
… from skyrl-train migration (#1246)

## Summary

Fix stale test docstrings and remaining docs references left over from
the `skyrl-train` → `skyrl/train` migration (follow-up to #1236, #1187).

### Tests (`tests/backends/skyrl_train/`)

- **`--extra vllm` → `--extra fsdp`** in 11 test file docstrings — the
`vllm` extra was renamed to `fsdp` during the migration
- **`tests/gpu/` → `tests/backends/skyrl_train/gpu/`** in 26 test file
docstrings — paths were stale after the test directory was relocated
- **`inf_engines/` → `inference_engines/`** in 2 test file docstrings —
abbreviated directory name was never updated
- **`skyrl-train/skyrl_train/` → `skyrl/backends/skyrl_train/`** in 1
test file docstring — old package directory reference

### Docs (`docs/content/docs/`)

- **`DictConfig` → `SkyRLTrainConfig`** in entrypoint function
signatures (`custom_algorithms.mdx`, `dapo.mdx`, `new_env.mdx`)
- **Remove `@hydra.main` boilerplate** — replaced with
`SkyRLTrainConfig.from_cli_overrides()` pattern (`new_env.mdx`)
- **Remove `+` prefix** from CLI overrides — no longer needed after
Hydra removal (`dapo.mdx`, `harbor/index.mdx`)
- **`${oc.env:HOME}` → `~`** for default paths — OmegaConf
interpolations replaced with tilde (`checkpointing.mdx`)
- **Fix stale GitHub URL** pointing to old
`skyrl-train/skyrl_train/workers/worker.py` (`overview.mdx`)
- **Remove Hydra searchpath syntax** and `+`/`++` config prefixes from
Harbor example invocation (`harbor/index.mdx`)

## Test plan

- [x] Verified no remaining `--extra vllm` in
`tests/backends/skyrl_train/`
- [x] Verified no remaining `tests/gpu/` paths in test docstrings
- [x] Verified no remaining `DictConfig`, `@hydra.main`, `+generator.`,
`+trainer.`, `${oc.env:HOME}`, or `skyrl-train/skyrl_train/` in docs

<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1246"
target="_blank">
  <picture>
<source media="(prefers-color-scheme: dark)"
srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img
src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1"
alt="Open with Devin">
  </picture>
</a>
<!-- devin-review-badge-end -->

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
erictang000 added a commit that referenced this pull request Mar 11, 2026
…1317)

After #1187, the `eval_sampling_params.max_generate_length` no longer
was automatically set to `sampling_params.max_generate_length` in the
case that `eval_sampling_params` was not `None`. ([code
link](https://github.com/NovaSky-AI/SkyRL/blob/073b1f3b626b760885b94e6e53e4a8bce5df1a38/skyrl/train/config/config.py#L515))
This was causing example scripts to have matching training behavior but
diverging eval behavior. Manually setting
`eval_sampling_params.max_generate_length` in example scripts should fix
this


Examples of divergence due to unexpected lower
`eval_sampling_params.max_generate_length` cc: @justinvyu
<img width="2086" height="1332" alt="image"
src="https://github.com/user-attachments/assets/0b1cfee9-1dfd-4a25-b85c-cbeb1991e515"
/>
<img width="696" height="664" alt="image"
src="https://github.com/user-attachments/assets/07c0b300-6785-4b44-b46f-f4b840fa2a0c"
/>

<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1317"
target="_blank">
  <picture>
<source media="(prefers-color-scheme: dark)"
srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img
src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1"
alt="Open with Devin">
  </picture>
</a>
<!-- devin-review-badge-end -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants