Conversation
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces typed dataclasses for configuration, moving away from generic DictConfig objects, which is a significant step towards improving type safety, code clarity, and maintainability. However, it introduces a critical security regression: sensitive information from environment variables (such as API keys) may now be leaked into logs. This occurs because the new configuration system resolves interpolations upfront and logs these resolved values, whereas the previous implementation logged the unresolved configuration. Additionally, a potential bug in fsdp_utils.py and areas for future configuration management improvements have been identified.
| def from_dict_config(cls, cfg: DictConfig) -> "BaseConfig": | ||
| """Construct a typed BaseConfig from a Hydra DictConfig.""" | ||
| raw = OmegaConf.to_container(cfg, resolve=True) | ||
| return build_nested_dataclass(cls, raw) |
There was a problem hiding this comment.
The from_dict_config method resolves configuration interpolations by calling OmegaConf.to_container(cfg, resolve=True). This means that any sensitive information stored in environment variables and referenced in the configuration (e.g., ${env:WANDB_API_KEY}) will be expanded to plaintext when the configuration dataclass is initialized. If these dataclasses are subsequently logged or serialized, the sensitive information will be leaked.
Recommendation: Consider delaying resolution until the values are actually needed for execution, or ensure that sensitive fields are masked before logging.
| def get_cfg_as_str(cfg: Union[SkyRLConfig, DictConfig]) -> str: | ||
| return pprint.pformat(get_config_as_dict(cfg)) | ||
|
|
There was a problem hiding this comment.
The get_cfg_as_str method now logs the resolved configuration by using get_config_as_dict(cfg), which contains resolved interpolations. This is a regression from the previous implementation using OmegaConf.to_yaml(dict_cfg), which did not resolve interpolations by default. This change leads to sensitive information (like API keys) being written to logs in plaintext.
Recommendation: Use a non-resolving method for logging the configuration, or implement masking for sensitive keys.
| @dataclass | ||
| class GSM8kLLMJudgeEnvConfig(BaseConfig): | ||
| model: str = "gpt-4o-mini" | ||
| base_url: Optional[str] = None |
There was a problem hiding this comment.
I see that GSM8kLLMJudgeEnvConfig is redefined here, and the comment explains why. While this might be a necessary workaround for now to avoid circular dependencies with the examples directory, it introduces code duplication which can be a maintenance concern. The definition in skyrl-train/examples/llm_as_a_judge/llm_judge_env.py also doesn't inherit from BaseConfig. It would be good to consolidate these into a single definition in a shared location in a future refactoring, perhaps by moving the example environment into the core library if it's meant to be generally available.
| # Create a copy of the config and apply overrides | ||
| loss_config = OmegaConf.merge(loss_config, OmegaConf.create(loss_fn_config)) | ||
| # NOTE (sumanthrh): This doesn't work like OmegaConf for nested dataclasses - only top level fields are merged | ||
| # TODO: Fix nested overrides | ||
| if isinstance(loss_config, DictConfig): | ||
| loss_config = OmegaConf.merge(loss_config, OmegaConf.create(loss_fn_config)) | ||
| else: | ||
| assert isinstance(loss_config, AlgorithmConfig) | ||
| new_loss_config_as_dict = asdict(loss_config) | ||
| new_loss_config_as_dict.update(loss_fn_config) | ||
| loss_config = AlgorithmConfig(**new_loss_config_as_dict) |
There was a problem hiding this comment.
The comment here correctly points out that the current implementation for updating the dataclass config performs a shallow merge. This could lead to unexpected behavior if a user tries to override a nested configuration object (e.g., kl_ctrl), as the entire nested object would be replaced rather than merged.
To address the TODO, you might consider implementing a recursive update function for dataclasses to mimic the deep merge behavior of OmegaConf.merge. This would make the configuration overrides more intuitive and robust.
There was a problem hiding this comment.
Code Review
This pull request introduces a significant and valuable refactoring by defining typed dataclasses for configurations, moving away from DictConfig. This greatly improves code clarity, maintainability, and developer experience by enabling static analysis and auto-completion. The strategy of preserving backward compatibility and migrating tests first is excellent. The implementation of the configuration dataclasses and the conversion logic is well-executed. I've identified a few minor areas for improvement, mainly around ensuring consistent handling of both DictConfig and the new dataclasses during the transition period to prevent potential runtime errors. Overall, this is a high-quality contribution that modernizes the codebase.
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a significant and valuable refactoring by replacing YAML-based DictConfig with typed Python dataclasses for configuration. This greatly improves type safety, code clarity, and developer experience. The migration is handled thoughtfully, with backward compatibility in mind by using Union types during the transition. The changes are consistently applied across the codebase, including core logic, examples, and tests. The addition of unit tests for the new config system is also a great touch.
I've found a few areas for improvement, mainly related to potential issues with config merging and a code smell due to a duplicated config definition. Please see my detailed comments.
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
What does this PR do?
1/2 PRs for introducing pythonic configurations in SkyRL-train.
A short design doc is here: https://docs.google.com/document/d/1YzLPCmVZpBauRyh4l-17AUhXiXALmOX-N1gXZkA9slA
This is going to be lightweight in terms of configuration hierarchy changes to keep user experience similar.
Summary:
PolicyConfig,ModelConfig,AlgorithmConfig,PlacementConfig, etc.DictConfig- only migrate tests to new dataclasses.