Skip to content

[tx] Port https://github.com/NovaSky-AI/SkyRL/pull/1008 to skyrl folder#1217

Merged
pcmoritz merged 3 commits intoNovaSky-AI:mainfrom
pcmoritz:port-1008
Feb 25, 2026
Merged

[tx] Port https://github.com/NovaSky-AI/SkyRL/pull/1008 to skyrl folder#1217
pcmoritz merged 3 commits intoNovaSky-AI:mainfrom
pcmoritz:port-1008

Conversation

@pcmoritz
Copy link
Copy Markdown
Collaborator

@pcmoritz pcmoritz commented Feb 25, 2026

See #1008


Open with Devin

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request ports changes to add support for Manifold constrained HyperConnections (mHC). The changes are extensive, introducing a new LoRAConnector layer and integrating it into various parts of the system, including model configurations, model implementations, and checkpointing utilities. The PR also includes a comprehensive set of new tests for the added functionality.

I've identified one critical issue regarding parameter filtering that needs to be addressed, along with a couple of medium-severity suggestions to improve code clarity and maintainability. Overall, the changes are well-structured, but the identified issues should be resolved.

Comment on lines +21 to +25
is_lora = any(name in path for name in ("lora_A", "lora_B"))
is_connector = self.config.mhc_expansion_rate > 1 and any(
name in path for name in ("attn_connector", "mlp_connector")
)
return is_lora or is_connector
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The path argument received by this method is a tuple of nnx.path.PathEntry objects, not strings. Therefore, using the in operator like name in path will not work as intended to check for the presence of a key. This will cause trainable LoRA and connector parameters to not be identified correctly.

You should normalize the path to a tuple of strings (the keys from the PathEntry objects) before checking for containment, similar to how is_connector_path is implemented.

Suggested change
is_lora = any(name in path for name in ("lora_A", "lora_B"))
is_connector = self.config.mhc_expansion_rate > 1 and any(
name in path for name in ("attn_connector", "mlp_connector")
)
return is_lora or is_connector
normalized_path = tuple(p.key for p in path if hasattr(p, "key"))
is_lora = any(name in normalized_path for name in ("lora_A", "lora_B"))
is_connector = self.config.mhc_expansion_rate > 1 and any(
name in normalized_path for name in ("attn_connector", "mlp_connector")
)
return is_lora or is_connector


class LoRAConnector(nnx.Module):
"""
Implementation of Manifold constrained HyperConnections (https://arxiv.org/pdf/2512.24880)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The docstring for LoRAConnector refers to an arXiv paper https://arxiv.org/pdf/2512.24880. This appears to be a placeholder, as the publication date would be in the future (December 2025). Please update this to the correct link when available, or remove it if it's not intended to be a real reference.

Comment on lines +462 to 463
hidden_states, residual_norm = self.attn_connector.pre(hidden_states, self.input_layernorm, adapter_indices)
hidden_states = self.input_layernorm(hidden_states)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The input_layernorm is applied inside attn_connector.pre to determine routing based on the normalized input, and then it's applied again to the aggregated output of pre before passing it to the attention block. While functionally correct, this pattern is a bit hard to follow. Consider adding a comment to clarify the data flow, or refactoring the pre method's signature for better readability.

Comment on lines +315 to 316
hidden_states, residual_norm = self.attn_connector.pre(hidden_states, self.input_layernorm, adapter_indices)
hidden_states = self.input_layernorm(hidden_states)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The input_layernorm is applied inside attn_connector.pre to determine routing based on the normalized input, and then it's applied again to the aggregated output of pre before passing it to the attention block. While functionally correct, this pattern is a bit hard to follow. Consider adding a comment to clarify the data flow, or refactoring the pre method's signature for better readability.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

Open in Devin Review

@pcmoritz pcmoritz merged commit b041937 into NovaSky-AI:main Feb 25, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant