[tx] Add experimental SkyRL-train backend that supports SFT#871
[tx] Add experimental SkyRL-train backend that supports SFT#871pcmoritz merged 51 commits intoNovaSky-AI:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new SkyRL-train backend for supervised training. The changes include updating project dependencies in pyproject.toml and adding the new backend implementation in skyrl-tx/tx/tinker/backends/skyrl_train.py. While this is a good starting point for the new backend, my review has identified several issues that need to be addressed. The most critical issue is in the forward_backward method, which is currently a stub and does not perform a backward pass or return actual losses, preventing any training from occurring. Other significant issues include the use of hardcoded paths and hyperparameters, potentially incorrect token padding, and breaking encapsulation by accessing private members of a library class. Addressing these points will be crucial for the backend to be functional and maintainable.
| ray.get([actor.save_checkpoint.remote(output_path) for actor in self._actor_group._actor_handlers]) | ||
|
|
||
| def load_checkpoint(self, checkpoint_path, model_id: str) -> None: | ||
| if model_id != self._model_id: | ||
| raise ValueError(f"Model {model_id} not found") | ||
| ray.get([actor.load_checkpoint.remote(Path(checkpoint_path)) for actor in self._actor_group._actor_handlers]) |
There was a problem hiding this comment.
Accessing the private member _actor_handlers of PPORayActorGroup breaks encapsulation and makes the code dependent on the internal implementation of the skyrl-train library. This could lead to breakages if the library is updated. It would be more robust to use a public API from PPORayActorGroup for this purpose, or request one if it doesn't exist.
|
@pcmoritz is attempting to deploy a commit to the Tyler's projects Team on Vercel. A member of the Team first needs to authorize it. |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces an experimental SkyRL-train backend for supervised fine-tuning (SFT). The changes include adding the necessary dependencies and configuration in pyproject.toml, integrating the new backend into the engine and API, and implementing the backend logic in a new skyrl_train.py file.
My review focuses on improving the robustness and maintainability of this new backend. I've identified a few issues:
- A configuration issue in
pyproject.tomlthat unnecessarily restricts the backend to a specific Python version. - A potential resource leak in the new backend's
delete_modelimplementation. - Use of a hardcoded padding token ID, which could lead to incorrect behavior with different models.
- Some minor maintainability concerns regarding duplicated dependency configuration and a hardcoded value.
Overall, this is a great addition. Addressing these points will make the new backend more robust and easier to maintain.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces an experimental skyrl_train backend to support Supervised Fine-Tuning (SFT). However, the accompanying API implementation has several critical security flaws that must be addressed. Specifically, the API lacks authentication, exposing sensitive model weights and training controls, the checkpoint download mechanism is vulnerable to Denial of Service (OOM) due to loading large files into memory, and sensitive API keys are exposed via command-line arguments. Beyond these critical security concerns, there are also suggestions to improve maintainability and clarity, including refining a comment in pyproject.toml for git dependency accuracy and resolving naming inconsistencies for action_mask/response_mask within the new backend code.
skyrl-tx/pyproject.toml
Outdated
| # For now, just always use the current main branch, later it will be better to pin it to a released version. For development, you | ||
| # can set it to your own development branch. | ||
| skyrl-train = { git = "https://github.com/pcmoritz/SkyRL", rev = "6d117dd65139fa335d87efa31f056208b123280d", subdirectory = "skyrl-train" } |
There was a problem hiding this comment.
The comment on line 105 is slightly inaccurate. It says "just always use the current main branch", but rev is pinned to a specific commit hash, which is good practice for stability. I suggest updating the comment to reflect this. Also, as the comment mentions, it's a good idea to track an issue to update this to a released version of skyrl-train in the future to improve stability and reproducibility.
| # For now, just always use the current main branch, later it will be better to pin it to a released version. For development, you | |
| # can set it to your own development branch. | |
| skyrl-train = { git = "https://github.com/pcmoritz/SkyRL", rev = "6d117dd65139fa335d87efa31f056208b123280d", subdirectory = "skyrl-train" } | |
| # For now, we pin to a specific commit on the main branch. | |
| # TODO: Update to a released version of skyrl-train once available. | |
| # For development, you can set it to your own development branch. | |
| skyrl-train = { git = "https://github.com/pcmoritz/SkyRL", rev = "3cf9c781c5e733a5bb035efc6b4d2f83cac7cc8c", subdirectory = "skyrl-train" } |
There was a problem hiding this comment.
I'll change the branch before merging
| sequences, attention_masks, loss_masks, response_masks = [], [], [], [] | ||
|
|
||
| for seq, weights in zip(full_sequences, prepared_batch.all_token_weights): | ||
| pad_len = max_seq_len - len(seq) | ||
| sequences.append([self._tokenizer.pad_token_id] * pad_len + list(seq)) | ||
| attention_masks.append([0] * pad_len + [1] * len(seq)) | ||
| action_pad = max_response_len - len(weights) | ||
| loss_masks.append([0.0] * action_pad + [float(w) for w in weights]) | ||
| response_masks.append([0] * action_pad + [1] * len(weights)) | ||
|
|
||
| sequences_tensor = torch.tensor(sequences, dtype=torch.long) | ||
| attention_mask_tensor = torch.tensor(attention_masks, dtype=torch.long) | ||
| loss_mask_tensor = torch.tensor(loss_masks, dtype=torch.float32) | ||
| response_mask_tensor = torch.tensor(response_masks, dtype=torch.long) | ||
|
|
||
| batch = TrainingInputBatch( | ||
| { | ||
| "sequences": sequences_tensor, | ||
| "attention_mask": attention_mask_tensor, | ||
| "loss_mask": loss_mask_tensor, | ||
| "response_mask": response_mask_tensor, | ||
| } |
There was a problem hiding this comment.
There's a naming inconsistency for the mask. Here, you are creating a response_mask, but in skyrl-train/skyrl_train/workers/worker.py the code expects an action_mask. This inconsistency can be confusing. For better clarity and maintainability, I recommend using a consistent name across the components. Since worker.py is being updated to use action_mask, it seems to be the intended name.
This change would also require updating skyrl-train/training_batch.py to use action_mask in the TrainingInput TypedDict.
sequences, attention_masks, loss_masks, action_masks = [], [], [], []
for seq, weights in zip(full_sequences, prepared_batch.all_token_weights):
pad_len = max_seq_len - len(seq)
sequences.append([self._tokenizer.pad_token_id] * pad_len + list(seq))
attention_masks.append([0] * pad_len + [1] * len(seq))
action_pad = max_response_len - len(weights)
loss_masks.append([0.0] * action_pad + [float(w) for w in weights])
action_masks.append([0] * action_pad + [1] * len(weights))
sequences_tensor = torch.tensor(sequences, dtype=torch.long)
attention_mask_tensor = torch.tensor(attention_masks, dtype=torch.long)
loss_mask_tensor = torch.tensor(loss_masks, dtype=torch.float32)
action_mask_tensor = torch.tensor(action_masks, dtype=torch.long)
batch = TrainingInputBatch(
{
"sequences": sequences_tensor,
"attention_mask": attention_mask_tensor,
"loss_mask": loss_mask_tensor,
"action_mask": action_mask_tensor,
}
erictang000
left a comment
There was a problem hiding this comment.
looks good to me, just a couple minor nits (can be fixed later too)
| num_gpus = self._cfg.trainer.placement.policy_num_gpus_per_node | ||
|
|
||
| pg = placement_group([{"GPU": num_gpus, "CPU": 1}], strategy="PACK") | ||
| get_ray_pg_ready_with_timeout(pg, timeout=30) |
There was a problem hiding this comment.
small nit - this is sometimes not long enough since installing dependencies on all ray workers the first time can be expensive
we have this env var:
from skyrl_train.env_vars import SKYRL_RAY_PG_TIMEOUT_IN_Sthat we set to 180 by default
| num_gpus_per_node=num_gpus, | ||
| ray_actor_type=PolicyWorker, | ||
| pg=pg, | ||
| num_gpus_per_actor=0.2 if pg else 1, |
There was a problem hiding this comment.
this can probably just be hard coded to 1 for now since the pg is always created above, but should revisit if colocation is going to be supported
The engine can e.g. by started with
and then you can e.g. run