[tx] Add experimental SkyRL-train backend that supports SFT by pcmoritz · Pull Request #871 · NovaSky-AI/SkyRL

pcmoritz · 2026-01-13T19:44:29Z

The engine can e.g. by started with

uv run --extra gpu --extra tinker -m tx.tinker.api --base-model Qwen/Qwen3-4B --backend "skyrl_train"

and then you can e.g. run

uv run --with wandb --with tinker sl_loop.py base_url=http://localhost:8000 model_name=Qwen/Qwen3-4B lora_rank=1

gemini-code-assist

Code Review

This pull request introduces a new SkyRL-train backend for supervised training. The changes include updating project dependencies in pyproject.toml and adding the new backend implementation in skyrl-tx/tx/tinker/backends/skyrl_train.py. While this is a good starting point for the new backend, my review has identified several issues that need to be addressed. The most critical issue is in the forward_backward method, which is currently a stub and does not perform a backward pass or return actual losses, preventing any training from occurring. Other significant issues include the use of hardcoded paths and hyperparameters, potentially incorrect token padding, and breaking encapsulation by accessing private members of a library class. Addressing these points will be crucial for the backend to be functional and maintainable.

skyrl-tx/tx/tinker/backends/skyrl_train.py

skyrl-tx/pyproject.toml

skyrl-tx/tx/tinker/backends/skyrl_train.py

gemini-code-assist · 2026-01-13T19:49:59Z

skyrl-tx/tx/tinker/backends/skyrl_train.py

+        ray.get([actor.save_checkpoint.remote(output_path) for actor in self._actor_group._actor_handlers])
+
+    def load_checkpoint(self, checkpoint_path, model_id: str) -> None:
+        if model_id != self._model_id:
+            raise ValueError(f"Model {model_id} not found")
+        ray.get([actor.load_checkpoint.remote(Path(checkpoint_path)) for actor in self._actor_group._actor_handlers])


Accessing the private member _actor_handlers of PPORayActorGroup breaks encapsulation and makes the code dependent on the internal implementation of the skyrl-train library. This could lead to breakages if the library is updated. It would be more robust to use a public API from PPORayActorGroup for this purpose, or request one if it doesn't exist.

vercel · 2026-01-25T09:11:20Z

@pcmoritz is attempting to deploy a commit to the Tyler's projects Team on Vercel.

A member of the Team first needs to authorize it.

pcmoritz · 2026-01-27T02:13:57Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces an experimental SkyRL-train backend for supervised fine-tuning (SFT). The changes include adding the necessary dependencies and configuration in pyproject.toml, integrating the new backend into the engine and API, and implementing the backend logic in a new skyrl_train.py file.

My review focuses on improving the robustness and maintainability of this new backend. I've identified a few issues:

A configuration issue in pyproject.toml that unnecessarily restricts the backend to a specific Python version.
A potential resource leak in the new backend's delete_model implementation.
Use of a hardcoded padding token ID, which could lead to incorrect behavior with different models.
Some minor maintainability concerns regarding duplicated dependency configuration and a hardcoded value.

Overall, this is a great addition. Addressing these points will make the new backend more robust and easier to maintain.

skyrl-tx/pyproject.toml

skyrl-tx/tx/tinker/backends/skyrl_train.py

skyrl-tx/pyproject.toml

skyrl-tx/tx/tinker/backends/skyrl_train.py

pcmoritz · 2026-01-27T21:56:46Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces an experimental skyrl_train backend to support Supervised Fine-Tuning (SFT). However, the accompanying API implementation has several critical security flaws that must be addressed. Specifically, the API lacks authentication, exposing sensitive model weights and training controls, the checkpoint download mechanism is vulnerable to Denial of Service (OOM) due to loading large files into memory, and sensitive API keys are exposed via command-line arguments. Beyond these critical security concerns, there are also suggestions to improve maintainability and clarity, including refining a comment in pyproject.toml for git dependency accuracy and resolving naming inconsistencies for action_mask/response_mask within the new backend code.

gemini-code-assist · 2026-01-27T22:05:35Z

skyrl-tx/pyproject.toml

+# For now, just always use the current main branch, later it will be better to pin it to a released version. For development, you
+# can set it to your own development branch.
+skyrl-train = { git = "https://github.com/pcmoritz/SkyRL", rev = "6d117dd65139fa335d87efa31f056208b123280d", subdirectory = "skyrl-train" }


The comment on line 105 is slightly inaccurate. It says "just always use the current main branch", but rev is pinned to a specific commit hash, which is good practice for stability. I suggest updating the comment to reflect this. Also, as the comment mentions, it's a good idea to track an issue to update this to a released version of skyrl-train in the future to improve stability and reproducibility.

Suggested change

# For now, just always use the current main branch, later it will be better to pin it to a released version. For development, you

# can set it to your own development branch.

skyrl-train = { git = "https://github.com/pcmoritz/SkyRL", rev = "6d117dd65139fa335d87efa31f056208b123280d", subdirectory = "skyrl-train" }

# For now, we pin to a specific commit on the main branch.

# TODO: Update to a released version of skyrl-train once available.

# For development, you can set it to your own development branch.

skyrl-train = { git = "https://github.com/pcmoritz/SkyRL", rev = "3cf9c781c5e733a5bb035efc6b4d2f83cac7cc8c", subdirectory = "skyrl-train" }

I'll change the branch before merging

gemini-code-assist · 2026-01-27T22:05:35Z

skyrl-tx/tx/tinker/backends/skyrl_train.py

+        sequences, attention_masks, loss_masks, response_masks = [], [], [], []
+
+        for seq, weights in zip(full_sequences, prepared_batch.all_token_weights):
+            pad_len = max_seq_len - len(seq)
+            sequences.append([self._tokenizer.pad_token_id] * pad_len + list(seq))
+            attention_masks.append([0] * pad_len + [1] * len(seq))
+            action_pad = max_response_len - len(weights)
+            loss_masks.append([0.0] * action_pad + [float(w) for w in weights])
+            response_masks.append([0] * action_pad + [1] * len(weights))
+
+        sequences_tensor = torch.tensor(sequences, dtype=torch.long)
+        attention_mask_tensor = torch.tensor(attention_masks, dtype=torch.long)
+        loss_mask_tensor = torch.tensor(loss_masks, dtype=torch.float32)
+        response_mask_tensor = torch.tensor(response_masks, dtype=torch.long)
+
+        batch = TrainingInputBatch(
+            {
+                "sequences": sequences_tensor,
+                "attention_mask": attention_mask_tensor,
+                "loss_mask": loss_mask_tensor,
+                "response_mask": response_mask_tensor,
+            }


There's a naming inconsistency for the mask. Here, you are creating a response_mask, but in skyrl-train/skyrl_train/workers/worker.py the code expects an action_mask. This inconsistency can be confusing. For better clarity and maintainability, I recommend using a consistent name across the components. Since worker.py is being updated to use action_mask, it seems to be the intended name.

This change would also require updating skyrl-train/training_batch.py to use action_mask in the TrainingInput TypedDict.

sequences, attention_masks, loss_masks, action_masks = [], [], [], [] for seq, weights in zip(full_sequences, prepared_batch.all_token_weights): pad_len = max_seq_len - len(seq) sequences.append([self._tokenizer.pad_token_id] * pad_len + list(seq)) attention_masks.append([0] * pad_len + [1] * len(seq)) action_pad = max_response_len - len(weights) loss_masks.append([0.0] * action_pad + [float(w) for w in weights]) action_masks.append([0] * action_pad + [1] * len(weights)) sequences_tensor = torch.tensor(sequences, dtype=torch.long) attention_mask_tensor = torch.tensor(attention_masks, dtype=torch.long) loss_mask_tensor = torch.tensor(loss_masks, dtype=torch.float32) action_mask_tensor = torch.tensor(action_masks, dtype=torch.long) batch = TrainingInputBatch( { "sequences": sequences_tensor, "attention_mask": attention_mask_tensor, "loss_mask": loss_mask_tensor, "action_mask": action_mask_tensor, }

erictang000

looks good to me, just a couple minor nits (can be fixed later too)

erictang000 · 2026-01-27T22:07:00Z

skyrl-tx/tx/tinker/backends/skyrl_train.py

+        num_gpus = self._cfg.trainer.placement.policy_num_gpus_per_node
+
+        pg = placement_group([{"GPU": num_gpus, "CPU": 1}], strategy="PACK")
+        get_ray_pg_ready_with_timeout(pg, timeout=30)


small nit - this is sometimes not long enough since installing dependencies on all ray workers the first time can be expensive

we have this env var:

from skyrl_train.env_vars import SKYRL_RAY_PG_TIMEOUT_IN_S

that we set to 180 by default

erictang000 · 2026-01-27T22:08:35Z

skyrl-tx/tx/tinker/backends/skyrl_train.py

+            num_gpus_per_node=num_gpus,
+            ray_actor_type=PolicyWorker,
+            pg=pg,
+            num_gpus_per_actor=0.2 if pg else 1,


this can probably just be hard coded to 1 for now since the pg is always created above, but should revisit if colocation is going to be supported

pcmoritz added 3 commits January 13, 2026 10:44

[tx] Add SkyRL-train backend

00f821e

update

8d3e210

build requirements

1d4914d

pcmoritz added the tx label Jan 13, 2026

gemini-code-assist bot reviewed Jan 13, 2026

View reviewed changes

pcmoritz added 2 commits January 13, 2026 15:14

update

40b2fcf

update

b3f5d34

tyler-griggs mentioned this pull request Jan 23, 2026

Add SkyRLInferenceClient adapter and Tinker API tests (Stage 2) #929

Merged

5 tasks

pcmoritz added 12 commits January 24, 2026 13:45

Merge branch 'main' into tx-skyrl-train-backend

8d8c2f1

update

75aa030

update

a677873

update

9415999

update

275cb57

update

534c0bb

update

ec2cc81

update

f540c8e

update

64d7b01

update

34da65c

update

f7557b4

update

877b3e5

pcmoritz added 5 commits January 25, 2026 09:28

update

4fe7a67

update

326280b

update

6242d9f

update

ce20462

update

1066fcf

pcmoritz changed the title ~~[tx] [WIP] Add SkyRL-train backend~~ [tx] Add SkyRL-train backend Jan 25, 2026

pcmoritz added 3 commits January 25, 2026 02:02

black

9461891

update

9f7b31c

update

69d3db9

pcmoritz added 4 commits January 27, 2026 01:51

update

2eca5bf

update

234bb5d

fix CI

695dbfb

lint

ad46cff

gemini-code-assist bot reviewed Jan 27, 2026

View reviewed changes

pcmoritz added 11 commits January 27, 2026 02:19

update

5ea0d82

update

bdb250b

update

089204d

update

3cf9c78

update

171fbaa

update

6a436c9

address comments

fce28b6

update

d4f6f94

update

e217606

update

b9a7cc7

update config

6d117dd

use latest

4ab28a4

gemini-code-assist bot reviewed Jan 27, 2026

View reviewed changes

erictang000 approved these changes Jan 27, 2026

View reviewed changes

pcmoritz added 7 commits January 27, 2026 14:32

updates

8eb3fca

update

d1756a3

update

3ae71e5

fix

717ac68

update

756be47

update

17e3777

update

36f5bfa

pcmoritz merged commit 642a6a4 into NovaSky-AI:main Jan 28, 2026
4 of 6 checks passed

tyler-griggs mentioned this pull request Jan 30, 2026

Add checkpointing support for Tinker SkyRL backend #992

Merged

-# For now, just always use the current main branch, later it will be better to pin it to a released version. For development, you
-# can set it to your own development branch.
-skyrl-train = { git = "https://github.com/pcmoritz/SkyRL", rev = "6d117dd65139fa335d87efa31f056208b123280d", subdirectory = "skyrl-train" }
+# For now, we pin to a specific commit on the main branch.
+# TODO: Update to a released version of skyrl-train once available.
+# For development, you can set it to your own development branch.
+skyrl-train = { git = "https://github.com/pcmoritz/SkyRL", rev = "3cf9c781c5e733a5bb035efc6b4d2f83cac7cc8c", subdirectory = "skyrl-train" }

Conversation

pcmoritz commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

vercel bot commented Jan 25, 2026

Uh oh!

pcmoritz commented Jan 27, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pcmoritz commented Jan 27, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

pcmoritz Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

erictang000 left a comment

Choose a reason for hiding this comment

Uh oh!

erictang000 Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

erictang000 Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pcmoritz commented Jan 13, 2026 •

edited

Loading