[skyrl-train][refactor] Inference Server Refactor -- RemoteInferenceClient 2/N by kouroshHakha · Pull Request #904 · NovaSky-AI/SkyRL

kouroshHakha · 2026-01-20T22:03:58Z

Summary: Adds RemoteInferenceClient, a lightweight, fully serializable HTTP client that wraps inference server APIs. This client replaces the old InferenceEngineInterface for HTTP-based inference and can work with any HTTP-compatible inference backend (vLLM, sglang-router, Ray Serve LLM, etc.).

Architecture:

Router (InferenceRouter): Data plane only - routes requests to ONE backend (session-aware or round-robin)
Client (RemoteInferenceClient): Fully responsible for control plane fan-out to all backends

This separation allows using external routers (vllm-router, sglang-router) that only handle data plane.

Key Features:

Serializable: Can be pickled and passed between Ray actors/processes
Two URL types: proxy_url for data plane (router), server_urls for control plane (fan-out to backends)
Data plane: generate(), chat_completion(), completion(), tokenize(), detokenize()
Control plane (fan-out): pause(), resume(), sleep(), wake_up(), reset_prefix_cache()
Weight sync (fan-out): init_weight_transfer(), update_weights(), finalize_weight_update()
PauseMode enum: Forward-compatible with vLLM RFC #32103 pause modes
Built-in retry on abort: Handles stop_reason="abort" during weight sync

Comparison vs InferenceEngineInterface + InferenceEngineClient:

Serializable - Just URLs, no Ray actors/tokenizers/thread events
No local tokenizer - Uses /tokenize endpoint instead
Server-side routing - Router handles session routing via X-Session-ID header
Simplified parallelism - Single get_world_size() vs separate tp_size(), pp_size(), dp_size()
No ABC hierarchy - Simple dataclass with async methods
Backend-agnostic - Works with any HTTP server (vLLM, sglang, Ray Serve LLM)

Files Added:

skyrl_train/inference_servers/remote_inference_client.py - The client implementation
tests/cpu/inference_servers/test_remote_inference_client.py - Unit tests

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

gemini-code-assist

Code Review

This pull request refactors the inference engine by replacing the InferenceEngineInterface with a new RemoteInferenceClient for HTTP-based inference, introducing new modules for common utilities, protocols, server groups, and a robust router. While the changes are well-structured and include comprehensive unit and GPU CI tests, it introduces significant security risks. The most critical issue is the use of pickle.loads in the vLLM worker extension, which provides a direct path to Remote Code Execution (RCE). Additionally, the lack of authentication on sensitive control plane and weight synchronization endpoints in both the router and the server actor exposes the cluster to unauthorized control and potential weight hijacking. These security concerns must be addressed before deployment in untrusted network environments.

skyrl-train/skyrl_train/inference_servers/vllm_worker.py

skyrl-train/skyrl_train/inference_servers/router.py

skyrl-train/skyrl_train/inference_servers/vllm_server_actor.py

skyrl-train/skyrl_train/inference_servers/vllm_worker.py

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

vercel · 2026-01-27T05:53:15Z

@kouroshHakha is attempting to deploy a commit to the Tyler's projects Team on Vercel.

A member of the Team first needs to authorize it.

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

CharlieFRuan

Left several comments, thank you!

skyrl-train/skyrl_train/inference_servers/router.py

skyrl-train/skyrl_train/inference_servers/remote_inference_client.py

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

CharlieFRuan

Thank you!

gpu cpi failed after #904 got merged. This fixes that. Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

gpu ci failed after #904 got merged. This fixes that. Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

kouroshHakha added 20 commits January 18, 2026 21:23

v0

40e538b

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

common

a52b0dc

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

vllm_server_actor

6d68e2f

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

pool

d0d2990

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

d20b4bd

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

group

07f3d9f

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

1a48e61

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

e290f4b

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

tests

509538f

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Wip

555082b

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

afcc8de

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Wip

058cb95

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

7c8fc0b

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

lint

68dc4ed

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

lint

dce17d2

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

gemini fback

22c12ad

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

05bfc92

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

eca0e3d

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

bdd1d8a

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

9bf4173

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

kouroshHakha changed the title ~~PR 1/N: Inference Server Refactor -- RemoteInferenceClient~~ [skyrl-train][refactor] 2/N Inference Server Refactor -- RemoteInferenceClient Jan 20, 2026

gemini-code-assist bot reviewed Jan 20, 2026

View reviewed changes

CharlieFRuan self-assigned this Jan 21, 2026

Merge branch 'main' into kh/inference-2

ded5cf2

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

e102ebd

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

kouroshHakha changed the title ~~[skyrl-train][refactor] 2/N Inference Server Refactor -- RemoteInferenceClient~~ [skyrl-train][refactor] Inference Server Refactor -- RemoteInferenceClient 2/N Jan 27, 2026

CharlieFRuan reviewed Jan 29, 2026

View reviewed changes

wip

6da662e

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

kouroshHakha requested a review from CharlieFRuan January 29, 2026 23:50

Wip

43793df

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

CharlieFRuan approved these changes Jan 30, 2026

View reviewed changes

CharlieFRuan merged commit cce41f6 into NovaSky-AI:main Jan 30, 2026
3 of 4 checks passed

kouroshHakha mentioned this pull request Jan 30, 2026

Fix GPU tests after latest pushes on inference revamp #995

Merged

erictang000 pushed a commit that referenced this pull request Jan 30, 2026

Fix GPU tests after latest pushes on inference revamp (#995)

1483bd5

gpu cpi failed after #904 got merged. This fixes that. Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

kouroshHakha mentioned this pull request Feb 2, 2026

fix gpu ci #1007

Merged

CharlieFRuan pushed a commit that referenced this pull request Feb 2, 2026

fix gpu ci (#1007)

0d2dd5c

gpu ci failed after #904 got merged. This fixes that. Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[skyrl-train][refactor] Inference Server Refactor -- RemoteInferenceClient 2/N#904

[skyrl-train][refactor] Inference Server Refactor -- RemoteInferenceClient 2/N#904
CharlieFRuan merged 24 commits intoNovaSky-AI:mainfrom
kouroshHakha:kh/inference-2

kouroshHakha commented Jan 20, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vercel bot commented Jan 27, 2026

Uh oh!

CharlieFRuan left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CharlieFRuan left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kouroshHakha commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vercel bot commented Jan 27, 2026

Uh oh!

CharlieFRuan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CharlieFRuan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kouroshHakha commented Jan 20, 2026 •

edited

Loading