Skip to content

[skyrl-train][refactor] Inference Server Refactor -- RemoteInferenceClient 2/N#904

Merged
CharlieFRuan merged 24 commits intoNovaSky-AI:mainfrom
kouroshHakha:kh/inference-2
Jan 30, 2026
Merged

[skyrl-train][refactor] Inference Server Refactor -- RemoteInferenceClient 2/N#904
CharlieFRuan merged 24 commits intoNovaSky-AI:mainfrom
kouroshHakha:kh/inference-2

Conversation

@kouroshHakha
Copy link
Collaborator

@kouroshHakha kouroshHakha commented Jan 20, 2026

Summary: Adds RemoteInferenceClient, a lightweight, fully serializable HTTP client that wraps inference server APIs. This client replaces the old InferenceEngineInterface for HTTP-based inference and can work with any HTTP-compatible inference backend (vLLM, sglang-router, Ray Serve LLM, etc.).

Architecture:

  • Router (InferenceRouter): Data plane only - routes requests to ONE backend (session-aware or round-robin)
  • Client (RemoteInferenceClient): Fully responsible for control plane fan-out to all backends

This separation allows using external routers (vllm-router, sglang-router) that only handle data plane.

Key Features:

  • Serializable: Can be pickled and passed between Ray actors/processes
  • Two URL types: proxy_url for data plane (router), server_urls for control plane (fan-out to backends)
  • Data plane: generate(), chat_completion(), completion(), tokenize(), detokenize()
  • Control plane (fan-out): pause(), resume(), sleep(), wake_up(), reset_prefix_cache()
  • Weight sync (fan-out): init_weight_transfer(), update_weights(), finalize_weight_update()
  • PauseMode enum: Forward-compatible with vLLM RFC #32103 pause modes
  • Built-in retry on abort: Handles stop_reason="abort" during weight sync

Comparison vs InferenceEngineInterface + InferenceEngineClient:

  • Serializable - Just URLs, no Ray actors/tokenizers/thread events
  • No local tokenizer - Uses /tokenize endpoint instead
  • Server-side routing - Router handles session routing via X-Session-ID header
  • Simplified parallelism - Single get_world_size() vs separate tp_size(), pp_size(), dp_size()
  • No ABC hierarchy - Simple dataclass with async methods
  • Backend-agnostic - Works with any HTTP server (vLLM, sglang, Ray Serve LLM)

Files Added:

  • skyrl_train/inference_servers/remote_inference_client.py - The client implementation
  • tests/cpu/inference_servers/test_remote_inference_client.py - Unit tests

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
@kouroshHakha kouroshHakha changed the title PR 1/N: Inference Server Refactor -- RemoteInferenceClient [skyrl-train][refactor] 2/N Inference Server Refactor -- RemoteInferenceClient Jan 20, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the inference engine by replacing the InferenceEngineInterface with a new RemoteInferenceClient for HTTP-based inference, introducing new modules for common utilities, protocols, server groups, and a robust router. While the changes are well-structured and include comprehensive unit and GPU CI tests, it introduces significant security risks. The most critical issue is the use of pickle.loads in the vLLM worker extension, which provides a direct path to Remote Code Execution (RCE). Additionally, the lack of authentication on sensitive control plane and weight synchronization endpoints in both the router and the server actor exposes the cluster to unauthorized control and potential weight hijacking. These security concerns must be addressed before deployment in untrusted network environments.

@CharlieFRuan CharlieFRuan self-assigned this Jan 21, 2026
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
@vercel
Copy link

vercel bot commented Jan 27, 2026

@kouroshHakha is attempting to deploy a commit to the Tyler's projects Team on Vercel.

A member of the Team first needs to authorize it.

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
@kouroshHakha kouroshHakha changed the title [skyrl-train][refactor] 2/N Inference Server Refactor -- RemoteInferenceClient [skyrl-train][refactor] Inference Server Refactor -- RemoteInferenceClient 2/N Jan 27, 2026
Copy link
Member

@CharlieFRuan CharlieFRuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left several comments, thank you!

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Copy link
Member

@CharlieFRuan CharlieFRuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@CharlieFRuan CharlieFRuan merged commit cce41f6 into NovaSky-AI:main Jan 30, 2026
3 of 4 checks passed
erictang000 pushed a commit that referenced this pull request Jan 30, 2026
gpu cpi failed after #904 got
merged. This fixes that.

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
@kouroshHakha kouroshHakha mentioned this pull request Feb 2, 2026
CharlieFRuan pushed a commit that referenced this pull request Feb 2, 2026
gpu ci failed after #904 got
merged. This fixes that.

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants