[skyrl-train][refactor] Inference Server Refactor -- RemoteInferenceClient 2/N#904
Conversation
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
There was a problem hiding this comment.
Code Review
This pull request refactors the inference engine by replacing the InferenceEngineInterface with a new RemoteInferenceClient for HTTP-based inference, introducing new modules for common utilities, protocols, server groups, and a robust router. While the changes are well-structured and include comprehensive unit and GPU CI tests, it introduces significant security risks. The most critical issue is the use of pickle.loads in the vLLM worker extension, which provides a direct path to Remote Code Execution (RCE). Additionally, the lack of authentication on sensitive control plane and weight synchronization endpoints in both the router and the server actor exposes the cluster to unauthorized control and potential weight hijacking. These security concerns must be addressed before deployment in untrusted network environments.
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
|
@kouroshHakha is attempting to deploy a commit to the Tyler's projects Team on Vercel. A member of the Team first needs to authorize it. |
CharlieFRuan
left a comment
There was a problem hiding this comment.
Left several comments, thank you!
skyrl-train/skyrl_train/inference_servers/remote_inference_client.py
Outdated
Show resolved
Hide resolved
skyrl-train/skyrl_train/inference_servers/remote_inference_client.py
Outdated
Show resolved
Hide resolved
gpu cpi failed after #904 got merged. This fixes that. Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
gpu ci failed after #904 got merged. This fixes that. Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Summary: Adds
RemoteInferenceClient, a lightweight, fully serializable HTTP client that wraps inference server APIs. This client replaces the oldInferenceEngineInterfacefor HTTP-based inference and can work with any HTTP-compatible inference backend (vLLM, sglang-router, Ray Serve LLM, etc.).Architecture:
This separation allows using external routers (vllm-router, sglang-router) that only handle data plane.
Key Features:
proxy_urlfor data plane (router),server_urlsfor control plane (fan-out to backends)generate(),chat_completion(),completion(),tokenize(),detokenize()pause(),resume(),sleep(),wake_up(),reset_prefix_cache()init_weight_transfer(),update_weights(),finalize_weight_update()stop_reason="abort"during weight syncComparison vs
InferenceEngineInterface+InferenceEngineClient:/tokenizeendpoint insteadX-Session-IDheaderget_world_size()vs separatetp_size(),pp_size(),dp_size()Files Added:
skyrl_train/inference_servers/remote_inference_client.py- The client implementationtests/cpu/inference_servers/test_remote_inference_client.py- Unit tests