You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[skyrl-train][dependencies] separate vllm + megatron + bump vllm back to 0.11.0 + pin minimum uv version for extra-build-dependencies (NovaSky-AI#528)
## Separates vllm + megatron deps
After NovaSky-AI#481, there were some megatron flashinfer issues with --extra
vllm. This PR separates out the version of vllm that megatron relies on
from the general vllm version, allowing us to bump vllm to 0.11.0 for
the rest of the training stack.
## Update flash-attn installation
Updates flash-attn installation to use the `extra-build-dependencies`
feature from uv, requiring us to use a uv version >= 0.8.10. This
feature allows us to do the following, removing the need to deal with
markers + extras to specify a url source for each set of extras.
```
[tool.uv.extra-build-dependencies]
flash-attn = [{requirement = "torch", match-runtime = true}]
[tool.uv.extra-build-variables]
flash-attn = { FLASH_ATTENTION_SKIP_CUDA_BUILD = "TRUE"}
[project.optional-dependencies]
vllm = [
"vllm==0.11.0",
"flash-attn==2.8.3",
...
]
mcore = [
"flash-attn==2.7.4.post1"
...
]
```
Copy file name to clipboardExpand all lines: skyrl-train/docs/examples/megatron.rst
+2-7Lines changed: 2 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -104,13 +104,8 @@ After following the installation instructions, set the following environment var
104
104
105
105
Flash Attention
106
106
~~~~~~~~~~~~~~~
107
-
Next, in order to use flash attention with the megatron backend, you must use ``flash_attn`` version ``2.7.4.post1`` or lower for compatibility with ``TransformerEngine==2.5.0``.
108
-
You can replace the ``flash-attn`` wheel in the ``pyproject.toml`` file with the following to use the ``2.7.4.post1`` release, and you can find wheels for other versions `here <https://github.com/Dao-AILab/flash-attention/releases>`_.
In order to use flash attention with the megatron backend, you must use ``flash_attn`` version ``2.7.4.post1`` or lower for compatibility with ``TransformerEngine==2.5.0``.
108
+
This is handled in the ``pyproject.toml`` file for the ``mcore`` extra.
# NOTE (sumanthrh): We explictly use a flashinfer wheel from their index.
75
-
# The wheels on PyPI don't come with pre-compiled kernels and the package will JIT compile them at runtime which is slow.
76
-
# additionally, different inference engines may pin different compatible flashinfer versions, so we provide the option to pin different versions for vllm/sglang
81
+
# We use `flashinfer-jit-cache` to avoid slow JIT compilation on first run.
82
+
# Different inference engines may pin different compatible flashinfer versions, so we provide the option to pin different versions for vllm/sglang
{ url = "https://download.pytorch.org/whl/cu128/flashinfer/flashinfer_python-0.2.6.post1%2Bcu128torch2.7-cp39-abi3-linux_x86_64.whl", marker = "extra == 'sglang' and extra != 'vllm'" }
85
+
{ url = "https://download.pytorch.org/whl/cu128/flashinfer/flashinfer_python-0.2.6.post1%2Bcu128torch2.7-cp39-abi3-linux_x86_64.whl", marker = "extra == 'mcore' and extra != 'vllm'" },
86
+
{ url = "https://download.pytorch.org/whl/cu128/flashinfer/flashinfer_python-0.2.6.post1%2Bcu128torch2.7-cp39-abi3-linux_x86_64.whl", marker = "extra == 'sglang' and extra != 'mcore' and extra != 'vllm'" }
80
87
]
81
88
82
89
[project.optional-dependencies]
@@ -104,14 +111,17 @@ sandboxes = [
104
111
"litellm[proxy]>=1.67.5",
105
112
]
106
113
vllm = [
107
-
"vllm==0.10.1.1",
108
-
"torch==2.7.1",
114
+
"vllm==0.11.0",
115
+
"flash-attn==2.8.3",
116
+
"torch==2.8.0",
109
117
"flashinfer-python",
118
+
"flashinfer-jit-cache",
110
119
"torchvision"
111
120
]
112
121
sglang = [
113
122
"sglang[srt,openai,torch_memory_saver]==0.4.8.post1", # 0.4.9.post1 causes non-colocate weight broadcast to hang
0 commit comments