You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# NOTE (sumanthrh): We explictly use a flashinfer wheel from their index.
57
+
# The wheels on PyPI don't come with pre-compiled kernels and the package will JIT compile them at runtime which is slow.
58
+
# additionally, different inference engines may pin different compatible flashinfer versions, so we provide the option to pin different versions for vllm/sglang
"sglang[srt,openai,torch_memory_saver]==0.4.8.post1", # 0.4.9.post1 causes non-colocate weight broadcast to hang
85
-
# The version is pinned to 0.2.5 because sglang requires this
86
-
# NOTE (sumanthrh): This can be made a common dependency, but then different inference engines can pin different compatible flashinfer versions and it might quickly break.
0 commit comments