[DO NOT MERGE YET] DGX Spark integrations by rltakashige · Pull Request #1705 · exo-explore/exo

rltakashige · 2026-03-11T18:34:49Z

Motivation

TODO BEFORE MERGE:

test for no GLM regression.
nix run exo to just work. Or at least nix run exo-cuda

Part 1 of getting EXO to run well on the DGX Spark and other Linux CUDA machines outside of private betas.
nix run .#exo-cuda
or just sync-cuda && uv run exo

Changes

Cache a nix build of the dependencies
Add setup scripts
Make Linux look nice on the dashboard

Caveats

The connectivity between a Mac and Spark is still super dodgy.

src/exo/worker/engines/mlx/generator/batch_generate.py

src/exo/vllm_patches/growable_cache.py

Evanev7 · 2026-03-17T12:43:31Z

src/exo/worker/engines/vllm/prompt_format.py

+def format_vllm_prompt(
+    engine: LLMEngine, params: TextGenerationTaskParams
+) -> tuple[list[int], str, int]:
+    tokenizer = TokenizerWrapper(engine.get_tokenizer())


we should have our own wrapper so we can just use the tokenizers Tokenizer

Probably a TODO for a separate PR. Added it as a comment.

src/exo/worker/engines/vllm/vllm_generator.py

src/exo/worker/engines/vllm/growable_cache.py

src/exo/worker/engines/vllm/vllm_generator.py

Evanev7 · 2026-03-17T13:10:43Z

src/exo/worker/engines/vllm/vllm_generator.py

+        attention_backend="TRITON_ATTN",
+        enforce_eager=True,


maybe default

src/exo/worker/engines/vllm/growable_cache.py

src/exo/worker/engines/vllm/vllm_generator.py

src/exo/worker/engines/kv_cache.py

src/exo/worker/engines/vllm/kv_cache.py

src/exo/worker/runner/llm_inference/batch_generator.py

Evanev7 · 2026-03-17T13:25:55Z

src/exo/worker/runner/llm_inference/batch_generator.py

 @dataclass(eq=False)
 class BatchGenerator(InferenceGenerator):
-    model: Model
+    model: Model | None


the batch generator should not own the model if possible- what blockers are there to this?

Made an attempt at this by moving generator creation before the warmup phase.

src/exo/worker/runner/llm_inference/batch_generator.py

src/exo/worker/runner/llm_inference/runner.py

Evanev7 · 2026-03-17T13:30:36Z

src/exo/worker/runner/llm_inference/runner.py

+            case LoadModel() if (
                (
-                    isinstance(self.current_status, RunnerConnected)
+                    isinstance(self.generator, MlxBuilder)


can just be isinstance(Builder)

I do want the generator to explicitly be an mlx builder if there is an mlx group.

src/exo/worker/runner/llm_inference/runner.py

src/exo/worker/runner/bootstrap.py

… and vllm batch engine and don't store model on the generators.

rltakashige added 8 commits March 11, 2026 18:17

Fast direct USB connectivity

cfc8f09

Show Sparks and Linux in topology

5e9d27b

Fix placement preview

6be6ea5

Some Linux Laptop/Desktop detection and goodbye penguin

ca5870a

Vibe coding design baby

34df811

Progress: Run EXO-CUDA through nix!

659c1bc

Patch VLLM to load multiple models dynamically

9a83fa6

Move VLLM into the runner and add type stubs

ba35a4b

rltakashige force-pushed the leo/dgx-spark-integrations branch from 5930f76 to d6ea684 Compare March 11, 2026 19:24

Only import VLLM once..

0c8615f

rltakashige force-pushed the leo/dgx-spark-integrations branch from d6ea684 to 0c8615f Compare March 11, 2026 20:49

rltakashige added 2 commits March 11, 2026 20:57

only download a single copy of the model.

3e097f7

Download models without model.safetensors.index

404b976

rltakashige force-pushed the leo/dgx-spark-integrations branch from 07e49ce to 404b976 Compare March 11, 2026 21:32

Add Torch typings

2683ac7

rltakashige force-pushed the leo/dgx-spark-integrations branch 6 times, most recently from 8c8a96d to 32cc06a Compare March 11, 2026 22:26

Fix cache patch

f75d36c

rltakashige force-pushed the leo/dgx-spark-integrations branch from 32cc06a to f75d36c Compare March 11, 2026 22:28

Make vllm inference runner closer to the normal inference runner

9ee23ee

rltakashige force-pushed the leo/dgx-spark-integrations branch 2 times, most recently from 95f3e4e to d5989fd Compare March 12, 2026 00:03

Ignore missing modules if type stubs exist

8f94727

rltakashige force-pushed the leo/dgx-spark-integrations branch 2 times, most recently from 20685ae to 7492111 Compare March 12, 2026 00:33

Add missing runner features

1331465