Skip to content

[DO NOT MERGE YET] DGX Spark integrations#1705

Draft
rltakashige wants to merge 46 commits intomainfrom
leo/dgx-spark-integrations
Draft

[DO NOT MERGE YET] DGX Spark integrations#1705
rltakashige wants to merge 46 commits intomainfrom
leo/dgx-spark-integrations

Conversation

@rltakashige
Copy link
Collaborator

@rltakashige rltakashige commented Mar 11, 2026

Motivation

TODO BEFORE MERGE:

  • test for no GLM regression.
  • nix run exo to just work. Or at least nix run exo-cuda

Part 1 of getting EXO to run well on the DGX Spark and other Linux CUDA machines outside of private betas.
nix run .#exo-cuda
or just sync-cuda && uv run exo

Changes

Cache a nix build of the dependencies
Add setup scripts
Make Linux look nice on the dashboard

Caveats

The connectivity between a Mac and Spark is still super dodgy.

Screenshot 2026-03-11 at 18 33 23

@rltakashige rltakashige force-pushed the leo/dgx-spark-integrations branch from 5930f76 to d6ea684 Compare March 11, 2026 19:24
@rltakashige rltakashige force-pushed the leo/dgx-spark-integrations branch from d6ea684 to 0c8615f Compare March 11, 2026 20:49
@rltakashige rltakashige force-pushed the leo/dgx-spark-integrations branch from 07e49ce to 404b976 Compare March 11, 2026 21:32
@rltakashige rltakashige force-pushed the leo/dgx-spark-integrations branch 6 times, most recently from 8c8a96d to 32cc06a Compare March 11, 2026 22:26
@rltakashige rltakashige force-pushed the leo/dgx-spark-integrations branch from 32cc06a to f75d36c Compare March 11, 2026 22:28
@rltakashige rltakashige force-pushed the leo/dgx-spark-integrations branch 2 times, most recently from 95f3e4e to d5989fd Compare March 12, 2026 00:03
@rltakashige rltakashige force-pushed the leo/dgx-spark-integrations branch 2 times, most recently from 20685ae to 7492111 Compare March 12, 2026 00:33
def format_vllm_prompt(
engine: LLMEngine, params: TextGenerationTaskParams
) -> tuple[list[int], str, int]:
tokenizer = TokenizerWrapper(engine.get_tokenizer())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should have our own wrapper so we can just use the tokenizers Tokenizer

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a TODO for a separate PR. Added it as a comment.

Comment on lines +560 to +561
attention_backend="TRITON_ATTN",
enforce_eager=True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe default

@dataclass(eq=False)
class BatchGenerator(InferenceGenerator):
model: Model
model: Model | None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the batch generator should not own the model if possible- what blockers are there to this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made an attempt at this by moving generator creation before the warmup phase.

case LoadModel() if (
(
isinstance(self.current_status, RunnerConnected)
isinstance(self.generator, MlxBuilder)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can just be isinstance(Builder)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do want the generator to explicitly be an mlx builder if there is an mlx group.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants