Benchmarking SpecExit

Hi,
Thank you for the great work on SpecExit! I'm trying to benchmark the method using Qwen3-8B (and Qwen3-14B) as the base model.

### Issue

I attempted to use a standard EAGLE3 draft model (`AngelSlim/Qwen3-4B_eagle3`) with the provided inference code, but encountered a weight shape mismatch:
```bash
RuntimeError: Error(s) in loading state_dict for Model: size mismatch for fc.weight: copying a param with shape torch.Size([2560, 7680]) from checkpoint, the shape in current model is torch.Size([2563, 7680]).
```

This is expected since standard EAGLE3 models don't include the +3 outputs for the CPR (Confidence-Progress-Remain) prediction head required by SpecExit.

### Request

Would it be possible to release pre-trained SpecExit draft models that include the CPR head? Specifically:

1. **Qwen3-based models** (e.g., Qwen3-8B + compatible draft model)
2. **Any models used in the paper's benchmarks** (for reproduction purposes)

Alternatively, if releasing full models isn't feasible, could you provide details about your draft model training setup (train dataset, hyper-params, duration, any useful tips).
Particularly, **sharing the `sharegpt_train.json`and `sharegpt_test.json` files would be very helpful, as no support for creating your training dataset is there in the codebase**.

### Failure details
- Codebase used: `https://anonymous.4open.science/r/SpecExit-B802`
- Running inference with `gen_ea_answer.py`
- Benchmark: GSM8K
- Base model: `Qwen/Qwen3-8B`
- Attempted draft model (which turns out to be a standard eagle3 model): `AngelSlim/Qwen3-4B_eagle3`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarking SpecExit #229

Issue

Request

Failure details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Benchmarking SpecExit #229

Description

Issue

Request

Failure details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions