YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Speculation head checkpoints

Pre-trained pipeline speculation head weights. Each .pt file is a single checkpoint produced by training; pair it with the same base model architecture it was trained on (see config["base_model_path"] inside the file).

For inference, evaluation, and training examples, see the official repo:
https://github.com/yuyijiong/speculative_pipeline_decoding

Filename format

Files are named:

{model}_s{num_stages}_l{num_spec_layers}.pt
Part Meaning
{model} Base model tag from training config (e.g. Qwen3.5-4B, Qwen3.5-9B)
s{...} num_stages — pipeline depth (number of target-model stages)
l{...} num_spec_layers — number of Transformer layers in the speculation module

Example: Qwen3.5-9B_s16_l2.pt → Qwen3.5-9B base, 16 stages, 2 spec layers.

Checkpoint contents

Each file is a PyTorch archive with two top-level keys:

{
    "state_dict": ...,  # weights of the speculation module
    "config": { ... },  # hyperparameters and metadata
}

config fields (always present)

Field Description
base_model_path Base model path recorded at training time (often a machine-local path; override at load time — see below)
hidden_size Hidden size (matches base model)
vocab_size Base model vocabulary size
draft_vocab_size Draft head output size (full vocab or draft subset)
num_stages Pipeline depth (same as s in filename)
num_spec_layers Speculation module depth (same as l in filename)
version Checkpoint format version (10)
trained_with_use_deepest Whether training used deepest-layer features
shallow_hidden_layer_indices Which base layers feed the speculation module

config fields (optional)

Field Description
spec_init_from_base_layers Base layers used to initialize the spec module (if any)
draft_token_ids Draft vocabulary token ids (only when trained with a draft vocab subset)

Loading checkpoints

config["base_model_path"] is often a local path from the training machine (e.g. /share/models/Qwen3.5-4B). On your machine, pass the correct Hugging Face id or local directory via --base_model_path; it overrides the path stored in the checkpoint:

python pipeline_inference.py \
  --spec_head_ckpt /path/to/Qwen3.5-4B_s4_l2.pt \
  --base_model_path Qwen/Qwen3.5-4B

python eval.py \
  --spec_head_ckpt /path/to/Qwen3.5-4B_s4_l2.pt \
  --base_model_path /your/local/Qwen3.5-4B \
  --data_dir eval_data \
  --output_dir ./eval_output

If --base_model_path is omitted, the value from config["base_model_path"] is used as-is.

More usage details: speculative_pipeline_decoding.

Citation

If you use this repo, please cite our paper:

@misc{yu2026speculativepipelinedecodinghigheraccruacy,
      title={Speculative Pipeline Decoding: Higher-Accruacy and Zero-Bubble Speculation via Pipeline Parallelism}, 
      author={Yijiong Yu and Huazheng Wang and Shuai Yuan and Ruilong Ren and Ji Pei},
      year={2026},
      eprint={2605.30852},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2605.30852}, 
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for yuyijiong/speculative_pipeline_decoding