MobileLLM-R1-140M-base — 2× HyperCloned

Initialized from facebook/MobileLLM-R1-140M-base using HyperCloning (Samragh et al., 2024).

Architecture

	Source	HyperCloned
hidden_size	576	1152
num_attention_heads	9	18
num_key_value_heads	3	6
head_dim	64	64
intermediate_size	8192	16384
num_layers	15	15
parameters	140,248,512	454,790,016

Method

Each weight W is expanded via W.repeat(n, n) / n (the paper's W/2 for 2×). Heads double with embedding dimension. head_dim is preserved. Output logits match the source model at initialization.

This is an initialization checkpoint — further training is needed.

@article{samragh2024hypercloning,
  title={Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization},
  author={Samragh et al.},
  journal={arXiv:2409.12903},
  year={2024}
}

Downloads last month: 8

Safetensors

Model size

0.5B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bedio/MobileLLM-R1-140M-2x-HyperCloned

Base model

facebook/MobileLLM-R1-140M-base

Finetuned

(11)

this model

Paper for bedio/MobileLLM-R1-140M-2x-HyperCloned

Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization

Paper • 2409.12903 • Published Sep 19, 2024 • 22