MobileLLM-R1-140M-base โ€” 2ร— HyperCloned

Initialized from facebook/MobileLLM-R1-140M-base using HyperCloning (Samragh et al., 2024).

Architecture

Source HyperCloned
hidden_size 576 1152
num_attention_heads 9 18
num_key_value_heads 3 6
head_dim 64 64
intermediate_size 8192 16384
num_layers 15 15
parameters 140,248,512 454,790,016

Method

Each weight W is expanded via W.repeat(n, n) / n (the paper's W/2 for 2ร—). Heads double with embedding dimension. head_dim is preserved. Output logits match the source model at initialization.

This is an initialization checkpoint โ€” further training is needed.

@article{samragh2024hypercloning,
  title={Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization},
  author={Samragh et al.},
  journal={arXiv:2409.12903},
  year={2024}
}
Downloads last month
8
Safetensors
Model size
0.5B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for bedio/MobileLLM-R1-140M-2x-HyperCloned

Finetuned
(11)
this model

Paper for bedio/MobileLLM-R1-140M-2x-HyperCloned