Viharikvs
/

CMBA-768M-OpenWebMath

Text Generation

Model card Files Files and versions

CMBA-768M-OpenWebMath / README.md

Viharikvs's picture

Model card updated after epoch 0

cc39d6b verified 8 months ago

|

history blame contribute delete

2.47 kB

	---
	base_model: t5-small
	license: apache-2.0
	datasets:
	- open-web-math/open-web-math
	tags:
	- text-generation
	- causal-lm
	- mamba
	- hrm
	- pytorch
	language:
	- en
	pipeline_tag: text-generation
	---

	# CMBA-768M-OpenWebMath

	A 768M parameter Hierarchical Recurrent Memory (HRM) language model trained on high-quality math web text from OpenWebMath. This model uses Mamba2 state-space models instead of traditional attention mechanisms, enabling efficient long-range sequence modeling.

	## Model Architecture

	CMBA (Causal Mamba-based Architecture) implements a hierarchical processing structure:

	- Hierarchical Design: Dual-level processing with H-layers (high-level abstraction) and L-layers (low-level specialists)
	- Mamba2 Mixers: State-space models replace attention for O(n) complexity vs O(n²)
	- Adaptive Computation: Halting mechanism allows variable compute per token (ACT-style pondering)
	- Parameters: ~768M total
	- Context Length: 1024 tokens

	### Configuration
	```python
	Model Dimensions:
	- d_model: 1024
	- n_heads: 16 (for compatibility, not used in Mamba)
	- d_ff: 4096
	- H_layers: 12 (high-level hierarchy)
	- L_layers: 12 (low-level processing)

	Mamba2 Settings:
	- d_state: 128
	- expand: 2
	- headdim: 64
	- d_conv: 4
	- ngroups: 1

	Training:
	- Max halt steps: 1
	- Block size: 1024
	- Batch size: 64 (effective)
	- Learning rate: 3e-05 → 1e-06
	- Weight decay: 0.1
	```

	## Training Data

	- Dataset: [open-web-math/open-web-math](https://huggingface.co/datasets/open-web-math/open-web-math)
	- Tokenizer: `t5-small` (T5 SentencePiece)
	- Vocab Size: 32100

	## Latest Performance (Epoch 0)

	- Validation Loss: `10.3766`
	- Validation Perplexity: `32099.98`

	## Usage

	```python
	from transformers import T5Tokenizer
	from hrm_text1_mamba1_donor import HRMText1

	tokenizer = T5Tokenizer.from_pretrained("t5-small")
	model = HRMText1.from_pretrained("Viharikvs/CMBA-768M-OpenWebMath")

	# Generate text
	input_ids = tokenizer("Once upon a time", return_tensors="pt").input_ids
	outputs = model.generate(input_ids, max_length=100)
	print(tokenizer.decode(outputs[0]))
	```

	## Citation

	If you use this model, please cite:

	```bibtex
	@misc{cmba-768m-openwebmath,
	author = {Vihari},
	title = {CMBA-768M-OpenWebMath: Hierarchical Mamba-based Language Model},
	year = {2025},
	publisher = {HuggingFace},
	url = {https://huggingface.co/Viharikvs/CMBA-768M-OpenWebMath}
	}
	```

	## License

	Apache 2.0