RubiRLM-1B-Base / README.md
DevHunterAI's picture
Update README.md
de88bd7 verified
---
language:
- en
license: apache-2.0
library_name: transformers
tags:
- rubirlm
- causal-lm
- base-model
- text-generation
- 1b
- moe
datasets:
- HuggingFaceFW/fineweb
- HuggingFaceH4/ultrachat_200k
pipeline_tag: text-generation
---
# RubiRLM-1B-Base
**RubiRLM-1B-Base** is a **1B-parameter base language model** released by **DevHunterAI**.
**Model size: 1B parameters**
**Training datasets:** FineWeb, UltraChat-200k
**Model type:** Base / pretrained language model
**Important:** This release is a **base model**. It can be used for prompt-based generation and experimental chat-style interaction, but it is **not an instruction-tuned chat assistant**.
## Architecture
![RubiRLM 1B Architecture](architecture.png)
**RubiRLM 1B** uses a recursive language modeling architecture with recurrent state flow, Mixture-of-Experts routing, and conditional block execution.
## Key Features
- **1B parameters**
- **Recursive Language Model (RLM)** architecture
- **10 recursive blocks**
- **d_model = 1024**
- **16 attention heads**
- **max sequence length = 2048**
- **6 recursive reasoning steps**
- **Mixture-of-Experts: 32 experts, top-1 routing**
- **Layer skip router for conditional execution**
- **Packed execution support**
- **Tied token embedding and LM head**
## Training Data
This model was trained using a mixture of:
- **FineWeb**
- **UltraChat-200k**
## Intended Usage
This model is intended for:
- base language modeling research
- continued pretraining
- experimental prompt-based generation
- architecture experimentation around recursive and MoE-based language models
## Not Intended As
This release should **not** be treated as:
- a fully aligned assistant
- a safety-tuned production chatbot
- an instruction-following model with guaranteed conversational quality
## Loading
Because this repository includes custom model code, loading may require `trust_remote_code=True` depending on your workflow.
## Files
- `pytorch_model.bin`: exported RubiRLM weights
- `training_checkpoint.pt`: original training checkpoint
- `config.json`: Hugging Face-facing config
- `rubirlm_config.json`: full RubiRLM architecture config
- `RubiRLM.py`: model implementation
- `xqs_moe.py`, `xqs_stack.py`, `x_quantum_sparse_ops.py`, `rubi_train_stack.py`: supporting code
## Notes
The exported weights were produced from the final training checkpoint and packaged for Hugging Face publication.