DevHunterAI
/

RubiRLM-1B-Base

Text Generation

Mixture of Experts

Model card Files Files and versions

RubiRLM-1B-Base / README.md

DevHunterAI's picture

Update README.md

de88bd7 verified 20 days ago

|

history blame contribute delete

2.42 kB

	---
	language:
	- en
	license: apache-2.0
	library_name: transformers
	tags:
	- rubirlm
	- causal-lm
	- base-model
	- text-generation
	- 1b
	- moe
	datasets:
	- HuggingFaceFW/fineweb
	- HuggingFaceH4/ultrachat_200k
	pipeline_tag: text-generation
	---

	# RubiRLM-1B-Base

	RubiRLM-1B-Base is a 1B-parameter base language model released by DevHunterAI.

	Model size: 1B parameters

	Training datasets: FineWeb, UltraChat-200k

	Model type: Base / pretrained language model

	Important: This release is a base model. It can be used for prompt-based generation and experimental chat-style interaction, but it is not an instruction-tuned chat assistant.

	## Architecture

	![RubiRLM 1B Architecture](architecture.png)

	RubiRLM 1B uses a recursive language modeling architecture with recurrent state flow, Mixture-of-Experts routing, and conditional block execution.

	## Key Features

	- 1B parameters
	- Recursive Language Model (RLM) architecture
	- 10 recursive blocks
	- d_model = 1024
	- 16 attention heads
	- max sequence length = 2048
	- 6 recursive reasoning steps
	- Mixture-of-Experts: 32 experts, top-1 routing
	- Layer skip router for conditional execution
	- Packed execution support
	- Tied token embedding and LM head

	## Training Data

	This model was trained using a mixture of:

	- FineWeb
	- UltraChat-200k

	## Intended Usage

	This model is intended for:

	- base language modeling research
	- continued pretraining
	- experimental prompt-based generation
	- architecture experimentation around recursive and MoE-based language models

	## Not Intended As

	This release should not be treated as:

	- a fully aligned assistant
	- a safety-tuned production chatbot
	- an instruction-following model with guaranteed conversational quality

	## Loading

	Because this repository includes custom model code, loading may require `trust_remote_code=True` depending on your workflow.

	## Files

	- `pytorch_model.bin`: exported RubiRLM weights
	- `training_checkpoint.pt`: original training checkpoint
	- `config.json`: Hugging Face-facing config
	- `rubirlm_config.json`: full RubiRLM architecture config
	- `RubiRLM.py`: model implementation
	- `xqs_moe.py`, `xqs_stack.py`, `x_quantum_sparse_ops.py`, `rubi_train_stack.py`: supporting code

	## Notes

	The exported weights were produced from the final training checkpoint and packaged for Hugging Face publication.