RtaForge
/

Anvaya-Rabbit-2.7B

@@ -8,165 +8,57 @@ tags:
 - causal-lm
 - rabbit
 - rtaforge
-- proof-of-concept
 base_model: RtaForge/Anvaya-Rabbit-2.7B
 ---
-# Anvaya-Rabbit 2.7B — v0.1 Alpha
-Rabbit is a 2.7B parameter recurrent State-Space Model (Ṛta-SSM) trained entirely
-from scratch on a single NVIDIA L4 GPU using a custom non-transformer architecture
-and the Gurukul constitutional training protocol. It serves as a technical
-proof-of-concept that capable alternative-architecture models can be developed under
-severe compute constraints. This is the first model in the Anvaya series:
-**Rabbit → Raccoon → Polar Bear**.
-## Overview
-Rabbit demonstrates three proprietary components developed by RtaForge:
-- **Ṛta-SSM** — a custom recurrent state-space architecture with no attention
-  or transformer blocks
-- **Gurukul** — a proposal-validation training loop in which a Sisya proposes
-  weight deltas and a Guru validates them against constitutional constraints before
-  applying
-- **Subsuminator** — cross-architecture weight migration without full retraining,
-  enabling efficient curriculum transfer
-Trained across a phased curriculum on a single consumer GPU, Rabbit shows
-substantial gains over random initialisation on internal scale-invariant metrics.
-It is a deliberate architecture proof at seq_len=64 — not a production model.
-For strategic context, IndiaAI alignment, and full programme roadmap, see the
-[Anvaya Executive Briefing](https://huggingface.co/RtaForge/Anvaya-Rabbit-2.7B/resolve/main/docs/Anvaya-Executive-Briefing-May2026.pdf).
 ## Architecture
 - **Type**: Ṛta-SSM v7.2.2, Fortress Unbroken — recurrent SSM, no attention
-- **Parameters**: ~2.7B (post-subsumination)
 - **Layers**: 64
 - **d_model / d_state**: 2560
 - **Vocabulary**: 50,280 (GPT-NeoX tokenizer)
 - **Precision**: bfloat16
-- **Training seq_len**: 64
 ## Weights
-This repository contains the base pretrained checkpoint
-(`base/Anvaya-Rabbit-2.7B-0.1-alpha-base.pt`) and the SFT imprint checkpoint
-(`imprint/Anvaya-Rabbit-2.7B-0.1-alpha-imprint.pt`).
-Load the imprint weights (base + SFT overlay, recommended for inference):
 ```python
 from white_rabbit.rabbit_model import create_rabbit_model
 from transformers import AutoTokenizer
 import torch
-model = create_rabbit_model(
-    vocab_size=50280,
-    durga_variant="fu-64",  # 64-layer Fortress Unbroken backbone
-)
-sd = torch.load("imprint/Anvaya-Rabbit-2.7B-0.1-alpha-imprint.pt", map_location="cpu")
 model.load_state_dict(sd, strict=False)
 model.eval()
 tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
 ```
-> **Requires**: `rtaforge-substrates` (private repository — contact
-> guha@rtaforge.in for access). This model uses a custom SSM architecture
-> not compatible with standard HuggingFace `AutoModel`.
-**Training infrastructure**: [`Rta-Forge/polaris-revival`](https://github.com/Rta-Forge/polaris-revival) —
-patched ROCm 7.2 runtime restoring native HIP dispatch on gfx803 (RX 560X), with
-fused SSM recurrence kernels. MIT licensed.
-## Training Protocol
-Two proprietary components make this training regime possible:
-**Gurukul** is a constitutional Sisya/Guru proposal-validation loop:
-- The Sisya proposes weight deltas based on the current curriculum phase
-- The Guru validates each proposal against a set of constitutional constraints
-- Accepted proposals update the model; rejected proposals are logged for signal
-- Feedback from each cycle informs the next round of proposals
-**Subsuminator** enables efficient migration of learned weights across architectures,
-supporting curriculum transfer without retraining from scratch.
-Together these components allowed 1,500 accepted proposals across 6 phases to be
-processed in ~7 effective days on a single 24GB GPU.
-**1,500 accepted Gurukul proposals across 6 phases on a single AceCloud L4 (24GB VRAM).
-~7 days effective training time (total elapsed higher due to crash recovery and VRAM
-leak debugging).**
-| Phase | Proposals | Dataset | Focus |
-|-------|-----------|---------|-------|
-| 0 | 125 | CAMEL Physics | Physical reasoning |
-| 1 | 125 | CAMEL Chemistry | Chemical reasoning |
-| 2 | 125 | CAMEL Biology | Biological reasoning |
-| 3 | 250 | Raccoon Phase 1 | General reasoning |
-| 4 | 500 | Rabbit E2 Phase 4 | Extended curriculum |
-| 5 | 375 | Raccoon Phase 3 (consolidation re-run) | Pattern consolidation |
-**Final checkpoint: Step 1,500.** seq_len=64, batch_size=3, optimizer=Lion, lr=1e-5.
-SFT imprint applied using surface-only gate-layer fine-tuning (65 examples, 3 epochs).
-## Evaluation
-### Internal — Scale-Invariant Metrics
-Evaluated using Top-K accuracy and Mean Reciprocal Rank vs. a randomly initialised
-baseline of identical architecture. 50 samples per corpus, seq_len=64.
-| Metric | Random Init | Trained (Step 1,500) | Gain |
-|--------|-------------|----------------------|------|
-| Top-1 Accuracy (aggregate) | 0.24% | **1.90%** | **~8×** |
-| Top-10 Accuracy (aggregate) | 0.24% | **35.84%** | **~149×** |
-| MRR (aggregate) | 0.0026 | **0.1724** | **~66×** |
-| MRR — Deep Math | 0.0084 | **0.186** | **22×** |
-| Top-10 — Biology | ~1.3% | **~12%** | **~10×** |
-| Top-10 — Chemistry | ~1.3% | **~13%** | **~10×** |
-These gains are measured against a randomly initialised model of identical
-architecture — they reflect what the training curriculum taught, not absolute
-capability.
-### Commercial Benchmarks (lm-eval harness)
-> **Standard academic benchmarks are not yet meaningful here.** Rabbit was
-> deliberately trained at seq_len=64 as a pure architecture proof. Standard
-> lm-eval prompts run 150–400 tokens — well beyond Rabbit's training context.
-> Raccoon (seq_len=512) removes this constraint entirely.
-| Benchmark | Score | Notes |
-|-----------|-------|-------|
-| HellaSwag | 25.89% | Prompt exceeds training seq_len |
-| ARC-Challenge | 26.71% | Prompt exceeds training seq_len |
-| MMLU | 26.89% | Prompt exceeds training seq_len |
-| WinoGrande | 48.62% | Prompt exceeds training seq_len |
-| TruthfulQA MC1 | 21.91% | Prompt exceeds training seq_len |
-## Roadmap
-| Model | Params | seq_len | Status |
-|-------|--------|---------|--------|
-| **Rabbit** | ~2.7B | 64 | ✅ This model — v0.1 Alpha |
-| **Raccoon** | ~6.1B | 512 | In training — reasoning curriculum (math ×2, logic ×2) |
-| **Polar Bear** | ~13B | 512 | Planned — STEM + AEVA anti-hallucination layer |
-The delta between Rabbit and Raccoon is the story — same pipeline, same hardware
-philosophy, 8× context length, reasoning-heavy curriculum. Raccoon is intended to
-be the first Ṛta-SSM model trained end-to-end in India on domestic compute
-infrastructure to reach standard benchmark competitiveness.
-**Give us more resources and watch what happens.**
-## Related Resources
-- [Anvaya Executive Briefing — May 2026](https://huggingface.co/RtaForge/Anvaya-Rabbit-2.7B/resolve/main/docs/Anvaya-Executive-Briefing-May2026.pdf) (strategic context & IndiaAI alignment)
-- Training infrastructure: [`Rta-Forge/polaris-revival`](https://github.com/Rta-Forge/polaris-revival)
-- Technical inquiries: guha@rtaforge.in

 - causal-lm
 - rabbit
 - rtaforge
 base_model: RtaForge/Anvaya-Rabbit-2.7B
 ---
+# Anvaya-Rabbit 2.7B
+A 2.7B parameter State-Space Model (SSM) trained by RtaForge using the Gurukul
+constitutional training protocol.
 ## Architecture
 - **Type**: Ṛta-SSM v7.2.2, Fortress Unbroken — recurrent SSM, no attention
+- **Parameters**: ~2.78B
 - **Layers**: 64
 - **d_model / d_state**: 2560
 - **Vocabulary**: 50,280 (GPT-NeoX tokenizer)
 - **Precision**: bfloat16
 ## Weights
+This repository contains the base pretrained checkpoint (`base/Anvaya-Rabbit-2.7B-0.1-alpha-base.pt`)
+and the SFT imprint checkpoint (`imprint/Anvaya-Rabbit-2.7B-0.1-alpha-imprint.pt`).
+Load the base weights directly:
 ```python
 from white_rabbit.rabbit_model import create_rabbit_model
 from transformers import AutoTokenizer
 import torch
+model = create_rabbit_model(vocab_size=50280, durga_variant="fu-64")
+sd = torch.load("base/Anvaya-Rabbit-2.7B-0.1-alpha-base.pt", map_location="cpu")
 model.load_state_dict(sd, strict=False)
 model.eval()
 tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
 ```
+## Benchmarks
+*Benchmarks pending — will be updated after evaluation run completes.*
+| Task | Metric | Score |
+|------|--------|-------|
+| HellaSwag | acc_norm | — |
+| ARC-Challenge | acc_norm | — |
+| MMLU | acc | — |
+| WinoGrande | acc | — |
+| TruthfulQA MC1 | mc1 | — |
+## Training
+Trained with the Anvaya Gurukul protocol: a constitutional Sisya/Guru loop
+where Sisya proposes weight deltas and Guru applies them after validation.
+SFT imprint applied using surface-only gate-layer fine-tuning.