tvastr commited on
Commit
fbcb549
·
verified ·
1 Parent(s): d59f716

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +22 -130
README.md CHANGED
@@ -8,165 +8,57 @@ tags:
8
  - causal-lm
9
  - rabbit
10
  - rtaforge
11
- - proof-of-concept
12
  base_model: RtaForge/Anvaya-Rabbit-2.7B
13
  ---
14
 
15
- # Anvaya-Rabbit 2.7B — v0.1 Alpha
16
 
17
- Rabbit is a 2.7B parameter recurrent State-Space Model (Ṛta-SSM) trained entirely
18
- from scratch on a single NVIDIA L4 GPU using a custom non-transformer architecture
19
- and the Gurukul constitutional training protocol. It serves as a technical
20
- proof-of-concept that capable alternative-architecture models can be developed under
21
- severe compute constraints. This is the first model in the Anvaya series:
22
- **Rabbit → Raccoon → Polar Bear**.
23
-
24
- ## Overview
25
-
26
- Rabbit demonstrates three proprietary components developed by RtaForge:
27
-
28
- - **Ṛta-SSM** — a custom recurrent state-space architecture with no attention
29
- or transformer blocks
30
- - **Gurukul** — a proposal-validation training loop in which a Sisya proposes
31
- weight deltas and a Guru validates them against constitutional constraints before
32
- applying
33
- - **Subsuminator** — cross-architecture weight migration without full retraining,
34
- enabling efficient curriculum transfer
35
-
36
- Trained across a phased curriculum on a single consumer GPU, Rabbit shows
37
- substantial gains over random initialisation on internal scale-invariant metrics.
38
- It is a deliberate architecture proof at seq_len=64 — not a production model.
39
-
40
- For strategic context, IndiaAI alignment, and full programme roadmap, see the
41
- [Anvaya Executive Briefing](https://huggingface.co/RtaForge/Anvaya-Rabbit-2.7B/resolve/main/docs/Anvaya-Executive-Briefing-May2026.pdf).
42
 
43
  ## Architecture
44
 
45
  - **Type**: Ṛta-SSM v7.2.2, Fortress Unbroken — recurrent SSM, no attention
46
- - **Parameters**: ~2.7B (post-subsumination)
47
  - **Layers**: 64
48
  - **d_model / d_state**: 2560
49
  - **Vocabulary**: 50,280 (GPT-NeoX tokenizer)
50
  - **Precision**: bfloat16
51
- - **Training seq_len**: 64
52
 
53
  ## Weights
54
 
55
- This repository contains the base pretrained checkpoint
56
- (`base/Anvaya-Rabbit-2.7B-0.1-alpha-base.pt`) and the SFT imprint checkpoint
57
- (`imprint/Anvaya-Rabbit-2.7B-0.1-alpha-imprint.pt`).
58
-
59
- Load the imprint weights (base + SFT overlay, recommended for inference):
60
 
61
  ```python
62
  from white_rabbit.rabbit_model import create_rabbit_model
63
  from transformers import AutoTokenizer
64
  import torch
65
 
66
- model = create_rabbit_model(
67
- vocab_size=50280,
68
- durga_variant="fu-64", # 64-layer Fortress Unbroken backbone
69
- )
70
- sd = torch.load("imprint/Anvaya-Rabbit-2.7B-0.1-alpha-imprint.pt", map_location="cpu")
71
  model.load_state_dict(sd, strict=False)
72
  model.eval()
73
 
74
  tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
75
  ```
76
 
77
- > **Requires**: `rtaforge-substrates` (private repository — contact
78
- > guha@rtaforge.in for access). This model uses a custom SSM architecture
79
- > not compatible with standard HuggingFace `AutoModel`.
80
-
81
- **Training infrastructure**: [`Rta-Forge/polaris-revival`](https://github.com/Rta-Forge/polaris-revival) —
82
- patched ROCm 7.2 runtime restoring native HIP dispatch on gfx803 (RX 560X), with
83
- fused SSM recurrence kernels. MIT licensed.
84
-
85
- ## Training Protocol
86
-
87
- Two proprietary components make this training regime possible:
88
-
89
- **Gurukul** is a constitutional Sisya/Guru proposal-validation loop:
90
- - The Sisya proposes weight deltas based on the current curriculum phase
91
- - The Guru validates each proposal against a set of constitutional constraints
92
- - Accepted proposals update the model; rejected proposals are logged for signal
93
- - Feedback from each cycle informs the next round of proposals
94
-
95
- **Subsuminator** enables efficient migration of learned weights across architectures,
96
- supporting curriculum transfer without retraining from scratch.
97
-
98
- Together these components allowed 1,500 accepted proposals across 6 phases to be
99
- processed in ~7 effective days on a single 24GB GPU.
100
-
101
- **1,500 accepted Gurukul proposals across 6 phases on a single AceCloud L4 (24GB VRAM).
102
- ~7 days effective training time (total elapsed higher due to crash recovery and VRAM
103
- leak debugging).**
104
-
105
- | Phase | Proposals | Dataset | Focus |
106
- |-------|-----------|---------|-------|
107
- | 0 | 125 | CAMEL Physics | Physical reasoning |
108
- | 1 | 125 | CAMEL Chemistry | Chemical reasoning |
109
- | 2 | 125 | CAMEL Biology | Biological reasoning |
110
- | 3 | 250 | Raccoon Phase 1 | General reasoning |
111
- | 4 | 500 | Rabbit E2 Phase 4 | Extended curriculum |
112
- | 5 | 375 | Raccoon Phase 3 (consolidation re-run) | Pattern consolidation |
113
-
114
- **Final checkpoint: Step 1,500.** seq_len=64, batch_size=3, optimizer=Lion, lr=1e-5.
115
-
116
- SFT imprint applied using surface-only gate-layer fine-tuning (65 examples, 3 epochs).
117
-
118
- ## Evaluation
119
-
120
- ### Internal — Scale-Invariant Metrics
121
-
122
- Evaluated using Top-K accuracy and Mean Reciprocal Rank vs. a randomly initialised
123
- baseline of identical architecture. 50 samples per corpus, seq_len=64.
124
-
125
- | Metric | Random Init | Trained (Step 1,500) | Gain |
126
- |--------|-------------|----------------------|------|
127
- | Top-1 Accuracy (aggregate) | 0.24% | **1.90%** | **~8×** |
128
- | Top-10 Accuracy (aggregate) | 0.24% | **35.84%** | **~149×** |
129
- | MRR (aggregate) | 0.0026 | **0.1724** | **~66×** |
130
- | MRR — Deep Math | 0.0084 | **0.186** | **22×** |
131
- | Top-10 — Biology | ~1.3% | **~12%** | **~10×** |
132
- | Top-10 — Chemistry | ~1.3% | **~13%** | **~10×** |
133
-
134
- These gains are measured against a randomly initialised model of identical
135
- architecture — they reflect what the training curriculum taught, not absolute
136
- capability.
137
-
138
- ### Commercial Benchmarks (lm-eval harness)
139
-
140
- > **Standard academic benchmarks are not yet meaningful here.** Rabbit was
141
- > deliberately trained at seq_len=64 as a pure architecture proof. Standard
142
- > lm-eval prompts run 150–400 tokens — well beyond Rabbit's training context.
143
- > Raccoon (seq_len=512) removes this constraint entirely.
144
-
145
- | Benchmark | Score | Notes |
146
- |-----------|-------|-------|
147
- | HellaSwag | 25.89% | Prompt exceeds training seq_len |
148
- | ARC-Challenge | 26.71% | Prompt exceeds training seq_len |
149
- | MMLU | 26.89% | Prompt exceeds training seq_len |
150
- | WinoGrande | 48.62% | Prompt exceeds training seq_len |
151
- | TruthfulQA MC1 | 21.91% | Prompt exceeds training seq_len |
152
-
153
- ## Roadmap
154
 
155
- | Model | Params | seq_len | Status |
156
- |-------|--------|---------|--------|
157
- | **Rabbit** | ~2.7B | 64 | ✅ This model — v0.1 Alpha |
158
- | **Raccoon** | ~6.1B | 512 | In training — reasoning curriculum (math ×2, logic ×2) |
159
- | **Polar Bear** | ~13B | 512 | Planned — STEM + AEVA anti-hallucination layer |
160
 
161
- The delta between Rabbit and Raccoon is the story — same pipeline, same hardware
162
- philosophy, 8× context length, reasoning-heavy curriculum. Raccoon is intended to
163
- be the first Ṛta-SSM model trained end-to-end in India on domestic compute
164
- infrastructure to reach standard benchmark competitiveness.
 
 
 
165
 
166
- **Give us more resources and watch what happens.**
167
 
168
- ## Related Resources
169
 
170
- - [Anvaya Executive Briefing May 2026](https://huggingface.co/RtaForge/Anvaya-Rabbit-2.7B/resolve/main/docs/Anvaya-Executive-Briefing-May2026.pdf) (strategic context & IndiaAI alignment)
171
- - Training infrastructure: [`Rta-Forge/polaris-revival`](https://github.com/Rta-Forge/polaris-revival)
172
- - Technical inquiries: guha@rtaforge.in
 
8
  - causal-lm
9
  - rabbit
10
  - rtaforge
 
11
  base_model: RtaForge/Anvaya-Rabbit-2.7B
12
  ---
13
 
14
+ # Anvaya-Rabbit 2.7B
15
 
16
+ A 2.7B parameter State-Space Model (SSM) trained by RtaForge using the Gurukul
17
+ constitutional training protocol.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
  ## Architecture
20
 
21
  - **Type**: Ṛta-SSM v7.2.2, Fortress Unbroken — recurrent SSM, no attention
22
+ - **Parameters**: ~2.78B
23
  - **Layers**: 64
24
  - **d_model / d_state**: 2560
25
  - **Vocabulary**: 50,280 (GPT-NeoX tokenizer)
26
  - **Precision**: bfloat16
 
27
 
28
  ## Weights
29
 
30
+ This repository contains the base pretrained checkpoint (`base/Anvaya-Rabbit-2.7B-0.1-alpha-base.pt`)
31
+ and the SFT imprint checkpoint (`imprint/Anvaya-Rabbit-2.7B-0.1-alpha-imprint.pt`).
32
+ Load the base weights directly:
 
 
33
 
34
  ```python
35
  from white_rabbit.rabbit_model import create_rabbit_model
36
  from transformers import AutoTokenizer
37
  import torch
38
 
39
+ model = create_rabbit_model(vocab_size=50280, durga_variant="fu-64")
40
+ sd = torch.load("base/Anvaya-Rabbit-2.7B-0.1-alpha-base.pt", map_location="cpu")
 
 
 
41
  model.load_state_dict(sd, strict=False)
42
  model.eval()
43
 
44
  tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
45
  ```
46
 
47
+ ## Benchmarks
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48
 
49
+ *Benchmarks pending will be updated after evaluation run completes.*
 
 
 
 
50
 
51
+ | Task | Metric | Score |
52
+ |------|--------|-------|
53
+ | HellaSwag | acc_norm | |
54
+ | ARC-Challenge | acc_norm | — |
55
+ | MMLU | acc | — |
56
+ | WinoGrande | acc | — |
57
+ | TruthfulQA MC1 | mc1 | — |
58
 
 
59
 
60
+ ## Training
61
 
62
+ Trained with the Anvaya Gurukul protocol: a constitutional Sisya/Guru loop
63
+ where Sisya proposes weight deltas and Guru applies them after validation.
64
+ SFT imprint applied using surface-only gate-layer fine-tuning.