tvastr commited on
Commit
5367fde
Β·
verified Β·
1 Parent(s): ad066f6

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +34 -96
README.md CHANGED
@@ -8,119 +8,57 @@ tags:
8
  - causal-lm
9
  - rabbit
10
  - rtaforge
11
- - india
12
- - sovereign-ai
13
- pipeline_tag: text-generation
14
  ---
15
 
16
  # Anvaya-Rabbit 2.7B
17
 
18
- **India's first sovereign SSM-based language model.**
 
19
 
20
- Non-transformer architecture. No attention mechanism. Constitutional training via Gurukul. 7 patents filed at IP India.
21
-
22
- ---
23
-
24
- ## What's in this repo
25
-
26
- Three model tiers are available, each built on the same 2.7B parameter base:
27
-
28
- | Tier | File | Use this when… |
29
- |---|---|---|
30
- | **Base** | `base/Anvaya-Rabbit-2.7B-0.5-alpha-base.pt` | You want raw pretrained weights for your own fine-tuning |
31
- | **Instruct** | `instruct/Anvaya-Rabbit-2.7B-0.5-alpha-instruct.pt` | You want a general-purpose assistant that follows instructions |
32
- | **Imprint** | `imprint/Anvaya-Rabbit-2.7B-0.5-alpha-imprint.pt` | You want the full Rabbit persona β€” opinionated, constitutional, identity-aware |
33
 
34
- If you're not sure which to use, start with **Instruct**.
 
 
 
 
 
35
 
36
- ---
37
 
38
- ## Quickstart
39
-
40
- ```bash
41
- pip install rtaforge transformers
42
- ```
43
 
44
  ```python
45
- from transformers import AutoModelForCausalLM, AutoTokenizer
 
 
46
 
47
- model = AutoModelForCausalLM.from_pretrained(
48
- "RtaForge/Anvaya-Rabbit-2.7B",
49
- trust_remote_code=True,
50
- torch_dtype="auto",
51
- device_map="auto",
52
- )
53
- tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
54
 
55
- inputs = tokenizer("Hello, I am Rabbit.", return_tensors="pt").to(model.device)
56
- outputs = model.generate(**inputs, max_new_tokens=200)
57
- print(tokenizer.decode(outputs[0], skip_special_tokens=True))
58
  ```
59
 
60
- > The `rtaforge` runtime package provides the compiled architecture. Source is not distributed.
61
-
62
- ---
63
-
64
- ## Why SSM?
65
-
66
- > Transformers scale quadratically with context length because every token attends to every other token. SSMs replace attention with a fixed-size recurrent state: inference cost stays **constant per token** regardless of context length, VRAM footprint shrinks dramatically, and long-document throughput improves by orders of magnitude β€” all at the same parameter count.
67
 
68
- ---
69
-
70
- ## Architecture
71
 
72
- Rabbit is built on **RtaSSM v7.2.2-FU "Fortress Unbroken"**, a custom state-space model developed at RtaForge:
 
 
 
 
 
 
73
 
74
- - **No attention mechanism** β€” purely recurrent SSM layers with learned state dynamics
75
- - **64 layers, 2560 hidden dimensions**, 2.7B parameters, bfloat16
76
- - **Constitutional training** β€” Gurukul curriculum with wiki pretraining β†’ instruct SFT β†’ persona imprint
77
- - **Vocabulary** 50,280 tokens (GPT-NeoX tokenizer)
78
-
79
- ---
80
 
81
  ## Training
82
 
83
- | Stage | Data | Notes |
84
- |---|---|---|
85
- | Wiki pretraining | Wikipedia (en) | 732 constitutional proposals via Gurukul |
86
- | Instruct SFT | ChatML instruction pairs | `gate_only` trainable strategy |
87
- | Persona imprint | Rabbit constitutional corpus | Identity and value alignment |
88
-
89
- ---
90
-
91
- ## Evaluation Access
92
-
93
- Weights are publicly available. Runtime package is live:
94
-
95
- ```bash
96
- pip install rtaforge
97
- ```
98
-
99
- To evaluate Rabbit or discuss deployment:
100
- πŸ“§ guha@rtaforge.in
101
- 🌐 rtaforge.in
102
-
103
- Runtime documentation coming soon.
104
-
105
- ---
106
-
107
- ## Limitations
108
-
109
- v0.5-alpha is an early research release. Rabbit has not been evaluated on standard benchmarks. She is small, she is new, and she is learning. Feedback welcome at guha@rtaforge.in.
110
-
111
- ---
112
-
113
- ## Citation
114
-
115
- ```bibtex
116
- @misc{anvaya-rabbit-2026,
117
- title = {Anvaya-Rabbit: A Sovereign SSM Language Model},
118
- author = {RtaForge},
119
- year = {2026},
120
- url = {https://huggingface.co/RtaForge/Anvaya-Rabbit-2.7B}
121
- }
122
- ```
123
-
124
- ---
125
-
126
- *Anvaya (ΰ€…ΰ€¨ΰ₯ΰ€΅ΰ€―) β€” logical connection, coherence. Rabbit β€” the fast runner.*
 
8
  - causal-lm
9
  - rabbit
10
  - rtaforge
11
+ base_model: RtaForge/Anvaya-Rabbit-2.7B
 
 
12
  ---
13
 
14
  # Anvaya-Rabbit 2.7B
15
 
16
+ A 2.7B parameter State-Space Model (SSM) trained by RtaForge using the Gurukul
17
+ constitutional training protocol.
18
 
19
+ ## Architecture
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
+ - **Type**: Ṛta-SSM v7.2.2, Fortress Unbroken β€” recurrent SSM, no attention
22
+ - **Parameters**: ~2.78B
23
+ - **Layers**: 64
24
+ - **d_model / d_state**: 2560
25
+ - **Vocabulary**: 50,280 (GPT-NeoX tokenizer)
26
+ - **Precision**: bfloat16
27
 
28
+ ## Weights
29
 
30
+ This repository contains the base pretrained checkpoint (`base/Anvaya-Rabbit-2.7B-0.1-alpha-base.pt`)
31
+ and the SFT imprint checkpoint (`imprint/Anvaya-Rabbit-2.7B-0.1-alpha-imprint.pt`).
32
+ Load the base weights directly:
 
 
33
 
34
  ```python
35
+ from white_rabbit.rabbit_model import create_rabbit_model
36
+ from transformers import AutoTokenizer
37
+ import torch
38
 
39
+ model = create_rabbit_model(vocab_size=50280, durga_variant="fu-64")
40
+ sd = torch.load("base/Anvaya-Rabbit-2.7B-0.1-alpha-base.pt", map_location="cpu")
41
+ model.load_state_dict(sd, strict=False)
42
+ model.eval()
 
 
 
43
 
44
+ tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
 
 
45
  ```
46
 
47
+ ## Benchmarks
 
 
 
 
 
 
48
 
49
+ *Benchmarks pending β€” will be updated after evaluation run completes.*
 
 
50
 
51
+ | Task | Metric | Score |
52
+ |------|--------|-------|
53
+ | HellaSwag | acc_norm | β€” |
54
+ | ARC-Challenge | acc_norm | β€” |
55
+ | MMLU | acc | β€” |
56
+ | WinoGrande | acc | β€” |
57
+ | TruthfulQA MC1 | mc1 | β€” |
58
 
 
 
 
 
 
 
59
 
60
  ## Training
61
 
62
+ Trained with the Anvaya Gurukul protocol: a constitutional Sisya/Guru loop
63
+ where Sisya proposes weight deltas and Guru applies them after validation.
64
+ SFT imprint applied using surface-only gate-layer fine-tuning.