sarel commited on
Commit
bd8c499
·
verified ·
1 Parent(s): 87f14bb

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +87 -0
README.md ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - he
4
+ - en
5
+ license: apache-2.0
6
+ library_name: mamba
7
+ tags:
8
+ - mamba2
9
+ - moe
10
+ - hebrew
11
+ - finance
12
+ - legal
13
+ - ssm
14
+ model_name: HEBATRON
15
+ base_model: nvidia/nemotron-3-nano-30b-base
16
+ pipeline_tag: text-generation
17
+ ---
18
+
19
+ # 🛡️ HEBATRON: Hebrew-Specialized Mamba2-MoE
20
+
21
+ HEBATRON is a state-of-the-art, high-performance language model specialized for the Hebrew language. Developed through a collaboration between **PwC Israel**, **MAFAT**, and **AWS**, it introduces a unique hybrid architecture combining **Mamba2** and **Mixture-of-Experts (MoE)**.
22
+
23
+ ## 🚀 Model Summary
24
+ HEBATRON is designed to handle the structural and morphological complexities of Hebrew while providing linear scaling for long-context tasks. It is a localized and enhanced version of the **Nemotron-3-Nano-30B** framework, optimized for native-level reasoning in Hebrew and English.
25
+
26
+ ---
27
+
28
+ ## 📂 Technical Specifications
29
+
30
+ | Feature | Specification |
31
+ | :--- | :--- |
32
+ | **Model Name** | HEBATRON |
33
+ | **Architecture** | Hybrid Mamba2 (SSM) + Sparse MoE |
34
+ | **Total Parameters** | 31.6B |
35
+ | **Active Parameters** | ~3B per token |
36
+ | **Context Window** | 65,536 (64k) tokens |
37
+ | **Hardware** | NVIDIA Blackwell (B300) & H200 GPUs |
38
+ | **Precision** | FP8 Mixed-Precision |
39
+
40
+ ---
41
+
42
+ ## 🧬 Training Curriculum
43
+ The model was trained using a three-phase **Curriculum Learning** strategy:
44
+
45
+ 1. **Phase 1: Formal Foundation (75.5B tokens)**
46
+ Focused on high-quality, structured Hebrew (legal, academic, and literary texts) to establish core grammatical rules.
47
+ 2. **Phase 2: Colloquial Expansion (3.36B tokens)**
48
+ Integration of social media, forums, and informal web data to handle slang and modern registers.
49
+ 3. **Phase 3: Long-Context Extension (20.4B tokens)**
50
+ Fine-tuning on dense, long-form documents to stabilize the 64k context window.
51
+
52
+ ---
53
+
54
+ ## 📊 Performance Evaluation
55
+
56
+ ### Hebrew Reasoning Benchmarks
57
+ * **SNLI (Semantic Reasoning):** 91.2% accuracy
58
+ * **Israeli Trivia:** 72.1% (+14pt vs base)
59
+ * **Hebrew Average Reasoning:** 73.8% (Surpassing DictaLM-3.0-Thinking)
60
+ * **GSM8K (Math):** 83.3% accuracy in native Hebrew
61
+
62
+ ### English Reasoning Benchmarks
63
+ * **Psychometric Psi (EN):** 91.6%
64
+ * **English Reasoning Average:** 86.0%
65
+
66
+ ---
67
+
68
+ ## 🎯 Intended Use & Limitations
69
+ * **Intended Use:** Advanced Hebrew document analysis, long-context summarization (legal/technical), and complex bilingual reasoning.
70
+ * **Limitations:** Users should verify outputs for factual accuracy as with any Large Language Model.
71
+
72
+ ---
73
+
74
+ ## 🤝 Credits
75
+
76
+ ### **Project Leadership**
77
+ * **MAFAT Lead:** Tal Geva (Project Lead), Matan Frank
78
+ * **Technical Lead:** Sarel Weinberger (PwC Next)
79
+
80
+ ### **Core Teams**
81
+ * **PwC Israel Team:** Noam Kayzer, Dan Revital, Ori Bar Joseph, Smadar Arbatz, Or Levi, Kate Zinkovskaia, Zevi Apini, Omer Baruch (PwC Next)
82
+ * **MAFAT Team:** Noam Ordan, Nadav Cordova
83
+
84
+ ### **Partners & Collaborators**
85
+ * **Partners:** Amir Nissan Hacohen (Origin.ai)
86
+ * **Research Collaborators:** Shaltiel Shmidman (Dicta), Mike Erlihson
87
+ * **Infrastructure:** Netanel Ilouz (AWS)