HebArabNlpProject
/

Hebatron_base_long

Text Generation

Mixture of Experts

Model card Files Files and versions

sarel commited on 7 days ago

Commit

bd8c499

·

verified ·

1 Parent(s): 87f14bb

Create README.md

Files changed (1) hide show

README.md +87 -0

README.md ADDED Viewed

	@@ -0,0 +1,87 @@

+---
+language:
+- he
+- en
+license: apache-2.0
+library_name: mamba
+tags:
+- mamba2
+- moe
+- hebrew
+- finance
+- legal
+- ssm
+model_name: HEBATRON
+base_model: nvidia/nemotron-3-nano-30b-base
+pipeline_tag: text-generation
+---
+# 🛡️ HEBATRON: Hebrew-Specialized Mamba2-MoE
+HEBATRON is a state-of-the-art, high-performance language model specialized for the Hebrew language. Developed through a collaboration between **PwC Israel**, **MAFAT**, and **AWS**, it introduces a unique hybrid architecture combining **Mamba2** and **Mixture-of-Experts (MoE)**.
+## 🚀 Model Summary
+HEBATRON is designed to handle the structural and morphological complexities of Hebrew while providing linear scaling for long-context tasks. It is a localized and enhanced version of the **Nemotron-3-Nano-30B** framework, optimized for native-level reasoning in Hebrew and English.
+---
+## 📂 Technical Specifications
+| Feature | Specification |
+| :--- | :--- |
+| **Model Name** | HEBATRON |
+| **Architecture** | Hybrid Mamba2 (SSM) + Sparse MoE |
+| **Total Parameters** | 31.6B |
+| **Active Parameters** | ~3B per token |
+| **Context Window** | 65,536 (64k) tokens |
+| **Hardware** | NVIDIA Blackwell (B300) & H200 GPUs |
+| **Precision** | FP8 Mixed-Precision |
+---
+## 🧬 Training Curriculum
+The model was trained using a three-phase **Curriculum Learning** strategy:
+1. **Phase 1: Formal Foundation (75.5B tokens)**
+   Focused on high-quality, structured Hebrew (legal, academic, and literary texts) to establish core grammatical rules.
+2. **Phase 2: Colloquial Expansion (3.36B tokens)**
+   Integration of social media, forums, and informal web data to handle slang and modern registers.
+3. **Phase 3: Long-Context Extension (20.4B tokens)**
+   Fine-tuning on dense, long-form documents to stabilize the 64k context window.
+---
+## 📊 Performance Evaluation
+### Hebrew Reasoning Benchmarks
+* **SNLI (Semantic Reasoning):** 91.2% accuracy
+* **Israeli Trivia:** 72.1% (+14pt vs base)
+* **Hebrew Average Reasoning:** 73.8% (Surpassing DictaLM-3.0-Thinking)
+* **GSM8K (Math):** 83.3% accuracy in native Hebrew
+### English Reasoning Benchmarks
+* **Psychometric Psi (EN):** 91.6%
+* **English Reasoning Average:** 86.0%
+---
+## 🎯 Intended Use & Limitations
+* **Intended Use:** Advanced Hebrew document analysis, long-context summarization (legal/technical), and complex bilingual reasoning.
+* **Limitations:** Users should verify outputs for factual accuracy as with any Large Language Model.
+---
+## 🤝 Credits
+### **Project Leadership**
+* **MAFAT Lead:** Tal Geva (Project Lead), Matan Frank
+* **Technical Lead:** Sarel Weinberger (PwC Next)
+### **Core Teams**
+* **PwC Israel Team:** Noam Kayzer, Dan Revital, Ori Bar Joseph, Smadar Arbatz, Or Levi, Kate Zinkovskaia, Zevi Apini, Omer Baruch (PwC Next)
+* **MAFAT Team:** Noam Ordan, Nadav Cordova
+### **Partners & Collaborators**
+* **Partners:** Amir Nissan Hacohen (Origin.ai)
+* **Research Collaborators:** Shaltiel Shmidman (Dicta), Mike Erlihson
+* **Infrastructure:** Netanel Ilouz (AWS)