Covenant-72B: Pre-Training a 72B LLM with Trustless Peers Over-the-Internet
Paper
• 2603.08163 • Published
Covenant-72B-Chat is the instruction-tuned variant of Covenant-72B, the largest permissionless collaboratively trained language model. It was fine-tuned via supervised fine-tuning (SFT) on top of the 72B-parameter base model.
For more details, see the technical report.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"1Covenant/Covenant-72B-Chat",
torch_dtype=torch.bfloat16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("1Covenant/Covenant-72B-Chat")
messages = [
{"role": "user", "content": "Explain general relativity in simple terms."},
]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
output_ids = model.generate(input_ids, max_new_tokens=256)
print(tokenizer.decode(output_ids[0][input_ids.shape[-1]:], skip_special_tokens=True))
| Parameter | Value |
|---|---|
| Parameter Size | 72B |
| Architecture | LLaMA-style (LlamaForCausalLM) |
| Number of Layers | 80 |
| Number of Attention Heads | 64 (8 KV heads) |
| Hidden Size | 8192 |
| Intermediate Size | 28672 |
| Head Dimension | 128 |
| Vocabulary Size | 262,144 |
All values in (%). ARC-C is 25-shot, HellaSwag is 10-shot, BBH CoT is 3-shot, MATH is 4-shot; all others are 5-shot.
| Model | Size | ARC-C | ARC-E | GSM8K* | HellaSwag | MMLU** | OBQA | PIQA | WinoGrande** |
|---|---|---|---|---|---|---|---|---|---|
| Covenant-72B-Chat | 72B | 64.16 | 85.52 | 63.91 | 79.15 | 67.35 | 51.80 | 82.81 | 77.27 |
| LLaMA-2-7B-Chat | 7B | 53.16 | 80.64 | 22.59 | 78.60 | 47.23 | 42.60 | 78.24 | 72.45 |
| LLaMA-2-70B-Chat | 70B | 65.36 | 85.31 | 52.16 | 85.90 | 63.08 | 47.40 | 81.56 | 79.56 |
| K2-Chat (65B) | 65B | 61.95 | 85.82 | 79.00 | 79.31 | 67.87 | 48.20 | 83.35 | 79.64 |
*strict; **acc. All others use acc_norm.
| Model | Size | BBH CoT* | IFEval** | MATH* | MMLU-Pro* | MuSR |
|---|---|---|---|---|---|---|
| Covenant-72B-Chat | 72B | 54.97 | 64.70 | 26.28 | 40.91 | 39.68 |
| LLaMA-2-7B-Chat | 7B | 40.42 | 30.87 | 4.82 | 22.88 | 40.21 |
| LLaMA-2-70B-Chat | 70B | 63.22 | 40.67 | 10.66 | 35.20 | 48.68 |
| K2-Chat (65B) | 65B | 69.79 | 45.47 | 19.06 | 45.36 | 46.56 |
*exact_match; **prompt_strict. MuSR uses acc_norm.