Covenant-72B-Chat

Model Overview

Covenant-72B-Chat is the instruction-tuned variant of Covenant-72B, the largest permissionless collaboratively trained language model. It was fine-tuned via supervised fine-tuning (SFT) on top of the 72B-parameter base model.

For more details, see the technical report.

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "1Covenant/Covenant-72B-Chat",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("1Covenant/Covenant-72B-Chat")

messages = [
    {"role": "user", "content": "Explain general relativity in simple terms."},
]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
output_ids = model.generate(input_ids, max_new_tokens=256)
print(tokenizer.decode(output_ids[0][input_ids.shape[-1]:], skip_special_tokens=True))

Model Details

Base Model: Covenant-72B
Fine-tuning: Supervised fine-tuning (SFT)
Model License: Apache 2.0

Technical Specifications

Parameter	Value
Parameter Size	72B
Architecture	LLaMA-style (LlamaForCausalLM)
Number of Layers	80
Number of Attention Heads	64 (8 KV heads)
Hidden Size	8192
Intermediate Size	28672
Head Dimension	128
Vocabulary Size	262,144

Performance on Benchmarks

All values in (%). ARC-C is 25-shot, HellaSwag is 10-shot, BBH CoT is 3-shot, MATH is 4-shot; all others are 5-shot.

Model	Size	ARC-C	ARC-E	GSM8K*	HellaSwag	MMLU**	OBQA	PIQA	WinoGrande**
Covenant-72B-Chat	72B	64.16	85.52	63.91	79.15	67.35	51.80	82.81	77.27
LLaMA-2-7B-Chat	7B	53.16	80.64	22.59	78.60	47.23	42.60	78.24	72.45
LLaMA-2-70B-Chat	70B	65.36	85.31	52.16	85.90	63.08	47.40	81.56	79.56
K2-Chat (65B)	65B	61.95	85.82	79.00	79.31	67.87	48.20	83.35	79.64

*strict; **acc. All others use acc_norm.

Additional Benchmarks

Model	Size	BBH CoT*	IFEval**	MATH*	MMLU-Pro*	MuSR
Covenant-72B-Chat	72B	54.97	64.70	26.28	40.91	39.68
LLaMA-2-7B-Chat	7B	40.42	30.87	4.82	22.88	40.21
LLaMA-2-70B-Chat	70B	63.22	40.67	10.66	35.20	48.68
K2-Chat (65B)	65B	69.79	45.47	19.06	45.36	46.56

*exact_match; **prompt_strict. MuSR uses acc_norm.

Downloads last month: 129

Safetensors

Model size

73B params

Tensor type

BF16

Paper for 1Covenant/Covenant-72B-Chat

Covenant-72B: Pre-Training a 72B LLM with Trustless Peers Over-the-Internet

Paper • 2603.08163 • Published Mar 9 • 5