Covenant-72B-Chat

Model Overview

Covenant-72B-Chat is the instruction-tuned variant of Covenant-72B, the largest permissionless collaboratively trained language model. It was fine-tuned via supervised fine-tuning (SFT) on top of the 72B-parameter base model.

For more details, see the technical report.

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "1Covenant/Covenant-72B-Chat",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("1Covenant/Covenant-72B-Chat")

messages = [
    {"role": "user", "content": "Explain general relativity in simple terms."},
]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
output_ids = model.generate(input_ids, max_new_tokens=256)
print(tokenizer.decode(output_ids[0][input_ids.shape[-1]:], skip_special_tokens=True))

Model Details

  • Base Model: Covenant-72B
  • Fine-tuning: Supervised fine-tuning (SFT)
  • Model License: Apache 2.0

Technical Specifications

Parameter Value
Parameter Size 72B
Architecture LLaMA-style (LlamaForCausalLM)
Number of Layers 80
Number of Attention Heads 64 (8 KV heads)
Hidden Size 8192
Intermediate Size 28672
Head Dimension 128
Vocabulary Size 262,144

Performance on Benchmarks

All values in (%). ARC-C is 25-shot, HellaSwag is 10-shot, BBH CoT is 3-shot, MATH is 4-shot; all others are 5-shot.

Model Size ARC-C ARC-E GSM8K* HellaSwag MMLU** OBQA PIQA WinoGrande**
Covenant-72B-Chat 72B 64.16 85.52 63.91 79.15 67.35 51.80 82.81 77.27
LLaMA-2-7B-Chat 7B 53.16 80.64 22.59 78.60 47.23 42.60 78.24 72.45
LLaMA-2-70B-Chat 70B 65.36 85.31 52.16 85.90 63.08 47.40 81.56 79.56
K2-Chat (65B) 65B 61.95 85.82 79.00 79.31 67.87 48.20 83.35 79.64

*strict; **acc. All others use acc_norm.

Additional Benchmarks

Model Size BBH CoT* IFEval** MATH* MMLU-Pro* MuSR
Covenant-72B-Chat 72B 54.97 64.70 26.28 40.91 39.68
LLaMA-2-7B-Chat 7B 40.42 30.87 4.82 22.88 40.21
LLaMA-2-70B-Chat 70B 63.22 40.67 10.66 35.20 48.68
K2-Chat (65B) 65B 69.79 45.47 19.06 45.36 46.56

*exact_match; **prompt_strict. MuSR uses acc_norm.

Downloads last month
609
Safetensors
Model size
73B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for 1Covenant/Covenant-72B-Chat