OLMES Evaluations - a HCAI-Lab Collection

HCAI-Lab 's Collections

Other projects under HCAI-Lab

Archive (pre-6T and legacy)

OLMES Evaluations

TrackStar — Scores + Analysis

TrackStar — Indices + Training Shards

Dolma3 — Query Data

Dolma3 — Working Samples + Preconditioner

Dolma3 — Source Corpus + Manifest

OLMES Evaluations

updated May 25

OLMES benchmark evaluation results across OLMo-3-7B and SmolLM-3-3B model variants.

HCAI-Lab/olmes-eval-olmo3-7b-base

Updated May 25 • 324

Note OLMo-3-7B base.
HCAI-Lab/olmes-eval-olmo3-7b-instruct-base

Viewer • Updated May 25 • 30.8k • 221

Note OLMo-3-7B instruct-base.
HCAI-Lab/olmes-eval-olmo3-7b-instruct-cot

Viewer • Updated May 25 • 21.6k • 314

Note OLMo-3-7B instruct + chain-of-thought.
HCAI-Lab/olmes-eval-olmo3-7b-thinking

Viewer • Updated May 25 • 17.4k • 82

Note OLMo-3-7B thinking.
HCAI-Lab/olmes-eval-smollm3-3b-base

Viewer • Updated May 25 • 17.4k • 16

Note SmolLM3-3B base.