E³

Every Eval Ever

Browse and compare model benchmarks
beta
Leaderboard

hfopenllm_v2

by Hugging Face
IFEvalBBHMATH Level 5GPQAMUSRMMLU-PRO
Source →
Columns to Display

1 / 92

Metric Reference

IFEval ↑ Higher is better
Accuracy on IFEval
Range: [0 – 1] CONTINUOUS
BBH ↑ Higher is better
Accuracy on BBH
Range: [0 – 1] CONTINUOUS
MATH Level 5 ↑ Higher is better
Exact Match on MATH Level 5
Range: [0 – 1] CONTINUOUS
GPQA ↑ Higher is better
Accuracy on GPQA
Range: [0 – 1] CONTINUOUS
MUSR ↑ Higher is better
Accuracy on MUSR
Range: [0 – 1] CONTINUOUS
MMLU-PRO ↑ Higher is better
Accuracy on MMLU-PRO
Range: [0 – 1] CONTINUOUS

Submit via GitHub Pull Request:

  1. Fork evaleval/every_eval_ever
  2. Add JSON files to data/<leaderboard>/<developer>/<model>/
  3. Open a PR — automated validation runs on submission
  4. After merge, data syncs to HuggingFace automatically

Submission Guide Ā· JSON Schema