Evaluation runs
No evaluation runs yet
Runs score a model against a benchmark dataset. Start your first one and results will collect here — sortable, groupable, and comparable.
Connect a dataset
Upload an eval set or link one from the Hub.
Configure a run
Pick a model and benchmark to score against.
Read the guide
How scoring, grouping, and reruns work.