Evaluation runs

No evaluation runs yet

Runs score a model against a benchmark dataset. Start your first one and results will collect here — sortable, groupable, and comparable.

Connect a dataset

Upload an eval set or link one from the Hub.

Configure a run

Pick a model and benchmark to score against.

Read the guide

How scoring, grouping, and reruns work.