Datasets

Create and manage evaluation datasets and dataset runs from the dashboard, SDK, or API.

Datasets

Datasets store collections of input/expected-output pairs used to evaluate LLM pipelines. Each item has an input, an optional expectedOutput, and optional context and metadata.

You can manage datasets from:

Dashboard UI — create datasets, add items (single or CSV import), run evaluations, and export results from Datasets in the left sidebar.
SDK — TypeScript, Python, and Java SDKs for programmatic dataset management.
REST API — see the Datasets API reference.

Manage Datasets

Create, list, rename, duplicate, and delete datasets.

Dataset Items

Add items individually, upload from CSV, archive, and export.

Dataset Runs

Create runs with prompts or APIs, upload CSV results, and compare runs.

Datasets

Datasets

Manage Datasets

Dataset Items

Dataset Runs

On this page