BrowserStack AI Evals
EvaluationDatasets

Datasets

Create and manage evaluation datasets and dataset runs from the dashboard, SDK, or API.

Datasets

Datasets store collections of input/expected-output pairs used to evaluate LLM pipelines. Each item has an input, an optional expectedOutput, and optional context and metadata.

You can manage datasets from:

  • Dashboard UI — create datasets, add items (single or CSV import), run evaluations, and export results from Datasets in the left sidebar.
  • SDK — TypeScript, Python, and Java SDKs for programmatic dataset management.
  • REST API — see the Datasets API reference.

On this page