Systematically evaluate AI pipeline outputs against a dataset using prompts, APIs, or existing runs.

Experiments

Experiments let you systematically evaluate the outputs of an AI pipeline against a dataset, using one or more evaluators. Each run produces scored results you can compare and analyze in the dashboard.

How to Provide Outputs

Every experiment needs outputs to score. The platform supports three ways to provide them:

Prompt + Dataset — the platform runs the given prompt against each dataset item to generate outputs, then evaluates them. Use when you want to test a prompt end-to-end.
API (Dashboard only) — configure an HTTP endpoint that receives each dataset item and returns an output. The platform calls your API, collects the response, and evaluates it. Use when your AI pipeline isn't just a single prompt (e.g., RAG, agents, custom workflows).
Dataset Run Tag — point the experiment at an existing tagged dataset run that already has outputs (generated from your own code or trace pipeline). Use when you've already run the pipeline and just want to evaluate the results.

All three options use the same evaluator lists and produce results in the same dashboard view.

Where to Create Experiments

Dashboard UI — configure and launch from Evaluation > Experiments. Supports all three output options above, including the API-based experiment.
SDK — TypeScript, Python, and Java SDKs support Prompt + Dataset and Dataset Run Tag via client.experiments.create() / client.experimentRuns.create().
REST API — direct HTTP access to the same endpoints. See the Experiments API reference.

Create an Experiment

Create and configure experiments with prompts, datasets, and evaluators.

Experiment Runs

Start runs, poll for completion, and view results.

Auto-Stop & Pre-Run Checks

Automatically stop runs at a failure threshold and verify API endpoints before starting.

Experiments API

REST endpoints for creating and managing experiments programmatically.

Experiments

Experiments

How to Provide Outputs

Where to Create Experiments

Create an Experiment

Experiment Runs

Auto-Stop & Pre-Run Checks

Experiments API

On this page