BrowserStack AI Evals

Datasets API

Create and manage datasets, dataset items, dataset runs, and run items.

Datasets API

Datasets are collections of test cases (items) used to run experiments and evaluate LLM pipelines. Each dataset can have multiple runs, and each run contains items that link back to traces.

Datasets

Create Dataset

POST /api/public/datasets
FieldTypeRequiredDescription
namestringYesUnique dataset name
descriptionstringNoHuman-readable description
metadataobjectNoArbitrary metadata
curl -X POST https://evals-api.browserstack.com/api/public/datasets \
  -u "pk-lf-...:sk-lf-..." \
  -H "Content-Type: application/json" \
  -d '{ "name": "qa-dataset-v1", "description": "QA test cases" }'

Response:

{
  "id": "ds-uuid-1",
  "name": "qa-dataset-v1",
  "description": "QA test cases",
  "metadata": null,
  "projectId": "proj-xyz",
  "createdAt": "2026-04-03T10:00:00.000Z",
  "updatedAt": "2026-04-03T10:00:00.000Z"
}

List Datasets

GET /api/public/datasets
ParameterTypeDescription
pageintegerPage number
limitintegerItems per page
curl "https://evals-api.browserstack.com/api/public/datasets?page=1&limit=20" \
  -u "pk-lf-...:sk-lf-..."

Get Dataset

GET /api/public/datasets/{datasetName}
curl "https://evals-api.browserstack.com/api/public/datasets/qa-dataset-v1" \
  -u "pk-lf-...:sk-lf-..."

Get Dataset Tags

GET /api/public/datasets/{datasetName}/tags

Returns all run tags associated with a dataset.

curl "https://evals-api.browserstack.com/api/public/datasets/qa-dataset-v1/tags" \
  -u "pk-lf-...:sk-lf-..."

Response:

{
  "tags": [
    { "id": "tag-uuid-1", "tagName": "baseline", "datasetRunId": "run-uuid-1" }
  ]
}

Dataset Items

Create Dataset Item

POST /api/public/dataset-items
FieldTypeRequiredDescription
datasetNamestringYesTarget dataset name
inputanyNoInput data for this test case
expectedOutputanyNoExpected output for evaluation
contextanyNoContext information
metadataanyNoArbitrary metadata
idstringNoCustom item ID
sourceTraceIdstringNoTrace this item was sourced from
sourceObservationIdstringNoObservation this item was sourced from
statusstringNoACTIVE (default) or ARCHIVED
curl -X POST https://evals-api.browserstack.com/api/public/dataset-items \
  -u "pk-lf-...:sk-lf-..." \
  -H "Content-Type: application/json" \
  -d '{
    "datasetName": "qa-dataset-v1",
    "input": { "question": "What is the capital of France?" },
    "expectedOutput": "Paris",
    "metadata": { "category": "geography" }
  }'

Response:

{
  "id": "item-uuid-1",
  "datasetId": "ds-uuid-1",
  "datasetName": "qa-dataset-v1",
  "status": "ACTIVE",
  "input": { "question": "What is the capital of France?" },
  "expectedOutput": "Paris",
  "context": null,
  "metadata": { "category": "geography" },
  "sourceTraceId": null,
  "sourceObservationId": null,
  "createdAt": "2026-04-03T10:00:00.000Z",
  "updatedAt": "2026-04-03T10:00:00.000Z"
}

List Dataset Items

GET /api/public/dataset-items
ParameterTypeDescription
datasetNamestringFilter by dataset name
sourceTraceIdstringFilter by source trace ID
sourceObservationIdstringFilter by source observation ID
pageintegerPage number
limitintegerItems per page
curl "https://evals-api.browserstack.com/api/public/dataset-items?datasetName=qa-dataset-v1" \
  -u "pk-lf-...:sk-lf-..."

Get Dataset Item

GET /api/public/dataset-items/{datasetItemId}
curl "https://evals-api.browserstack.com/api/public/dataset-items/item-uuid-1" \
  -u "pk-lf-...:sk-lf-..."

Delete Dataset Item

DELETE /api/public/dataset-items/{datasetItemId}
curl -X DELETE "https://evals-api.browserstack.com/api/public/dataset-items/item-uuid-1" \
  -u "pk-lf-...:sk-lf-..."

Response:

{ "message": "Dataset item successfully deleted" }

Dataset Runs

A dataset run represents one execution of a pipeline against a dataset.

Create Dataset Run

POST /api/public/datasets/{datasetName}/runs
FieldTypeRequiredDescription
namestringNoRun name (auto-generated if omitted)
descriptionstringNoDescription
tagsstring[]NoTags ([a-zA-Z0-9_-], max 50 chars each)
curl -X POST "https://evals-api.browserstack.com/api/public/datasets/qa-dataset-v1/runs" \
  -u "pk-lf-...:sk-lf-..." \
  -H "Content-Type: application/json" \
  -d '{ "name": "run-2026-04-03", "tags": ["baseline"] }'

Response:

{
  "id": "run-uuid-1",
  "name": "run-2026-04-03",
  "description": null,
  "metadata": null,
  "datasetId": "ds-uuid-1",
  "type": "MUTABLE",
  "status": "OPEN",
  "tags": [{ "tagName": "baseline" }],
  "createdAt": "2026-04-03T10:00:00.000Z",
  "updatedAt": "2026-04-03T10:00:00.000Z"
}

List Dataset Runs

GET /api/public/datasets/{datasetName}/runs
ParameterTypeDescription
pageintegerPage number
limitintegerItems per page
namestringFilter by run name
typestringIMMUTABLE or MUTABLE
curl "https://evals-api.browserstack.com/api/public/datasets/qa-dataset-v1/runs" \
  -u "pk-lf-...:sk-lf-..."

Get Dataset Run

GET /api/public/datasets/{datasetName}/runs/{runName}

Returns a run with its items.

curl "https://evals-api.browserstack.com/api/public/datasets/qa-dataset-v1/runs/run-2026-04-03" \
  -u "pk-lf-...:sk-lf-..."

Delete Dataset Run

DELETE /api/public/datasets/{datasetName}/runs/{runName}
curl -X DELETE "https://evals-api.browserstack.com/api/public/datasets/qa-dataset-v1/runs/run-2026-04-03" \
  -u "pk-lf-...:sk-lf-..."

Dataset Run Items

Run items link a dataset item to a trace from a specific run.

Create Dataset Run Item

POST /api/public/dataset-run-items
FieldTypeRequiredDescription
runNamestringYesDataset run name
datasetItemIdstringYesDataset item ID
traceIdstringConditionalTrace ID (required if no observationId)
observationIdstringConditionalObservation ID (required if no traceId)
runDescriptionstringNoRun description
metadataobjectNoMetadata
curl -X POST https://evals-api.browserstack.com/api/public/dataset-run-items \
  -u "pk-lf-...:sk-lf-..." \
  -H "Content-Type: application/json" \
  -d '{
    "runName": "run-2026-04-03",
    "datasetItemId": "item-uuid-1",
    "traceId": "trace-uuid-1"
  }'

Get Run Tag

GET /api/public/datasets/{datasetName}/runs/tags/{tagName}

Returns the run associated with a specific tag.

curl "https://evals-api.browserstack.com/api/public/datasets/qa-dataset-v1/runs/tags/baseline" \
  -u "pk-lf-...:sk-lf-..."

Run Status Values

StatusTypeDescription
OPENMUTABLERun is accepting new items
PENDINGBothQueued, not yet started
RUNNINGBothCurrently executing
COMPLETEDBothFinished successfully
FAILEDBothFinished with errors