BrowserStack AI Evals

Experiments API

Create and manage experiments and experiment runs.

Experiments API

Experiments associate a dataset and an evaluator list (and optionally a prompt) to measure LLM pipeline quality systematically. Each experiment can have multiple runs.

Create Experiment

POST /api/public/experiments

Two creation modes are supported:

Mode 1 — by dataset run tag (use a tagged dataset run as ground truth):

FieldTypeRequiredDescription
namestringYesExperiment name
descriptionstringNoDescription
datasetRunTagIdstringYesTag ID from a dataset run
evaluatorListIdstringYesEvaluator list ID
concurrencyintegerNoMax concurrent eval executions

Mode 2 — by prompt + dataset (run evaluations against a prompt on a dataset):

FieldTypeRequiredDescription
namestringYesExperiment name
descriptionstringNoDescription
promptIdstringYesPrompt ID
datasetIdstringYesDataset ID
evaluatorListIdstringYesEvaluator list ID
concurrencyintegerNoMax concurrent eval executions

cURL Example (tag-based)

curl -X POST https://evals-api.browserstack.com/api/public/experiments \
  -u "pk-lf-...:sk-lf-..." \
  -H "Content-Type: application/json" \
  -d '{
    "name": "rag-eval-baseline",
    "datasetRunTagId": "tag-uuid-1",
    "evaluatorListId": "eval-list-uuid-1"
  }'

cURL Example (prompt-based)

curl -X POST https://evals-api.browserstack.com/api/public/experiments \
  -u "pk-lf-...:sk-lf-..." \
  -H "Content-Type: application/json" \
  -d '{
    "name": "prompt-eval-v2",
    "promptId": "prompt-uuid-1",
    "datasetId": "ds-uuid-1",
    "evaluatorListId": "eval-list-uuid-1"
  }'

Response

{
  "id": "exp-uuid-1",
  "name": "rag-eval-baseline",
  "description": null,
  "concurrency": 10,
  "projectId": "proj-xyz",
  "datasetId": "ds-uuid-1",
  "promptId": null,
  "datasetRunTagId": "tag-uuid-1",
  "experimentEvaluatorId": "eval-list-uuid-1",
  "createdBy": "user-123",
  "createdAt": "2026-04-03T10:00:00.000Z",
  "updatedAt": "2026-04-03T10:00:00.000Z"
}

List Experiments

GET /api/public/experiments
ParameterTypeDescription
pageintegerPage number
limitintegerItems per page
curl "https://evals-api.browserstack.com/api/public/experiments?page=1&limit=20" \
  -u "pk-lf-...:sk-lf-..."

Response:

{
  "experiments": [
    {
      "id": "exp-uuid-1",
      "name": "rag-eval-baseline",
      "projectId": "proj-xyz",
      "datasetId": "ds-uuid-1",
      "promptId": null,
      "datasetRunTagId": "tag-uuid-1",
      "experimentEvaluatorId": "eval-list-uuid-1",
      "concurrency": 10,
      "createdBy": "user-123",
      "createdAt": "2026-04-03T10:00:00.000Z",
      "updatedAt": "2026-04-03T10:00:00.000Z"
    }
  ],
  "totalCount": 1
}

Get Experiment

GET /api/public/experiments/{experimentId}
curl "https://evals-api.browserstack.com/api/public/experiments/exp-uuid-1" \
  -u "pk-lf-...:sk-lf-..."

Delete Experiment

DELETE /api/public/experiments/{experimentId}
curl -X DELETE "https://evals-api.browserstack.com/api/public/experiments/exp-uuid-1" \
  -u "pk-lf-...:sk-lf-..."

Experiment Runs

Create Experiment Run

POST /api/public/experiment-runs
FieldTypeRequiredDescription
experimentIdstringYesExperiment ID
namestringNoRun name
descriptionstringNoDescription
metadataobjectNoMetadata
curl -X POST https://evals-api.browserstack.com/api/public/experiment-runs \
  -u "pk-lf-...:sk-lf-..." \
  -H "Content-Type: application/json" \
  -d '{
    "experimentId": "exp-uuid-1",
    "name": "run-2026-04-03"
  }'

Response:

{
  "id": "run-uuid-1",
  "experimentId": "exp-uuid-1",
  "name": "run-2026-04-03",
  "status": "PENDING",
  "createdAt": "2026-04-03T10:00:00.000Z",
  "updatedAt": "2026-04-03T10:00:00.000Z"
}

List Experiment Runs

GET /api/public/experiment-runs
ParameterTypeDescription
experimentIdstringFilter by experiment ID
pageintegerPage number
limitintegerItems per page
curl "https://evals-api.browserstack.com/api/public/experiment-runs?experimentId=exp-uuid-1" \
  -u "pk-lf-...:sk-lf-..."

Get Experiment Run

GET /api/public/experiment-runs/{runId}
curl "https://evals-api.browserstack.com/api/public/experiment-runs/run-uuid-1" \
  -u "pk-lf-...:sk-lf-..."

Delete Experiment Run

DELETE /api/public/experiment-runs/{runId}
curl -X DELETE "https://evals-api.browserstack.com/api/public/experiment-runs/run-uuid-1" \
  -u "pk-lf-...:sk-lf-..."

Typical Workflow

# 1. Create or identify a dataset
curl -X POST https://evals-api.browserstack.com/api/public/datasets \
  -u "pk-lf-...:sk-lf-..." \
  -d '{ "name": "my-dataset" }'

# 2. Upload items to the dataset (see Datasets API)

# 3. Create an evaluator list (see Evaluators API)

# 4. Create an experiment
curl -X POST https://evals-api.browserstack.com/api/public/experiments \
  -u "pk-lf-...:sk-lf-..." \
  -d '{
    "name": "my-experiment",
    "datasetId": "<dataset-id>",
    "promptId": "<prompt-id>",
    "evaluatorListId": "<eval-list-id>"
  }'

# 5. Trigger a run
curl -X POST https://evals-api.browserstack.com/api/public/experiment-runs \
  -u "pk-lf-...:sk-lf-..." \
  -d '{ "experimentId": "<experiment-id>", "name": "run-1" }'