BrowserStack AI Evals
EvaluationHuman Review

Scoring & Labels

Configure score types, rubrics, and view score analytics for annotation queues.

Scoring & Labels

Each queue is linked to one or more score configurations that define what reviewers score. Annotators fill in these scores for every item they complete.

Score Types

TypeDescriptionExample
NumericA number, optionally bounded by minValue and maxValueRelevance: 0.01.0
CategoricalA selection from predefined labels, each mapped to a numeric valueQuality: PASS (1), FAIL (0)
BooleanA categorical with two options: true/falseCorrect: Yes / No
TextA free-text comment or noteReviewer notes

Configuring Score Configs

Score configs are created in Settings → Score Configurations. Each config has:

  • A name (e.g. "Relevance")
  • A data type (NUMERIC, CATEGORICAL, BOOLEAN, or TEXT)
  • For numeric: optional minValue and maxValue constraints
  • For categorical: a list of { label, value } pairs

Attach score configs to a queue during creation or via the queue edit dialog.

How Scores Appear in Analytics

After review, scores are aggregated per run:

  • Numeric scores — average, distribution histogram, individual values
  • Categorical scores — count per category, percentage breakdown

These analytics are visible in the queue detail view and are available via the secret link analytics endpoint for external sharing.