BrowserStack AI Evals
Settings & Configuration

Automated Rules

Configure online evaluation, human review routing, and dataset automation rules.

Automated Rules

Access via Settings → Automated Rules in your project.

Automated Rules let you define what happens automatically as traces arrive — scoring them with evaluators, routing them to human review queues, or extracting them into datasets.

Rule Types

The Automated Rules page has three tabs:

TabDescription
Online EvaluationsScore incoming traces with evaluators as they arrive
Human ReviewRoute traces to annotation queues for human scoring
DatasetsAutomatically extract traces matching criteria into a dataset

Online Evaluation Rules

Online evaluation rules attach evaluators to incoming traces. When a matching trace is ingested, the rule runs the configured evaluator and stores a score.

Creating an Online Evaluation Rule

  1. Go to Settings → Automated Rules → Online Evaluations
  2. Click Create rule
  3. Configure:
    • Rule name — identifier shown in the rules list
    • Description — optional description
    • Evaluator — which evaluator to run (LLM-as-judge, RAGAS metric, code-based)
    • Filters — optional trace filters (e.g., only traces tagged production, or with a specific score below a threshold)
    • Sampling rate — percentage of matching traces to evaluate (0–100%)
    • Concurrency — maximum parallel evaluations
  4. Click Save and activate

Rule Status

Rules can be in three states:

  • Active — running, evaluating incoming traces
  • Paused — rule exists but is not processing traces
  • Inactive — rule has been disabled

Toggle status with the play/pause button in the rules list.

Sampling Rate

Set the sampling rate to evaluate a fraction of traffic. This is useful when:

  • Evaluation cost is high (e.g., GPT-4o-based judge)
  • You want statistical coverage rather than full coverage
  • Traffic volume is very high

A 10% sampling rate means 1 in 10 matching traces will be evaluated.

Token Exhaustion

A token exhaustion health bar shows how much of your evaluation token budget has been consumed. If the budget is exhausted, online evaluations pause automatically until the next billing period or you increase the budget.


Human Review Rules

Human review rules route traces to annotation queues for manual scoring.

Creating a Human Review Rule

  1. Go to Settings → Automated Rules → Human Review
  2. Click Create rule
  3. Configure:
    • Rule name and description
    • Queue — which annotation queue to route traces into
    • Filters — trace filters (score thresholds, tags, metadata)
    • Sampling rate — percentage of matching traces to route
  4. Click Save and activate

Human review rules are useful for:

  • Routing low-scoring traces (flagged by online evaluators) to human review
  • Randomly sampling a percentage of production traffic for quality audits
  • Building high-quality labeled datasets from production traffic

Dataset Automation Rules

Dataset rules automatically extract traces that match specified criteria into a named dataset.

Creating a Dataset Rule

  1. Go to Settings → Automated Rules → Datasets
  2. Click Create rule
  3. Configure:
    • Rule name and description
    • Target dataset — which dataset to add matching traces to (created automatically if it doesn't exist)
    • Filters — trace filters (score ranges, tags, user ID, session ID, metadata fields)
    • Sampling rate — percentage of matching traces to include
  4. Click Save and activate

Dataset rules are useful for:

  • Continuously building a dataset of production failures (low-score traces)
  • Sampling a representative slice of production traffic for regression testing
  • Capturing traces for specific users or sessions

Permissions

ActionRequired Role
View rulesMEMBER and above
Create / edit / delete rulesMEMBER and above
Pause / resume rulesMEMBER and above