BrowserStack AI Evals
Dashboards & Analytics

Metrics

Reference for all available metrics, dimensions, aggregation functions, and filter operators in the dashboard query engine.

Metrics Reference

Dashboard widgets query one of five data views. Each view exposes a set of metrics you can aggregate and dimensions you can group by or filter on.

Data Views

Traces

One row per trace. Use this view for request-level monitoring.

Metrics

MetricDescription
countNumber of traces
latencyEnd-to-end trace duration (ms)
totalTokensSum of input + output tokens across all observations in the trace
totalCostEstimated cost (USD) across all observations
uniqueUserCountDistinct user IDs
observationsCountNumber of observations (spans/generations) per trace
scoresCountNumber of evaluation scores attached

Dimensions

DimensionDescription
nameTrace name set by the SDK
userIdUser identifier passed at ingestion
sessionIdSession identifier
environmentEnvironment tag (e.g. production, staging)
releaseRelease version string
versionApplication version
tagsFree-form tags array

Observations

One row per span, generation, or event. Use this view for model-level and operation-level analysis.

Metrics

MetricDescription
countNumber of observations
latencyObservation duration (ms)
streamingLatencyStreaming-specific latency (ms)
timeToFirstTokenTime from request to first token (ms)
tokensPerSecondOverall throughput (tokens/s)
outputTokensPerSecondOutput-only throughput (tokens/s)
inputTokensInput token count
outputTokensOutput token count
totalTokensInput + output tokens
inputCostEstimated input cost (USD)
outputCostEstimated output cost (USD)
totalCostTotal estimated cost (USD)
countScoresNumber of scores attached

Dimensions

DimensionDescription
nameObservation name
typeObservation type (GENERATION, SPAN, EVENT)
levelLog level (DEFAULT, DEBUG, WARNING, ERROR)
providedModelNameModel name as provided by the SDK
promptNameName of the prompt template used
environmentEnvironment tag

Sessions

One row per session (a group of related traces). Use this view for multi-turn conversation analytics.

Metrics

MetricDescription
countNumber of sessions
traceCountAverage number of traces per session
latencySession duration (ms)
totalTokensTotal tokens across all traces in the session
totalCostTotal cost (USD) across the session

Dimensions

DimensionDescription
sessionIdSession identifier
userIdUser identifier
environmentEnvironment tag
releaseRelease version string
versionApplication version
tagsFree-form tags

Scores (Numeric)

One row per numeric evaluation score. Use this view to track evaluation quality over time.

Metrics

MetricDescription
countNumber of scores
avgMean score value
minMinimum score value
maxMaximum score value
p50Median score
p7575th percentile
p9090th percentile
p9595th percentile
p9999th percentile

Dimensions

DimensionDescription
nameScore name (evaluator name)
sourceScore source (API, EVAL, HUMAN)
dataTypeData type (NUMERIC, BOOLEAN)
environmentEnvironment tag

Scores (Categorical)

One row per categorical evaluation score.

Metrics

MetricDescription
countNumber of scores

Dimensions

Same as Scores (Numeric): name, source, dataType, environment.


Aggregation Functions

FunctionDescription
countRow count
sumSum of values
avgArithmetic mean
uniqueCount of distinct values
minMinimum value
maxMaximum value
p5050th percentile (median)
p7575th percentile
p9090th percentile
p9595th percentile
p9999th percentile

Percentile aggregations (p50p99) are most meaningful for latency and cost metrics. They are computed using ClickHouse's quantile function.

Filters

You can add one or more filter conditions to any widget to restrict which rows are included in the query.

Filter Operators

OperatorApplies toDescription
equalsString, numberExact match
not equalsString, numberExclude exact match
containsStringSubstring match
not containsStringExclude substring match
starts withStringPrefix match
ends withStringSuffix match
greater thanNumberNumeric comparison
less thanNumberNumeric comparison
is nullAnyField is absent/null
is not nullAnyField is present

Multiple filters on the same widget are combined with AND logic.

Time Range and Granularity

All widgets operate on a time range (start date to end date). You can set a date preset per widget:

PresetWindow
Last 1 hour1 h
Last 24 hours24 h
Last 7 days7 d (default)
Last 30 days30 d
Last 90 days90 d

For time-series widgets, granularity controls how data is bucketed along the time axis:

GranularityBucket size
autoChosen automatically based on date range
hour1 hour buckets
day1 day buckets
week7 day buckets
monthCalendar month buckets