BrowserStack AI Evals

Introduction

Welcome to BrowserStack AI Evals — the comprehensive platform for testing and evaluating AI applications.

BrowserStack AI Evals

BrowserStack AI Evals is a comprehensive AI application testing and evaluation platform. It provides LLM observability, evaluation pipelines, and testing capabilities to help teams build reliable AI-powered products.

What You Can Do

Trace LLM calls — capture inputs, outputs, latency, and token usage across your AI stack
Run evaluations — score responses with built-in and custom evaluators (RAGAS, LLM-as-judge, rule-based)
Manage datasets — create curated test sets and run regression experiments
Compare experiments — benchmark prompt changes, model swaps, and configuration updates
Monitor in production — track quality metrics over time with dashboards and alerts

Choose Your Path

Platform

Getting Started

Set up your project and run your first evaluation in minutes.

Tracing

View traces, sessions, and observations in the dashboard.

Evaluation

Datasets, experiments, evaluators, prompts, and tools for systematic AI testing.

Settings

Configure projects, teams, and integrations.

SDK

SDK Setup

Install the SDK and initialize the client for TypeScript, Python, or Java.

Auto-Instrumentation

Auto-trace LLM calls for all major providers and frameworks.

Manual Tracing

Create traces, spans, generations, and scores explicitly.

Distributed Tracing

Link traces across microservices with context propagation.

CLI

aievals CLI

Manage datasets, run experiments, and score responses from the terminal and CI.

Experiment Workflow

Run a full evaluation end to end from the command line.

API

API Reference

Complete REST API documentation organized by resource domain.

Introduction

What BrowserStack AI Evals is, what it does, and how to get started.

On this page

BrowserStack AI Evals What You Can Do Choose Your Path Platform SDK CLI API