Introduction
Welcome to BrowserStack AI Evals — the comprehensive platform for testing and evaluating AI applications.
BrowserStack AI Evals
BrowserStack AI Evals is a comprehensive AI application testing and evaluation platform. It provides LLM observability, evaluation pipelines, and testing capabilities to help teams build reliable AI-powered products.
What You Can Do
- Trace LLM calls — capture inputs, outputs, latency, and token usage across your AI stack
- Run evaluations — score responses with built-in and custom evaluators (RAGAS, LLM-as-judge, rule-based)
- Manage datasets — create curated test sets and run regression experiments
- Compare experiments — benchmark prompt changes, model swaps, and configuration updates
- Monitor in production — track quality metrics over time with dashboards and alerts
Choose Your Path
Platform
Getting Started
Set up your project and run your first evaluation in minutes.
Tracing
View traces, sessions, and observations in the dashboard.
Evaluation
Datasets, experiments, evaluators, prompts, and tools for systematic AI testing.
Settings
Configure projects, teams, and integrations.
SDK
SDK Setup
Install the SDK and initialize the client for TypeScript, Python, or Java.
Auto-Instrumentation
Auto-trace LLM calls for all major providers and frameworks.
Manual Tracing
Create traces, spans, generations, and scores explicitly.
Distributed Tracing
Link traces across microservices with context propagation.