SDK-side evaluation operations — datasets, experiments, evaluators, prompts, and tools across TypeScript, Python, and Java.

Evaluation

The evaluation SDK lets you manage datasets, run experiments, configure evaluators, and work with versioned prompts and tools — all from code.

For UI-based evaluation workflows in the dashboard, see Platform Evaluation.

Pages in This Section

Create datasets, add test items, and track dataset runs via SDK.

Create experiments, start runs, and poll for results via SDK.

Build evaluator lists and run inline evaluations via SDK.

Fetch versioned prompts and compile Mustache templates via SDK.