Evaluation
Evaluation
SDK-side evaluation operations — datasets, experiments, evaluators, prompts, and tools across TypeScript, Python, and Java.
Evaluation
The evaluation SDK lets you manage datasets, run experiments, configure evaluators, and work with versioned prompts and tools — all from code.
For UI-based evaluation workflows in the dashboard, see Platform Evaluation.
Pages in This Section
Datasets
Create datasets, add test items, and track dataset runs via SDK.
Experiments
Create experiments, start runs, and poll for results via SDK.
Evaluators
Build evaluator lists and run inline evaluations via SDK.
Prompts
Fetch versioned prompts and compile Mustache templates via SDK.
Tools
Register and compile LLM tool schemas across providers via SDK.