EvaluationHuman Review
Human Review
Set up annotation queues for human evaluation, scoring, and quality review of AI outputs.
Human Review
Human review lets your team manually score and label AI outputs. Reviewers work through queues of traces or observations, applying structured scores that feed back into your evaluation pipeline as ground truth.
How it fits in the evaluation pipeline
Live traces (production or experiment)
↓
Annotation Queue (filtered set for review)
↓
Human Reviewers score each item
↓
Scores stored → Golden datasets / model feedbackUse human review to:
- Build ground truth — create labeled datasets from real traffic for fine-tuning or benchmarking.
- Quality assurance — catch systematic failures that automated evaluators miss.
- Model feedback loops — surface low-quality outputs for targeted improvement.
Annotation Queues
Create and manage queues for routing traces to human reviewers.
Scoring & Labels
Configure score types, rubrics, and view score analytics.
Custom Layouts
Control how items are presented to reviewers with custom view layouts.
Shared Views
Share annotation queues with external reviewers via secret links.