Human Review

Set up annotation queues for human evaluation, scoring, and quality review of AI outputs.

Human Review

Human review lets your team manually score and label AI outputs. Reviewers work through queues of traces or observations, applying structured scores that feed back into your evaluation pipeline as ground truth.

How it fits in the evaluation pipeline

Live traces (production or experiment)
    ↓
Annotation Queue (filtered set for review)
    ↓
Human Reviewers score each item
    ↓
Scores stored → Golden datasets / model feedback

Use human review to:

Build ground truth — create labeled datasets from real traffic for fine-tuning or benchmarking.
Quality assurance — catch systematic failures that automated evaluators miss.
Model feedback loops — surface low-quality outputs for targeted improvement.

Annotation Queues

Create and manage queues for routing traces to human reviewers.

Scoring & Labels

Configure score types, rubrics, and view score analytics.

Custom Layouts

Control how items are presented to reviewers with custom view layouts.

Shared Views

Share annotation queues with external reviewers via secret links.

Human Review

Human Review

How it fits in the evaluation pipeline

Annotation Queues

Scoring & Labels

Custom Layouts

Shared Views

On this page