Score Playground outputs with your project's evaluators without creating a full experiment.

Inline Evaluations

You can run evaluators on Playground outputs directly, without setting up a formal experiment. This gives you instant scoring feedback while you're iterating on prompts or comparing models.

Scoring a Single Output

Run your prompt

Set up your prompt and click Run. Wait for the response to finish streaming.

Open the Evaluate panel

Click Evaluate in the output panel (the scale icon). A panel opens listing the evaluators available in your project.

Select evaluators

Check one or more evaluators to apply. You can select individual evaluators or an evaluator list.

View scores

The selected evaluators run against the output. Each evaluator's score and reasoning appear inline, directly below the response. For LLM-as-a-Judge evaluators, you also see the judge's explanation.

Scoring Dataset Run Results

When you've run a dataset through the Playground, evaluators can score every row:

Add evaluators before running

In the Evaluators panel (accessible during a dataset run), select the evaluators or evaluator list to apply. You can configure this before or after starting the run.

Review per-row scores

After the batch run completes, the results table shows a score column for each evaluator. Click any row to see the full output, score, and reasoning for that row.

Switch to the Evaluators tab

Toggle to the Evaluators tab in the results area to see aggregated average scores per evaluator across all rows and windows.

Iterating with Evaluations

Adjust your prompt or model configuration based on the scores and reasoning.

Re-run the Playground.

Re-evaluate to see whether scores improved.

Use model comparison with evaluations to quantitatively compare two prompt variants or models across the same dataset — scores appear side by side in the results table.

Inline Evaluations

On this page