Inline Evaluations
Score Playground outputs with your project's evaluators without creating a full experiment.
Inline Evaluations
You can run evaluators on Playground outputs directly, without setting up a formal experiment. This gives you instant scoring feedback while you're iterating on prompts or comparing models.
Scoring a Single Output
Run your prompt
Set up your prompt and click Run. Wait for the response to finish streaming.
Open the Evaluate panel
Click Evaluate in the output panel (the scale icon). A panel opens listing the evaluators available in your project.
Select evaluators
Check one or more evaluators to apply. You can select individual evaluators or an evaluator list.
View scores
The selected evaluators run against the output. Each evaluator's score and reasoning appear inline, directly below the response. For LLM-as-a-Judge evaluators, you also see the judge's explanation.
Scoring Dataset Run Results
When you've run a dataset through the Playground, evaluators can score every row:
Add evaluators before running
In the Evaluators panel (accessible during a dataset run), select the evaluators or evaluator list to apply. You can configure this before or after starting the run.
Review per-row scores
After the batch run completes, the results table shows a score column for each evaluator. Click any row to see the full output, score, and reasoning for that row.
Switch to the Evaluators tab
Toggle to the Evaluators tab in the results area to see aggregated average scores per evaluator across all rows and windows.
Iterating with Evaluations
Adjust your prompt or model configuration based on the scores and reasoning.
Re-run the Playground.
Re-evaluate to see whether scores improved.
Use model comparison with evaluations to quantitatively compare two prompt variants or models across the same dataset — scores appear side by side in the results table.