Appearance
Push datasets
A PUSH leaderboard accepts examples from your application through an HTTP webhook. This lets production traffic, user feedback, or previous LLM responses become future evaluation and training data.
Create a PUSH leaderboard
In Create Leaderboard → Dataset, choose Push dataset. Configure:
- Auto-limit dataset size: keep a bounded consolidated dataset.
- Maximum samples to keep: cap the final dataset size.
- Daily accepted-row limit: protect against accidental spikes.
- Monthly accepted-row limit: control long-term ingestion.
- Consolidate every N events: compact pending rows after enough new rows arrive.
- Consolidate every N hours: compact based on time.
- End date: optional final date for accepting new rows.
Create a dataset token
Open the leaderboard Detail tab and use the PUSH dataset controls to create a token. Dataset push tokens start with hpd_ and can push rows to one leaderboard only.
You can also push rows with a Dr.Gero API token that has leaderboards:write.
Push a row
bash
export API_BASE="https://dr-gero-frontend-99142474693.europe-west1.run.app"
export DRGERO_TOKEN="drgero_REPLACE_WITH_TOKEN_FROM_SETTINGS"
export PUSH_LEADERBOARD_ID="241b151a-407b-4dfb-9bad-e95906567647"
curl -sS -X POST "$API_BASE/v1/leaderboard/$PUSH_LEADERBOARD_ID/dataset/push" \
-H "Authorization: Bearer $DRGERO_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"input": "What did the user ask?",
"output": "The answer your current LLM returned"
}'Supported row formats
The push endpoint accepts:
- A single row object.
- An array of row objects.
- An object with
rows,events, orsamples.
Common row fields:
| Field | Purpose |
|---|---|
input, prompt, question, query | User input or task input. |
messages | Chat messages; converted to text input when input is absent. |
output, response, completion, answer, expected | Expected/current answer. |
rubric | Optional judge rubric. |
id, example_id, trace_id | Stable row identifier. |
metadata | Free-form JSON metadata. |
tags, user_id, session_id, conversation_id | Common metadata fields. |
latency_ms, cost_usd, model, provider | Operational metadata from production calls. |
Consolidation
Pushed rows first become pending events. Consolidation deduplicates rows, enforces max sample limits, and writes the dataset that leaderboard runs can evaluate.
You can trigger consolidation from the API, or let configured event/time thresholds handle it automatically.
Status
The status endpoint shows accepted/pending/consolidated counts, tokens, recent batches, and dataset URL metadata.