Skip to content

Push datasets

A PUSH leaderboard accepts examples from your application through an HTTP webhook. This lets production traffic, user feedback, or previous LLM responses become future evaluation and training data.

Create a PUSH leaderboard

In Create Leaderboard → Dataset, choose Push dataset. Configure:

  • Auto-limit dataset size: keep a bounded consolidated dataset.
  • Maximum samples to keep: cap the final dataset size.
  • Daily accepted-row limit: protect against accidental spikes.
  • Monthly accepted-row limit: control long-term ingestion.
  • Consolidate every N events: compact pending rows after enough new rows arrive.
  • Consolidate every N hours: compact based on time.
  • End date: optional final date for accepting new rows.

Create a dataset token

Open the leaderboard Detail tab and use the PUSH dataset controls to create a token. Dataset push tokens start with hpd_ and can push rows to one leaderboard only.

You can also push rows with a Dr.Gero API token that has leaderboards:write.

Push a row

bash
export API_BASE="https://dr-gero-frontend-99142474693.europe-west1.run.app"
export DRGERO_TOKEN="drgero_REPLACE_WITH_TOKEN_FROM_SETTINGS"
export PUSH_LEADERBOARD_ID="241b151a-407b-4dfb-9bad-e95906567647"

curl -sS -X POST "$API_BASE/v1/leaderboard/$PUSH_LEADERBOARD_ID/dataset/push" \
  -H "Authorization: Bearer $DRGERO_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "What did the user ask?",
    "output": "The answer your current LLM returned"
  }'

Supported row formats

The push endpoint accepts:

  • A single row object.
  • An array of row objects.
  • An object with rows, events, or samples.

Common row fields:

FieldPurpose
input, prompt, question, queryUser input or task input.
messagesChat messages; converted to text input when input is absent.
output, response, completion, answer, expectedExpected/current answer.
rubricOptional judge rubric.
id, example_id, trace_idStable row identifier.
metadataFree-form JSON metadata.
tags, user_id, session_id, conversation_idCommon metadata fields.
latency_ms, cost_usd, model, providerOperational metadata from production calls.

Consolidation

Pushed rows first become pending events. Consolidation deduplicates rows, enforces max sample limits, and writes the dataset that leaderboard runs can evaluate.

You can trigger consolidation from the API, or let configured event/time thresholds handle it automatically.

Status

The status endpoint shows accepted/pending/consolidated counts, tokens, recent batches, and dataset URL metadata.