Skip to content

Overview

Dr.Gero is an evaluation and inference-routing layer for LLM applications. You define a task, attach a dataset, compare multiple models, and then call a stable inference endpoint that routes to the model selected by the leaderboard.

Typical workflow

  1. Create an account and workspace. Sign in with email/password, then use the workspace selector in the header.
  2. Connect integrations. Add an OpenRouter key before creating leaderboards or running evaluations. Add a Hugging Face key when using private or gated datasets.
  3. Create a leaderboard. Choose a name, system prompt, dataset mode, and evaluation type.
  4. Add models. Add OpenRouter, Custom, Hugging Face, or Dr.Gero models. You can also auto-select OpenRouter models based on cost, latency, and open-source constraints.
  5. Run a ranking. Dr.Gero evaluates the candidate models against the dataset and records run logs, costs, scores, and traces.
  6. Serve inference. Create an API token and call /v1/leaderboard/{leaderboard_id}/inference.
  7. Iterate. Push new production examples, inspect traces, schedule recurring rankings, and fine-tune Dr.Gero models.

Base environment

Use this shell setup for the examples in this docs site:

bash
export API_BASE="https://dr-gero-frontend-99142474693.europe-west1.run.app"
export DRGERO_TOKEN="drgero_REPLACE_WITH_TOKEN_FROM_SETTINGS"
export LEADERBOARD_ID="b60fe691-06a3-4261-bec3-6080380dc72d"

The DRGERO_TOKEN is an API token created in Settings → Tokens. Keep it server-side and never expose it in browser code.

UI versus API

Dr.Gero exposes two families of endpoints:

Endpoint familyAuthPurpose
/v1/leaderboard/...Dr.Gero API token or push tokenPublic runtime APIs for inference, traces, and push datasets.
/api/leaderboards, /api/modelsDr.Gero API token for resource APIs; Supabase session for some UI-only actionsManage leaderboards, models, runs, and fine-tuning.
/api/tokens, /api/invite-user, /api/integrations/validateSupabase user sessionWorkspace administration from the signed-in UI.

Minimum setup checklist

  • OpenRouter integration saved and validated.
  • A leaderboard with either a Hugging Face JSONL dataset or a PUSH dataset.
  • At least two candidate models before running a ranking.
  • A completed ranking before using the inference endpoint.
  • A Dr.Gero API token with the right scopes for automation.