Appearance
Overview
Dr.Gero is an evaluation and inference-routing layer for LLM applications. You define a task, attach a dataset, compare multiple models, and then call a stable inference endpoint that routes to the model selected by the leaderboard.
Typical workflow
- Create an account and workspace. Sign in with email/password, then use the workspace selector in the header.
- Connect integrations. Add an OpenRouter key before creating leaderboards or running evaluations. Add a Hugging Face key when using private or gated datasets.
- Create a leaderboard. Choose a name, system prompt, dataset mode, and evaluation type.
- Add models. Add OpenRouter, Custom, Hugging Face, or Dr.Gero models. You can also auto-select OpenRouter models based on cost, latency, and open-source constraints.
- Run a ranking. Dr.Gero evaluates the candidate models against the dataset and records run logs, costs, scores, and traces.
- Serve inference. Create an API token and call
/v1/leaderboard/{leaderboard_id}/inference. - Iterate. Push new production examples, inspect traces, schedule recurring rankings, and fine-tune Dr.Gero models.
Base environment
Use this shell setup for the examples in this docs site:
bash
export API_BASE="https://dr-gero-frontend-99142474693.europe-west1.run.app"
export DRGERO_TOKEN="drgero_REPLACE_WITH_TOKEN_FROM_SETTINGS"
export LEADERBOARD_ID="b60fe691-06a3-4261-bec3-6080380dc72d"The DRGERO_TOKEN is an API token created in Settings → Tokens. Keep it server-side and never expose it in browser code.
UI versus API
Dr.Gero exposes two families of endpoints:
| Endpoint family | Auth | Purpose |
|---|---|---|
/v1/leaderboard/... | Dr.Gero API token or push token | Public runtime APIs for inference, traces, and push datasets. |
/api/leaderboards, /api/models | Dr.Gero API token for resource APIs; Supabase session for some UI-only actions | Manage leaderboards, models, runs, and fine-tuning. |
/api/tokens, /api/invite-user, /api/integrations/validate | Supabase user session | Workspace administration from the signed-in UI. |
Minimum setup checklist
- OpenRouter integration saved and validated.
- A leaderboard with either a Hugging Face JSONL dataset or a PUSH dataset.
- At least two candidate models before running a ranking.
- A completed ranking before using the inference endpoint.
- A Dr.Gero API token with the right scopes for automation.