Overview

Dr.Gero is an evaluation and inference-routing layer for LLM applications. You define a task, attach a dataset, compare multiple models, and then call a stable inference endpoint that routes to the model selected by the leaderboard.

Typical workflow

Create an account and workspace. Sign in with email/password, then use the workspace selector in the header.
Connect integrations. Add an OpenRouter key before creating leaderboards or running evaluations. Add a Hugging Face key when using private or gated datasets.
Create a leaderboard. Choose a name, system prompt, dataset mode, and evaluation type.
Add models. Add OpenRouter, Custom, Hugging Face, or Dr.Gero models. You can also auto-select OpenRouter models based on cost, latency, and open-source constraints.
Run a ranking. Dr.Gero evaluates the candidate models against the dataset and records run logs, costs, scores, and traces.
Serve inference. Create an API token and call /v1/leaderboard/{leaderboard_id}/inference.
Iterate. Push new production examples, inspect traces, schedule recurring rankings, and fine-tune Dr.Gero models.

Base environment

Use this shell setup for the examples in this docs site:

bash

export API_BASE="https://dr-gero-frontend-99142474693.europe-west1.run.app"
export DRGERO_TOKEN="drgero_REPLACE_WITH_TOKEN_FROM_SETTINGS"
export LEADERBOARD_ID="b60fe691-06a3-4261-bec3-6080380dc72d"

The DRGERO_TOKEN is an API token created in Settings → Tokens. Keep it server-side and never expose it in browser code.

UI versus API

Dr.Gero exposes two families of endpoints:

Endpoint family	Auth	Purpose
`/v1/leaderboard/...`	Dr.Gero API token or push token	Public runtime APIs for inference, traces, and push datasets.
`/api/leaderboards`, `/api/models`	Dr.Gero API token for resource APIs; Supabase session for some UI-only actions	Manage leaderboards, models, runs, and fine-tuning.
`/api/tokens`, `/api/invite-user`, `/api/integrations/validate`	Supabase user session	Workspace administration from the signed-in UI.

Minimum setup checklist

OpenRouter integration saved and validated.
A leaderboard with either a Hugging Face JSONL dataset or a PUSH dataset.
At least two candidate models before running a ranking.
A completed ranking before using the inference endpoint.
A Dr.Gero API token with the right scopes for automation.

Overview ​

Typical workflow ​

Base environment ​

UI versus API ​

Minimum setup checklist ​

Overview

Typical workflow

Base environment

UI versus API

Minimum setup checklist