Appearance
Custom model endpoints
Custom models let you add any HTTPS endpoint as a leaderboard candidate.
What Dr.Gero sends
For custom/Hugging Face/Dr.Gero endpoints, Dr.Gero sends a JSON POST request using the rendered leaderboard prompt or messages.
A typical request looks like:
json
{
"messages": [
{"role": "user", "content": "Answer the support question.\n\nInput:\nHow do I reset my password?"}
],
"temperature": 0.2,
"max_tokens": 300
}Your endpoint should return either an OpenAI-compatible chat-completions response or another JSON/text response from which Dr.Gero can extract output.
Recommended response shape
json
{
"id": "cmpl_123",
"object": "chat.completion",
"choices": [
{
"index": 0,
"message": {"role": "assistant", "content": "Open Settings, then Security, and choose a new password."},
"finish_reason": "stop"
}
],
"usage": {"prompt_tokens": 42, "completion_tokens": 17, "total_tokens": 59},
"cost_usd": 0.00012
}Including usage and cost fields helps Dr.Gero display accurate run and token budget metrics.
Auth methods
When adding a custom model, choose one of:
| Method | Header sent |
|---|---|
| Bearer | Authorization: Bearer <token> |
| X-API-Key | X-API-Key: <token> |
| X-Dr.Gero-API-Key | X-Dr.Gero-API-Key: <token> |
| Authorization raw | Authorization: <token> |
| Custom header | <auth_header_name>: <token> |
Reliability expectations
- Respond with JSON when possible.
- Return non-2xx status codes for true failures.
- Keep latency predictable; leaderboard runs multiply latency by dataset rows and model count.
- Avoid streaming responses; Dr.Gero runtime inference rejects
stream: true. - Include stable model IDs in metadata when your endpoint fronts multiple models.