Chat completions
Send a list of messages, get back a model-generated reply. The /v1/chat/completions endpoint is the primary way to use any of the 9 catalog LLMs through tokmux.
Virtual Keys
A Virtual Key (prefix sk_app_) is the credential customers use to call any model in the tokmux catalog. One key, every provider, governed by your project's budget and allowlist.
Each Virtual Key:
- Is scoped to a per-model allowlist — a key can target a single model or any subset across capabilities (LLM chat, TTS, STT, image, video, search). Minimum one model per capability the key covers.
- Carries a monthly Credits cap and a customer-side rate limit.
- Is created and revoked from the Virtual Keys page in the dashboard.
Management keys (prefix sk_management_) are a separate concept. An organization has exactly one management key, used for catalog listing and Virtual Key administration; it cannot make inference requests. The two key types are independent — a Virtual Key authenticates inference, a management key authenticates dashboard / API administration.
Schema compatibility
tokmux is drop-in compatible with OpenAI and Anthropic API schemas. Point your existing client at the tokmux base URL, replace the provider key with a Virtual Key, ship.
| Schema | Endpoint |
|---|---|
| OpenAI-compatible | POST /v1/chat/completions |
| Anthropic-compatible | POST /v1/messages |
The same Virtual Key works against both schemas — pick whichever your client already targets.
Authentication
Pass your Virtual Key in the Authorization header.
Authorization: Bearer sk_app_XnkP9f...
A 403 with scope_denied means the Virtual Key isn't allowlisted for this model.
Request body
| Field | Type | Description |
|---|---|---|
| modelrequired | string | Slug from the catalog. Example openai/gpt-5-4, anthropic/claude-sonnet-4-6, fireworks-ai/fireworks-deepseek-v4-pro. |
| messagesrequired | array | Conversation so far. Each item has role (system, user, assistant, tool) and content. |
| temperature | number | 0–2. Default 1.0. Lower for deterministic, higher for creative. |
| max_tokens | integer | Cap on output. Defaults to model maximum minus input length. |
| stream | boolean | If true, response is server-sent events. Default false. |
| tools | array | Function-calling schemas. tokmux normalizes Anthropic and OpenAI tool calls — define once. |
| route | object | tokmux extension. Set { "fallback": ["openai/gpt-5-4"] } for automatic provider failover. Credential failover within a single model uses the per-app priority chain. |
| metadata | object | Up to 10 string-string pairs. Surfaces in audit log and analytics filters. |
Example request
{
"model": "anthropic/claude-sonnet-4-6",
"messages": [{ "role": "system", "content": "You are a support engineer." },
{ "role": "user", "content": "Why am I getting 429s on TTS?" }
],
"temperature": 0.7,
"max_tokens": 2048,
"route": { "fallback": ["openai/gpt-5-4"] },
"metadata": { "surface": "support-bot", "env": "prod" }
}Response shape
{
"id": "cmpl_01HX9...",
"object": "chat.completion",
"model": "anthropic/claude-sonnet-4-6",
"choices": [{
"index": 0,
"message": { "role": "assistant", "content": "429s on TTS while under cap..." },
"finish_reason": "stop"
}],
"usage": { "prompt_tokens": 218, "completion_tokens": 312, "total_tokens": 530 },
"tokmux": {
"route": "primary",
"first_token_ms": 412,
"credits": "0.01",
"cached": false
}
}Errors
| Status | Code | Meaning |
|---|---|---|
| 401 | invalid_api_key | Key is malformed or revoked. Check the Virtual Keys page. |
| 403 | scope_denied | Key is valid but not scoped to this capability. |
| 429 | rate_limited | RPM ceiling hit. Inspect x-tokmux-rate-limit-remaining. |
| 402 | insufficient_credits | Organization Credits balance is at or below zero. Top up to continue. |
| 502 | upstream_failed | Provider error. tokmux retried across the credential priority chain; if you see this, all credentials also failed. |
| 503 | budget_exhausted | Credits cap reached on this key. Top up or raise the cap. |