Endpoints›Chat completions

Chat completions

Send a list of messages, get back a model-generated reply. The /v1/chat/completions endpoint is the primary way to use any of the 9 catalog LLMs through tokmux.

POSThttps://api.tokmux.dev/v1/chat/completions

Virtual Keys

A Virtual Key (prefix sk_app_) is the credential customers use to call any model in the tokmux catalog. One key, every provider, governed by your project's budget and allowlist.

Each Virtual Key:

Is scoped to a per-model allowlist — a key can target a single model or any subset across capabilities (LLM chat, TTS, STT, image, video, search). Minimum one model per capability the key covers.
Carries a monthly Credits cap and a customer-side rate limit.
Is created and revoked from the Virtual Keys page in the dashboard.

Management keys (prefix sk_management_) are a separate concept. An organization has exactly one management key, used for catalog listing and Virtual Key administration; it cannot make inference requests. The two key types are independent — a Virtual Key authenticates inference, a management key authenticates dashboard / API administration.

Schema compatibility

tokmux is drop-in compatible with OpenAI and Anthropic API schemas. Point your existing client at the tokmux base URL, replace the provider key with a Virtual Key, ship.

Schema	Endpoint
OpenAI-compatible	POST /v1/chat/completions
Anthropic-compatible	POST /v1/messages

The same Virtual Key works against both schemas — pick whichever your client already targets.

Authentication

Pass your Virtual Key in the Authorization header.

Authorization: Bearer sk_app_XnkP9f...

A 403 with scope_denied means the Virtual Key isn't allowlisted for this model.

Request body

Field	Type	Description
modelrequired	string	Slug from the catalog. Example openai/gpt-5-4, anthropic/claude-sonnet-4-6, fireworks-ai/fireworks-deepseek-v4-pro.
messagesrequired	array	Conversation so far. Each item has role (system, user, assistant, tool) and content.
temperature	number	0–2. Default 1.0. Lower for deterministic, higher for creative.
max_tokens	integer	Cap on output. Defaults to model maximum minus input length.
stream	boolean	If true, response is server-sent events. Default false.
tools	array	Function-calling schemas. tokmux normalizes Anthropic and OpenAI tool calls — define once.
route	object	tokmux extension. Set { "fallback": ["openai/gpt-5-4"] } for automatic provider failover. Credential failover within a single model uses the per-app priority chain.
metadata	object	Up to 10 string-string pairs. Surfaces in audit log and analytics filters.

Example request

{
  "model": "anthropic/claude-sonnet-4-6",
  "messages": [{ "role": "system", "content": "You are a support engineer." },
    { "role": "user",   "content": "Why am I getting 429s on TTS?" }
  ],
  "temperature": 0.7,
  "max_tokens": 2048,
  "route": { "fallback": ["openai/gpt-5-4"] },
  "metadata": { "surface": "support-bot", "env": "prod" }
}

Response shape

{
  "id": "cmpl_01HX9...",
  "object": "chat.completion",
  "model": "anthropic/claude-sonnet-4-6",
  "choices": [{
    "index": 0,
    "message": { "role": "assistant", "content": "429s on TTS while under cap..." },
    "finish_reason": "stop"
  }],
  "usage": { "prompt_tokens": 218, "completion_tokens": 312, "total_tokens": 530 },
  "tokmux": {
    "route": "primary",
    "first_token_ms": 412,
    "credits": "0.01",
    "cached": false
  }
}

Errors

Status	Code	Meaning
401	invalid_api_key	Key is malformed or revoked. Check the Virtual Keys page.
403	scope_denied	Key is valid but not scoped to this capability.
429	rate_limited	RPM ceiling hit. Inspect x-tokmux-rate-limit-remaining.
402	insufficient_credits	Organization Credits balance is at or below zero. Top up to continue.
502	upstream_failed	Provider error. tokmux retried across the credential priority chain; if you see this, all credentials also failed.
503	budget_exhausted	Credits cap reached on this key. Top up or raise the cap.

One key, every model. tokmux normalizes the OpenAI chat schema across all 9 LLMs in the catalog — switch model and nothing else has to change in your code. The same key works for image gen, TTS, STT, and web search via separate /v1/* endpoints.