tokmux
$4,287.32USD
EndpointsChat completions

Chat completions

Send a list of messages, get back a model-generated reply. The /v1/chat/completions endpoint is the primary way to use any of the 9 catalog LLMs through tokmux.

POSThttps://api.tokmux.dev/v1/chat/completions

Virtual Keys

A Virtual Key (prefix sk_app_) is the credential customers use to call any model in the tokmux catalog. One key, every provider, governed by your project's budget and allowlist.

Each Virtual Key:

  • Is scoped to a per-model allowlist — a key can target a single model or any subset across capabilities (LLM chat, TTS, STT, image, video, search). Minimum one model per capability the key covers.
  • Carries a monthly Credits cap and a customer-side rate limit.
  • Is created and revoked from the Virtual Keys page in the dashboard.

Management keys (prefix sk_management_) are a separate concept. An organization has exactly one management key, used for catalog listing and Virtual Key administration; it cannot make inference requests. The two key types are independent — a Virtual Key authenticates inference, a management key authenticates dashboard / API administration.

Schema compatibility

tokmux is drop-in compatible with OpenAI and Anthropic API schemas. Point your existing client at the tokmux base URL, replace the provider key with a Virtual Key, ship.

SchemaEndpoint
OpenAI-compatiblePOST /v1/chat/completions
Anthropic-compatiblePOST /v1/messages

The same Virtual Key works against both schemas — pick whichever your client already targets.

Authentication

Pass your Virtual Key in the Authorization header.

Authorization: Bearer sk_app_XnkP9f...

A 403 with scope_denied means the Virtual Key isn't allowlisted for this model.

Request body

FieldTypeDescription
modelrequiredstringSlug from the catalog. Example openai/gpt-5-4, anthropic/claude-sonnet-4-6, fireworks-ai/fireworks-deepseek-v4-pro.
messagesrequiredarrayConversation so far. Each item has role (system, user, assistant, tool) and content.
temperaturenumber0–2. Default 1.0. Lower for deterministic, higher for creative.
max_tokensintegerCap on output. Defaults to model maximum minus input length.
streambooleanIf true, response is server-sent events. Default false.
toolsarrayFunction-calling schemas. tokmux normalizes Anthropic and OpenAI tool calls — define once.
routeobjecttokmux extension. Set { "fallback": ["openai/gpt-5-4"] } for automatic provider failover. Credential failover within a single model uses the per-app priority chain.
metadataobjectUp to 10 string-string pairs. Surfaces in audit log and analytics filters.

Example request

{
  "model": "anthropic/claude-sonnet-4-6",
  "messages": [{ "role": "system", "content": "You are a support engineer." },
    { "role": "user",   "content": "Why am I getting 429s on TTS?" }
  ],
  "temperature": 0.7,
  "max_tokens": 2048,
  "route": { "fallback": ["openai/gpt-5-4"] },
  "metadata": { "surface": "support-bot", "env": "prod" }
}

Response shape

{
  "id": "cmpl_01HX9...",
  "object": "chat.completion",
  "model": "anthropic/claude-sonnet-4-6",
  "choices": [{
    "index": 0,
    "message": { "role": "assistant", "content": "429s on TTS while under cap..." },
    "finish_reason": "stop"
  }],
  "usage": { "prompt_tokens": 218, "completion_tokens": 312, "total_tokens": 530 },
  "tokmux": {
    "route": "primary",
    "first_token_ms": 412,
    "credits": "0.01",
    "cached": false
  }
}

Errors

StatusCodeMeaning
401invalid_api_keyKey is malformed or revoked. Check the Virtual Keys page.
403scope_deniedKey is valid but not scoped to this capability.
429rate_limitedRPM ceiling hit. Inspect x-tokmux-rate-limit-remaining.
402insufficient_creditsOrganization Credits balance is at or below zero. Top up to continue.
502upstream_failedProvider error. tokmux retried across the credential priority chain; if you see this, all credentials also failed.
503budget_exhaustedCredits cap reached on this key. Top up or raise the cap.
One key, every model. tokmux normalizes the OpenAI chat schema across all 9 LLMs in the catalog — switch model and nothing else has to change in your code. The same key works for image gen, TTS, STT, and web search via separate /v1/* endpoints.
Try it
Live request, billed to your key
Sandbox
Last response
200 OK, 412 ms
{ "id": "cmpl_01HX...", "choices": [{ "message": { "role": "assistant", "content": "Hello! How can I help you today?" }}], "usage": { "total_tokens": 14 } }