manifest/auto as the model, and routing picks the real model behind the scenes.
Base URL
| Mode | URL |
|---|---|
| Cloud | https://app.manifest.build |
| Self-hosted | http://localhost:2099 (or your custom port) |
Authentication
Every request requires a Manifest agent key:mnfst_.
Endpoints
| Method | Path | Format | Use it for |
|---|---|---|---|
POST | /v1/chat/completions | OpenAI | Most clients (OpenAI SDK, LangChain, Vercel AI SDK, custom HTTP) |
POST | /v1/responses | OpenAI Responses | Codex, *-pro, o1-pro, deep-research models |
POST | /v1/messages | Anthropic | Anthropic SDK, Claude Code, anything that speaks the Messages API |
Chat completions
model rewritten to the actual model ID. All standard OpenAI fields (temperature, max_tokens, tools, tool_choice, response_format, stream, etc.) pass through.
Anthropic messages
Streaming
Set"stream": true to get an SSE stream back. The stream format matches the upstream protocol: OpenAI-style data: {...} chunks for /v1/chat/completions, Anthropic event blocks for /v1/messages.
Routing and fallback both work with streams. If the primary model fails before the first chunk, the request restarts on the fallback. If it fails mid-stream, the connection closes. There’s no silent mid-stream retry.
Errors
The proxy returns a standard JSON error envelope:| Status | Meaning |
|---|---|
401 | Invalid or missing Authorization header |
402 | Provider requires payment / quota exceeded on the upstream |
424 | Fallback chain exhausted (all configured models failed) |
429 | Hard limit hit, or Manifest rate limit (THROTTLE_LIMIT) tripped |
5xx | Upstream provider error (triggers fallback) |
424 is the only one that does not trigger a fallback. Manifest returns it itself when the chain is exhausted, so re-routing it would loop forever.
Rate limits
Self-hosted instances default to 100 requests per 60 seconds per agent. Override withTHROTTLE_TTL and THROTTLE_LIMIT (Environment variables).
Cloud rate limits are tied to your plan and shown in the dashboard.