auto as the model, and routing picks the real model behind the scenes.
Base URL
| Mode | URL |
|---|---|
| Cloud | https://app.manifest.build |
| Self-hosted | http://localhost:2099 (or your custom port) |
Authentication
Every request requires a Manifest agent key:mnfst_.
Endpoints
| Method | Path | Format | Use it for |
|---|---|---|---|
POST | /v1/chat/completions | OpenAI | Most clients (OpenAI SDK, LangChain, Vercel AI SDK, custom HTTP) |
POST | /v1/responses | OpenAI Responses | Codex, *-pro, o1-pro, deep-research models |
POST | /v1/messages | Anthropic | Anthropic SDK, Claude Code, anything that speaks the Messages API |
GET | /v1/models | OpenAI | Listing the models your agent can route to |
Chat completions
model rewritten to the actual model ID. All standard OpenAI fields (temperature, max_tokens, tools, tool_choice, response_format, stream, etc.) pass through.
Anthropic messages
Listing models
GET /v1/models returns the models your agent can reach, in OpenAI format. The first entry is always auto (routing); the rest are the real model IDs from your connected providers.
auto to let Manifest route, or send any listed model ID to skip routing and go straight to that provider. See Routing → Route a specific model.
Streaming
Set"stream": true to get an SSE stream back. The stream format matches the upstream protocol: OpenAI-style data: {...} chunks for /v1/chat/completions, Anthropic event blocks for /v1/messages.
Routing and fallback both work with streams. If the primary model fails before the first chunk, the request restarts on the fallback. If it fails mid-stream, the connection closes. There’s no silent mid-stream retry.
Errors
The proxy returns a standard JSON error envelope:| Status | Meaning |
|---|---|
401 | Invalid or missing Authorization header |
402 | Provider requires payment / quota exceeded on the upstream |
424 | Fallback chain exhausted (all configured models failed) |
429 | Hard limit hit, or Manifest rate limit (THROTTLE_LIMIT) tripped |
5xx | Upstream provider error (triggers fallback) |
424 is the only one that does not trigger a fallback. Manifest returns it itself when the chain is exhausted, so re-routing it would loop forever.
Rate limits
Self-hosted instances default to 100 requests per 60 seconds per agent. Override withTHROTTLE_TTL and THROTTLE_LIMIT (Environment variables).
Cloud rate limits are tied to your plan and shown in the dashboard.