Skip to main content

What is routing?

Instead of sending every request to the same expensive model, Manifest scores each query and routes it to the cheapest model that can handle it.
  • Four tiers: simple, standard, complex, reasoning.
  • Scoring happens in under 2 ms with zero external calls.

The four tiers

Simple

Greetings, definitions, short factual questions. Routed to the cheapest model.

Standard

General coding help, moderate questions. Good quality at low cost.

Complex

Multi-step tasks, large context, code generation. Best quality models.

Reasoning

Formal logic, proofs, math, multi-constraint problems. Reasoning-capable models only.

How scoring works

23 dimensions grouped into three categories: Keyword-based (14) — Scans the prompt for patterns like “prove”, “write function”, “what is”, etc. Structural (5) — Analyzes token count, nesting depth, code-to-prose ratio, conditional logic, and constraint density. Contextual (4) — Considers expected output length, repetition requests, tool count, and conversation depth. Each dimension has a weight. The weighted sum maps to a tier via threshold boundaries. A confidence score (0–1) indicates how clearly the request fits its tier.

Session momentum

Manifest remembers the last 5 tier assignments (30-minute TTL). Short follow-up messages (“yes”, “do it”) inherit momentum from the conversation, so they don’t drop to a cheaper tier unnecessarily.

Tier overrides

Some signals force a minimum tier regardless of the score:
SignalMinimum tier
Tools detectedstandard
Large context (>50k tokens)complex
Formal logic keywordsreasoning

Response headers

Every response includes these headers:
HeaderDescription
X-Manifest-TierAssigned tier
X-Manifest-ModelActual model used
X-Manifest-ProviderProvider (anthropic, openai, google, etc.)
X-Manifest-ConfidenceScoring confidence (0–1)
X-Manifest-ReasonWhy this tier was selected

Cloud vs Local

Routing is performed server-side. Model mappings are managed by the Manifest team and updated regularly.