Skip to main content

What is routing?

Instead of sending every request to the same expensive model, Manifest scores each query and routes it to the cheapest model that can handle it.
  • Four tiers: simple, standard, complex, reasoning.
  • Scoring happens in under 2 ms with zero external calls.

The four tiers

Simple

Greetings, definitions, short factual questions. Routed to the cheapest model.

Standard

General coding help, moderate questions. Good quality at low cost.

Complex

Multi-step tasks, large context, code generation. Best quality models.

Reasoning

Formal logic, proofs, math, multi-constraint problems. Reasoning-capable models only.

How scoring works

23 dimensions grouped into two categories: Keyword-based (13) — Scans the prompt for patterns like “prove”, “write function”, “what is”, etc. Structural (10) — Analyzes token count, nesting depth, code-to-prose ratio, tool count, conversation depth, etc. Each dimension has a weight. The weighted sum maps to a tier via threshold boundaries. A confidence score (0–1) indicates how clearly the request fits its tier.

Session momentum

Manifest remembers the last 5 tier assignments (30-minute TTL). Short follow-up messages (“yes”, “do it”) inherit momentum from the conversation, preventing unnecessary tier drops.

Tier overrides

Certain signals force a minimum tier:
SignalMinimum tier
Tools detectedstandard
Large context (>50k tokens)complex
Formal logic keywordsreasoning

Response headers

Every routed response includes:
HeaderDescription
X-Manifest-TierAssigned tier
X-Manifest-ModelActual model used
X-Manifest-ProviderProvider (anthropic, openai, google, etc.)
X-Manifest-ConfidenceScoring confidence (0–1)

Cloud vs Local

Routing is performed server-side. Model mappings are managed by the Manifest team and updated regularly.