The errors that actually break LLM agents in production

Team Manifest Jun 30, 2026 5 min read

The errors that actually break LLM agents in production

Your agent ran clean in the demo. Friday it returns 400s. You reread your prompts, swap the model, pray. Nothing moves.

The model is fine. Something in your request stopped matching what the provider expects: a renamed parameter, a rejected schema field, a model that got pulled. These are plumbing failures, and they got more common in 2026 as providers ship and retire models faster.

We see these all day in Manifest’s logs. Here are the six that show up most, each with a real string you will recognize, and why every provider serves it to you differently.

Rate limit and empty balance

Resource has been exhausted (e.g. check quota).

By far the most common one. Your key hit its rate cap, or your credit ran out. The provider rejects every request until the window resets or you top up.

The catch is that one 429 covers two situations with different fixes. Too many requests per minute means slow down and try again. No balance left means trying again does nothing and you pay. Plenty of apps loop a retry that the second case will never clear. Across providers the accounting differs too: requests per minute for one, tokens per minute for another, per-project quotas elsewhere. A fallback that hits a second provider already maxed out buys you nothing.

Deprecated or unsupported model

This model has been deprecated. It is recommended to migrate to <newer-model>

Models come and go fast now. You call one that got pulled, and your requests fail until you point at a live one. OpenAI gives generally available models at least 6 months of notice, specialized variants at least 3 months, and preview models as little as 2 weeks, so anything with preview in the name is a moving target.

There’s a quieter version. If you call a -latest alias like gpt-5.5-chat-latest, the underlying model changes whenever the provider ships an update, with no version bump on your side. The call still returns 200, your output drifts, and you debug your own code before you trace it upstream.

Malformed parameter

Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead.

You send a parameter the model rejects. Each family runs its own vocabulary, and it changes between generations. OpenAI’s reasoning models want max_completion_tokens where older ones took max_tokens. Gemini moved from thinking_budget to thinking_level. Same intent, different key, rejected request.

Malformed message structure

The reasoning_content in the thinking mode must be passed back to the API

The conversation format does not match what the provider expects: a missing field, a misplaced role, a reasoning block you forgot to echo back. Thinking-mode models are strict about this. They hand you a reasoning field on one turn and reject the next turn if you drop it. Your messages look right to the eye, but the provider wants a precise shape your code stopped producing.

Context length exceeded

input length and max_tokens exceed context limit: 189136 + 20000 > 204648

Your prompt plus the output you asked for runs past the window. This one grows over time on agents that pile up history every turn: messages, tool results, documents. The threshold also moves when you switch families, so an agent that fit before blows up the moment you route it to a model with a shorter window. Truncating the start loses context you needed. Summarizing, pruning old turns, or routing to a bigger window each fit different cases.

Malformed tool schema

Unknown name 'additionalProperties' at 'tools[0].function_declarations[0].parameters'

OpenAI’s strict structured outputs require additionalProperties: false on every object in your tool schema, and reject the request if it’s missing. Send that same schema to another provider and the exact key OpenAI demanded comes back rejected. Your tool code runs on one API and breaks on the next, with no change to your logic. Frameworks that emit standard JSON Schema (Pydantic, Zod) walk straight into it.

Why so many?

Four things stack up. Models appear and disappear on short cycles. Each provider wants a slightly different request format. Requests built on the fly, by agents, carry more mistakes than handwritten ones. And context keeps growing as agents iterate. None of these is the model being wrong. They are the request not matching the contract.

Most of these share a trait: the model is fine, and firing the same request again changes nothing. Resend a renamed parameter and it stays rejected. Hit an empty balance and it fails the same way. The one real exception is a per-minute rate limit, where waiting and retrying is the right move. That is the whole point: you have to tell the cases apart before you react. A blind retry treats them all the same and fails on most of them.

That is what we’re building with Manifest Auto-fix. It sits between your app and the providers, catches the failure in the request, patches it on the fly, and sends the corrected version through. Your app stays up while you fix the root cause on your side.

Be the first to try Auto-fix

If this speaks to you and you want to be among the first to run Auto-fix, claim your spot. We’re onboarding teams a few at a time.

or Book a demo

The errors that actually break LLM agents in production

Rate limit and empty balance

Deprecated or unsupported model

Malformed parameter

Malformed message structure

Context length exceeded

Malformed tool schema

Why so many?

A blind retry won’t save you

Be the first to try Auto-fix

Start routing for free. Scale with your team.

Rate limit and empty balance

Deprecated or unsupported model

Malformed parameter

Malformed message structure

Context length exceeded

Malformed tool schema

Why so many?

A blind retry won’t save you

Be the first to try Auto-fix

Start routing for free. Scale with your team.

Claim my spot