What this error means
Groq enforces tight per-minute request and token limits per model; fast agent loops hit them in bursts.
How to fix it
- Wait for the window in the Retry-After header
- Throttle concurrency on the client
- Spread load across models or request a limit bump
Example error message
{
"error": {
"message": "Rate limit reached for model llama-3.3-70b in organization on tokens per minute (TPM).",
"code": "rate_limit_exceeded",
"type": "tokens"
}
}Frequently asked
Why does Groq rate limit so aggressively?
Its very high throughput comes with tight per-minute caps; pacing and fallbacks keep runs alive.