What this error means
OpenAI throttles by requests-per-minute and tokens-per-minute per model. Parallel agent calls trip it in bursts.
How to fix it
- Honor the Retry-After header before retrying
- Add client-side concurrency limits
- Request a higher tier or split traffic across models
Example error message
{
"error": {
"message": "Rate limit reached for requests. Limit 3500 per min.",
"type": "requests",
"code": "rate_limit_exceeded"
}
}Frequently asked
RPM or TPM — which limit am I hitting?
The error names it: "requests" is per-minute request cap, "tokens" is per-minute token cap.