Understanding API Rate Limits: Patterns, Strategies, and Why They Break Your Integrations

Rate limiting is how APIs protect themselves from overload. It is also one of the most reliable sources of production incidents for teams that integrate enterprise APIs. The typical failure: your code works perfectly in development, passes staging QA, and then starts throwing 429 errors in production at 11pm when your batch job kicks off.

Understanding what's actually happening — and why your current retry logic probably makes it worse — requires understanding two things: the rate-limit algorithms upstream APIs actually use, and the failure modes in naive retry implementations.

Why APIs Rate Limit Differently

No two enterprise APIs rate-limit the same way. Some enforce per-second limits. Some enforce per-minute limits. Some use a sliding window; others use a fixed window that resets on the clock minute. ERP systems like Orbis ERP operate at 30 req/sec because their backends are not horizontally scaled for API traffic — the rate limit reflects real database contention. CRM systems like Fieldvault operate at 100 req/sec because they're built for API-first workflows. Logistics TMS APIs often have burst allowances — they'll let you send 500 requests in 5 seconds, then throttle hard for the next 55 seconds.

Your retry logic needs to know which regime it's in. A fixed-window reset means "wait until the next clock boundary." A sliding window means "wait the duration of your oldest request in the window." These are not the same wait time, and treating them as equivalent will either over-wait (wasting throughput) or under-wait (triggering another 429).

Token Bucket vs Leaky Bucket

The two dominant rate-limiting algorithms are token bucket and leaky bucket. Token bucket: each request consumes a token; tokens refill at a constant rate; you can burst up to the bucket capacity. Leaky bucket: requests are processed at a constant rate regardless of arrival rate; excess requests are queued or dropped.

Token bucket allows bursting — this is why some APIs accept short bursts before throttling. If you're aware of this, you can design your integration to send requests in controlled bursts followed by deliberate pauses, achieving higher throughput than a naive rate-limited client that sends at exactly the permitted rate. The Devloom SDK exposes per-connector rate profiles that encode each connector's actual algorithm and capacity — no guessing required.

Handling 429 Responses Correctly

A 429 response from an enterprise API typically includes one of two signals in the response headers:

Retry-After: 30 — wait this many seconds before retrying
X-RateLimit-Reset: 1714089600 — Unix timestamp when the window resets

The correct behavior is to read the header and wait exactly as long as the API tells you to, not to apply a fixed backoff. This is where most hand-rolled retry implementations fail: they apply exponential backoff to 429s the same way they apply it to 503s. A 503 is a transient server error — exponential backoff makes sense. A 429 is a deterministic signal — the API is telling you exactly when it will accept your next request. Ignoring the Retry-After header and sleeping for 2 seconds is either too short (you get another 429 immediately) or too long (you waste throughput you could have used).

Some APIs don't include either header. For those, you need per-connector empirical knowledge of reset interval. This is one reason the Devloom connector catalog stores rate-limit metadata per connector — so you don't have to rediscover it in production.

Exponential Backoff and Jitter

For transient errors (500, 503, network timeouts), exponential backoff with jitter is correct. The canonical implementation:

const wait = Math.min(
  maxWaitMs,
  (2 ** attempt) * baseMs + Math.random() * jitterMs
);

The jitter is not optional. Without it, all clients that hit the same error at the same time will retry at the same intervals — amplifying load on an already struggling server in synchronized waves. This is called the thundering herd problem. Adding randomness desynchronizes the retries.

The maxWaitMs cap is also required. Uncapped exponential backoff on attempt 10 with a 1000ms base would wait ~17 minutes per retry — that's not acceptable for a production integration. A reasonable cap is 30–60 seconds.

What Devloom Does For You

The Devloom SDK handles all of this at the connector layer. Per-connector rate profiles encode the correct algorithm, window type, and reset signal. When a connector returns 429, the SDK reads Retry-After if present, falls back to the connector profile if not, and queues the retry for the correct time. Exponential backoff with full-jitter is applied to transient errors (5xx) separately from quota errors (429). Your application code never sees a 429.

We also surface per-connector rate-limit metrics in the dashboard — requests remaining in current window, time to reset — so you can profile your integration's throughput before you hit the ceiling in production.