Blog / Pagination

Pagination Patterns in Enterprise APIs: Cursor vs Offset vs Page Token

Enterprise API pagination is one of those problems that looks solved until you're in production. You wrote a cursor loop in your staging environment, it worked, you shipped it. Then at 3am your batch job silently returns 4,000 records instead of 40,000 — because the upstream dataset was modified between page fetches and your cursor became invalid, and the API returned an empty page instead of an error.

The three dominant pagination patterns in enterprise APIs — cursor-based, offset-based, and page-token — behave differently under mutation, have different performance characteristics, and fail in different ways. If you're building an abstraction layer, or evaluating whether to use one, you need to understand the tradeoffs.

Cursor-Based Pagination

Cursor-based pagination works by returning an opaque pointer — the cursor — alongside each page of results. To get the next page, you pass the cursor back to the API. Cursors are typically stable references to a position in the underlying dataset, often encoded as a record ID or timestamp.

// Page 1
GET /contacts?limit=100
Response: { data: [...], next_cursor: "eyJpZCI6MTAwfQ==" }

// Page 2
GET /contacts?limit=100&cursor=eyJpZCI6MTAwfQ==
Response: { data: [...], next_cursor: "eyJpZCI6MjAwfQ==" }

// Last page
GET /contacts?limit=100&cursor=eyJpZCI6OTkwfQ==
Response: { data: [...], next_cursor: null }

Cursor pagination is generally the best choice for large datasets. It's consistent under concurrent writes (the cursor points to a stable position) and it's efficient (no OFFSET scan). The failure mode: if a cursor expires (some APIs expire cursors after 5–10 minutes) and your batch job is slow, you'll get a 400 or 404 on the next page fetch rather than a graceful continuation. Your retry logic needs to handle cursor expiry by restarting from the beginning — a detail most developers discover the hard way.

Offset Pagination

Offset pagination uses numeric offsets: "give me 100 records starting at record 200." It's the simplest to implement and the most intuitive.

GET /purchase_orders?limit=100&offset=0   // records 1–100
GET /purchase_orders?limit=100&offset=100  // records 101–200
GET /purchase_orders?limit=100&offset=200  // records 201–300

The problem: offset pagination is unstable under concurrent writes. If a record is inserted between page 1 and page 2 fetches, every record shifts by one position and your page 2 fetch will skip a record. For read-heavy, rarely-written datasets (most ERP systems at rest), this is acceptable. For datasets with high write concurrency (e.g., order management systems during peak hours), it will produce data inconsistencies.

There is also a performance floor: OFFSET N in SQL requires scanning and discarding N rows before returning the next page. For large datasets, late pages become progressively slower. At offset 100,000 on a 500-record-per-page query, the database scans 100,000 rows before returning your 500.

Page Token

Page tokens are a hybrid approach: the API returns an opaque token (like a cursor) but the token is stateless — it encodes all the information needed to compute the next page without maintaining server-side state. Google's APIs use this pattern extensively.

GET /shipments
Response: {
  data: [...],
  nextPageToken: "CiMKGjFmMGMyZTEwLTY5NjMtNGM4ZC05NzA1..."
}

GET /shipments?pageToken=CiMKGjFmMGMyZTEwLTY5NjMtNGM4ZC05NzA1...
Response: { data: [...], nextPageToken: null }

Page tokens give you cursor stability without server-side session state. They're generally safe under concurrent writes. The limitation is opacity — you can't jump to an arbitrary page, only iterate forward sequentially.

What Actually Matters for Integration Work

For most enterprise API integration work, the critical question is not which pagination pattern is theoretically best, but which one the API actually uses and whether your abstraction layer handles it correctly. The common mistakes:

  • Treating offset pagination as cursor-safe and skipping records on write-heavy datasets
  • Not handling cursor expiry — writing a simple do-while loop that assumes the cursor is always valid
  • Stopping early — some APIs return an empty next_cursor: null on the final page; others return a cursor that, when followed, returns an empty data array. Both are valid; both need to be handled
  • Not honoring the API's specified page size — some APIs silently cap page size below your requested limit; if you're counting records to detect when you've fetched all pages, this breaks your loop

How the SDK Handles It

The Devloom connector catalog records the pagination type for each connector. When you call .autopage(), the SDK applies the correct strategy: cursor loop for cursor-based connectors, offset increment for offset-based connectors, page token forwarding for token-based connectors. Cursor expiry triggers a logged warning and a restart from page 1. Empty final pages are handled correctly regardless of whether the API signals end-of-data via null cursor or empty data array.

The result: client.query('contacts').autopage().all() works correctly regardless of the underlying pagination model. You don't write different code per connector.