Skip to contents

All NYC OpenData SODA2 endpoints apply a default cap of 1,000 rows per request unless an explicit $limit (or $$app_token for authenticated requests) is supplied. For the NYPD CJ datasets wrapped here that means:

Details

  • morie_datasets_nyc_nypd_arrests_ytd(offline = FALSE) returns only 1,000 rows by default, even though the live feed carries ~69,300 rows.

  • Pass max_features = N to lift the single-request cap to N rows (Socrata enforces a hard server-side cap of 50,000 rows per request).

  • Pagination (wired in 3OO). For full pulls over the cap, pass paginate = TRUE. morie walks SODA2 $offset in page_size-row chunks until the server returns a short page (exhausted) or max_features is reached. Without an app_token the per-request ceiling is 1,000 rows so page_size = 1000 is the default; with page_size = 50000 + app_token you can pull the full ~69K-row arrests_ytd feed in two requests. max_pages (default 200) is a safety net against runaway pulls.

Worked example:

# Full live pull of the YTD arrests feed (~69K rows over ~70 pages).
df <- morie_datasets_nyc_nypd_arrests_ytd(
  offline = FALSE, paginate = TRUE)

# First 5,000 rows only (5 paged requests of 1,000 each).
df <- morie_datasets_nyc_nypd_arrests_ytd(
  offline = FALSE, paginate = TRUE, max_features = 5000L)

The bundled fixtures (offline mode) are unaffected – they ship 5 rows each as deterministic sample data, and max_features simply truncates the fixture.