Socrata default-API-cap note + pagination wiring
Source:R/datasets_nyc_nypd.R
morie_nyc_nypd_socrata_cap_note.RdAll NYC OpenData SODA2 endpoints apply a default cap of 1,000 rows
per request unless an explicit $limit (or $$app_token for
authenticated requests) is supplied. For the NYPD CJ datasets
wrapped here that means:
Details
morie_datasets_nyc_nypd_arrests_ytd(offline = FALSE)returns only 1,000 rows by default, even though the live feed carries ~69,300 rows.Pass
max_features = Nto lift the single-request cap to N rows (Socrata enforces a hard server-side cap of 50,000 rows per request).Pagination (wired in 3OO). For full pulls over the cap, pass
paginate = TRUE. morie walks SODA2$offsetinpage_size-row chunks until the server returns a short page (exhausted) ormax_featuresis reached. Without an app_token the per-request ceiling is 1,000 rows sopage_size = 1000is the default; withpage_size = 50000+app_tokenyou can pull the full ~69K-row arrests_ytd feed in two requests.max_pages(default 200) is a safety net against runaway pulls.
Worked example:
# Full live pull of the YTD arrests feed (~69K rows over ~70 pages).
df <- morie_datasets_nyc_nypd_arrests_ytd(
offline = FALSE, paginate = TRUE)
# First 5,000 rows only (5 paged requests of 1,000 each).
df <- morie_datasets_nyc_nypd_arrests_ytd(
offline = FALSE, paginate = TRUE, max_features = 5000L)The bundled fixtures (offline mode) are unaffected – they ship 5
rows each as deterministic sample data, and max_features simply
truncates the fixture.