Skip to contents

Wraps the City of Chicago "Crimes – 2001 to Present" open dataset (Socrata resource id ijzp-q8t2; portal landing https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-Present/ijzp-q8t2/about_data). 22-column schema, one row per reported crime incident (except murders, where one row per victim). Data are extracted from the Chicago PD CLEAR system, refreshed daily with a seven-day lag, and addresses are block-only redacted.

Usage

morie_datasets_chicago_crime(
  year = NULL,
  max_features = NULL,
  offline = TRUE,
  mode = c("soda2", "soda3"),
  paginate = FALSE,
  page_size = 1000L,
  max_pages = 200L,
  app_token = NULL
)

Arguments

year

Integer or NULL; server-side year filter.

max_features

Integer or NULL; cap on returned rows. When paginate = TRUE this is the total cap across all walked pages.

offline

Logical; if TRUE, return the bundled synthetic frame.

mode

One of "soda2" (default) or "soda3". Selects the API path for live mode:

  • "soda2" -> /resource/<id>.json?$where=... via .morie_dataset_socrata_fetch() (URL-param SoQL grammar).

  • "soda3" -> /api/v3/views/<id>/query.json?query=SELECT ... via .morie_dataset_soda3_query() (full SoQL passthrough). Both modes return the same 22-column schema; SODA3 is required when a derived/map view is involved (none here, but available for parity with morie_datasets_chicago_crime_map()) and for the canonical "single SoQL string" experience.

paginate

Logical; if TRUE and offline = FALSE, walk pagination in page_size chunks. SODA2 uses $offset; SODA3 uses LIMIT page_size OFFSET m baked into the SoQL.

page_size

Integer; per-page row count when paginate = TRUE. Default 1,000 (the unauthenticated SODA2 ceiling).

max_pages

Integer; safety net on paginate = TRUE walks (default 200 -> up to 200,000 rows without an app_token).

app_token

Optional Socrata app token (SODA3 only – sent as the X-App-Token header; ignored under mode = "soda2").

Value

A data.frame with the documented Socrata schema.

Details

Scale warning. As of 2026-05 the live feed carries ~8,557,071 rows (8.56M; last refreshed 2026-05-23) – too large for spreadsheet programs and slow even for programmatic pulls without filtering. Always prefer narrowing the query first (year = ... server-side filter) or paginating with paginate = TRUE + a large page_size (and ideally an app_token). A full unfiltered pull at the default page_size = 1000 would issue ~8,560 requests; with page_size = 50000 + an app_token it drops to ~172.

Socrata accepts both the numeric id (/resource/ijzp-q8t2.json) and the publisher's crimes alias (/resource/crimes.json). SODA3 endpoints are also available (/api/v3/views/crimes/query.json), as are CSV variants (/resource/crimes.csv, /api/v3/views/crimes/query.csv). morie defaults to SODA2 JSON via the UUID for stability.

Cross-referenced datasets (Chicago Open Data). The 22-col schema carries geographic and crime-classification foreign keys that other Chicago datasets resolve:

beat

morie wraps via morie_datasets_chicago_police_beats() (n9it-hstw).

district

morie wraps via morie_datasets_chicago_police_districts() (24zt-jpfn).

ward

morie wraps via morie_datasets_chicago_wards() (sp34-6z76, 3UU).

community_area

morie wraps via morie_datasets_chicago_community_areas() (cauq-8yn6, 3UU).

iucr / fbi_code

morie wraps via morie_datasets_chicago_iucr_codes() (c7ck-438e, 3UU).