Lightweight Ontario SIU Director's Reports scraper (R-native)
Source:R/siu_fetch.R
morie_siu_fetch.RdOn-demand scraper for the Ontario Special Investigations Unit (SIU)
Director's Reports index at https://www.siu.on.ca/en/directors_reports.php.
This is the R port of morie.siu_fetch – the lightweight
httr2/rvest path that complements the C/C++ harvester
in morie_fetch_siu. Use this when:
Details
you want a tiny R-only dependency footprint (no compiled code);
you only need the header / index fields (case_number, police_service, incident date, decision date) – not the full 64-column schema;
you are running on a host where the C++ parser does not build.
Distribution policy (2026-05): the scraped corpus is NOT shipped with the package. Each user runs the scraper themselves, which is unambiguously fair use of public oversight reports.
The scraper is conservative: a 2-second delay between requests,
retries on 5xx, and a descriptive user-agent. The latest published
year as of release is 2023; years = NULL (the default) scrapes
the unfiltered index, which surfaces the most recent posts.
Cache directory
By default this writes SIU.csv under tempdir()
so R cleans it up at end of session. Pass cache_dir =
morie_cache_dir("siu") explicitly to opt into a persistent cross-
session cache; see morie_cache_dir and
morie_cache_clear (no implicit writes to ~/.cache).