Skip to contents

On-demand scraper for the Ontario Special Investigations Unit (SIU) Director's Reports index at https://www.siu.on.ca/en/directors_reports.php. This is the R port of morie.siu_fetch – the lightweight httr2/rvest path that complements the C/C++ harvester in morie_fetch_siu. Use this when:

Usage

morie_siu_index_url()

Details

  • you want a tiny R-only dependency footprint (no compiled code);

  • you only need the header / index fields (case_number, police_service, incident date, decision date) – not the full 64-column schema;

  • you are running on a host where the C++ parser does not build.

Distribution policy (2026-05): the scraped corpus is NOT shipped with the package. Each user runs the scraper themselves, which is unambiguously fair use of public oversight reports.

The scraper is conservative: a 2-second delay between requests, retries on 5xx, and a descriptive user-agent. The latest published year as of release is 2023; years = NULL (the default) scrapes the unfiltered index, which surfaces the most recent posts.

Cache directory

By default this writes SIU.csv under tempdir() so R cleans it up at end of session. Pass cache_dir = morie_cache_dir("siu") explicitly to opt into a persistent cross- session cache; see morie_cache_dir and morie_cache_clear (no implicit writes to ~/.cache).