Downloads a Statistics Canada _CSV.zip product from
www150.statcan.gc.ca, extracts a CSV member, and returns
the contents as a base R data.frame. The archive is
streamed to a session-scoped tempfile (PUMF zips can be hundreds
of megabytes), and the tempfile is removed when the function
returns. Nothing is written under ~/.cache unless the
caller explicitly opts in via morie_cache_dir.
Usage
morie_ingest_statcan_csv(
url,
member = NULL,
timeout = 600,
user_agent = "morie/r (+https://github.com/rootcoder007/rmorie)",
...
)Arguments
- url
Direct URL of the StatCan
.zipproduct, e.g.https://www150.statcan.gc.ca/n1/pub/82m0013x/2024001/2022_CSV.zip.- member
Name of the CSV inside the archive; defaults to the first
.csventry.- timeout
HTTP timeout in seconds (default 600).
- user_agent
User-Agent string sent with the request.
- ...
Further arguments forwarded to
read_csv(orread.csvif readr is unavailable).
Details
Note that a StatCan catalogue page (e.g.
/n1/en/catalogue/82M0013X) is only an HTML index — the
actual data is linked from the product page
(/n1/pub/82m0013x/82m0013x2024001-eng.htm), which points at
the real ..._CSV.zip.