Skip to contents

Downloads a Statistics Canada _CSV.zip product from www150.statcan.gc.ca, extracts a CSV member, and returns the contents as a base R data.frame. The archive is streamed to a session-scoped tempfile (PUMF zips can be hundreds of megabytes), and the tempfile is removed when the function returns. Nothing is written under ~/.cache unless the caller explicitly opts in via morie_cache_dir.

Usage

morie_ingest_statcan_csv(
  url,
  member = NULL,
  timeout = 600,
  user_agent = "morie/r (+https://github.com/rootcoder007/rmorie)",
  ...
)

Arguments

url

Direct URL of the StatCan .zip product, e.g. https://www150.statcan.gc.ca/n1/pub/82m0013x/2024001/2022_CSV.zip.

member

Name of the CSV inside the archive; defaults to the first .csv entry.

timeout

HTTP timeout in seconds (default 600).

user_agent

User-Agent string sent with the request.

...

Further arguments forwarded to read_csv (or read.csv if readr is unavailable).

Value

A base R data.frame.

Details

Note that a StatCan catalogue page (e.g. /n1/en/catalogue/82M0013X) is only an HTML index — the actual data is linked from the product page (/n1/pub/82m0013x/82m0013x2024001-eng.htm), which points at the real ..._CSV.zip.

Examples

if (FALSE) { # \dontrun{
# Requires network access.
url <- paste0(
  "https://www150.statcan.gc.ca/n1/pub/82m0013x/",
  "2024001/2022_CSV.zip"
)
df <- morie_ingest_statcan_csv(url)
head(df)
} # }