Skip to contents

For any case_number (or drid), return the parser's 64-column row together with the raw HTML pages it was extracted from – the director's-report page and, when linked, the news-release page. This is the per-row ground truth: every field in the emitted CSV is reproducible from report_html via the parser, and any disagreement with another data source can be adjudicated against the saved HTML.

Usage

morie_siu_audit_case(
  case_number,
  cache_dir = file.path(tempdir(), "morie", "siu"),
  fetch_if_missing = TRUE
)

Arguments

case_number

An SIU case number (e.g. "17-OVI-201"), or an integer drid.

cache_dir

Directory holding the harvester's SIU.csv and the optional html/ subdirectory.

fetch_if_missing

If TRUE (default), fetch the page from SIU when the local cache misses. Set FALSE to work strictly from the cache.

Value

A list with elements row (the parser's 1-row data frame for this case), drid, nrid, report_html, news_html, report_text (HTML-stripped plain text of the report) and news_text.

Details

Reads from the local cache at <cache_dir>/html/ (populated by morie_fetch_siu(cache_html = TRUE)) when available, and falls back to a polite live fetch when the cache is missing.

Examples

if (FALSE) { # \dontrun{
a <- morie_siu_audit_case(
  "17-OVI-201",
  cache_dir = file.path(tempdir(), "morie", "siu")
)
cat(substr(a$report_text, 1, 1000), "\n")
} # }