For any case_number (or drid), return the parser's 64-column row
together with the raw HTML pages it was extracted from – the
director's-report page and, when linked, the news-release page.
This is the per-row ground truth: every field in the emitted CSV
is reproducible from report_html via the parser, and any
disagreement with another data source can be adjudicated against
the saved HTML.
Usage
morie_siu_audit_case(
case_number,
cache_dir = file.path(tempdir(), "morie", "siu"),
fetch_if_missing = TRUE
)
Arguments
- case_number
An SIU case number (e.g. "17-OVI-201"),
or an integer drid.
- cache_dir
Directory holding the harvester's SIU.csv and the
optional html/ subdirectory.
- fetch_if_missing
If TRUE (default), fetch the page
from SIU when the local cache misses. Set FALSE to work
strictly from the cache.
Value
A list with elements row (the parser's 1-row data
frame for this case), drid, nrid,
report_html, news_html, report_text
(HTML-stripped plain text of the report) and news_text.
Details
Reads from the local cache at <cache_dir>/html/ (populated
by morie_fetch_siu(cache_html = TRUE)) when available, and
falls back to a polite live fetch when the cache is missing.
Examples
if (FALSE) { # \dontrun{
a <- morie_siu_audit_case(
"17-OVI-201",
cache_dir = file.path(tempdir(), "morie", "siu")
)
cat(substr(a$report_text, 1, 1000), "\n")
} # }