Field-by-field SIU comparison against a user-supplied external table
Source:R/siu.R
morie_siu_compare.RdFor one case_number, line up the parser's value against the same field in a user-supplied external data source – and, critically, show the surrounding report HTML so the user can adjudicate any disagreement against the actual source document.
Arguments
- case_number
A case number (e.g.
"17-OVI-201").- external
A data frame of external answers, OR a path to an
.xlsxfile (read withreadxl). Must contain a column whose values match SIU case numbers (defaultexternal_case_col = "Q1").- field_map
A named list mapping external-column names to morie field names.
- external_case_col
Name of the external column carrying the case-number key.
- cache_dir
Directory holding the harvester's SIU.csv and optional cached HTML.
Value
A data frame with one row per mapped field: field,
parser_value, external_value, agree, and
html_excerpt (a 240-character window around the first
occurrence of either value in the cleaned report text). When
parser and external disagree, the html_excerpt is the
tie-breaker.
Details
The ground truth is the SIU director's-report HTML itself. The HTML is what the SIU published; the parser's job is to extract structured fields from it faithfully, and any field's correctness is decidable by reading the cached HTML for that case. Any external reference – a hand-coded survey, an independently-scraped CSV, a colleague's analysis – is just another extraction attempt, possibly with its own errors. This function does not endorse any external source; it only displays both side-by-side with the HTML excerpt so you can decide.
The default field map covers the common SIU-extraction column
layout (Q1 = case_number, Q3 = police_service,
Q4 = number_of_officers_involved, ...). Pass a custom
field_map for any other external schema.
Examples
if (FALSE) { # \dontrun{
# Caller supplies their own external table; nothing about the
# mapping or the file format is canonical to morie.
external <- data.frame(case_id = "17-OVI-201", officers = 1L)
cmp <- morie_siu_compare(
"17-OVI-201",
external = external,
field_map = list(officers = "number_of_officers_involved"),
external_case_col = "case_id"
)
subset(cmp, !agree)
} # }