Per-field anomaly check: does the parser's extraction match the HTML?
Source:R/siu.R
morie_siu_anomaly_check.RdFor each populated field in the parser's row, ask the LLM whether the extracted value is supported by the cached report HTML. Used to surface fields where the regex parser is plausibly wrong – the LLM's verdicts are not authoritative, just an automated way to triage which rows a human should re-read against the HTML.
Arguments
- case_number
An SIU case number (e.g.
"17-OVI-201").- model
One of
"ollama"(default; free, runs locally, zero-config when an Ollama daemon is onlocalhost:11434),"gemini"(paid), or"claude"(paid). A character vector enables fail-over: the first model whose call succeeds wins. The defaultc("ollama", "gemini")tries the local free model first and only escalates to paid Gemini if Ollama isn't installed or fails – so morie costs $0 to use as long as you have a free Gemma / Qwen / Llama running locally (e.g.ollama pull gemma3:4b).- cache_dir
Directory holding the harvester's SIU.csv and the optional
html/subdirectory.- max_html_chars
Soft cap on the HTML payload sent to the model (default 80,000 – larger than any real SIU report, small enough to stay under typical context budgets).
- mock_response_text
For testing only: if non-NULL, skip the network call and use this string as the model's raw reply.
Value
A data frame with one row per populated parser field:
field, parser_value, verdict (one of
"agree" / "disagree" / "unclear"), and
reason (a short sentence pointing to the report passage).
Details
One API call is made per case (all fields batched into a single prompt with structured-JSON output).
Examples
if (FALSE) { # \dontrun{
Sys.setenv(GOOGLE_API_KEY = "your-gemini-key")
a <- morie_siu_anomaly_check("17-OVI-201", model = "gemini")
subset(a, verdict == "disagree")
} # }