Skip to contents

For each populated field in the parser's row, ask the LLM whether the extracted value is supported by the cached report HTML. Used to surface fields where the regex parser is plausibly wrong – the LLM's verdicts are not authoritative, just an automated way to triage which rows a human should re-read against the HTML.

Usage

morie_siu_anomaly_check(
  case_number,
  model = c("ollama", "gemini"),
  cache_dir = file.path(tempdir(), "morie", "siu"),
  max_html_chars = 80000L,
  mock_response_text = NULL
)

Arguments

case_number

An SIU case number (e.g. "17-OVI-201").

model

One of "ollama" (default; free, runs locally, zero-config when an Ollama daemon is on localhost:11434), "gemini" (paid), or "claude" (paid). A character vector enables fail-over: the first model whose call succeeds wins. The default c("ollama", "gemini") tries the local free model first and only escalates to paid Gemini if Ollama isn't installed or fails – so morie costs $0 to use as long as you have a free Gemma / Qwen / Llama running locally (e.g. ollama pull gemma3:4b).

cache_dir

Directory holding the harvester's SIU.csv and the optional html/ subdirectory.

max_html_chars

Soft cap on the HTML payload sent to the model (default 80,000 – larger than any real SIU report, small enough to stay under typical context budgets).

mock_response_text

For testing only: if non-NULL, skip the network call and use this string as the model's raw reply.

Value

A data frame with one row per populated parser field: field, parser_value, verdict (one of "agree" / "disagree" / "unclear"), and reason (a short sentence pointing to the report passage).

Details

One API call is made per case (all fields batched into a single prompt with structured-JSON output).

Examples

if (FALSE) { # \dontrun{
Sys.setenv(GOOGLE_API_KEY = "your-gemini-key")
a <- morie_siu_anomaly_check("17-OVI-201", model = "gemini")
subset(a, verdict == "disagree")
} # }