Sends the cached director's-report HTML for one case through a
large-language-model endpoint and asks it to return the 64-column
morie schema as JSON. The result is in the SAME row format as the
C++ parser, so it drops straight into morie_siu_compare()
as the external argument for an independent diff against
the parser.
Arguments
- case_number
An SIU case number (e.g.
"17-OVI-201").- model
One of
"ollama"(default; free, runs locally, zero-config when an Ollama daemon is onlocalhost:11434),"gemini"(paid), or"claude"(paid). A character vector enables fail-over: the first model whose call succeeds wins. The defaultc("ollama", "gemini")tries the local free model first and only escalates to paid Gemini if Ollama isn't installed or fails – so morie costs $0 to use as long as you have a free Gemma / Qwen / Llama running locally (e.g.ollama pull gemma3:4b).- cache_dir
Directory holding the harvester's SIU.csv and the optional
html/subdirectory.- max_html_chars
Soft cap on the HTML payload sent to the model (default 80,000 – larger than any real SIU report, small enough to stay under typical context budgets).
- mock_response_text
For testing only: if non-NULL, skip the network call and use this string as the model's raw reply.
Value
A one-row data frame with the 64 morie SIU columns. Any field the model could not extract is the empty string (matching the C++ parser's convention).
Details
The cached HTML remains the ground truth. This function does not claim the LLM is more accurate than the regex parser; it provides a fast second extraction so disagreements between two independent methods (regex vs. LLM) can be flagged for human review against the saved report.
Credentials are read from environment variables only – never
hard-coded, never passed as function arguments – so secrets do
not leak into call traces, logs, or scripts. Set
GOOGLE_API_KEY for Gemini, ANTHROPIC_API_KEY for
Claude, or OLLAMA_HOST (e.g.
"http://localhost:11434" or an OllamaFreeAPI base URL) plus
optionally OLLAMA_MODEL (default "llama3.2:3b") for
Ollama-compatible open-weight endpoints.
Examples
if (FALSE) { # \dontrun{
Sys.setenv(GOOGLE_API_KEY = "your-gemini-key")
r <- morie_siu_llm_extract("17-OVI-201", model = "gemini")
# Diff parser vs LLM against the HTML:
morie_siu_compare(
"17-OVI-201",
external = r,
field_map = setNames(as.list(names(r)), names(r)),
external_case_col = "case_number"
)
} # }