Pure-R SIU director's-report parser (port of morie.siu._parser)
Source: R/siu_parser.R
morie_siu_parser.RdParses one SIU director's-report HTML page (or one news-release
page) into a structured row list. The production parser lives in
the Rcpp / C++ backend (.siu_parse_report,
.siu_parse_news); this pure-R port is provided as a
reference implementation and as a fallback for environments
where the compiled libmorie backend is unavailable.
Details
Suggested dependencies. These functions optionally use
rvest + xml2 for DOM walking; without them, a
regex-based fallback over flat tag-stripped text is used. Either
way the parser is pure (no network) – hand it a raw HTML string
and it returns a row dict matching SIU_COLUMNS.
Hardened against the SIU page markup shifting over time by:
looking for several label variants per field,
falling back to regex on stripped text when DOM structure shifts,
preserving the verbatim
narrative_fullregardless of parse success.