Skip to contents

Sweeps director's-report ids 1..max_drid and writes a small CSV recording which ids return a healthy report page, the parsed case number, and the response body size. The harvester (morie_fetch_siu) then uses this manifest to short-circuit the ~30-50 percent of ids that have no report, saving bandwidth and WAF-trigger risk on every run.

Usage

morie_siu_refresh_manifest(
  out_path = NULL,
  max_drid = NULL,
  min_drid = 1L,
  concurrency = 4L,
  rate_rps = 4,
  progress = TRUE
)

Arguments

out_path

Path to write the gzipped CSV. Default is the in-place manifest location (only useful for maintainers building from a source checkout).

max_drid

Highest drid to probe. Default NULL auto-discovers from the SIU index endpoint and adds a margin.

min_drid

Lowest drid to probe (default 1L).

concurrency

Maximum simultaneous transfers (default 4).

rate_rps

Maximum request starts per second (default 4).

progress

Logical; print a per-batch progress line.

Value

Invisibly, a data frame of the full sweep (every probed drid, including misses), parallel to what was written to out_path.

Details

The shipped manifest at inst/extdata/siu_drid_manifest.csv.gz is a snapshot. Users who want the latest can call this function; it is also how morie maintainers regenerate the snapshot.

Examples

if (FALSE) { # \dontrun{
# Network: refreshes the manifest by probing the SIU site
# (~25-40 min at the default polite rate of 4 RPS for ~6000 ids).
df <- morie_siu_refresh_manifest(out_path = tempfile(fileext = ".csv.gz"))
table(df$http_code)
} # }