R API

Part of API Reference — MORIE API reference.

Reference for every public function exported by the morie R package. Signatures and descriptions come from the Roxygen2 .Rd files in r-package/morie/man/; see Statistical Methods for the methodology behind each function.

Causal estimators

Note

Documentation for R function estimate_aipw() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function estimate_atc() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function estimate_ate() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function estimate_att() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function estimate_cate() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function estimate_g_computation() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function estimate_gate() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function estimate_late() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function estimate_propensity_scores() is pending. Run roxygen2 to generate the .Rd file.

Effect sizes + tests

Note

Documentation for R function anova_one_way() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function chi_square_test() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function cohens_d() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function cramers_v() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function e_value() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function effective_sample_size() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function eta_squared() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function fisher_exact_test() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function hedges_g() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function kendall_tau() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function kruskal_wallis_test() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function levene_test() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function mann_whitney_test() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function odds_ratio_ci() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function omega_squared() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function one_sample_t_test() is pending. Run roxygen2 to generate the .Rd file.

Survey + sampling

Note

Documentation for R function bootstrap_sample() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function calibration_weights() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function cluster_sample() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function compute_design_weights() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function design_effect() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function generate_synthetic_data() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function jackknife_estimate() is pending. Run roxygen2 to generate the .Rd file.

Datasets + I/O

Note

Documentation for R function canonicalize_cpads_data() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function load_cpads_data() is pending. Run roxygen2 to generate the .Rd file.

Query Perseus via Python

Usage

morie_ask_percy(
  question,
  context = NULL,
  python_bin = Sys.getenv("MORIE_PYTHON_BIN", "python3")
)

morie_assistant_query(
  question,
  context = NULL,
  python_bin = Sys.getenv("MORIE_PYTHON_BIN", "python3")
)

Arguments

question

User question.

context

Optional context string.

python_bin

Python executable to use. Defaults to MORIE_PYTHON_BIN or python3.

Returns

Agent text response.

Examples

# See the package vignettes for usage examples:
#   vignette(package = "morie")

Returns the path to morie.db that ships with the package (inst/extdata/morie.db). This database contains all CPADS, CCS, CSADS, CSUS, HealthInfobase, and CIHI datasets pre-loaded as SQLite tables.

Usage

morie_builtin_db()

Returns

File path string.

Examples

morie_builtin_db()

Reads a local file and writes it to the cache so that CI and Docker environments (which may lack the original files) can still run tests.

Usage

morie_cache_file(path, table_name, db_path = NULL, con = NULL)

Arguments

path

Path to a CSV or RDS file.

table_name

Name for the cached table.

db_path

Optional path to a SQLite file (default backend).

con

Optional pre-opened DBI connection (overrides db_path).

Returns

Number of rows cached (invisible).

Examples

tdir <- tempfile("morie-cache-")
dir.create(tdir)
f <- file.path(tdir, "demo.csv")
write.csv(data.frame(x = 1:3, y = 4:6), f, row.names = FALSE)
morie_cache_file(f, "demo", db_path = file.path(tdir, "cache.db"))

List all tables in the MORIE cache

Usage

morie_cache_list(db_path = NULL, con = NULL)

Arguments

db_path

Optional path to a SQLite file (default backend).

con

Optional pre-opened DBI connection (overrides db_path).

Returns

A data.frame with columns table and rows.

Examples

\donttest{
db <- tempfile(fileext = ".db")
morie_cache_store(data.frame(x = 1:3), "demo", db_path = db)
morie_cache_list(db_path = db)
file.remove(db)
}

Load a table from the MORIE cache

Usage

morie_cache_load(table_name, db_path = NULL, con = NULL)

Arguments

table_name

Name of the table.

db_path

Optional path to a SQLite file (default backend).

con

Optional pre-opened DBI connection (overrides db_path).

Returns

A data.frame, or NULL if the table does not exist.

Examples

\donttest{
db <- tempfile(fileext = ".db")
morie_cache_store(
  data = data.frame(x = 1:5),
  table_name = "demo",
  db_path = db
)
morie_cache_load(table_name = "demo", db_path = db)
file.remove(db)
}

Writes (or replaces) a table in the shared SQLite cache.

Usage

morie_cache_store(data, table_name, db_path = NULL, con = NULL)

Arguments

data

A data.frame to cache.

table_name

Name of the destination table.

db_path

Optional path to a SQLite file (default backend).

con

Optional pre-opened DBI connection. When supplied, the table is written through con and db_path is ignored. Use this for non-SQLite backends (PostgreSQL, DuckDB, MariaDB).

Returns

Number of rows written (invisible).

Examples

\donttest{
db <- tempfile(fileext = ".db")
morie_cache_store(
  data = data.frame(x = rnorm(50), y = rnorm(50)),
  table_name = "demo",
  db_path = db
)
file.remove(db)
}

Returns a data.frame describing every dataset available through the MORIE data management system. Each row maps a short catalog key to its source, survey, year, file format, local path, SQLite table name, and CKAN resource ID (if available).

Usage

morie_dataset_catalog()

Returns

A data.frame with 44 rows (one per dataset) and columns: key, name, source, survey, year, format, type, large_file, local_path, table_name, ckan_resource_id, download_url, zip_member. The download_url / zip_member columns are empty for datasets reachable through the SQLite cache or the CKAN datastore.

Details

Keys match the Python DATASET_CATALOG in data.py exactly. Use ``morie_load_dataset`` to load by key.

Examples

cat <- morie_dataset_catalog()
nrow(cat)
head(cat[, c("key", "name", "source", "year")])
# Find Ontario carceral datasets:
cat[
  grepl("OTIS|Ontario", paste(cat$source, cat$survey)),
  c("key", "year")
]

Get metadata for a single dataset

Usage

morie_dataset_info(key)

Arguments

key

Dataset catalog key (or fuzzy match).

Returns

A named list with dataset metadata.

Examples

# Use a real catalog key (run `morie_dataset_catalog()$key` to list them):
info <- morie_dataset_info("ocp21")
info$source
info$year
# Fuzzy match works for partial / forgiving keys:
morie_dataset_info("cpads")$key

Opens (or creates) the per-user cache database. The default backend is strong{DuckDB} — zero-config like SQLite, but vectorised + columnar, so it handles the multi-GB-scale open-data PUMFs (TPS, CPADS bulk) that morie ingests without breaking down on analytical queries. For back-compat, an existing SQLite cache at morie.db is reused; if duckdb is unavailable, falls back to SQLite.

Usage

morie_db_connect(db_path = NULL)

Arguments

db_path

Optional path to a DuckDB (*.duckdb) or SQLite (*.db) file. Defaults to the MORIE_CACHE_DB env var, else morie.duckdb / morie.db in the per-user cache directory.

Returns

A DBI connection object.

Details

For non-default backends (PostgreSQL, MariaDB, MS SQL Server, …), construct your own DBI connection and pass it as con to the verb{morie_cache_*} and morie_load_dataset functions:

preformatted{ con <- DBI::dbConnect(RPostgres::Postgres(),

host = “…”, dbname = “morie”, user = “…”, password = “…”)

morie_load_dataset(“ocp21”, con = con) }

Examples

\donttest{
# DuckDB (default when 'duckdb' is installed); pass a '.db' path for SQLite.
if (requireNamespace("duckdb", quietly = TRUE) &&
  requireNamespace("DBI", quietly = TRUE)) {
  tmp <- tempfile(fileext = ".duckdb")
  con <- morie_db_connect(db_path = tmp)
  DBI::dbListTables(con)
  DBI::dbDisconnect(con)
  file.remove(tmp)
}
}

Downloads large bootstrap weight CSVs that are too big to ship with the package. Data is cached in the user cache database for future use.

Usage

morie_download_bootstrap(
  survey = "all",
  limit = 32000L,
  db_path = NULL,
  con = NULL
)

Arguments

survey

One of "csads_2021", "csads_2023", "csus_2019", "csus_2023", or "all" (default).

limit

Max records per CKAN request (default 32000).

db_path

Optional path to a SQLite/DuckDB file (default backend).

con

Optional pre-opened DBI connection (overrides db_path).

Returns

Invisibly, the number of CSV files successfully downloaded.

Examples

\donttest{
# See the package vignettes for usage examples:
#   vignette(package = "morie")
}

Fetch data from the CKAN API and cache it

Usage

morie_fetch_ckan(
  dataset_key = "cpads",
  limit = Inf,
  db_path = NULL,
  resource_id = NULL,
  con = NULL
)

Arguments

dataset_key

One of "cpads", "csads", "csus".

limit

Maximum records to fetch. The CKAN datastore caps a single request at 32000 rows, so larger resources are paged through with offset; the default reads the entire resource.

db_path

Optional override for the database path.

resource_id

Optional CKAN datastore resource id. When supplied (e.g. from morie_dataset_catalog()$ckan_resource_id) it is used directly, so any catalogued dataset can be fetched without a built-in database; dataset_key then only labels the cache table.

con

Optional pre-opened DBI connection (overrides db_path).

Returns

A data.frame.

Examples

\dontrun{
# Requires network access. Fetches the first 5000 rows of the
# Canadian Postsecondary Alcohol and Drug Use Survey from the
# Government of Canada CKAN datastore:
cpads <- morie_fetch_ckan(dataset_key = "cpads", limit = 5000L)
nrow(cpads)
}

List all datasets with cache status

Usage

morie_list_datasets(db_path = NULL, con = NULL)

Arguments

db_path

Optional path to a SQLite/DuckDB file (default backend).

con

Optional pre-opened DBI connection (overrides db_path).

Returns

A data.frame with columns: key, name, source, survey, year, type, cached (logical), rows (integer or NA).

Examples

morie_list_datasets()

Resolution order: enumerate{ item Local RDS/CSV files in standard project locations item SQLite cache (data/cache/morie.db) item CKAN API fetch (requires internet) }

Usage

morie_load_cpads(db_path = NULL, use_ckan = TRUE, con = NULL)

Arguments

db_path

Optional path to a SQLite/DuckDB file (default backend).

use_ckan

Logical; if TRUE and data not found locally or in cache, attempt to fetch from the CKAN API.

con

Optional pre-opened DBI connection (overrides db_path).

Returns

A data.frame with canonical CPADS columns.

Examples

\dontrun{
# Needs the CPADS PUMF (local file, cache, or a live CKAN fetch).
cpads <- morie_load_cpads(use_ckan = TRUE)
if (!is.null(cpads)) head(cpads)
}

Resolution tiers, tried in order: built-in DB -> user cache -> local file -> CKAN datastore -> direct download URL -> ArcGIS layer -> error. Supports fuzzy matching: morie_load_dataset("cpads_2021") resolves to ocp21.

Usage

morie_load_dataset(key, db_path = NULL, refresh = FALSE, con = NULL)

Arguments

key

Dataset catalog key (or fuzzy match).

db_path

Optional path to a SQLite/DuckDB file (default backend).

refresh

If TRUE, bypass the built-in database and the user cache (and, for remotely-backed datasets, the local file) and re-fetch from the remote source, overwriting the cached copy. Use this to pick up time-to-time updates to a dataset.

con

Optional pre-opened DBI connection for the user cache (overrides db_path). The built-in DB read is always SQLite-based and is unaffected by con.

Returns

A data.frame.

Examples

\dontrun{
df <- morie_load_dataset("ocp21") # CPADS 2021-2022 (default DuckDB cache)
df <- morie_load_dataset("ocp21", refresh = TRUE) # force re-fetch

# PostgreSQL cache (run a server first):
# con <- DBI::dbConnect(RPostgres::Postgres(),
#   host = "localhost", dbname = "morie", user = "...")
# df <- morie_load_dataset("ocp21", con = con)
}

Resolve standard project paths

Usage

morie_paths(project_root = NULL)

Arguments

project_root

Project root directory. If NULL, inferred from the current working directory.

Returns

Named list of key paths.

Examples

tryCatch(morie_paths(),
  error = function(e) message("not inside a morie project tree")
)

Lists or retrieves bundled userguide PDF files. These are the official PUMF codebooks and user guides from Health Canada / Statistics Canada.

Usage

morie_userguide(name = NULL)

Arguments

name

Filename (e.g., "20212022-cpads-pumf-user-guide.pdf"). If NULL, lists all available userguides.

Returns

File path string, or character vector of filenames.

Examples

morie_userguide()

Workflow + audit

Note

Documentation for R function ask_percy() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function audit_public_outputs() is pending. Run roxygen2 to generate the .Rd file.

Build a Perseus agent prompt

Usage

morie_build_prompt(question, context = NULL)

build_assistant_prompt(question, context = NULL)

Arguments

question

User question.

context

Optional context string.

Returns

Character scalar prompt.

Examples

# See the package vignettes for usage examples:
#   vignette(package = "morie")

Note

Documentation for R function build_outputs_manifest() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function build_prompt() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function cpads_contract() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function default_synthetic_name_map() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function default_workflow_map() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function find_project_root() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function list_morie_modules() is pending. Run roxygen2 to generate the .Rd file.

Other

Note

Documentation for R function paired_t_test() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function point_biserial_r() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function power_prop_test() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function power_t_test() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function pps_sample() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function proportion_ci() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function read_outputs_manifest() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function risk_difference_ci() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function risk_ratio_ci() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function run_ebac_selection_ipw_analysis() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function run_morie_module() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function run_morie_modules() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function run_pipeline() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function run_propensity_ipw_analysis() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function run_workflow_step() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function sample_size_logistic() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function sensitivity_rosenbaum() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function shapiro_wilk_test() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function simple_random_sample() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function spearman_rho() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function stratified_sample() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function summarize_output_audit() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function two_sample_t_test() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function validate_cpads_data() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function validate_outputs_manifest() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function wilcoxon_signed_rank_test() is pending. Run roxygen2 to generate the .Rd file.

Note

Documentation for R function write_synthetic_data() is pending. Run roxygen2 to generate the .Rd file.