R API¶
Part of API Reference — MORIE API reference.
Reference for every public function exported by the
morie R package. Signatures and descriptions come from the
Roxygen2 .Rd files in r-package/morie/man/; see
Statistical Methods for the methodology behind each function.
Causal estimators¶
Note
Documentation for R function estimate_aipw() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function estimate_atc() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function estimate_ate() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function estimate_att() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function estimate_cate() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function estimate_g_computation() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function estimate_gate() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function estimate_late() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function estimate_propensity_scores() is pending. Run roxygen2 to generate the .Rd file.
Effect sizes + tests¶
Note
Documentation for R function anova_one_way() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function chi_square_test() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function cohens_d() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function cramers_v() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function e_value() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function effective_sample_size() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function eta_squared() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function fisher_exact_test() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function hedges_g() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function kendall_tau() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function kruskal_wallis_test() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function levene_test() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function mann_whitney_test() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function odds_ratio_ci() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function omega_squared() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function one_sample_t_test() is pending. Run roxygen2 to generate the .Rd file.
Survey + sampling¶
Note
Documentation for R function bootstrap_sample() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function calibration_weights() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function cluster_sample() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function compute_design_weights() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function design_effect() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function generate_synthetic_data() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function jackknife_estimate() is pending. Run roxygen2 to generate the .Rd file.
Datasets + I/O¶
Note
Documentation for R function canonicalize_cpads_data() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function load_cpads_data() is pending. Run roxygen2 to generate the .Rd file.
Query Perseus via Python
Usage
morie_ask_percy(
question,
context = NULL,
python_bin = Sys.getenv("MORIE_PYTHON_BIN", "python3")
)
morie_assistant_query(
question,
context = NULL,
python_bin = Sys.getenv("MORIE_PYTHON_BIN", "python3")
)
Arguments
questionUser question.
contextOptional context string.
python_binPython executable to use. Defaults to
MORIE_PYTHON_BINorpython3.
Returns
Agent text response.
Examples
# See the package vignettes for usage examples:
# vignette(package = "morie")
Returns the path to morie.db that ships with the package
(inst/extdata/morie.db). This database contains all CPADS,
CCS, CSADS, CSUS, HealthInfobase, and CIHI datasets pre-loaded as
SQLite tables.
Usage
morie_builtin_db()
Returns
File path string.
Examples
morie_builtin_db()
Reads a local file and writes it to the cache so that CI and Docker environments (which may lack the original files) can still run tests.
Usage
morie_cache_file(path, table_name, db_path = NULL, con = NULL)
Arguments
pathPath to a CSV or RDS file.
table_nameName for the cached table.
db_pathOptional path to a SQLite file (default backend).
conOptional pre-opened DBI connection (overrides
db_path).
Returns
Number of rows cached (invisible).
Examples
tdir <- tempfile("morie-cache-")
dir.create(tdir)
f <- file.path(tdir, "demo.csv")
write.csv(data.frame(x = 1:3, y = 4:6), f, row.names = FALSE)
morie_cache_file(f, "demo", db_path = file.path(tdir, "cache.db"))
List all tables in the MORIE cache
Usage
morie_cache_list(db_path = NULL, con = NULL)
Arguments
db_pathOptional path to a SQLite file (default backend).
conOptional pre-opened DBI connection (overrides
db_path).
Returns
A data.frame with columns
tableandrows.
Examples
\donttest{
db <- tempfile(fileext = ".db")
morie_cache_store(data.frame(x = 1:3), "demo", db_path = db)
morie_cache_list(db_path = db)
file.remove(db)
}
Load a table from the MORIE cache
Usage
morie_cache_load(table_name, db_path = NULL, con = NULL)
Arguments
table_nameName of the table.
db_pathOptional path to a SQLite file (default backend).
conOptional pre-opened DBI connection (overrides
db_path).
Returns
A data.frame, or
NULLif the table does not exist.
Examples
\donttest{
db <- tempfile(fileext = ".db")
morie_cache_store(
data = data.frame(x = 1:5),
table_name = "demo",
db_path = db
)
morie_cache_load(table_name = "demo", db_path = db)
file.remove(db)
}
Writes (or replaces) a table in the shared SQLite cache.
Usage
morie_cache_store(data, table_name, db_path = NULL, con = NULL)
Arguments
dataA data.frame to cache.
table_nameName of the destination table.
db_pathOptional path to a SQLite file (default backend).
conOptional pre-opened DBI connection. When supplied, the table is written through
conanddb_pathis ignored. Use this for non-SQLite backends (PostgreSQL, DuckDB, MariaDB).
Returns
Number of rows written (invisible).
Examples
\donttest{
db <- tempfile(fileext = ".db")
morie_cache_store(
data = data.frame(x = rnorm(50), y = rnorm(50)),
table_name = "demo",
db_path = db
)
file.remove(db)
}
Returns a data.frame describing every dataset available through the MORIE data management system. Each row maps a short catalog key to its source, survey, year, file format, local path, SQLite table name, and CKAN resource ID (if available).
Usage
morie_dataset_catalog()
Returns
A data.frame with 44 rows (one per dataset) and columns: key, name, source, survey, year, format, type, large_file, local_path, table_name, ckan_resource_id, download_url, zip_member. The
download_url/zip_membercolumns are empty for datasets reachable through the SQLite cache or the CKAN datastore.
Details
Keys match the Python DATASET_CATALOG in data.py exactly.
Use ``morie_load_dataset`` to load by key.
Examples
cat <- morie_dataset_catalog()
nrow(cat)
head(cat[, c("key", "name", "source", "year")])
# Find Ontario carceral datasets:
cat[
grepl("OTIS|Ontario", paste(cat$source, cat$survey)),
c("key", "year")
]
Get metadata for a single dataset
Usage
morie_dataset_info(key)
Arguments
keyDataset catalog key (or fuzzy match).
Returns
A named list with dataset metadata.
Examples
# Use a real catalog key (run `morie_dataset_catalog()$key` to list them):
info <- morie_dataset_info("ocp21")
info$source
info$year
# Fuzzy match works for partial / forgiving keys:
morie_dataset_info("cpads")$key
Opens (or creates) the per-user cache database. The default backend
is strong{DuckDB} — zero-config like SQLite, but vectorised + columnar,
so it handles the multi-GB-scale open-data PUMFs (TPS, CPADS bulk)
that morie ingests without breaking down on analytical queries. For
back-compat, an existing SQLite cache at morie.db is reused; if
duckdb is unavailable, falls back to SQLite.
Usage
morie_db_connect(db_path = NULL)
Arguments
db_pathOptional path to a DuckDB (
*.duckdb) or SQLite (*.db) file. Defaults to theMORIE_CACHE_DBenv var, elsemorie.duckdb/morie.dbin the per-user cache directory.
Returns
A DBI connection object.
Details
For non-default backends (PostgreSQL, MariaDB, MS SQL Server, …),
construct your own DBI connection and pass it as con to the
verb{morie_cache_*} and morie_load_dataset functions:
preformatted{ con <- DBI::dbConnect(RPostgres::Postgres(),
host = “…”, dbname = “morie”, user = “…”, password = “…”)
morie_load_dataset(“ocp21”, con = con) }
Examples
\donttest{
# DuckDB (default when 'duckdb' is installed); pass a '.db' path for SQLite.
if (requireNamespace("duckdb", quietly = TRUE) &&
requireNamespace("DBI", quietly = TRUE)) {
tmp <- tempfile(fileext = ".duckdb")
con <- morie_db_connect(db_path = tmp)
DBI::dbListTables(con)
DBI::dbDisconnect(con)
file.remove(tmp)
}
}
Downloads large bootstrap weight CSVs that are too big to ship with the package. Data is cached in the user cache database for future use.
Usage
morie_download_bootstrap(
survey = "all",
limit = 32000L,
db_path = NULL,
con = NULL
)
Arguments
surveyOne of
"csads_2021","csads_2023","csus_2019","csus_2023", or"all"(default).limitMax records per CKAN request (default 32000).
db_pathOptional path to a SQLite/DuckDB file (default backend).
conOptional pre-opened DBI connection (overrides
db_path).
Returns
Invisibly, the number of CSV files successfully downloaded.
Examples
\donttest{
# See the package vignettes for usage examples:
# vignette(package = "morie")
}
Fetch data from the CKAN API and cache it
Usage
morie_fetch_ckan(
dataset_key = "cpads",
limit = Inf,
db_path = NULL,
resource_id = NULL,
con = NULL
)
Arguments
dataset_keyOne of
"cpads","csads","csus".limitMaximum records to fetch. The CKAN datastore caps a single request at 32000 rows, so larger resources are paged through with
offset; the default reads the entire resource.db_pathOptional override for the database path.
resource_idOptional CKAN datastore resource id. When supplied (e.g. from
morie_dataset_catalog()$ckan_resource_id) it is used directly, so any catalogued dataset can be fetched without a built-in database;dataset_keythen only labels the cache table.conOptional pre-opened DBI connection (overrides
db_path).
Returns
A data.frame.
Examples
\dontrun{
# Requires network access. Fetches the first 5000 rows of the
# Canadian Postsecondary Alcohol and Drug Use Survey from the
# Government of Canada CKAN datastore:
cpads <- morie_fetch_ckan(dataset_key = "cpads", limit = 5000L)
nrow(cpads)
}
List all datasets with cache status
Usage
morie_list_datasets(db_path = NULL, con = NULL)
Arguments
db_pathOptional path to a SQLite/DuckDB file (default backend).
conOptional pre-opened DBI connection (overrides
db_path).
Returns
A data.frame with columns: key, name, source, survey, year, type, cached (logical), rows (integer or NA).
Examples
morie_list_datasets()
Resolution order:
enumerate{
item Local RDS/CSV files in standard project locations
item SQLite cache (data/cache/morie.db)
item CKAN API fetch (requires internet)
}
Usage
morie_load_cpads(db_path = NULL, use_ckan = TRUE, con = NULL)
Arguments
db_pathOptional path to a SQLite/DuckDB file (default backend).
use_ckanLogical; if TRUE and data not found locally or in cache, attempt to fetch from the CKAN API.
conOptional pre-opened DBI connection (overrides
db_path).
Returns
A data.frame with canonical CPADS columns.
Examples
\dontrun{
# Needs the CPADS PUMF (local file, cache, or a live CKAN fetch).
cpads <- morie_load_cpads(use_ckan = TRUE)
if (!is.null(cpads)) head(cpads)
}
Resolution tiers, tried in order: built-in DB -> user cache -> local
file -> CKAN datastore -> direct download URL -> ArcGIS layer ->
error. Supports fuzzy matching: morie_load_dataset("cpads_2021")
resolves to ocp21.
Usage
morie_load_dataset(key, db_path = NULL, refresh = FALSE, con = NULL)
Arguments
keyDataset catalog key (or fuzzy match).
db_pathOptional path to a SQLite/DuckDB file (default backend).
refreshIf
TRUE, bypass the built-in database and the user cache (and, for remotely-backed datasets, the local file) and re-fetch from the remote source, overwriting the cached copy. Use this to pick up time-to-time updates to a dataset.conOptional pre-opened DBI connection for the user cache (overrides
db_path). The built-in DB read is always SQLite-based and is unaffected bycon.
Returns
A data.frame.
Examples
\dontrun{
df <- morie_load_dataset("ocp21") # CPADS 2021-2022 (default DuckDB cache)
df <- morie_load_dataset("ocp21", refresh = TRUE) # force re-fetch
# PostgreSQL cache (run a server first):
# con <- DBI::dbConnect(RPostgres::Postgres(),
# host = "localhost", dbname = "morie", user = "...")
# df <- morie_load_dataset("ocp21", con = con)
}
Resolve standard project paths
Usage
morie_paths(project_root = NULL)
Arguments
project_rootProject root directory. If
NULL, inferred from the current working directory.
Returns
Named list of key paths.
Examples
tryCatch(morie_paths(),
error = function(e) message("not inside a morie project tree")
)
Lists or retrieves bundled userguide PDF files. These are the official PUMF codebooks and user guides from Health Canada / Statistics Canada.
Usage
morie_userguide(name = NULL)
Arguments
nameFilename (e.g.,
"20212022-cpads-pumf-user-guide.pdf"). IfNULL, lists all available userguides.
Returns
File path string, or character vector of filenames.
Examples
morie_userguide()
Workflow + audit¶
Note
Documentation for R function ask_percy() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function audit_public_outputs() is pending. Run roxygen2 to generate the .Rd file.
Build a Perseus agent prompt
Usage
morie_build_prompt(question, context = NULL)
build_assistant_prompt(question, context = NULL)
Arguments
questionUser question.
contextOptional context string.
Returns
Character scalar prompt.
Examples
# See the package vignettes for usage examples:
# vignette(package = "morie")
Note
Documentation for R function build_outputs_manifest() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function build_prompt() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function cpads_contract() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function default_synthetic_name_map() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function default_workflow_map() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function find_project_root() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function list_morie_modules() is pending. Run roxygen2 to generate the .Rd file.
Other¶
Note
Documentation for R function paired_t_test() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function point_biserial_r() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function power_prop_test() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function power_t_test() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function pps_sample() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function proportion_ci() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function read_outputs_manifest() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function risk_difference_ci() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function risk_ratio_ci() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function run_ebac_selection_ipw_analysis() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function run_morie_module() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function run_morie_modules() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function run_pipeline() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function run_propensity_ipw_analysis() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function run_workflow_step() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function sample_size_logistic() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function sensitivity_rosenbaum() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function shapiro_wilk_test() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function simple_random_sample() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function spearman_rho() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function stratified_sample() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function summarize_output_audit() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function two_sample_t_test() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function validate_cpads_data() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function validate_outputs_manifest() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function wilcoxon_signed_rank_test() is pending. Run roxygen2 to generate the .Rd file.
Note
Documentation for R function write_synthetic_data() is pending. Run roxygen2 to generate the .Rd file.