Skip to contents

Returns a data.frame describing every dataset available through the MORIE data management system. Each row maps a short catalog key to its source, survey, year, file format, local path, SQLite table name, and CKAN resource ID (if available).

Usage

morie_dataset_catalog()

Value

A data.frame with 44 rows (one per dataset) and columns: key, name, source, survey, year, format, type, large_file, local_path, table_name, ckan_resource_id, download_url, zip_member. The download_url / zip_member columns are empty for datasets reachable through the SQLite cache or the CKAN datastore.

Details

Keys match the Python DATASET_CATALOG in data.py exactly. Use morie_load_dataset to load by key.

Examples

cat <- morie_dataset_catalog()
nrow(cat)
#> [1] 44
head(cat[, c("key", "name", "source", "year")])
#>       key                      name source      year
#> 1   ocp21      CPADS 2021-2022 PUMF     oc 2021-2022
#> 2   occ22        CCS 2018-2022 PUMF     oc 2018-2022
#> 3   occ23             CCS 2023 PUMF     oc      2023
#> 4   occ24             CCS 2024 PUMF     oc      2024
#> 5 ocs22mf      CSADS 2021-2022 PUMF     oc 2021-2022
#> 6 ocs22bt CSADS 2021-2022 Bootstrap     oc 2021-2022
# Find Ontario carceral datasets:
cat[
  grepl("OTIS|Ontario", paste(cat$source, cat$survey)),
  c("key", "year")
]
#> [1] key  year
#> <0 rows> (or 0-length row.names)