Skip to contents

Generates non-identifying synthetic data suitable for development, testing, and demos. The generator uses a canonical variable set and allows output column renaming through name_map so it can be adapted to multiple studies. Synthetic data should not be used for final inferential reporting.

Usage

morie_generate_synthetic_data(
  n = 5000L,
  seed = 42L,
  special_code_rate = 0.02,
  profile = c("generic", "morie_legacy"),
  name_map = NULL
)

Arguments

n

Number of rows.

seed

Random seed for reproducibility.

special_code_rate

Proportion of values replaced with survey-style special missing codes (97/98/99/997/998/999) in discrete fields.

profile

Convenience profile for output naming; ignored when name_map is supplied.

name_map

Optional named character vector mapping canonical keys to output column names. Use morie_default_synthetic_name_map() as a template.

Value

A data.frame with synthetic records.

Examples

df <- morie_generate_synthetic_data(n = 200, seed = 1)
nrow(df)
#> [1] 200