Skip to contents

Walks every column, infers its NOIR level and epidemiological role, computes summary statistics, and resolves a best-guess treatment, outcome, and survey-weight column. User-supplied hints override heuristic detection.

Usage

morie_dataset_profile(
  df,
  hint_treatment = NULL,
  hint_outcome = NULL,
  hint_weights = NULL,
  ordinal_threshold = 10L,
  binary_threshold = 2L
)

Arguments

df

A data.frame.

hint_treatment

Optional character; force this column as the treatment.

hint_outcome

Optional character; force this column as the outcome.

hint_weights

Optional character; force this column as the survey weight.

ordinal_threshold

Integer; max unique values for a categorical column to be classified as ordinal (default 10).

binary_threshold

Integer; max unique values for a binary column (default 2).

Value

A named list (the dataset profile) with fields n_rows, n_cols, columns (named list of column profiles), suggested_treatment, suggested_outcome, suggested_weights.