Skip to contents
  1. Fit logistic \(P(\text{trait} | \text{covariates})\) on the survey microdata.

  2. Apply fitted coefficients to area-level marginals from area_df.

  3. Multiply predicted rate by area population to obtain the synthetic "population at risk" exposure offset.

Usage

mrm_synthetic_area_exposure(
  survey_df,
  survey_trait_col,
  survey_covariate_cols,
  area_df,
  area_population_col,
  fit_callable = NULL,
  return_per_area_rate = FALSE
)

Arguments

survey_df

A data.frame of survey microdata (one row per respondent), carrying survey_trait_col (0/1 or logical) and survey_covariate_cols.

survey_trait_col

Character. Name of the binary trait column.

survey_covariate_cols

Character vector of covariates that are present in BOTH the survey and the area dataset.

area_df

A data.frame with one row per area (tract, precinct, etc.); must carry the same covariate columns as area-level proportions / means, plus area_population_col.

area_population_col

Character. Adult-population column in area_df.

fit_callable

Optional function with signature function(X, y) -> coef, returning a coefficient vector of length length(survey_covariate_cols) + 1L (intercept first). Defaults to a base-R Newton-IRLS logistic fit.

return_per_area_rate

Logical; default FALSE. If TRUE the result list also carries predicted_rate.

Value

A named list with classes morie_mrm_result, morie_rich_result, list. Carries exposure (named numeric vector, one entry per area row), predicted_rate (when requested), coef (the fitted logistic coefficient vector), plus interpretation + warnings.

Examples

set.seed(2)
n_survey <- 500
x1 <- rnorm(n_survey); x2 <- rnorm(n_survey)
p  <- 1 / (1 + exp(-(-2 + 0.6 * x1 - 0.4 * x2)))
y  <- rbinom(n_survey, 1, p)
survey <- data.frame(trait = y, x1 = x1, x2 = x2)

area <- data.frame(
  x1 = rnorm(20), x2 = rnorm(20),
  pop = sample(800:1500, 20, replace = TRUE)
)
rownames(area) <- paste0("area_", seq_len(20))
res <- mrm_synthetic_area_exposure(
  survey_df = survey,
  survey_trait_col = "trait",
  survey_covariate_cols = c("x1", "x2"),
  area_df = area,
  area_population_col = "pop"
)
head(res$exposure)
#>    area_1    area_2    area_3    area_4    area_5    area_6 
#>  81.02290  45.79443  32.31382 513.26164  59.79157  51.19841