Performs nested K-fold CV: the outer loop estimates generalisation performance while an inner CV grid search picks the best hyperparameter configuration on each outer training fold. Two calling conventions are supported for backward compatibility:
Usage
nested_cross_validate(
fit_fn = NULL,
predict_fn = NULL,
X = NULL,
y = NULL,
score_fn = NULL,
hyperparam_grid = NULL,
outer_k = 5L,
inner_k = 3L,
scoring = "roc_auc",
random_state = 42L,
tune_fn = NULL,
outer_folds = NULL
)Arguments
- fit_fn
Function with signature
(X, y, hyperparams) -> modelaccepting a single hyperparameter list (full form only).- predict_fn
Function with signature
(model, X) -> y_pred.- X
Numeric predictor matrix (or coercible).
- y
Response vector.
- score_fn
Optional custom scoring function
(y_true, y_pred) -> numeric(1). Higher is better. IfNULL, the named scoring rule viascoringis used.- hyperparam_grid
Named list of candidate vectors (one per hyperparameter). The Cartesian product defines the search grid.
- outer_k
Number of outer folds (default 5).
- inner_k
Number of inner folds (default 3).
- scoring
Named scoring rule passed to the internal scorer (
"roc_auc","accuracy","brier"). Used only ifscore_fnisNULL.- random_state
Integer seed for fold construction (default 42).
- tune_fn
Deprecated legacy positional argument; see Description.
- outer_folds
Deprecated alias for
outer_k(legacy stub form).
Value
Named list with outer_scores (numeric vector, length
outer_k), best_hyperparams_per_fold (list of named lists),
mean_score, se_score, and n_configs.
Details
Legacy stub form:
nested_cross_validate(tune_fn, predict_fn, X, y, outer_folds, scoring, random_state)wheretune_fn(X, y)returns a fitted model (no grid argument). In this mode no inner search is run.Full form: pass
fit_fn,predict_fn,score_fn, andhyperparam_grid(a named list of candidate vectors). The function enumerates the Cartesian product, runs inner K-fold CV on each outer training fold, picks the best configuration, refits on the full outer-train fold, and scores on the held-out outer fold.
Examples
set.seed(1)
n <- 120
X <- matrix(rnorm(n * 3), n, 3)
y <- as.integer(plogis(X[, 1]) > runif(n))
fit_fn <- function(X, y, hp) {
df <- data.frame(y = y, X)
suppressWarnings(stats::glm(y ~ ., data = df, family = stats::binomial()))
}
predict_fn <- function(model, X) {
stats::predict(model, newdata = data.frame(X), type = "response")
}
nested_cross_validate(fit_fn = fit_fn, predict_fn = predict_fn,
X = X, y = y,
hyperparam_grid = list(dummy = c(1)),
outer_k = 3L, inner_k = 2L)
#> $outer_scores
#> [1] 0.7192982 0.7775000 0.6428571
#>
#> $best_hyperparams_per_fold
#> $best_hyperparams_per_fold[[1]]
#> $best_hyperparams_per_fold[[1]]$dummy
#> [1] 1
#>
#>
#> $best_hyperparams_per_fold[[2]]
#> $best_hyperparams_per_fold[[2]]$dummy
#> [1] 1
#>
#>
#> $best_hyperparams_per_fold[[3]]
#> $best_hyperparams_per_fold[[3]]$dummy
#> [1] 1
#>
#>
#>
#> $mean_score
#> [1] 0.7132185
#>
#> $se_score
#> [1] 0.03898674
#>
#> $n_configs
#> [1] 1
#>