Cross-Validation for Genomic Prediction Models

Cross-Validation for Genomic Prediction Models

Description

Performs k-fold or leave-one-out (LOO) cross-validation to estimate prediction accuracy of genomic models fitted with masreml(). Individuals are split into training and validation sets; the model is trained on each training set and used to predict the held-out validation individuals.

Usage

cv_masreml(
  y,
  X = NULL,
  markers = NULL,
  G = NULL,
  folds = 5L,
  scheme = "random",
  method = "auto",
  solver = "auto",
  max_iter = 100L,
  tol = 1e-08,
  n_threads = NULL,
  seed = NULL
)

Arguments

y numeric vector of phenotypes (length n)
X fixed effects matrix (n x c). NULL = intercept only
markers list of raw marker inputs (see masreml)
G list of pre-built G matrices (see masreml)
folds integer, number of CV folds (default 5). Use folds = length(y) for leave-one-out (LOO). Larger values give less biased but more variable estimates.
scheme

character, fold assignment scheme:

  • “random” (default): individuals randomly assigned to folds — recommended for most cases

  • “systematic”: every k-th individual assigned to same fold — useful for structured populations

method character, REML method (see masreml)
solver character, EBV solver (see masreml)
max_iter integer, max REML iterations
tol numeric, convergence tolerance
n_threads integer, number of threads
seed integer, random seed for reproducibility

Value

Object of class “masreml_cv” with elements:

  • accuracy: mean prediction accuracy (r_MG)

  • accuracy_fold: accuracy per fold

  • bias: mean regression slope (bias check)

  • bias_fold: slope per fold

  • gebv_all: GEBV for all individuals

  • fold_assignments: fold ID per individual

  • folds: number of folds used

  • scheme: fold scheme used

  • call: matched call

See Also

masreml, compute_accuracy

Examples

library("masreml")

# 5-fold cross-validation
cv <- cv_masreml(
  y       = y,
  markers = list(snp_add = W),
  folds   = 5,
  seed    = 42
)
summary(cv)

# Leave-one-out cross-validation
cv_loo <- cv_masreml(
  y     = y,
  markers = list(snp_add = W),
  folds = length(y)
)
summary(cv_loo)