Evaluate Out-of-Sample Genomic Prediction

Evaluate Out-of-Sample Genomic Prediction

Description

Computes prediction accuracy metrics for test individuals. Designed to complement predict.masreml() for out-of-sample evaluation. Works with any GEBV source (masreml, masbayes, or other models).

Usage

evaluate_prediction(gebv, y, h2 = NULL, tbv = NULL, fitted_prob = NULL)

Arguments

gebv numeric vector of predicted GEBV for test individuals
y numeric vector of observed phenotypes for test individuals
h2 numeric, heritability from fitted model for r_MG computation. Typically fit$varcomp$h2 (single component) or a specific component. If NULL, r_MG is returned as NA.
tbv numeric vector of true breeding values (simulation only). If provided, r_MG_true = cor(gebv, tbv) is computed directly without requiring h2. If NULL, r_MG_true is NA.
fitted_prob numeric vector of fitted probabilities P(y=1) for binary trait, typically pred$fitted from predict.masreml(). If NULL, AUC is returned as NA.

Value

data.frame with columns:

  • r_test_y: predictive ability. Continuous: cor(gebv, y). Binary: cor(fitted_prob, y) on the observed (probability) scale.

  • r_test_g: cor(gebv, tbv) — accuracy vs true BV on the genetic-value scale (uses gebv for both continuous and binary). NA if tbv NULL.

  • bias: regression slope. Continuous: lm(y ~ gebv); binary: lm(y ~ fitted_prob) = calibration slope. Both interpret 1.0 = unbiased / well-calibrated, <1 over-dispersion, >1 under-dispersion.

  • r_MG: cor(gebv, y) / sqrt(h2) — heritability-adjusted accuracy on the GEBV scale (uses gebv not fitted_prob even for binary, so h2 is on the same scale). NA if h2 NULL.

  • AUC: area under ROC curve, computed from fitted_prob. Rank-invariant so unaffected by inverse-link transformation. NA if fitted_prob NULL.

  • RMSE: continuous: sqrt(mean((gebv - y)^2)); binary: sqrt(mean((fitted_prob - y)^2)) \(\approx\) sqrt(Brier).

See Also

predict.masreml, compute_accuracy

Examples

library("masreml")

pred <- predict(fit, G_full = list(g = G_full),
                train_ids = train_ids, test_ids = test_ids)

# Continuous trait
evaluate_prediction(
  gebv = pred$total_gebv,
  y    = y_test,
  h2   = fit$varcomp$h2["g"],
  tbv  = tbv_test
)

# Binary trait
evaluate_prediction(
  gebv        = pred$total_gebv,
  y           = y_test,
  h2          = fit$varcomp$h2["g"],
  fitted_prob = pred$fitted
)