library("masreml")
# ── In-sample mode: shortcut to training metrics ──────────
fit <- masreml(y_train, markers = list(snp_add = W_train))
pred <- predict(fit) # no test data → returns training info
pred$metrics # = fit$training_metrics
# ── Mode A: G_full pre-built, with auto-metrics ───────────
G_full <- build_G_snp(W_all) # n_total x n_total
fit <- masreml(y_train, G = list(snp_add = G_train))
pred <- predict(fit,
G_full = list(snp_add = G_full),
train_ids = rownames(W_train),
test_ids = rownames(W_test),
y_new = y_test) # auto-compute test metrics
pred$GEBV
pred$metrics$accuracy
# ── Mode B: markers_new + auto-metrics ────────────────────
fit <- masreml(y_train, markers = list(snp_add = W_train))
pred <- predict(fit,
markers_new = list(snp_add = W_test),
markers_train = list(snp_add = W_train),
y_new = y_test)
pred$metrics$bias # calibration slope (binary) / dispersion
# ── Forecast only (no y_new) ──────────────────────────────
fc <- predict(fit, markers_new = list(snp_add = W_test),
markers_train = list(snp_add = W_train))
fc$GEBV # metrics is NULLPredict GEBV for New Individuals Using a Fitted masreml Model
Predict GEBV for New Individuals Using a Fitted masreml Model
Description
Computes genomic estimated breeding values (GEBV) for test individuals using the relationship between test and training sets. Two modes are supported: pre-built G matrix (full or cross-block) and raw marker data.
Usage
## S3 method for class 'masreml'
predict(
object,
G_full = NULL,
markers_new = NULL,
markers_train = NULL,
train_ids = NULL,
test_ids = NULL,
y_new = NULL,
X_new = NULL,
link = NULL,
jitter = 1e-06,
...
)
Arguments
object
|
fitted masreml object from masreml()
|
G_full
|
numeric matrix (n_total x n_total), pre-built genomic relationship matrix including both training and test individuals. Row/column names must include all training IDs and test IDs. Either G_full or markers_new must be provided. For multi-component models, provide a named list of G matrices, one per component (names must match names(object$gebv)).
|
markers_new
|
list of raw marker data for test individuals only. Used to compute cross-G matrix
markers_train argument.
|
markers_train
|
list of raw marker data for training individuals, required when markers_new is provided. Same format as markers_new. Used to compute consistent allele frequencies and scaling (k) for cross-G computation.
|
train_ids
|
character vector of training individual IDs, in the same order as used when fitting object. Required when using G_full mode to subset the training block.
|
test_ids
|
character vector of test individual IDs. Required for G_full mode. Inferred from markers_new row names if not provided.
|
y_new
|
optional numeric vector of observed phenotypes for test individuals. When supplied, prediction metrics (R2, RMSE, accuracy or AUC, bias) are computed automatically and stored in $metrics. For binary, metrics are on the observed (probability) scale so bias is the calibration slope. Length must match the number of test individuals.
|
X_new
|
optional numeric matrix of fixed-effects design for test individuals (n_test x c). Required only if the original fit used non-intercept fixed effects. Defaults to an intercept column of 1s. |
link
|
character, link function for binary trait prediction: “logit” (default) or “probit”. Inherited from object$binary$link if available.
|
jitter
|
numeric, small value added to diagonal of G[train,train] for numerical stability (default 1e-6).
|
…
|
ignored |
Details
The prediction formula per component k is:
\(\hat{u}_{test,k} = G_{te,tr,k} \cdot (G_{tr,tr,k} + \lambda_k I)^{-1} \cdot \hat{u}_{train,k}\)
where \(\lambda_k = \sigma^2_e / \sigma^2_{g,k}\).
For markers_new mode (SNP), the cross-G is:
\(G_{te,tr} = \frac{W_{te} W_{tr}'}{k}\)
where \(W\) matrices are centered using training allele frequencies and \(k = \sum 2p_j(1-p_j)\) computed from training data.
For binary trait, prediction is on the liability scale. Fitted probabilities are obtained by applying the inverse link function to \(\eta_{test} = \mu_{fixed} + \hat{u}_{test}\).
Value
list of class “masreml_pred”. Field names are aligned with masbayes::predict.masbayes_*() for cross-package workflows. Backward-compatibility aliases (total_gebv, fitted) are retained.
-
GEBV: named numeric vector of total GEBV for test individuals (sum across components). Liability scale for binary. -
total_gebv: alias forGEBV(back-compat). -
gebv: named list of per-component GEBV vectors. -
prob: fitted probabilitiesP(y=1)for binary trait via inverse-link,NULLfor continuous. -
fitted: alias forprob(back-compat). -
metrics: list of test-set metrics ify_newsupplied (R2, RMSE, accuracy/AUC, bias), elseNULL. -
h2, sigma2: heritability and variance components carried over from the fit. -
response_type:“continuous”or“binary”. -
algorithm: REML algorithm from the fit. -
eval_scope:“in-sample (training)”,“test set (G_full)”, or“test set (markers_new)”. -
has_truth:TRUEif metrics were computed. -
train_ids, test_ids: ID vectors. -
n_train, n_test: sample sizes.
See Also
masreml, build_G_snp, build_G_mh