GWAS-Assisted Best Linear Unbiased Prediction (GWABLUP)

GWAS-Assisted Best Linear Unbiased Prediction (GWABLUP)

Description

Genomic prediction using GWAS-weighted relationship matrix. Implements Meuwissen et al. (2024) GWABLUP: weights all markers by their posterior probability of having non-zero effects, derived from EMMAX GWAS results.

Usage

gwablup(
  y,
  markers,
  gwas_result,
  X = NULL,
  method = "auto",
  solver = "auto",
  max_iter = 100L,
  tol = 1e-06,
  n_threads = NULL,
  min_weight = 1e-04,
  ref_markers = NULL,
  trait = "continuous",
  link = "logit"
)

Arguments

y numeric named vector of phenotypes (length n)
markers

list of raw marker inputs:

  • snp_add: matrix (n x m), raw genotype 0/1/2

  • mh_add: list of chr data.frames

gwas_result object of class “gwas_result” from run_gwas(). Must be run with the same markers and individuals.
X fixed effects design matrix (n x c). If NULL, intercept only
method character, REML algorithm (default "auto"). Passed to masreml()
solver character, EBV solver (default "auto"). Passed to masreml()
max_iter integer, maximum REML iterations (default 100)
tol numeric, convergence tolerance (default 1e-6)
n_threads integer, number of threads (default: all cores)
ref_markers list of raw marker data for training individuals only, same format as markers. If provided, allele frequencies for building G_wa are computed from ref_markers instead of markers. Use in train/test context to avoid data leakage. If NULL (default), allele frequencies computed from markers.
trait character, trait distribution: “continuous” (default) or “binary”. Passed to masreml().
link character, link function for binary trait: “logit” (default) or “probit”. Passed to masreml().

Details

User workflow:

  1. Fit standard GBLUP: fit <- masreml(y, markers)

  2. Run GWAS: gwas <- run_gwas(markers, y, fit)

  3. Inspect GWAS results (optional): summary(gwas)

  4. Run GWABLUP: fit_wa <- gwablup(y, markers, gwas)

Value

Object of class c(“gwablup”, “masreml”) — identical structure to masreml() output, with additional $gwas slot containing the gwas_result. All S3 methods from masreml (summary, print, plot) work directly.

References

Meuwissen et al. (2024) GWABLUP: genome-wide association assisted BLUP. Genet Sel Evol. 56:17.

See Also

masreml, run_gwas, compute_accuracy

Examples

library("masreml")

# ── SNP GWABLUP ───────────────────────────────────────────
fit    <- masreml(y, markers = list(snp_add = W))
gwas   <- run_gwas(markers = list(snp_add = W), y = y, masreml_fit = fit)
fit_wa <- gwablup(y, markers = list(snp_add = W), gwas_result = gwas)
summary(fit_wa)

# ── MH GWABLUP ────────────────────────────────────────────
fit    <- masreml(y, markers = list(mh_add = mh_list))
gwas   <- run_gwas(markers = list(mh_add = mh_list), y = y, masreml_fit = fit)
fit_wa <- gwablup(y, markers = list(mh_add = mh_list), gwas_result = gwas)
summary(fit_wa)

# ── Compare GBLUP vs GWABLUP ──────────────────────────────
fit_gblup   <- masreml(y, markers = list(snp_add = W))
gwas        <- run_gwas(list(snp_add = W), y, fit_gblup)
fit_gwablup <- gwablup(y, list(snp_add = W), gwas)
cor(fit_gblup$total_gebv, fit_gwablup$total_gebv)

# ── Train/test workflow (no data leakage) ─────────────────
fit_tr  <- masreml(y_train, markers = list(snp_add = W_train))
gwas_tr <- run_gwas(
  markers     = list(snp_add = W_train),
  y           = y_train,
  masreml_fit = fit_tr,
  ref_markers = list(snp_add = W_train)
)
fit_wa_tr <- gwablup(
  y           = y_train,
  markers     = list(snp_add = W_train),
  gwas_result = gwas_tr,
  ref_markers = list(snp_add = W_train)
)

# ── Binary trait GWABLUP ──────────────────────────────────
fit_tr_bin <- masreml(y_bin_train, markers = list(snp_add = W_train),
                      trait = "binary")
gwas_bin   <- run_gwas(list(snp_add = W_train), y_bin_train, fit_tr_bin,
                       ref_markers = list(snp_add = W_train))
fit_wa_bin <- gwablup(y_bin_train, list(snp_add = W_train), gwas_bin,
                      trait = "binary", ref_markers = list(snp_add = W_train))
G_full <- build_G_snp(W_all, ref_W = W_train)
pred   <- predict(fit_wa_tr,
                  G_full    = list(snp_add = G_full),
                  train_ids = rownames(W_train),
                  test_ids  = rownames(W_test))