GWAS-Assisted Best Linear Unbiased Prediction (GWABLUP)

Description

Genomic prediction using GWAS-weighted relationship matrix. Implements Meuwissen et al. (2024) GWABLUP: weights all markers by their posterior probability of having non-zero effects, derived from EMMAX GWAS results.

Usage

gwablup(
  y,
  markers,
  gwas_result,
  X = NULL,
  method = "auto",
  solver = "auto",
  max_iter = 100L,
  tol = 1e-06,
  n_threads = NULL,
  min_weight = 1e-04,
  ref_markers = NULL,
  trait = "continuous",
  link = "logit"
)

Arguments

`y`	numeric named vector of phenotypes (length n)
`markers`	list of raw marker inputs: `snp_add`: matrix (n x m), raw genotype 0/1/2 `mh_add`: list of chr data.frames
`gwas_result`	object of class `“gwas_result”` from `run_gwas()`. Must be run with the same markers and individuals.
`X`	fixed effects design matrix (n x c). If NULL, intercept only
`method`	character, REML algorithm (default "auto"). Passed to `masreml()`
`solver`	character, EBV solver (default "auto"). Passed to `masreml()`
`max_iter`	integer, maximum REML iterations (default 100)
`tol`	numeric, convergence tolerance (default 1e-6)
`n_threads`	integer, number of threads (default: all cores)
`ref_markers`	list of raw marker data for training individuals only, same format as `markers`. If provided, allele frequencies for building G_wa are computed from `ref_markers` instead of `markers`. Use in train/test context to avoid data leakage. If NULL (default), allele frequencies computed from `markers`.
`trait`	character, trait distribution: `“continuous”` (default) or `“binary”`. Passed to `masreml()`.
`link`	character, link function for binary trait: `“logit”` (default) or `“probit”`. Passed to `masreml()`.

Details

User workflow:

Fit standard GBLUP: fit <- masreml(y, markers)
Run GWAS: gwas <- run_gwas(markers, y, fit)
Inspect GWAS results (optional): summary(gwas)
Run GWABLUP: fit_wa <- gwablup(y, markers, gwas)

Value

Object of class c(“gwablup”, “masreml”) — identical structure to masreml() output, with additional $gwas slot containing the gwas_result. All S3 methods from masreml (summary, print, plot) work directly.

References

Meuwissen et al. (2024) GWABLUP: genome-wide association assisted BLUP. Genet Sel Evol. 56:17.

Examples

library("masreml")

# ── SNP GWABLUP ───────────────────────────────────────────
fit    <- masreml(y, markers = list(snp_add = W))
gwas   <- run_gwas(markers = list(snp_add = W), y = y, masreml_fit = fit)
fit_wa <- gwablup(y, markers = list(snp_add = W), gwas_result = gwas)
summary(fit_wa)

# ── MH GWABLUP ────────────────────────────────────────────
fit    <- masreml(y, markers = list(mh_add = mh_list))
gwas   <- run_gwas(markers = list(mh_add = mh_list), y = y, masreml_fit = fit)
fit_wa <- gwablup(y, markers = list(mh_add = mh_list), gwas_result = gwas)
summary(fit_wa)

# ── Compare GBLUP vs GWABLUP ──────────────────────────────
fit_gblup   <- masreml(y, markers = list(snp_add = W))
gwas        <- run_gwas(list(snp_add = W), y, fit_gblup)
fit_gwablup <- gwablup(y, list(snp_add = W), gwas)
cor(fit_gblup$total_gebv, fit_gwablup$total_gebv)

# ── Train/test workflow (no data leakage) ─────────────────
fit_tr  <- masreml(y_train, markers = list(snp_add = W_train))
gwas_tr <- run_gwas(
  markers     = list(snp_add = W_train),
  y           = y_train,
  masreml_fit = fit_tr,
  ref_markers = list(snp_add = W_train)
)
fit_wa_tr <- gwablup(
  y           = y_train,
  markers     = list(snp_add = W_train),
  gwas_result = gwas_tr,
  ref_markers = list(snp_add = W_train)
)

# ── Binary trait GWABLUP ──────────────────────────────────
fit_tr_bin <- masreml(y_bin_train, markers = list(snp_add = W_train),
                      trait = "binary")
gwas_bin   <- run_gwas(list(snp_add = W_train), y_bin_train, fit_tr_bin,
                       ref_markers = list(snp_add = W_train))
fit_wa_bin <- gwablup(y_bin_train, list(snp_add = W_train), gwas_bin,
                      trait = "binary", ref_markers = list(snp_add = W_train))
G_full <- build_G_snp(W_all, ref_W = W_train)
pred   <- predict(fit_wa_tr,
                  G_full    = list(snp_add = G_full),
                  train_ids = rownames(W_train),
                  test_ids  = rownames(W_test))