Run GWAS for SNP or Microhaplotype Markers

Description

Performs genome-wide association study (GWAS) for SNP or microhaplotype (MH) markers. Uses EMMAX (Efficient Mixed-Model Association eXpedited) to control for population structure and relatedness, preventing spurious associations. Results can be inspected for QTL detection or passed directly to gwablup() for GWAS-assisted genomic prediction.

Usage

run_gwas(
  markers,
  y,
  masreml_fit,
  X = NULL,
  pi = 0.001,
  window = 5L,
  ref_markers = NULL
)

Arguments

`markers`	list of raw marker inputs. Provide one of: `snp_add`: numeric matrix (n x m), genotype coded 0/1/2 `mh_add`: list of data.frames, one per chromosome, with paired haplotype columns per block
`y`	numeric named vector of phenotypes (length n). Names must match row names of marker matrix.
`masreml_fit`	fitted `masreml` object from `masreml()`. Used to account for population structure in the GWAS model. Must be fitted with the same markers and individuals.
`X`	fixed effects design matrix (n x c). If NULL, intercept only.
`pi`	numeric, prior probability that a marker has a non-zero effect (default 0.001). Lower values make posterior probabilities more conservative.
`window`	integer, number of adjacent markers used in moving average smoothing of likelihood ratios (default 5). Larger values produce smoother posterior probabilities.
`ref_markers`	list of raw marker data for training individuals only, in the same format as `markers`. If provided, allele frequencies for building G_u (used in EMMAX) are computed from `ref_markers` instead of `markers`. Use this when running GWAS in a train/test context to avoid data leakage from test individuals into the population structure correction matrix. Supported elements: `snp_add` (matrix) or `mh_add` (list). If NULL (default), allele frequencies are computed from `markers`.

Value

Object of class “gwas_result” with elements:

lr: log-likelihood ratio per marker/block
beta: effect estimate per marker/block
se: standard error per marker/block
pval: p-value per marker/block
smoothed_lr: smoothed LR after moving average
pp: posterior probability of non-zero effect per marker/block — use this as input to gwablup()
marker_type: “snp” or “mh”
n_markers: number of markers or MH blocks
pi: prior probability used
window: smoothing window size used

References

Kang et al. (2010) Variance component model to account for sample structure in GWAS. Nat Genet. 42:348-354.

Meuwissen et al. (2024) GWABLUP: genome-wide association assisted BLUP. Genet Sel Evol. 56:17.

Examples

library("masreml")

# ── Standard workflow (all individuals) ───────────────────
# Step 1: fit standard GBLUP
fit <- masreml(y, markers = list(snp_add = W))

# Step 2: run GWAS — inspect QTL signals
gwas <- run_gwas(
  markers     = list(snp_add = W),
  y           = y,
  masreml_fit = fit
)
summary(gwas)

# Step 3: run GWABLUP using GWAS weights
fit_wa <- gwablup(y, markers = list(snp_add = W), gwas_result = gwas)
summary(fit_wa)

# ── Train/test workflow (no data leakage) ─────────────────
# Step 1: fit GBLUP on training only
fit_tr <- masreml(y_train, markers = list(snp_add = W_train))

# Step 2: run GWAS on training — ref_markers = training markers
gwas_tr <- run_gwas(
  markers     = list(snp_add = W_train),
  y           = y_train,
  masreml_fit = fit_tr,
  ref_markers = list(snp_add = W_train)
)

# Step 3: GWABLUP on training with training-based freq
fit_wa_tr <- gwablup(
  y           = y_train,
  markers     = list(snp_add = W_train),
  gwas_result = gwas_tr,
  ref_markers = list(snp_add = W_train)
)

# Step 4: predict test set
G_full <- build_G_snp(W_all, ref_W = W_train)
pred   <- predict(fit_wa_tr,
                  G_full    = list(snp_add = G_full),
                  train_ids = rownames(W_train),
                  test_ids  = rownames(W_test))