Run GWAS for SNP or Microhaplotype Markers

Run GWAS for SNP or Microhaplotype Markers

Description

Performs genome-wide association study (GWAS) for SNP or microhaplotype (MH) markers. Uses EMMAX (Efficient Mixed-Model Association eXpedited) to control for population structure and relatedness, preventing spurious associations. Results can be inspected for QTL detection or passed directly to gwablup() for GWAS-assisted genomic prediction.

Usage

run_gwas(
  markers,
  y,
  masreml_fit,
  X = NULL,
  pi = 0.001,
  window = 5L,
  ref_markers = NULL
)

Arguments

markers

list of raw marker inputs. Provide one of:

  • snp_add: numeric matrix (n x m), genotype coded 0/1/2

  • mh_add: list of data.frames, one per chromosome, with paired haplotype columns per block

y numeric named vector of phenotypes (length n). Names must match row names of marker matrix.
masreml_fit fitted masreml object from masreml(). Used to account for population structure in the GWAS model. Must be fitted with the same markers and individuals.
X fixed effects design matrix (n x c). If NULL, intercept only.
pi numeric, prior probability that a marker has a non-zero effect (default 0.001). Lower values make posterior probabilities more conservative.
window integer, number of adjacent markers used in moving average smoothing of likelihood ratios (default 5). Larger values produce smoother posterior probabilities.
ref_markers list of raw marker data for training individuals only, in the same format as markers. If provided, allele frequencies for building G_u (used in EMMAX) are computed from ref_markers instead of markers. Use this when running GWAS in a train/test context to avoid data leakage from test individuals into the population structure correction matrix. Supported elements: snp_add (matrix) or mh_add (list). If NULL (default), allele frequencies are computed from markers.

Value

Object of class “gwas_result” with elements:

  • lr: log-likelihood ratio per marker/block

  • beta: effect estimate per marker/block

  • se: standard error per marker/block

  • pval: p-value per marker/block

  • smoothed_lr: smoothed LR after moving average

  • pp: posterior probability of non-zero effect per marker/block — use this as input to gwablup()

  • marker_type: “snp” or “mh”

  • n_markers: number of markers or MH blocks

  • pi: prior probability used

  • window: smoothing window size used

References

Kang et al. (2010) Variance component model to account for sample structure in GWAS. Nat Genet. 42:348-354.

Meuwissen et al. (2024) GWABLUP: genome-wide association assisted BLUP. Genet Sel Evol. 56:17.

See Also

masreml, gwablup

Examples

library("masreml")

# ── Standard workflow (all individuals) ───────────────────
# Step 1: fit standard GBLUP
fit <- masreml(y, markers = list(snp_add = W))

# Step 2: run GWAS — inspect QTL signals
gwas <- run_gwas(
  markers     = list(snp_add = W),
  y           = y,
  masreml_fit = fit
)
summary(gwas)

# Step 3: run GWABLUP using GWAS weights
fit_wa <- gwablup(y, markers = list(snp_add = W), gwas_result = gwas)
summary(fit_wa)

# ── Train/test workflow (no data leakage) ─────────────────
# Step 1: fit GBLUP on training only
fit_tr <- masreml(y_train, markers = list(snp_add = W_train))

# Step 2: run GWAS on training — ref_markers = training markers
gwas_tr <- run_gwas(
  markers     = list(snp_add = W_train),
  y           = y_train,
  masreml_fit = fit_tr,
  ref_markers = list(snp_add = W_train)
)

# Step 3: GWABLUP on training with training-based freq
fit_wa_tr <- gwablup(
  y           = y_train,
  markers     = list(snp_add = W_train),
  gwas_result = gwas_tr,
  ref_markers = list(snp_add = W_train)
)

# Step 4: predict test set
G_full <- build_G_snp(W_all, ref_W = W_train)
pred   <- predict(fit_wa_tr,
                  G_full    = list(snp_add = G_full),
                  train_ids = rownames(W_train),
                  test_ids  = rownames(W_test))