Build Multi-allelic Additive Genomic Relationship Matrix

Description

Constructs the multi-allelic (MH) additive genomic relationship matrix (Wgh) following Da (2015). Uses multi-allelic haplotype block coding (W_ah) that captures haplotype diversity beyond what bi-allelic SNPs can represent. The resulting matrix can be used in masreml() or gwablup() via the G argument.

Usage

build_G_mh(mh_list, ref_mh = NULL, ids = NULL)

Arguments

mh_list

list of data.frames (one per chromosome) or integer matrix (n x n_blocks2). Two input modes are supported:

List of data.frames: First column = individual IDs, remaining columns = paired haplotype allele codes per block (strand1_block1, strand2_block1, …). Allele codes must be 0-based sequential integers.
Haplotype matrix: integer matrix (n x n_blocks

2), columns alternate strand1/strand2 per block. Allele codes can be any integer (sparse, non-sequential) — re-encoded internally to sequential 0-based using training reference (ref_mh). Example matrix format (50 blocks → 100 columns):

  hap_block_all  # n x (n_blocks*2), cols: s1_b1, s2_b1, s1_b2, s2_b2, ...

ref_mh reference haplotype data for training-based allele frequencies. Same format as mh_list. If provided, allele frequencies and dropped allele (most frequent) are estimated from ref_mh only, avoiding data leakage from test individuals. If NULL, frequencies are computed from all individuals in mh_list.

ids character vector of individual IDs for alignment. Required when mh_list is a haplotype matrix without rownames.

Value

numeric matrix (n x n) of microhaplotype-based genomic relationships. Same interpretation as SNP additive G matrix but based on haplotype block diversity.

References

Da et al. (2015) Multi-allelic haplotype model based on genetic partition for genomic prediction and variance component estimation using SNP markers. BMC Genet. 15:100.

Examples

library("masreml")

# mh_list: one data.frame per chromosome
# Each row = one individual
# Columns: ID, then paired strand columns per haplotype block
chr1 <- data.frame(
  ID    = paste0("ind", 1:5),
  B1_s1 = c(0, 1, 0, 2, 1),   # block 1 strand 1
  B1_s2 = c(1, 1, 2, 0, 0),   # block 1 strand 2
  B2_s1 = c(2, 0, 1, 1, 0),   # block 2 strand 1
  B2_s2 = c(0, 1, 2, 0, 1)    # block 2 strand 2
)
mh_list <- list(chr1 = chr1)

# Build Agh matrix from mh_list
G_mh <- build_G_mh(mh_list)

# Build Agh matrix from haplotype matrix (train/test split)
# hap_block_all: n x (n_blocks*2), any integer allele codes
G_mh_full <- build_G_mh(
  mh_list = hap_block_all,
  ref_mh  = hap_block_all[idx_train, ],
  ids     = as.character(1:n_total)
)

# Use in masreml
fit <- masreml(y, G = list(mh_add = G_mh))

# Use in run_gwas + gwablup (hap_matrix mode)
fit_tr  <- masreml(y_train, markers = list(mh_add = hap_block_train))
gwas_tr <- run_gwas(
  markers     = list(mh_add = hap_block_train),
  y           = y_train,
  masreml_fit = fit_tr,
  ref_markers = list(mh_add = hap_block_train)
)
fit_wa <- gwablup(
  y           = y_train,
  markers     = list(mh_add = hap_block_train),
  gwas_result = gwas_tr,
  ref_markers = list(mh_add = hap_block_train)
)