Bayesian Alphabet (Probabilistic)

TL;DR

Tip

The Bayesian alphabet places marker-specific shrinkage priors on the additive-effects vector \(\boldsymbol{\beta}\). masbayes ships two members:

  • BayesA (Meuwissen, Hayes & Goddard 2001): each \(\beta_j\) has its own variance with a scaled inverse-\(\chi^2\) prior; shrinkage is Student-\(t\) tailed, individual, never to exact zero.
  • BayesR (Erbe et al. 2012): \(\beta_j\) comes from a 4-component mixture with variances \(\{0, 0.001, 0.01, 0.1\} \cdot \sigma_g^2\); posterior \(\boldsymbol{\pi}\) tells you how sparse the architecture is.

Both relax the equal-marker-variance assumption that GBLUP makes implicitly.

1. Assumption

  1. Standardised design matrix. Effects are interpretable on a per-standardised-marker scale; raw dosage matrices lead to scale confounding between MAF and \(\beta_j\). Use masbayes::construct_snp_matrix(encoding = "vanRaden") or the \(W_{\alpha h}\) output of construct_wah_matrix().
  2. Conditional independence of marker effects given variance hyperparameters — the Gibbs sampler updates each \(\beta_j\) conditionally on all others.
  3. Effect distribution matches the prior: BayesA-style \(t\)-tailed priors fit oligogenic architectures; BayesR’s mixture fits explicitly sparse architectures.
  4. MCMC convergence under the default sampler — trace plots and ESS in summary() should be inspected before trusting posterior intervals.
  5. No missingness in \(\mathbf{W}\) or \(\mathbf{y}\) at fit time (impute upstream).

2. Linear model (shared)

Both BayesA and BayesR specialise the same marker-effects regression:

\[ \mathbf{y} = \mathbf{X}\boldsymbol{\alpha} + \mathbf{W}\boldsymbol{\beta} + \boldsymbol{\epsilon}, \qquad \boldsymbol{\epsilon} \sim N(\mathbf{0}, \mathbf{I}\sigma_e^2), \tag{1}\]

with \(\mathbf{X}\boldsymbol{\alpha}\) the fixed effects and \(\mathbf{W}\) either the standardised SNP design matrix (construct_snp_matrix()) or the multi-allelic \(W_{\alpha h}\) (construct_wah_matrix()). The random additive genetic value is then \(\mathbf{g} = \mathbf{W}\boldsymbol{\beta}\), and GEBV = posterior mean of \(\mathbf{g}\) for any individual whose \(\mathbf{W}\) row is known.

The shared priors are \(\sigma_e^2 \sim \chi^{-2}(a_{0,e}, b_{0,e})\) (improper-or-weak by default) and \(\boldsymbol{\alpha} \sim N(\mathbf{0}, \kappa \mathbf{I})\) with \(\kappa\) a large constant (vague fixed-effect prior). What differs between BayesA and BayesR is the prior on \(\boldsymbol{\beta}\).

3. GEBV calculation

BayesA — scaled inverse-\(\chi^2\) per-marker variance

\[ \beta_j \mid \sigma_{\beta_j}^2 \sim N(0,\, \sigma_{\beta_j}^2), \qquad \sigma_{\beta_j}^2 \sim \chi^{-2}(\nu,\, S), \tag{2}\]

with default \(\nu = 4.5\) and the scale derived to make the prior expectation of each per-marker variance equal the total genetic variance hyperparameter divided by total design variance:

\[ S \;=\; \frac{\sigma_g^2\,(\nu - 2)}{\nu \cdot \sum_{j} \mathrm{Var}(w_j)}, \tag{3}\]

so that \(E[\sigma_{\beta_j}^2] = \sigma_g^2 / \sum_j \mathrm{Var}(w_j)\). Integrating \(\sigma_{\beta_j}^2\) out gives a Student-\(t\) marginal on \(\beta_j\) — heavier tails than a normal, hence resistance to over-shrinking large effects.

The Gibbs sampler in masbayes::run_bayesa(method = "mcmc") cycles through: \(\sigma_{\beta_j}^2 \mid \beta_j\) (inverse-\(\chi^2\)), \(\beta_j \mid \mathbf{y}, \boldsymbol{\beta}_{-j}, \sigma_{\beta_j}^2\) (normal, marginalising the residual), \(\sigma_e^2 \mid \mathbf{y}, \boldsymbol{\beta}\) (inverse-\(\chi^2\)). GEBV per individual \(i\) is the posterior mean of \(\mathbf{w}_i^\top \boldsymbol{\beta}\).

BayesR — 4-component mixture

\[ \beta_j \mid \boldsymbol{\pi}, \sigma_g^2 \sim \pi_0\,\delta_0 \;+\; \pi_1\,N(0,\, 0.001\,\sigma_g^2) \;+\; \pi_2\,N(0,\, 0.01\,\sigma_g^2) \;+\; \pi_3\,N(0,\, 0.1\,\sigma_g^2), \tag{4}\]

with \(\sum_k \pi_k = 1\) and Dirichlet prior on \(\boldsymbol{\pi}\). The default initial mixture proportions in masbayes::run_bayesr(pi_vec = c(0.95, 0.02, 0.02, 0.01)) follow Erbe et al. (2012): 95% of SNPs are a priori in the zero atom, 1% have the largest variance class.

The \(\delta_0\) atom permits exact zeros. masbayes reports each marker’s posterior inclusion probability \(1 - P(\beta_j = 0 \mid \mathbf{y})\) as a built-in variable-selection diagnostic; users typically threshold at PIP \(> 0.5\) for “associated” or PIP \(> 0.9\) for “high-confidence”.

The Gibbs sampler in run_bayesr(method = "mcmc") cycles through the class indicators (categorical given \(\beta_j\) and current variances), the effects within each class (normal), \(\boldsymbol{\pi}\) (Dirichlet), and \(\sigma_e^2\) (inverse-\(\chi^2\)). GEBV is the posterior mean of \(\mathbf{w}_i^\top \boldsymbol{\beta}\).

Both methods also expose an EM variant (run_bayesa(method = "em"), run_bayesr(method = "em")) returning the MAP point estimates without posterior intervals — faster, useful when predictive accuracy is the only endpoint.

4. Variance component & h²

For BayesA, the per-marker variances \(\sigma_{\beta_j}^2\) are free-floating; the total genetic variance is reported as \[ \sigma_g^2 = \mathrm{Var}(\mathbf{W}\hat{\boldsymbol{\beta}}), \tag{5}\] and heritability as \[ h^2 = \frac{\sigma_g^2}{\sigma_g^2 + \sigma_e^2}. \tag{6}\]

For BayesR, \(\sigma_g^2\) is a sampled parameter (it scales the mixture variances directly). The class-membership distribution \(\boldsymbol{\pi}\) is itself a key diagnostic — concentration of mass in the high-variance classes signals a sparse architecture, while diffuse \(\boldsymbol{\pi}\) approaches a GBLUP-like infinitesimal fit.

Posterior chains for \(\sigma_g^2\), \(\sigma_e^2\), \(h^2\), and (for BayesR) \(\boldsymbol{\pi}\) are returned in the fit object alongside the \(\boldsymbol{\beta}\) samples; summary() reports posterior means and 95% credible intervals.

5. When to use

Architecture Choose
Truly infinitesimal, hundreds of small-effect loci GBLUP — Bayesian methods offer no gain
Mixed: most loci small, a handful large BayesA
Sparse: a small number of moderate-to-large QTL on polygenic background BayesR (and inspect PIPs for QTL recovery)
Need cross-population transfer BayesR (mixture prior is more transferable than infinitesimal)
Need credible intervals on per-marker effects Either, with method = "mcmc"
Need a fast point estimate only Either, with method = "em"

6. See it in code

library(masbayes)

d <- masbayes::load_data("large")
y <- d$pheno$y_cont_qtl_snp
names(y) <- d$pheno$id

# Build the standardised design matrix once
W <- masbayes::construct_snp_matrix(d$snp, encoding = "vanRaden")
wtw_diag <- colSums(W ^ 2)

# Optional fixed-effect design matrix
X_fixed <- stats::model.matrix(~ sex, data = d$pheno)

# --- BayesA (MCMC) -----------------------------------------------
fit_a <- masbayes::run_bayesa(
  w             = W,
  y             = y,
  wtw_diag      = wtw_diag,
  X             = X_fixed,
  marker_type   = "snp",
  nu            = 4.5,
  sigma2_g      = var(y) * 0.5,
  sigma2_e_init = var(y) * 0.5,
  method        = "mcmc",
  response_type = "gaussian"
)
summary(fit_a)

# --- BayesR (MCMC) -----------------------------------------------
fit_r <- masbayes::run_bayesr(
  w             = W,
  y             = y,
  wtw_diag      = wtw_diag,
  X             = X_fixed,
  pi_vec        = c(0.95, 0.02, 0.02, 0.01),  # default
  sigma2_e_init = var(y) * 0.5,
  method        = "mcmc",
  response_type = "gaussian"
)
summary(fit_r)

# --- Multi-allelic: BayesR on the W_alpha-h design matrix --------
wah <- masbayes::construct_wah_matrix(
  hap_matrix           = d$mh,
  colnames             = attr(d$mh, "block_id"),
  allele_freq_filtered = d$allele_freq
)
fit_r_mh <- masbayes::run_bayesr(
  w             = wah$W_ah,
  y             = y,
  wtw_diag      = colSums(wah$W_ah ^ 2),
  X             = X_fixed,
  sigma2_e_init = var(y) * 0.5,
  method        = "mcmc",
  marker_type   = "multiallelic"
)

7. References

  • Meuwissen, T. H. E., Hayes, B. J. & Goddard, M. E. (2001). Prediction of total genetic value using genome-wide dense marker maps. Genetics, 157(4), 1819–1829. doi:10.1093/genetics/157.4.1819.

  • Erbe, M., Hayes, B. J., Matukumalli, L. K., Goswami, S., Bowman, P. J., Reich, C. M., Mason, B. A. & Goddard, M. E. (2012). Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels. Journal of Dairy Science, 95(7), 4114–4129. doi:10.3168/jds.2011-5019.

  • Habier, D., Fernando, R. L., Kizilkaya, K. & Garrick, D. J. (2011). Extension of the Bayesian alphabet for genomic selection. BMC Bioinformatics, 12, 186. doi:10.1186/1471-2105-12-186.