masbayes: Bayesian Genomic Prediction Models for SNPs and Microhaplotypes

masbayes: Bayesian Genomic Prediction Models for SNPs and Microhaplotypes

Description

Bayesian methods for genomic prediction using SNP or microhaplotype markers. Implements BayesR (four-class mixture) and BayesA (marker-specific variance), each available as full MCMC or stochastic EM. Continuous (Gaussian) and binary (Albert-Chib data augmentation) traits are supported.

Workflow

The standard pipeline is three steps:

  1. construct_wah_matrix() — build the \(W_{\alpha h}\) design matrix from phased haplotypes.

  2. run_bayesr() or run_bayesa() — fit the Bayesian model. The fit object is auto-saved to results_bayesr.Rds / results_bayesa.Rds by default.

  3. summary(fit) — full report (heritability, MCMC ESS / Geweke, variance components, training metrics, top alleles).

  4. predict(fit, newdata, y_new) — evaluate the model on any data: in-sample, test split, or k-fold CV held-out fold.

Three usage scenarios

summary() and predict() are scheme-agnostic — the same methods cover all three of:

  • Full-data fit: train on everything; predict(fit) returns in-sample metrics.

  • Train/test split: train on a subset; evaluate via predict(fit, W_test, y_test).

  • k-fold CV: the user loops over folds (typically with save_rds = FALSE, verbose = FALSE), then predict() on each held-out fold.

Cross-validation orchestration is intentionally left to the caller — the package does not ship a built-in cv_*() helper.

Quick start

library(masbayes)

# 1. Design matrix
train <- construct_wah_matrix(hap, colnames(hap), allele_freq)
W     <- train\$W_ah

# 2. Sufficient statistics
wtw <- colSums(W^2)

# 3. Fit
fit <- run_bayesr(W, y, wtw,
                  sigma2_e_init = var(y) * 0.5,
                  sigma2_ah     = var(y) * 0.5,
                  method        = "mcmc")

# 4. Report and evaluate
summary(fit)
pred <- predict(fit, W_test, y_test)   # train/test mode
pred\$metrics\$accuracy

Main functions

construct_wah_matrix
Build the \(W_{\alpha h}\) design matrix.
run_bayesr
Fit BayesR (4-class mixture).
run_bayesa
Fit BayesA (marker-specific variance).
summary.masbayes_bayesr, summary.masbayes_bayesa
Full-fit reports (S3 methods).
predict.masbayes_bayesr, predict.masbayes_bayesa
GEBV + metrics, scheme- agnostic (S3 methods).

Output of run_bayesr() / run_bayesa()

The returned S3 object (class masbayes_bayesr or masbayes_bayesa) contains:

  • GEBV, beta_hat, beta_samples (and sigma2_j_samples for BayesA)

  • sigma2_e_samples, mu_samples, runtime (seconds)

  • h2, sigma2_g, sigma2_e

  • training_metrics: R2, RMSE, accuracy/AUC, bias

  • diagnostics: ESS, Geweke Z (MCMC only)

  • variance_components: small / medium / large + pi (BayesR), or tertile bins of marker variances (BayesA)

  • rds_path: where the fit was auto-saved

Disable auto-save with save_rds = FALSE — recommended for CV loops.

Low-level bindings

run_bayesr_mcmc(), run_bayesr_em(), run_bayesa_mcmc(), and run_bayesa_em() are the direct Rust bindings. They are exported but undocumented; prefer run_bayesr() / run_bayesa() for routine use.

References

  • Meuwissen, T. H. E., Hayes, B. J., &amp; Goddard, M. E. (2001). Prediction of total genetic value using genome-wide dense marker maps. Genetics, 157(4), 1819-1829. doi:10.1093/genetics/157.4.1819

  • Erbe, M., Hayes, B. J., Matukumalli, L. K., Goswami, S., Bowman, P. J., Reich, C. M., Mason, B. A., &amp; Goddard, M. E. (2012). Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels. Journal of Dairy Science, 95(7), 4114-4129. doi:10.3168/jds.2011-5019

  • Da, Y. (2015). Multi-allelic haplotype model based on genetic partition for genomic prediction and variance component estimation using SNP markers. BMC Genetics, 16(1), 144. doi:10.1186/s12863-015-0301-1

Author(s)

Maintainer: Wibowo Agus aguswibowo1698@gmail.com

See Also

Useful links: