masbayes: Bayesian Genomic Prediction Models for SNPs and Microhaplotypes
masbayes: Bayesian Genomic Prediction Models for SNPs and Microhaplotypes
Description
Bayesian methods for genomic prediction using SNP or microhaplotype markers. Implements BayesR (four-class mixture) and BayesA (marker-specific variance), each available as full MCMC or stochastic EM. Continuous (Gaussian) and binary (Albert-Chib data augmentation) traits are supported.
Workflow
The standard pipeline is three steps:
-
construct_wah_matrix()— build the \(W_{\alpha h}\) design matrix from phased haplotypes. -
run_bayesr()orrun_bayesa()— fit the Bayesian model. The fit object is auto-saved toresults_bayesr.Rds/results_bayesa.Rdsby default. -
summary(fit)— full report (heritability, MCMC ESS / Geweke, variance components, training metrics, top alleles). -
predict(fit, newdata, y_new)— evaluate the model on any data: in-sample, test split, or k-fold CV held-out fold.
Three usage scenarios
summary() and predict() are scheme-agnostic — the same methods cover all three of:
-
Full-data fit: train on everything;
predict(fit)returns in-sample metrics. -
Train/test split: train on a subset; evaluate via
predict(fit, W_test, y_test). -
k-fold CV: the user loops over folds (typically with
save_rds = FALSE,verbose = FALSE), thenpredict()on each held-out fold.
Cross-validation orchestration is intentionally left to the caller — the package does not ship a built-in cv_*() helper.
Quick start
library(masbayes)
# 1. Design matrix
train <- construct_wah_matrix(hap, colnames(hap), allele_freq)
W <- train\$W_ah
# 2. Sufficient statistics
wtw <- colSums(W^2)
# 3. Fit
fit <- run_bayesr(W, y, wtw,
sigma2_e_init = var(y) * 0.5,
sigma2_ah = var(y) * 0.5,
method = "mcmc")
# 4. Report and evaluate
summary(fit)
pred <- predict(fit, W_test, y_test) # train/test mode
pred\$metrics\$accuracy
Main functions
-
construct_wah_matrix - Build the \(W_{\alpha h}\) design matrix.
-
run_bayesr - Fit BayesR (4-class mixture).
-
run_bayesa - Fit BayesA (marker-specific variance).
-
summary.masbayes_bayesr,summary.masbayes_bayesa - Full-fit reports (S3 methods).
-
predict.masbayes_bayesr,predict.masbayes_bayesa - GEBV + metrics, scheme- agnostic (S3 methods).
Output of run_bayesr() / run_bayesa()
The returned S3 object (class masbayes_bayesr or masbayes_bayesa) contains:
-
GEBV,beta_hat,beta_samples(andsigma2_j_samplesfor BayesA) -
sigma2_e_samples,mu_samples,runtime(seconds) -
h2,sigma2_g,sigma2_e -
training_metrics:R2,RMSE,accuracy/AUC,bias -
diagnostics: ESS, Geweke Z (MCMC only) -
variance_components: small / medium / large +pi(BayesR), or tertile bins of marker variances (BayesA) -
rds_path: where the fit was auto-saved
Disable auto-save with save_rds = FALSE — recommended for CV loops.
Low-level bindings
run_bayesr_mcmc(), run_bayesr_em(), run_bayesa_mcmc(), and run_bayesa_em() are the direct Rust bindings. They are exported but undocumented; prefer run_bayesr() / run_bayesa() for routine use.
References
-
Meuwissen, T. H. E., Hayes, B. J., & Goddard, M. E. (2001). Prediction of total genetic value using genome-wide dense marker maps. Genetics, 157(4), 1819-1829. doi:10.1093/genetics/157.4.1819
-
Erbe, M., Hayes, B. J., Matukumalli, L. K., Goswami, S., Bowman, P. J., Reich, C. M., Mason, B. A., & Goddard, M. E. (2012). Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels. Journal of Dairy Science, 95(7), 4114-4129. doi:10.3168/jds.2011-5019
-
Da, Y. (2015). Multi-allelic haplotype model based on genetic partition for genomic prediction and variance component estimation using SNP markers. BMC Genetics, 16(1), 144. doi:10.1186/s12863-015-0301-1
See Also
Useful links: