library("masbayes")
set.seed(42)
n <- 200
mcmc <- list(n_iter = 2000L, n_burn = 1000L, n_thin = 5L, seed = 123L)
X_cov <- cbind(sex = rbinom(n, 1, 0.5), batch = rnorm(n)) # optional
# ---- (A) SNP path ------------------------------------------------------
n_snp <- 100
X <- matrix(rbinom(n * n_snp, 2, prob = runif(n_snp, 0.1, 0.5)),
n, n_snp)
W_snp <- construct_snp_matrix(X)$W
y <- W_snp[, 1:5] %*% rnorm(5, 0, 0.5) + rnorm(n, 0, 1)
fit_snp <- run_bayesr(
w = W_snp,
y = y,
wtw_diag = colSums(W_snp^2),
X = X_cov, # optional
sigma2_e_init = var(y) * 0.5,
sigma2_ah = var(y) * 0.5,
mcmc_params = mcmc
)
summary(fit_snp)
# ---- (B) Microhaplotype path ------------------------------------------
n_block <- 50
hap <- matrix(sample.int(3, n * n_block * 2, replace = TRUE), nrow = n)
colnames(hap) <- paste0("hap_", rep(seq_len(n_block), each = 2))
freq <- data.frame(
haplotype = paste0("hap_", rep(seq_len(n_block), each = 3)),
allele = rep(1:3, n_block),
freq = rep(c(0.5, 0.3, 0.2), n_block)
)
W_mh <- construct_wah_matrix(hap, colnames(hap), freq)$W_ah
y_mh <- W_mh[, 1:5] %*% rnorm(5, 0, 0.5) + rnorm(n, 0, 1)
fit_mh <- run_bayesr(
w = W_mh,
y = y_mh,
wtw_diag = colSums(W_mh^2),
X = X_cov, # optional
sigma2_e_init = var(y_mh) * 0.5,
sigma2_ah = var(y_mh) * 0.5,
mcmc_params = mcmc
)
summary(fit_mh)
# ---- Train/test split (scheme-agnostic) -------------------------------
idx <- sample(n, 0.8 * n)
W_tr <- W_snp[idx, ]
y_tr <- y[idx]
fit <- run_bayesr(
w = W_tr,
y = y_tr,
wtw_diag = colSums(W_tr^2),
sigma2_e_init = var(y_tr) * 0.5,
sigma2_ah = var(y_tr) * 0.5,
mcmc_params = mcmc
)
pred <- predict(fit, W_snp[-idx, ], y[-idx])
pred$metrics$accuracyFit a BayesR Mixture Model (MCMC or stochastic EM)
Fit a BayesR Mixture Model (MCMC or stochastic EM)
Description
BayesR (Erbe et al., 2012) places a four-component mixture prior on allele effects: zero, small, medium, and large effect classes with proportions pi_vec and class variances proportional to sigma2_ah. Use method = “mcmc” for full Bayesian posterior inference, or method = “em” for fast stochastic-EM point estimates (Gaussian only).
Usage
run_bayesr(
w,
y,
wtw_diag,
X = NULL,
pi_vec = c(0.95, 0.02, 0.02, 0.01),
sigma2_e_init,
sigma2_ah = NULL,
sigma2_vec = NULL,
prior_params = NULL,
mcmc_params = NULL,
em_params = NULL,
method = c("mcmc", "em"),
response_type = c("gaussian", "binary"),
fold_id = 0L,
save_rds = FALSE,
save_path = NULL,
verbose = FALSE
)
Arguments
w
|
Numeric design matrix (n x p). Typically the $W_ah element returned by construct_wah_matrix.
|
y
|
Phenotype vector (length n). For binary traits use 0/1 coding.
|
wtw_diag
|
Pre-computed colSums(w^2), length p.
|
X
|
Optional fixed-effects design matrix (n x q). When supplied, the model is \(y = X\alpha + W\beta + \mu + \epsilon\) with a flat prior on \(\alpha\). Do not include a column of ones for the intercept; \(\mu\) is sampled separately.
|
pi_vec
|
Initial mixture proportions for the four classes (zero, small, medium, large). Must sum to 1. Default c(0.95, 0.02, 0.02, 0.01).
|
sigma2_e_init
|
Initial residual variance. Typically var(y) * 0.5; for binary traits fix at 1.0.
|
sigma2_ah
|
Prior total genetic variance (required). Typically var(y) * 0.5; for binary traits use 1.0.
|
sigma2_vec
|
Optional explicit variance vector for EM; if NULL it is derived from sigma2_ah, pi_vec, and prior_params$variance_class.
|
prior_params
|
Optional named list overriding defaults: a0_e (10), a0_g (10), variance_class (c(0, 0.001, 0.01, 0.1)). b0_e and b0_g are computed internally.
|
mcmc_params
|
Optional named list: n_iter (40000), n_burn (20000), n_thin (10), seed (123).
|
em_params
|
Optional named list: max_iter (500), tol (1e-6).
|
method
|
Either “mcmc” (default) or “em”.
|
response_type
|
“gaussian” (default) or “binary”. Binary requires method = “mcmc”.
|
fold_id
|
Integer label printed in progress messages; useful when running CV loops. |
save_rds
|
If TRUE (default) the fit object is written to disk as an RDS file. Set FALSE (default) for CV loops.
|
save_path
|
Optional explicit RDS path. If NULL (default), defaults to “results_bayesr.Rds” in the current working directory.
|
verbose
|
If TRUE, the Rust engine streams per-iteration progress (start banner, Iter X/Y diagnostics, ESS / Geweke, completion) to stderr. Default FALSE keeps the Rust side silent. The brief R post-fit summary (model, runtime, \(h^2\), ESS, Geweke) prints regardless.
|
Details
Algorithm choice. MCMC uses marginalised Gibbs sampling and returns full posterior chains; recommended when posterior uncertainty, ESS, or Geweke diagnostics are needed. EM is much faster but yields no uncertainty quantification and does not support response_type = “binary”.
Auto-save. By default the returned fit is also saved to results_bayesr.Rds in getwd(). Set save_rds = FALSE (or pass an explicit save_path) when running cross-validation loops to avoid overwriting earlier folds.
Three usage scenarios. The same run_bayesr() + summary() + predict() pipeline supports:
-
Full-data fit: train on all data;
predict(fit)returns in-sample metrics. -
Train-test split: train on a subset; evaluate via
predict(fit, W_test, y_test). -
k-fold CV: the user loops over folds, calling
run_bayesr()withsave_rds = FALSEandfold_id = k, thenpredict()on each held-out fold.
Derived hyperparameters. The wrapper computes b0_e = sigma2_e_init * (a0_e - 1) and b0_g = sigma2_ah * (a0_g - 2) / (a0_g * (1 - pi_vec[1])) before calling the Rust engine. For EM, a sigma2_vec is derived as variance_class * sigma2_ah / ((1 - pi_vec[1]) * p).
Value
An object of class c(“masbayes_bayesr”, “masbayes”) — a list with the following key fields:
-
beta_hat, mu_hat, sigma2_e_hat - Posterior point estimates.
-
beta_samples, gamma_samples, pi_samples, sigma2_e_samples, sigma2_small_samples, sigma2_medium_samples, sigma2_large_samples, mu_samples - Posterior chains (single-row matrices for EM).
-
GEBV / pred_train - Training genomic estimated breeding values. For binary traits this is on the liability scale.
-
prob_train -
Binary only: training-set predicted probabilities
P(y = 1) = pnorm(pred_train)(probit inverse link from Albert-Chib augmentation). -
h2, sigma2_g, sigma2_e - Heritability and variance components.
-
runtime - Elapsed seconds.
-
training_metrics -
R2,RMSE,accuracy(orAUCfor binary),bias. For binary, all metrics are on the observed (probability) scale —biasis the calibration slope (1.0 = perfectly calibrated) andRMSE^2approximates the Brier score. AUC is rank-invariant so unaffected. -
diagnostics - ESS and Geweke Z for key parameters (MCMC only).
-
variance_components -
Posterior mean and 95% CI for each class plus the mixture proportions
pi. -
rds_path -
Path of the saved RDS file, or
NULLwhensave_rds = FALSE.
References
Erbe, M., Hayes, B. J., Matukumalli, L. K., Goswami, S., Bowman, P. J., Reich, C. M., Mason, B. A., & Goddard, M. E. (2012). Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels. Journal of Dairy Science, 95(7), 4114-4129. doi:10.3168/jds.2011-5019
Meuwissen, T. H. E., Hayes, B. J., & Goddard, M. E. (2001). Prediction of total genetic value using genome-wide dense marker maps. Genetics, 157(4), 1819-1829. doi:10.1093/genetics/157.4.1819
See Also
run_bayesa, construct_wah_matrix, summary.masbayes_bayesr, predict.masbayes_bayesr