library("masbayes")
set.seed(42)
n <- 200
mcmc <- list(n_iter = 2000L, n_burn = 1000L, n_thin = 5L, seed = 123L)
X_cov <- cbind(sex = rbinom(n, 1, 0.5), batch = rnorm(n)) # optional
# ---- (A) SNP path ------------------------------------------------------
n_snp <- 100
X <- matrix(rbinom(n * n_snp, 2, prob = runif(n_snp, 0.1, 0.5)),
n, n_snp)
W_snp <- construct_snp_matrix(X)$W
y <- W_snp[, 1:5] %*% rnorm(5, 0, 0.5) + rnorm(n, 0, 1)
fit_snp <- run_bayesa(
w = W_snp,
y = y,
wtw_diag = colSums(W_snp^2),
X = X_cov, # optional
nu = 4.5,
sigma2_g = var(y) * 0.5,
sigma2_e_init = var(y) * 0.5,
mcmc_params = mcmc
)
summary(fit_snp)
# ---- (B) Microhaplotype path ------------------------------------------
n_block <- 50
hap <- matrix(sample.int(3, n * n_block * 2, replace = TRUE), nrow = n)
colnames(hap) <- paste0("hap_", rep(seq_len(n_block), each = 2))
freq <- data.frame(
haplotype = paste0("hap_", rep(seq_len(n_block), each = 3)),
allele = rep(1:3, n_block),
freq = rep(c(0.5, 0.3, 0.2), n_block)
)
W_mh <- construct_wah_matrix(hap, colnames(hap), freq)$W_ah
y_mh <- W_mh[, 1:5] %*% rnorm(5, 0, 0.5) + rnorm(n, 0, 1)
fit_mh <- run_bayesa(
w = W_mh,
y = y_mh,
wtw_diag = colSums(W_mh^2),
X = X_cov, # optional
nu = 4.5,
sigma2_g = var(y_mh) * 0.5,
sigma2_e_init = var(y_mh) * 0.5,
mcmc_params = mcmc
)
summary(fit_mh)
# ---- Predict on a held-out test set -----------------------------------
idx <- sample(n, 0.8 * n)
W_tr <- W_snp[idx, ]
y_tr <- y[idx]
fit <- run_bayesa(
w = W_tr,
y = y_tr,
wtw_diag = colSums(W_tr^2),
nu = 4.5,
sigma2_g = var(y_tr) * 0.5,
sigma2_e_init = var(y_tr) * 0.5,
mcmc_params = mcmc
)
pred <- predict(fit, W_snp[-idx, ], y[-idx])
pred$metrics$accuracyFit a BayesA Marker-Specific Variance Model (MCMC or stochastic EM)
Fit a BayesA Marker-Specific Variance Model (MCMC or stochastic EM)
Description
BayesA (Meuwissen et al., 2001) places a scaled inverse chi-squared prior on each allele’s effect variance, allowing heavier-tailed effect-size distributions than ridge regression. Use method = “mcmc” for full posterior inference or method = “em” for fast stochastic-EM point estimates (Gaussian only).
Usage
run_bayesa(
w,
y,
wtw_diag,
X = NULL,
nu = 4.5,
sigma2_g,
sigma2_e_init,
prior_params = NULL,
mcmc_params = NULL,
em_params = NULL,
method = c("mcmc", "em"),
response_type = c("gaussian", "binary"),
fold_id = 0L,
save_rds = FALSE,
save_path = NULL,
verbose = FALSE
)
Arguments
w
|
Numeric design matrix (n x p). Typically the $W_ah element returned by construct_wah_matrix.
|
y
|
Phenotype vector (length n). Use 0/1 for binary traits.
|
wtw_diag
|
Pre-computed colSums(w^2).
|
X
|
Optional fixed-effects design matrix (n x q). When supplied, the model is \(y = X\alpha + W\beta + \mu + \epsilon\) with a flat prior on \(\alpha\). Do not include a column of ones for the intercept; \(\mu\) is sampled separately.
|
nu
|
Degrees of freedom for the scaled inverse chi-squared prior on marker variances. Smaller values allow heavier tails. Must be > 2. Default 4.5.
|
sigma2_g
|
Prior total genetic variance. Typically var(y) * 0.5; for binary traits use 1.0.
|
sigma2_e_init
|
Initial residual variance. For binary traits fix at 1.0.
|
prior_params
|
Optional named list. Only a0_e (default 10) is used; b0_e is derived as sigma2_e_init * (a0_e - 1).
|
mcmc_params
|
Optional named list: n_iter (40000), n_burn (20000), n_thin (10), seed (123).
|
em_params
|
Optional named list: max_iter (500), tol (1e-6).
|
method
|
Either “mcmc” (default) or “em”.
|
response_type
|
“gaussian” (default) or “binary”. Binary requires method = “mcmc”.
|
fold_id
|
Integer label for progress messages. |
save_rds
|
If TRUE, the fit object is saved to disk as an RDS file. Set FALSE (default) for CV loops.
|
save_path
|
Optional explicit RDS path. If NULL (default), defaults to “results_bayesa.Rds” in the current working directory.
|
verbose
|
If TRUE, the Rust engine streams per-iteration progress (start banner, Iter X/Y diagnostics, ESS / Geweke, completion) to stderr. Default FALSE keeps the Rust side silent. The brief R post-fit summary prints regardless.
|
Details
Algorithm choice. MCMC uses marginalised Gibbs sampling and returns full posterior chains. EM is much faster but provides no uncertainty quantification and does not support response_type = “binary”.
Auto-save. By default the returned fit is saved to results_bayesa.Rds in getwd(). Set save_rds = FALSE for CV loops.
Three usage scenarios. run_bayesa() + summary() + predict() support full-data fits, train-test splits, and k-fold CV uniformly. See run_bayesr for example loops.
Derived parameter. The wrapper computes s_squared = sigma2_g * (nu - 2) / (nu * sum(apply(w, 2, var))) so the prior expectation of each marker variance equals sigma2_g / sum_2pq.
Value
An object of class c(“masbayes_bayesa”, “masbayes”) — a list with the following key fields:
-
beta_hat, mu_hat, sigma2_e_hat, sigma2_j_hat - Posterior point estimates (intercept, allele effects, residual variance, per-marker variances).
-
beta_samples, sigma2_j_samples, sigma2_e_samples, mu_samples - Posterior chains (single-row matrices for EM).
-
GEBV / pred_train, h2, sigma2_g, sigma2_e - Training GEBVs (liability scale for binary), heritability, and total variance components.
-
prob_train -
Binary only: training-set predicted probabilities
P(y = 1) = pnorm(pred_train)(probit inverse link from Albert-Chib augmentation). -
runtime - Elapsed seconds.
-
training_metrics -
R2,RMSE,accuracy(orAUCfor binary),bias. For binary, all metrics are on the observed (probability) scale —biasis the calibration slope (1.0 = perfectly calibrated) andRMSE^2approximates the Brier score. AUC is rank-invariant so unaffected. -
diagnostics - ESS / Geweke Z (MCMC only).
-
variance_components - Per-marker variances binned into tertiles (small / medium / large) reporting mean, range, and marker count.
-
rds_path -
Path of the saved RDS file, or
NULL.
References
Meuwissen, T. H. E., Hayes, B. J., & Goddard, M. E. (2001). Prediction of total genetic value using genome-wide dense marker maps. Genetics, 157(4), 1819-1829. doi:10.1093/genetics/157.4.1819
See Also
run_bayesr, construct_wah_matrix, summary.masbayes_bayesa, predict.masbayes_bayesa