library("masbayes")
set.seed(1)
n_ind <- 30
n_snp <- 50
maf <- runif(n_snp, 0.1, 0.5)
# Simulate dosage matrix (0/1/2) for training and test sets
X_train <- matrix(rbinom(n_ind * n_snp, 2, prob = maf),
nrow = n_ind)
X_test <- matrix(rbinom(10 * n_snp, 2, prob = maf),
nrow = 10)
# Training: compute frequencies from X_train
train <- construct_snp_matrix(X_train)
W_train <- train$W
# Test: reuse training frequencies (do NOT recompute from X_test)
test <- construct_snp_matrix(X_test, ref_freq = train$freq)
W_test <- test$W
stopifnot(ncol(W_train) == ncol(W_test))
# The returned W can be passed directly to run_bayesr() or run_bayesa()
# together with the response y_train.Construct a Centered SNP Design Matrix
Construct a Centered SNP Design Matrix
Description
Build a centered SNP design matrix \(W_{ij} = X_{ij} - 2 p_j\) from an allele-dosage matrix \(X\) (entries 0, 1, 2). The output is suitable for direct use with run_bayesr and run_bayesa, which are agnostic to whether the design matrix encodes biallelic SNPs or multi-allelic haplotypes.
Usage
construct_snp_matrix(X, ref_freq = NULL)
Arguments
X
|
Numeric or integer matrix of allele dosages (typically 0, 1, or 2) with dimensions n x p_snp. Coerced to double internally.
|
ref_freq
|
Optional numeric vector of length ncol(X) giving the reference allele frequencies \(p_j\) from the training set. If NULL (default) the frequencies are computed from X itself (training-set use).
|
Details
Training / test workflow. Always center the test set with training allele frequencies, never with frequencies recomputed from the test set itself:
-
Training: call
construct_snp_matrix(X_train)(leaveref_freq = NULL). The returned$freqcontains the per-SNP reference frequencies \(p_j = mean(X_{train,j}) / 2\). -
Test: call
construct_snp_matrix(X_test, ref_freq = train$freq)so the columns ofW_testare aligned withW_trainon the same baseline frequencies.
For phased multi-allelic haplotype data, use construct_wah_matrix instead. For other marker representations, build the design matrix manually and pass it to the Bayesian fitters.
Value
A list with elements:
-
W -
Numeric matrix
n x p_snpof centered dosages \(W_{ij} = X_{ij} - 2 p_j\). -
freq -
Numeric vector of length
p_snpwith the allele frequencies used for centering. -
n,p - Number of individuals and SNPs.
See Also
construct_wah_matrix, run_bayesr, run_bayesa