Universal REML-BLUP for SNP and Microhaplotype Genomic Prediction

Description

BLUP-based model for genomic prediction using SNP or microhaplotype (MH) markers. Estimates variance components via REML and predicts genomic estimated breeding values (GEBV) via BLUP. Supports SNP additive, SNP dominance, and MH additive relationship matrices, for both continuous and binary traits. For GWAS-assisted prediction, see run_gwas and gwablup.

Usage

masreml(
  y,
  X = NULL,
  markers = NULL,
  G = NULL,
  method = "auto",
  solver = "auto",
  max_iter = 100L,
  tol = 1e-06,
  n_threads = NULL,
  trait = "continuous",
  link = "logit",
  verbose = TRUE,
  save_rds = FALSE,
  save_path = NULL
)

Arguments

`y`	numeric vector of phenotypes (length n). Named vector recommended for ID alignment. For binary trait, must contain only 0 and 1.
`X`	fixed effects design matrix (n x c). If NULL, intercept only is used.
`markers`	list of raw marker inputs (optional): `snp_add`: matrix (n x m), raw genotype 0/1/2 `snp_dom`: matrix (n x m), raw genotype 0/1/2 `mh_add`: list of data.frames, one per chromosome. First column = individual ID, remaining columns = paired haplotype allele codes per block (strand1, strand2, strand1, …)
`G`	list of pre-built relationship matrices (optional): `snp_add`: numeric matrix (n x n), SNP additive G `snp_dom`: numeric matrix (n x n), SNP dominance D `mh_add`: numeric matrix (n x n), MH additive Agh `pedigree`: numeric matrix (n x n), pedigree A
`method`	character, REML algorithm for variance component estimation: `“auto”` (default): AI-REML for continuous, HE regression for binary `“HE”`: Haseman-Elston regression (fast, one-step) `“AI”`: Average Information REML (accurate, iterative) `“EM”`: Expectation-Maximization REML (stable, slower) `“HI”`: HE-initialized AI-REML For binary trait, `“HE”` is recommended for speed. Any method can be used but `“AI”` may be slow on working response.
`solver`	character, EBV solver: `“auto”` (default): Cholesky for n < 10,000, PCG otherwise `“cholesky”`: direct Cholesky factorization `“pcg”`: preconditioned conjugate gradient (large n)
`max_iter`	integer, maximum REML iterations (default 100)
`tol`	numeric, convergence tolerance (default 1e-6)
`n_threads`	integer, number of threads (default: all physical cores)
`trait`	character, trait distribution: `“continuous”` (default): Gaussian REML-BLUP `“binary”`: 0/1 phenotype, single-step Laplace approximation on liability scale
`link`	character, link function for binary trait: `“logit”` (default): logistic link, sigma2_e = pi^2/3 `“probit”`: probit link, sigma2_e = 1
`verbose`	logical, if `TRUE` (default) prints a brief post-fit summary banner (algorithm, runtime, convergence, variance components, h2, GEBV summary) when `masreml()` returns. The existing progress messages (`“Building…”`, `“Running REML…”`, `“Solving EBV…”`) are unaffected by this flag.
`save_rds`	logical, if `TRUE` the fit is also written to disk as an RDS file. Default `FALSE` (differs from `masbayes::run_bayesr()`) because `masreml()` is often called inside `cv_masreml` loops where auto-saving would clobber per-fold output. Set to `TRUE` for one-off full-data fits you want to persist.
`save_path`	optional explicit RDS path. If `NULL` (default), defaults to `“results_masreml.Rds”` in the current working directory. Ignored unless `save_rds = TRUE`.

Details

Auto-summary. On completion, masreml() prints a brief banner mirroring masbayes::run_bayesr()/ masbayes::run_bayesa(). Set verbose = FALSE to suppress. The full summary.masreml method is unchanged and is still the canonical detailed report.

Runtime field. The returned object now contains a runtime field (elapsed seconds for REML + BLUP), populated even when verbose = FALSE.

Value

Object of class “masreml” with elements:

gebv: named list of GEBV vectors per component
total_gebv: total GEBV (sum across components)
fixed_effects: fixed effect estimates
varcomp: list with sigma2 and h2
loglik: restricted log-likelihood
algorithm: REML algorithm used
solver: EBV solver used
converged: logical, convergence status
n_iter: number of REML iterations
n: number of individuals
runtime: elapsed seconds for REML + BLUP
rds_path: path of saved RDS file, or NULL when save_rds = FALSE
training_metrics: list of training-set fit metrics. Continuous: R2, RMSE, accuracy (Pearson r), bias (slope of \(y \sim \hat{y}\)). Binary: same fields plus AUC, computed on the observed (probability) scale (y_01 vs mu_hat) so that bias is the calibration slope (1.0 = perfectly calibrated) and RMSE^2 approximates the Brier score. AUC is rank-invariant under inverse-link transformation.
call: matched call

For binary trait (trait = “binary”), additional binary slot contains:

link: link function used
prevalence: proportion of y = 1
h2_liability: h2 on liability scale
h2_observed: h2 on observed (0/1) scale via Dempster-Falconer transformation
auc: Area Under ROC Curve (Wilcoxon-MWW)
fitted: fitted probabilities P(y = 1)
converged: always TRUE for single-step

References

VanRaden (2008) Efficient methods to compute genomic predictions. J. Dairy Sci. 91:4414-4423.

Da (2015) Multi-allelic haplotype model for genomic prediction. BMC Genet. 16:144.

Dempster & Falconer (1950) Interpretation of high and low liability. Ann. Hum. Genet. 31:195-203.

Johnson & Thompson (1995) Restricted maximum likelihood estimation of variance components. J. Dairy Sci. 78:449-456.

Examples

library("masreml")

# ── Continuous trait ─────────────────────────────────────
# SNP additive GBLUP
fit <- masreml(y = y, markers = list(snp_add = W))
summary(fit)

# MH additive only
fit <- masreml(y = y, markers = list(mh_add = mh_list))

# Combined SNP additive + dominance
fit <- masreml(
  y       = y,
  markers = list(snp_add = W, snp_dom = W),
  method  = "AI"
)

# Pre-built G matrix
fit <- masreml(y = y, G = list(snp_add = G))

# ── Binary trait ─────────────────────────────────────────
# Binary GBLUP (logit link, default HE)
fit <- masreml(
  y     = y_binary,
  markers = list(snp_add = W),
  trait = "binary"
)
summary(fit)

# Binary with probit link
fit <- masreml(
  y     = y_binary,
  markers = list(snp_add = W),
  trait = "binary",
  link  = "probit"
)

# Binary with EM-REML for variance components
fit <- masreml(
  y      = y_binary,
  markers = list(snp_add = W),
  trait  = "binary",
  method = "EM"
)