Pedigree
Schema
Pedigree is consumed by masreml only — masbayes is a marker-based package and does not use a numerator-relationship matrix.
The data frame has one row per individual including founders:
| Column | Type | Description |
|---|---|---|
id |
character | Individual identifier; joins to rownames(snp) for offspring |
sire |
character or NA
|
Sire identifier, NA for founders without recorded sire |
dam |
character or NA
|
Dam identifier, NA for founders without recorded dam |
The demo dataset has 220 rows (large) / 120 rows (small): 10 sire founders + 10 dam founders (all with NA sire/dam) plus the offspring with their two-parent ancestry recorded.
Example (from the demo dataset)
Full pedigree, with founders surfaced first:
Founder vs. offspring count:
Loading
masreml::build_A_ped() expects the sire/dam columns as integer row-indices (with 0 meaning “unknown”), not as character IDs. The demo pedigree stores them as character strings (e.g. "SIRE05"), so a small conversion step is required before calling the builder:
library(masreml)
d <- masreml::load_data("large")
# Map every id to its 1-based row index in the pedigree frame.
id_to_row <- setNames(seq_len(nrow(d$pedigree)), d$pedigree$id)
ped_int <- data.frame(
id = d$pedigree$id,
sire = ifelse(is.na(d$pedigree$sire), 0L, id_to_row[d$pedigree$sire]),
dam = ifelse(is.na(d$pedigree$dam), 0L, id_to_row[d$pedigree$dam]),
stringsAsFactors = FALSE
)
A <- masreml::build_A_ped(ped_int)
str(A) # 220 x 220 numeric, rownames = idsUsing A inside the mixed model
masreml::masreml() does not accept a pedigree argument directly. Build \(\mathbf{A}\) first with build_A_ped(), then pass it through the G slot under one of the recognised keys (snp_add, snp_dom, or mh_add — the key is the random-effect label, not a content check):
y <- d$pheno$y_cont_qtl_snp
names(y) <- d$pheno$id
X_fixed <- stats::model.matrix(~ sex, data = d$pheno)
# Build A on the full pedigree (220 individuals = founders + offspring)
A_full <- masreml::build_A_ped(ped_int)
# Restrict to the phenotyped offspring before fitting.
A_pheno <- A_full[d$pheno$id, d$pheno$id]
# A-BLUP: the pedigree-derived relationship matrix as the single
# random-effect kernel. The slot key here is `snp_add` simply because
# that is the recognised label for an additive-relationship slot in
# masreml's G list — its contents are A, not a SNP-derived G.
fit_a <- masreml::masreml(
y = y,
X = X_fixed,
G = list(snp_add = A_pheno),
trait = "continuous"
)Single-step GBLUP (combining \(\mathbf{A}\) and \(\mathbf{G}\)). masreml’s G slot recognises only the three keys above; it does not provide a dedicated pedigree-plus-genomic kernel. For single-step prediction, build the combined \(\mathbf{H}\) matrix (Misztal et al. 2009) externally and pass it as G = list(snp_add = H).
Common pitfalls
Founders must appear in the pedigree. A common mistake is including only offspring rows. build_A_ped() walks the matrix bottom-up; if a sire row is missing, its descendants lose their genetic-relationship columns and inbreeding estimates collapse to 0. The demo pedigree’s 220 rows include 20 founder rows for this reason.
Missing parents must be NA (or 0 after conversion), not the empty string or "NA". is.na() is the only check that triggers founder handling — the literal text "NA" is not the same as NA_character_.
Pedigree row order matters for the integer-index conversion shown above. Each parent must appear in the data frame before (i.e. at a lower row index than) any of its offspring. The demo pedigree satisfies this — sires/dams come first, then offspring. If your real pedigree has cycles or is unordered, sort topologically before passing to build_A_ped():
# Toy topological sort assuming no cycles
sort_pedigree <- function(ped) {
done <- character()
out <- ped[0, ]
while (nrow(ped) > 0) {
ready <- is.na(ped$sire) | (ped$sire %in% done)
ready <- ready & (is.na(ped$dam) | (ped$dam %in% done))
if (!any(ready)) stop("Pedigree has cycles or missing parent rows.")
out <- rbind(out, ped[ready, ])
done <- c(done, ped$id[ready])
ped <- ped[!ready, ]
}
out
}Pedigree is optional for marker-based prediction. If you only need GBLUP / BayesA / BayesR on markers, skip the pedigree entirely and use build_G_snp() or build_G_mh() on the genotype matrix. The pedigree matters when you have un-genotyped individuals (then A covers them) or when you want to compare \(A\) vs \(G\) partitioning.
📐 Related theory: Mixed Model & BLUP Family
🛠 Used in: Genomic Prediction