Pedigree

Schema

Pedigree is consumed by masreml onlymasbayes is a marker-based package and does not use a numerator-relationship matrix.

The data frame has one row per individual including founders:

Column Type Description
id character Individual identifier; joins to rownames(snp) for offspring
sire character or NA Sire identifier, NA for founders without recorded sire
dam character or NA Dam identifier, NA for founders without recorded dam

The demo dataset has 220 rows (large) / 120 rows (small): 10 sire founders + 10 dam founders (all with NA sire/dam) plus the offspring with their two-parent ancestry recorded.

Example (from the demo dataset)

Full pedigree, with founders surfaced first:

Founder vs. offspring count:

Loading

masreml::build_A_ped() expects the sire/dam columns as integer row-indices (with 0 meaning “unknown”), not as character IDs. The demo pedigree stores them as character strings (e.g. "SIRE05"), so a small conversion step is required before calling the builder:

library(masreml)
d <- masreml::load_data("large")

# Map every id to its 1-based row index in the pedigree frame.
id_to_row <- setNames(seq_len(nrow(d$pedigree)), d$pedigree$id)

ped_int <- data.frame(
  id   = d$pedigree$id,
  sire = ifelse(is.na(d$pedigree$sire), 0L, id_to_row[d$pedigree$sire]),
  dam  = ifelse(is.na(d$pedigree$dam),  0L, id_to_row[d$pedigree$dam]),
  stringsAsFactors = FALSE
)

A <- masreml::build_A_ped(ped_int)
str(A)                          # 220 x 220 numeric, rownames = ids

Using A inside the mixed model

masreml::masreml() does not accept a pedigree argument directly. Build \(\mathbf{A}\) first with build_A_ped(), then pass it through the G slot under one of the recognised keys (snp_add, snp_dom, or mh_add — the key is the random-effect label, not a content check):

y <- d$pheno$y_cont_qtl_snp
names(y) <- d$pheno$id
X_fixed <- stats::model.matrix(~ sex, data = d$pheno)

# Build A on the full pedigree (220 individuals = founders + offspring)
A_full <- masreml::build_A_ped(ped_int)

# Restrict to the phenotyped offspring before fitting.
A_pheno <- A_full[d$pheno$id, d$pheno$id]

# A-BLUP: the pedigree-derived relationship matrix as the single
# random-effect kernel. The slot key here is `snp_add` simply because
# that is the recognised label for an additive-relationship slot in
# masreml's G list — its contents are A, not a SNP-derived G.
fit_a <- masreml::masreml(
  y     = y,
  X     = X_fixed,
  G     = list(snp_add = A_pheno),
  trait = "continuous"
)
Note

Single-step GBLUP (combining \(\mathbf{A}\) and \(\mathbf{G}\)). masreml’s G slot recognises only the three keys above; it does not provide a dedicated pedigree-plus-genomic kernel. For single-step prediction, build the combined \(\mathbf{H}\) matrix (Misztal et al. 2009) externally and pass it as G = list(snp_add = H).

Common pitfalls

Warning

Founders must appear in the pedigree. A common mistake is including only offspring rows. build_A_ped() walks the matrix bottom-up; if a sire row is missing, its descendants lose their genetic-relationship columns and inbreeding estimates collapse to 0. The demo pedigree’s 220 rows include 20 founder rows for this reason.

Warning

Missing parents must be NA (or 0 after conversion), not the empty string or "NA". is.na() is the only check that triggers founder handling — the literal text "NA" is not the same as NA_character_.

Warning

Pedigree row order matters for the integer-index conversion shown above. Each parent must appear in the data frame before (i.e. at a lower row index than) any of its offspring. The demo pedigree satisfies this — sires/dams come first, then offspring. If your real pedigree has cycles or is unordered, sort topologically before passing to build_A_ped():

# Toy topological sort assuming no cycles
sort_pedigree <- function(ped) {
  done <- character()
  out  <- ped[0, ]
  while (nrow(ped) > 0) {
    ready <- is.na(ped$sire) | (ped$sire %in% done)
    ready <- ready & (is.na(ped$dam) | (ped$dam %in% done))
    if (!any(ready)) stop("Pedigree has cycles or missing parent rows.")
    out  <- rbind(out, ped[ready, ])
    done <- c(done, ped$id[ready])
    ped  <- ped[!ready, ]
  }
  out
}
Tip

Pedigree is optional for marker-based prediction. If you only need GBLUP / BayesA / BayesR on markers, skip the pedigree entirely and use build_G_snp() or build_G_mh() on the genotype matrix. The pedigree matters when you have un-genotyped individuals (then A covers them) or when you want to compare \(A\) vs \(G\) partitioning.

📐 Related theory: Mixed Model & BLUP Family

🛠 Used in: Genomic Prediction