Pedigree

Schema

Pedigree is consumed by masreml only — masbayes is a marker-based package and does not use a numerator-relationship matrix.

The data frame has one row per individual including founders:

Column	Type	Description
`id`	character	Individual identifier; joins to `rownames(snp)` for offspring
`sire`	character or `NA`	Sire identifier, `NA` for founders without recorded sire
`dam`	character or `NA`	Dam identifier, `NA` for founders without recorded dam

The demo dataset has 220 rows (large) / 120 rows (small): 10 sire founders + 10 dam founders (all with NA sire/dam) plus the offspring with their two-parent ancestry recorded.

Example (from the demo dataset)

Full pedigree, with founders surfaced first:

Founder vs. offspring count:

Loading

masreml::build_A_ped() expects the sire/dam columns as integer row-indices (with 0 meaning “unknown”), not as character IDs. The demo pedigree stores them as character strings (e.g. "SIRE05"), so a small conversion step is required before calling the builder:

library(masreml)
d <- masreml::load_data("large")

# Map every id to its 1-based row index in the pedigree frame.
id_to_row <- setNames(seq_len(nrow(d$pedigree)), d$pedigree$id)

ped_int <- data.frame(
  id   = d$pedigree$id,
  sire = ifelse(is.na(d$pedigree$sire), 0L, id_to_row[d$pedigree$sire]),
  dam  = ifelse(is.na(d$pedigree$dam),  0L, id_to_row[d$pedigree$dam]),
  stringsAsFactors = FALSE
)

A <- masreml::build_A_ped(ped_int)
str(A)                          # 220 x 220 numeric, rownames = ids

Using A inside the mixed model

masreml::masreml() does not accept a pedigree argument directly. Build \(\mathbf{A}\) first with build_A_ped(), then pass it through the G slot under one of the recognised keys (snp_add, snp_dom, or mh_add — the key is the random-effect label, not a content check):

y <- d$pheno$y_cont_qtl_snp
names(y) <- d$pheno$id
X_fixed <- stats::model.matrix(~ sex, data = d$pheno)

# Build A on the full pedigree (220 individuals = founders + offspring)
A_full <- masreml::build_A_ped(ped_int)

# Restrict to the phenotyped offspring before fitting.
A_pheno <- A_full[d$pheno$id, d$pheno$id]

# A-BLUP: the pedigree-derived relationship matrix as the single
# random-effect kernel. The slot key here is `snp_add` simply because
# that is the recognised label for an additive-relationship slot in
# masreml's G list — its contents are A, not a SNP-derived G.
fit_a <- masreml::masreml(
  y     = y,
  X     = X_fixed,
  G     = list(snp_add = A_pheno),
  trait = "continuous"
)

Note

Single-step GBLUP (combining \(\mathbf{A}\) and \(\mathbf{G}\)). masreml’s G slot recognises only the three keys above; it does not provide a dedicated pedigree-plus-genomic kernel. For single-step prediction, build the combined \(\mathbf{H}\) matrix (Misztal et al. 2009) externally and pass it as G = list(snp_add = H).

Common pitfalls

Warning

Founders must appear in the pedigree. A common mistake is including only offspring rows. build_A_ped() walks the matrix bottom-up; if a sire row is missing, its descendants lose their genetic-relationship columns and inbreeding estimates collapse to 0. The demo pedigree’s 220 rows include 20 founder rows for this reason.

Warning

Missing parents must be NA (or 0 after conversion), not the empty string or "NA". is.na() is the only check that triggers founder handling — the literal text "NA" is not the same as NA_character_.

Warning

Pedigree row order matters for the integer-index conversion shown above. Each parent must appear in the data frame before (i.e. at a lower row index than) any of its offspring. The demo pedigree satisfies this — sires/dams come first, then offspring. If your real pedigree has cycles or is unordered, sort topologically before passing to build_A_ped():

# Toy topological sort assuming no cycles
sort_pedigree <- function(ped) {
  done <- character()
  out  <- ped[0, ]
  while (nrow(ped) > 0) {
    ready <- is.na(ped$sire) | (ped$sire %in% done)
    ready <- ready & (is.na(ped$dam) | (ped$dam %in% done))
    if (!any(ready)) stop("Pedigree has cycles or missing parent rows.")
    out  <- rbind(out, ped[ready, ])
    done <- c(done, ped$id[ready])
    ped  <- ped[!ready, ]
  }
  out
}

Tip

Pedigree is optional for marker-based prediction. If you only need GBLUP / BayesA / BayesR on markers, skip the pedigree entirely and use build_G_snp() or build_G_mh() on the genotype matrix. The pedigree matters when you have un-genotyped individuals (then A covers them) or when you want to compare \(A\) vs \(G\) partitioning.

📐 Related theory: Mixed Model & BLUP Family

🛠 Used in: Genomic Prediction