Input Data

Both packages consume the same four kinds of input. This section documents each one’s schema, shows worked examples drawn from the bundled demo dataset, and exposes every artefact as a download — so you can preview the shape, grab the standalone CSV/TSV files, or load the canonical .rds directly with load_data().

Input In-memory shape Page
SNP genotypes integer matrix \(n \times p\), dosage 0/1/2 SNP
Microhaplotype calls integer matrix \(n \times (2 \cdot n_{blocks})\) (compact) Microhaplotypes
Phenotypes (continuous + binary, plus QTL ground truth) data frame: id, fixed effects, response(s) Phenotype
Pedigree data frame: id, sire, dam Pedigree

The demo bundle

Each subpage uses the same canonical dataset, shipped byte-identically by both sibling packages. Two scales are bundled:

Size Individuals SNPs MH blocks \(W_{ah}\) columns QTL Use
large (main) 200 (10 sires × 20 offspring) 100 50 ~149 10 All standard examples
small (toy) 100 (10 sires × 10 offspring) 50 25 ~75 5 Quick-start (runs in seconds)

Both are simulated with set.seed(42) for the genetic stream and set.seed(41) for the within-family train/test split; re-running demo-data/generate.R produces bit-identical output.

One .rds per size, byte-identical to the sibling packages

demo-data/main/demo_data.rds and demo-data/toy/demo_data_small.rds are byte-identical to the files shipped under masbayes/inst/extdata/ and masreml/inst/extdata/. The canonical generator lives in the workspace as tools/make_demo_data.R; the local mirror is demo-data/generate.R.

Loading

# Either package's load_data() returns the canonical list.
library(masreml)   # or masbayes
d <- masreml::load_data("large")    # or "small"

# Equivalent via system.file() (returns the bundled path):
path <- system.file("extdata", "demo_data.rds", package = "masreml")
d    <- readRDS(path)

# Or load the standalone .rds directly from the site repo:
d    <- readRDS("demo-data/main/demo_data.rds")

d is a named list with 12 slots: snp, mh, allele_freq, pheno, pedigree, qtl, meta, family_id, train_idx, test_idx, map_snp, map_mh. The four subpages describe each slot’s schema and common ways to construct it from your own data.

Within-family train/test split

The bundle ships a stratified split: each of the 10 families contributes 16 offspring to training and 4 to test in large (8/2 in small). This keeps full-sibs out of the test partition’s predictor pool without losing family-level relatedness.

Size n_train n_test per family
large 160 40 16 / 4
small 80 20 8 / 2

Indices live in d$train_idx and d$test_idx (row indices into d$snp, d$mh, d$pheno).

Download the bundle

large dataset (~55 KB zipped)

📦 demo_data.rds   📦 main.zip

small dataset (~23 KB zipped)

📦 demo_data_small.rds   📦 toy.zip

The ZIPs bundle the canonical .rds plus every derived CSV/TSV used on the subpages.

Regenerating locally

demo-data/generate.R requires library(masbayes) (uses construct_wah_matrix()):

Rscript demo-data/generate.R main toy   # both sizes
Rscript demo-data/generate.R main       # large only
Rscript demo-data/generate.R toy        # small only

Re-running emits the same .rds byte-for-byte (set.seed() at the top of every config).