# Either package's load_data() returns the canonical list.
library(masreml) # or masbayes
d <- masreml::load_data("large") # or "small"
# Equivalent via system.file() (returns the bundled path):
path <- system.file("extdata", "demo_data.rds", package = "masreml")
d <- readRDS(path)
# Or load the standalone .rds directly from the site repo:
d <- readRDS("demo-data/main/demo_data.rds")Input Data
Both packages consume the same four kinds of input. This section documents each one’s schema, shows worked examples drawn from the bundled demo dataset, and exposes every artefact as a download — so you can preview the shape, grab the standalone CSV/TSV files, or load the canonical .rds directly with load_data().
| Input | In-memory shape | Page |
|---|---|---|
| SNP genotypes | integer matrix \(n \times p\), dosage 0/1/2 | SNP |
| Microhaplotype calls | integer matrix \(n \times (2 \cdot n_{blocks})\) (compact) | Microhaplotypes |
| Phenotypes (continuous + binary, plus QTL ground truth) | data frame: id, fixed effects, response(s) |
Phenotype |
| Pedigree | data frame: id, sire, dam
|
Pedigree |
The demo bundle
Each subpage uses the same canonical dataset, shipped byte-identically by both sibling packages. Two scales are bundled:
| Size | Individuals | SNPs | MH blocks | \(W_{ah}\) columns | QTL | Use |
|---|---|---|---|---|---|---|
large (main) |
200 (10 sires × 20 offspring) | 100 | 50 | ~149 | 10 | All standard examples |
small (toy) |
100 (10 sires × 10 offspring) | 50 | 25 | ~75 | 5 | Quick-start (runs in seconds) |
Both are simulated with set.seed(42) for the genetic stream and set.seed(41) for the within-family train/test split; re-running demo-data/generate.R produces bit-identical output.
.rds per size, byte-identical to the sibling packages
demo-data/main/demo_data.rds and demo-data/toy/demo_data_small.rds are byte-identical to the files shipped under masbayes/inst/extdata/ and masreml/inst/extdata/. The canonical generator lives in the workspace as tools/make_demo_data.R; the local mirror is demo-data/generate.R.
Loading
d is a named list with 12 slots: snp, mh, allele_freq, pheno, pedigree, qtl, meta, family_id, train_idx, test_idx, map_snp, map_mh. The four subpages describe each slot’s schema and common ways to construct it from your own data.
Within-family train/test split
The bundle ships a stratified split: each of the 10 families contributes 16 offspring to training and 4 to test in large (8/2 in small). This keeps full-sibs out of the test partition’s predictor pool without losing family-level relatedness.
| Size | n_train | n_test | per family |
|---|---|---|---|
large |
160 | 40 | 16 / 4 |
small |
80 | 20 | 8 / 2 |
Indices live in d$train_idx and d$test_idx (row indices into d$snp, d$mh, d$pheno).
Download the bundle
large dataset (~55 KB zipped)
small dataset (~23 KB zipped)
The ZIPs bundle the canonical .rds plus every derived CSV/TSV used on the subpages.
Regenerating locally
demo-data/generate.R requires library(masbayes) (uses construct_wah_matrix()):
Rscript demo-data/generate.R main toy # both sizes
Rscript demo-data/generate.R main # large only
Rscript demo-data/generate.R toy # small onlyRe-running emits the same .rds byte-for-byte (set.seed() at the top of every config).