Input Data

Both packages consume the same four kinds of input. This section documents each one’s schema, shows worked examples drawn from the bundled demo dataset, and exposes every artefact as a download — so you can preview the shape, grab the standalone CSV/TSV files, or load the canonical .rds directly with load_data().

Input	In-memory shape	Page
SNP genotypes	integer matrix $n \times p$, dosage 0/1/2	SNP
Microhaplotype calls	integer matrix $n \times (2 \cdot n_{blocks})$ (compact)	Microhaplotypes
Phenotypes (continuous + binary, plus QTL ground truth)	data frame: `id`, fixed effects, response(s)	Phenotype
Pedigree	data frame: `id`, `sire`, `dam`	Pedigree

The demo bundle

Each subpage uses the same canonical dataset, shipped byte-identically by both sibling packages. Two scales are bundled:

Size	Individuals	SNPs	MH blocks	$W_{ah}$ columns	QTL	Use
`large` (`main`)	200 (10 sires × 20 offspring)	100	50	~149	10	All standard examples
`small` (`toy`)	100 (10 sires × 10 offspring)	50	25	~75	5	Quick-start (runs in seconds)

Both are simulated with set.seed(42) for the genetic stream and set.seed(41) for the within-family train/test split; re-running demo-data/generate.R produces bit-identical output.

One .rds per size, byte-identical to the sibling packages

demo-data/main/demo_data.rds and demo-data/toy/demo_data_small.rds are byte-identical to the files shipped under masbayes/inst/extdata/ and masreml/inst/extdata/. The canonical generator lives in the workspace as tools/make_demo_data.R; the local mirror is demo-data/generate.R.

Loading

# Either package's load_data() returns the canonical list.
library(masreml)   # or masbayes
d <- masreml::load_data("large")    # or "small"

# Equivalent via system.file() (returns the bundled path):
path <- system.file("extdata", "demo_data.rds", package = "masreml")
d    <- readRDS(path)

# Or load the standalone .rds directly from the site repo:
d    <- readRDS("demo-data/main/demo_data.rds")

d is a named list with 12 slots: snp, mh, allele_freq, pheno, pedigree, qtl, meta, family_id, train_idx, test_idx, map_snp, map_mh. The four subpages describe each slot’s schema and common ways to construct it from your own data.

Within-family train/test split

The bundle ships a stratified split: each of the 10 families contributes 16 offspring to training and 4 to test in large (8/2 in small). This keeps full-sibs out of the test partition’s predictor pool without losing family-level relatedness.

Size	n_train	n_test	per family
`large`	160	40	16 / 4
`small`	80	20	8 / 2

Indices live in d$train_idx and d$test_idx (row indices into d$snp, d$mh, d$pheno).

Download the bundle

large dataset (~55 KB zipped)

📦 demo_data.rds 📦 main.zip

small dataset (~23 KB zipped)

📦 demo_data_small.rds 📦 toy.zip

The ZIPs bundle the canonical .rds plus every derived CSV/TSV used on the subpages.

Regenerating locally

demo-data/generate.R requires library(masbayes) (uses construct_wah_matrix()):

Rscript demo-data/generate.R main toy   # both sizes
Rscript demo-data/generate.R main       # large only
Rscript demo-data/generate.R toy        # small only

Re-running emits the same .rds byte-for-byte (set.seed() at the top of every config).