Controlling for sources of variability within a targeted experiment is standard practice in gene expression studies, where transcriptional noise is inherent and not fully understood. A common control is to condition on genetic background, usually through inbred model organisms, strains, clones or cell lines. Although this decreases variability within the experimental system, generalizing to another genetic background or strain is perilous. This is rarely more true than in translational or comparative biology, where genetic background changes profoundly (i.e., across species).
In order to assess the impact of genetic background on mammalian transcriptomic findings, we exploited the polyembrony of the wild nine-banded armadillo (Dasypus novemcinctus). It is an ideal model system for our purposes since it is outbred, yet produces monozygotic quadruplets in every litter, serving as natural biological replicates. Surprisingly, this unique reproductive mode has yet to be exploited in transcriptional studies, even though its discovery well over a hundred years ago was critical to the field of developmental biology.
First, we sequenced the blood transcriptomes of five litters of armadillo quadruplets to generally assess transcriptional variation. We found 2982 genes with human and mouse homologs exhibiting statistically significant differences (FDR<0.001) between quadruplet sets, indicating variability sensitive to genetic background. Immune function and cell cycle homologs were particularly prominent. To determine if these genes, sensitive to genetic background, were generating spurious results within mouse and human experiments, we performed a meta-analysis across 3275 pre-existing gene expression studies. We find highly variable genes are often called differentially expressed in mouse (rho=0.28, p<2.2e-16) and human (rho=0.27, p<2.2e-16).
Our findings suggest that genes sensitive to genetic background can be easily identified and are a potentially useful probe for results that will not generalize across species, helping to address the replicability “crisis” in transcriptomic, functional genomics and beyond.