Poster Presentation 39th Annual Lorne Genome Conference 2018

Modelling transcriptional variability in single cell RNA-seq data during human embryogenesis captures changes in the regulation of critical developmental genes   (#213)

Elizabeth A Mason 1 , Shila Ghazanfar 2 , Fredrik Lanner 3 , Jean Yang 2 , Christine A Wells 1
  1. University of Melbourne Centre for Stem Cell Systems, Parkville, VIC, Australia
  2. School of Mathematics and Statistics, University of Sydney, Sydney, NSW, Australia
  3. Department of Clinical Science, Intervention and Technology, Karolinska Institute, Stockholm, Sweden

Human development is a temporally and spatially ordered series of events that occur with remarkable precision. Embryogenesis appears predictable because we observe the average behaviour of many individual cells, even as the number of cells and transcriptional complexity increases during development. When we evaluate single molecules and transcripts the stochastic nature of gene expression is revealed, particularly in single cell RNA-seq experiments (scRNA-seq).  Current methods reduce scRNA-seq data to a trajectory based on the abundance of key regulators of phenotype, and differential abundance is used to identify sub-populations. We present an alternative approach: measuring the transcriptional variability at the gene level informs the level of regulation imposed on it, reflecting an intrinsic property of development that is often overlooked. While linear models have successfully characterized the differences between phenotypes on average, they cannot account for stochastic differences captured by scRNA-seq experiments. Accurately determining abundance is further complicated by the sparseness of non-zero expression values. To address these challenges and evaluate gene expression during human pre-implantation embryogenesis, we applied a statistical mixture model to scRNA-seq data. Fitting the model on a gene-by-gene basis allowed us to evaluate shifts in the proportion of cells expressing a given gene (λ), and also the mean (μ) and standard deviation (σ) of expression. A correlation based analysis evaluated whether abundance (μ) and variability (σ) capture different aspects of transcriptional regulation. While each metric largely identified the same genes, the number and nature of relationships between them differed. Indeed, genes sharing correlated patterns of variability during development were enriched for motifs associated with developmental transcription factors. Variability was more effective than abundance at specifically detecting regulatory relationships during development, and with less redundancy. Our approach provides a gene-centric platform to evaluate population-based parameters of gene expression, while preserving the complexity of scRNA-seq data.