Oral Presentation 39th Annual Lorne Genome Conference 2018

Deep-Mpute: Imputation of missing methylation values using deep convolution neural networks (#252)

Akanksha Srivastava 1 , Akshay Asthana 2 , Justin Borevitz 3 , Ryan Lister 1 4
  1. ARC Centre of Excellence in Plant Energy Biology, University of Western Australia (UWA), Perth , WA, Australia
  2. Seeing Machines , Canberra, ACT, Australia
  3. Research School of Biology (RSB), Australian National University, Canberra, ACT, Australia
  4. Harry Perkins Institute of Medical Research, 6 Verdun St, Nedlands, WA, Australia

The development of whole genome sequencing techniques has made it possible to quantify DNA methylation levels at single-base resolution across an entire genome. However, due to the high cost of sequencing, missing or insufficient coverage at individual CpG sites remains a major issue. To tackle this critical problem, we have developed a user-friendly tool for the imputation of methylation values at single CpG sites using a deep convolution neural network (CNN). We demonstrate that our tool overcomes the biases present in current DNA methylation data resulting from variability in sequencing depth across samples and hence can be successfully used for imputing low or missing methylation values before conducting any further analysis, such as the identification of differentially methylated regions. The CNN itself is trained by integrating important features such as neighboring CpG methylation values, genomic distance and the methylation levels from available replicate data. We show that Deep-Mpute can reliably predict the missing methylation values in both plant and mammalian genomes and outperforms current state-of-the-art tools.