Mitochondria are critical to cell survival as the host cell’s energy source and in regulating cell metabolism. Mitochondria’s role in cancers, degenerative diseases and ageing are increasing in prominence, and better analysis tools are required to further identify their contributions to such conditions. We are developing a clustering mechanism that uses a novel deep learning system and unsupervised learning to extract features from mitochondrial genome data at multiple dimensions. This system then can be quickly and easily re-trained to analyse mitochondria in multiple ways with minimal sample data for specialised classification of any condition or trait. We use a “convolutional autoencoder” to reduce the dimensionality of the data and use the reducer part of the autoencoder as a basis of a trained DL system. We will then demonstrate that the generated encoder represents mitochondria and can be used as a knowledge source that can be applied to identify certain mitochondrial traits and conditions with minimal supervised training. The technique we use for the mitochondrial genome is general and is applicable to the whole genome or any selected proportions of it.
MitoWisdom currently uses sequenced DNA for its main input, but can also use RNA data as well. It extracts information from BAM or SAM files, then uses this digested ~16600x4 input to extract multiple-dimensional features from the DNA and/or RNA to shape the network. It is designed to be input agnostic, so as long the input can be converted to a NxM matrix it can be fed into the system. Once trained, the encoder part of the system can directly be used by itself or can be included as a part of a larger network that is trained to learn more complex inter-chromosome interactions.