Poster Presentation 39th Annual Lorne Genome Conference 2018

Pre-miRNA Folding Through Context-Free Grammar Parsing and the Identification of miRNA Using a Feedforward Neural Network (#263)

Viktor Prypoten 1 , Sean Gribben 1 , Xavier Pellow 1 , Reena Zelenkova 1 , David Helmerson 1 , Joshua Nibbs 1 , Boris Deletic 1 , Nathan Di Pierro 1 , Andrew Harrison 1 , Linda McIver 1 , Sonika Tyagi 2
  1. John Monash Science School, Melbourne, VIC, Australia
  2. Monash Bioinformatics Platform, Monash University, Melbourne, VIC, Australia

There is currently no efficient way to accurately identify miRNA or predict the structure of miRNA in an ab-initio manner. The purpose of this work is to provide a framework which allows for efficient identification of mature miRNA and folding of pre-miRNA using a feedforward neural network (FFNN) and probabilistic context-free grammar (PCFG) parsing, respectively. The FFNN interprets and provides a prediction of the likelihood, expressed by a probability, of the input being miRNA. The FFNN was trained on a positive set of known human mature miRNAs from the miRBase database and randomly selected sequences from human chromosome 1 used as a negative training set. After training, the FFNN  developed an accuracy of 84% when tested on seperate high confidence potential miRNA sequences of which it was not trained on. The probability of a false negative was found to be 16%, while the probability of a false positive when tested on the negative data was found to be 6x10-4%, indicating a high specificity of predicting mature miRNA. The PCFG was created to predict the structures of pre-miRNA, trained on a set of 1800 human pre-miRNA and tested on 400. Out of all control cases using high confidence miRNA, the program returned folded structures that matched the canonical structures to an accuracy of 81%. Further refinement using free energy models could increase this, this would however significantly affect the runtime of the program due to the computationally intensive nature of free energy models. The results of this work indicates definite patterns in miRNA folding and sequences which could facilitate the development and discovery of new strands and their characteristics. Though the current study was done using human data, the probabilistic models are generic and can be trained to work with different organisms.