Poster Presentation 39th Annual Lorne Genome Conference 2018

Confident effect sizes controlling FDR provide an ideal ranking of differentially expressed genes (#163)

Paul F Harrison 1 , Andrew D Pattison 2 , David R Powell 1 , Traude H Beilharz 2
  1. Monash Bioinformatics Platform, Monash University, Clayton, VIC, Australia
  2. Biomedicine Discovery Institute, Monash University, Clayton, VIC, Australia

A method is described for giving a confidence bound on the magnitude of Log Fold Change (LFC) in gene expression while controlling the False Discovery Rate (FDR). We propose this confidence bound as an ideal quantity by which to rank differentially expressed genes when presenting the results of an RNA-seq experiment. The method builds on the TREAT method of McCarthy and Smyth (2009). Unlike TREAT, a minimum LFC of interest does not need to be specified. The only parameter is the desired FDR, for which a reasonable default value can be given.

Sorting by p-value is a common default in the output of differential expression software. We compare this to our method of ranking genes, using a breast cancer RNA-seq data-set consisting of matched tumor-normal pairs. The top genes as ranked by p-value have small but consistent differential expression, whereas the top genes as ranked by confidence bound have a much larger magnitute of differential expression but also higher variability. This leads to a difference in biological interpretation, with greater emphasis placed on genes related to the extra-cellular matrix by our confidence bound method.

The confidence bound method degrades gracefully on subsets of samples in this data-set. For experiments with low statistical power, the ranking is similar to the p-value ranking, but as the power of an experiment increases the ranking is increasingly determined by the true effect size. Comparing the confidence bound and estimated LFC of top genes provides immediate feedback on whether or not an experiment was under-powered. As such, we propose our method as a better default method of ranking differentially expressed genes.

An R package implementing the method is available at https://github.com/pfh/topconfects

  1. McCarthy, D. J., and Smyth, G. K. (2009). Testing significance relative to a fold-change threshold is a TREAT. Bioinformatics 25, 765-771. http://bioinformatics.oxfordjournals.org/content/25/6/765