Abstract

Visual Mining Methods for RNA-Seq Data: Data Structure, Dispersion Estimation and Significance Testing

Tengfei Yin, Mahbubul Majumder, Niladri Roy Chowdhury, Dianne Cook, Randy Shoemaker and Michelle Graham

In an analysis of RNA-Seq data from soybeans, initial significance testing using one software package produced very different gene lists from those yielded by another. How can this happen? This paper demonstrates how the disparities between the results were investigated, and can be explained. This type of contradiction can occur more generally in high-throughput analyses. To explore the model fitting and hypothesis testing, we implemented an interactive graphic that allows the exploration of the effect of dispersion estimation on the overall estimation of variance and differential expression tests. In addition, we propose a new procedure to test for the presence of any structure in biological data.