On Multiple Hypothesis Testing Maximizing the Average Power


  •  John Nixon    

Abstract

A general theory is described for making decisions as to which of two modelled hypotheses (that can depend on unknown parameters) best fits each of a set of data sets, such that the average power is maximized. Statistical independence between the large number of data sets of the same type is assumed. Therefore error rates can be expressed as proportions and the continuous approach to the data model is used. The framework of decision theory is used and the equivalence between different criteria for optimization is demonstrated. General procedures are shown to satisfy this criterion in the cases when each hypothesis has a finite number of unknown parameters, and when the alternative hypothesis is vacuous. If the null hypothesis is determined by a known distribution of a test statistic, this reduces to using the density of $p$-values of this test statistic as the final test statistic to rank the data into significance order. For two scenarios, one of three density estimation methods based on the kernel density estimate gave a result almost equivalent in power to the likelihood ratio test that uses full knowledge of the null and alternative models, and compared favourably with the optimal discovery procedure (ODP) and its iterated variant. For genetic expression data from microarrays and more recently RNA-Seq experiments where the data for different genes are not generally independent, it is suggested to use this technique with the $p$-values from methods such as Surrogate Variable Analysis that removes much of the effects of dependence.


This work is licensed under a Creative Commons Attribution 4.0 License.