Correlation and variance stabilization in the two group comparison case in high dimensional data under dependencies



Journal Title

Journal ISSN

Volume Title


Kansas State University


Multiple testing research has undergone renewed focus in recent years as advances in high throughput technologies have produced data on unprecedented scales. Much of the focus has been on false discovery rates (FDR) and related quantities that are estimated (or controlled for) in large scale multiple testing situations. Recent papers by Efron have directly addressed this issue and incorporated measures to account for high-dimensional correlation structure when estimating false discovery rates and when estimating a density. Other authors also have proposed methods to control or estimate FDR under dependencies with certain assumptions. However, not much focus is given to the stability of the results obtained under dependencies in the literature. This work begins by demonstrating the effect of dependence structure on the variance of the number of discoveries and the false discovery proportion (FDP). A variance of the number of discoveries is shown and the density of a test statistic, conditioned on the status (reject or failure to reject) of a different correlated test, is derived. A closed form solution to the correlation between test statistics is also derived. This correlation is a combination of correlations and variances of the data within groups being compared. It is shown that these correlations among the test statistics affect the conditional density and alters the threshold for significance of a correlated test, causing instability in the results. The concept of performing tests within networks, Conditional Network Testing (CNT) is introduced. This method is based on the conditional density mentioned above and uses the correlation between test statistics to construct networks. A method to simulate realistic data with preserved dependence structures is also presented. CNT is evaluated using simple simulations and the proposed simulation method. In addition, existing methods that controls false discovery rates are used on t-tests and CNT for comparing performance. It was shown that the false discovery proportion and type I error proportions are smaller when using CNT versus using t-tests and, in general, results are more stable when applied to CNT. Finally, applications and steps to further improve CNT are discussed.



FDR, Correlation, Test, Hypothesis, Significance, Covariance

Graduation Month



Doctor of Philosophy


Department of Statistics

Major Professor

Gary L. Gadbury