Correlation and variance stabilization in the two group comparison case in high dimensional data under dependencies

Date

2011-11-29

Journal Title

Journal ISSN

Volume Title

Publisher

Kansas State University

Abstract

Multiple testing research has undergone renewed focus in recent years as advances in high throughput technologies have produced data on unprecedented scales. Much of the focus has been on false discovery rates (FDR) and related quantities that are estimated (or controlled for) in large scale multiple testing situations. Recent papers by Efron have directly addressed this issue and incorporated measures to account for high-dimensional correlation structure when estimating false discovery rates and when estimating a density. Other authors also have proposed methods to control or estimate FDR under dependencies with certain assumptions. However, not much focus is given to the stability of the results obtained under dependencies in the literature. This work begins by demonstrating the effect of dependence structure on the variance of the number of discoveries and the false discovery proportion (FDP). A variance of the number of discoveries is shown and the density of a test statistic, conditioned on the status (reject or failure to reject) of a different correlated test, is derived. A closed form solution to the correlation between test statistics is also derived. This correlation is a combination of correlations and variances of the data within groups being compared. It is shown that these correlations among the test statistics affect the conditional density and alters the threshold for significance of a correlated test, causing instability in the results. The concept of performing tests within networks, Conditional Network Testing (CNT) is introduced. This method is based on the conditional density mentioned above and uses the correlation between test statistics to construct networks. A method to simulate realistic data with preserved dependence structures is also presented. CNT is evaluated using simple simulations and the proposed simulation method. In addition, existing methods that controls false discovery rates are used on t-tests and CNT for comparing performance. It was shown that the false discovery proportion and type I error proportions are smaller when using CNT versus using t-tests and, in general, results are more stable when applied to CNT. Finally, applications and steps to further improve CNT are discussed.

Description

Keywords

FDR, Correlation, Test, Hypothesis, Significance, Covariance

Graduation Month

December

Degree

Doctor of Philosophy

Department

Department of Statistics

Major Professor

Gary L. Gadbury

Date

2011

Type

Dissertation

Citation