Correlation and variance stabilization in the two group comparison case in high dimensional data under dependencies

dc.contributor.authorParanagama, Dilan C.
dc.date.accessioned2011-11-29T14:20:11Z
dc.date.available2011-11-29T14:20:11Z
dc.date.graduationmonthDecemberen_US
dc.date.issued2011-11-29
dc.date.published2011en_US
dc.description.abstractMultiple testing research has undergone renewed focus in recent years as advances in high throughput technologies have produced data on unprecedented scales. Much of the focus has been on false discovery rates (FDR) and related quantities that are estimated (or controlled for) in large scale multiple testing situations. Recent papers by Efron have directly addressed this issue and incorporated measures to account for high-dimensional correlation structure when estimating false discovery rates and when estimating a density. Other authors also have proposed methods to control or estimate FDR under dependencies with certain assumptions. However, not much focus is given to the stability of the results obtained under dependencies in the literature. This work begins by demonstrating the effect of dependence structure on the variance of the number of discoveries and the false discovery proportion (FDP). A variance of the number of discoveries is shown and the density of a test statistic, conditioned on the status (reject or failure to reject) of a different correlated test, is derived. A closed form solution to the correlation between test statistics is also derived. This correlation is a combination of correlations and variances of the data within groups being compared. It is shown that these correlations among the test statistics affect the conditional density and alters the threshold for significance of a correlated test, causing instability in the results. The concept of performing tests within networks, Conditional Network Testing (CNT) is introduced. This method is based on the conditional density mentioned above and uses the correlation between test statistics to construct networks. A method to simulate realistic data with preserved dependence structures is also presented. CNT is evaluated using simple simulations and the proposed simulation method. In addition, existing methods that controls false discovery rates are used on t-tests and CNT for comparing performance. It was shown that the false discovery proportion and type I error proportions are smaller when using CNT versus using t-tests and, in general, results are more stable when applied to CNT. Finally, applications and steps to further improve CNT are discussed.en_US
dc.description.advisorGary L. Gadburyen_US
dc.description.degreeDoctor of Philosophyen_US
dc.description.departmentDepartment of Statisticsen_US
dc.description.levelDoctoralen_US
dc.identifier.urihttp://hdl.handle.net/2097/13132
dc.language.isoen_USen_US
dc.publisherKansas State Universityen
dc.subjectFDRen_US
dc.subjectCorrelationen_US
dc.subjectTesten_US
dc.subjectHypothesisen_US
dc.subjectSignificanceen_US
dc.subjectCovarianceen_US
dc.subject.umiStatistics (0463)en_US
dc.titleCorrelation and variance stabilization in the two group comparison case in high dimensional data under dependenciesen_US
dc.typeDissertationen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
DilanParanagama2011.pdf
Size:
1.13 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.61 KB
Format:
Item-specific license agreed upon to submission
Description: