More powerful two-sample tests for univariate and high-dimensional data

K-REx Repository

Show simple item record Zhang, Huaiyu 2019-11-13T19:50:12Z 2019-11-13T19:50:12Z 2019-12-01
dc.description.abstract Comparing the means of two populations is a common task in scientific studies. In this dissertation, we consider more powerful tests for testing the equality of means for univariate and high-dimensional settings. In the univariate case, the classical two-sample t-test is not robust to skewed population, and the large-sample test has low accuracy for finite sample sizes. The first part of this dissertation proposes two new types of tests, the TCFU, and the TT tests, for comparing means with unequal-variance populations. The TCFU test uses Welch’s t-statistic as the test statistic and the Cornish-Fisher expansion as its critical values. The TT tests transform Welch’s t-statistic and use the normal percentiles as critical values. Four types of monotone transformations are considered for the TT tests. Power and type I error rate comparison of different tests are conducted theoretically and numerically. Analytical conditions are derived to help practitioners choose a powerful test. Two real-data examples are presented to illustrate the application of the new tests. The second part considers a more challenging situation: testing the equality of two high-dimensional means. When the sample sizes are much smaller than the dimensionality, it is not viable to construct a uniformly most powerful test. Here we propose a new test based on the average squared component-wise t-statistic. Our new test shares some similarity with the generalized component test (GCT) proposed by Gregory et al. (2015), but it differs from the latter test in the following aspects: (i) our new test constructs a different scaling parameter that can be directly estimated from the data instead of from the t-statistics sequence. (ii) it does not require the stationarity condition implicitly assumed in the GCT test; (iii) the new variance estimator guarantees non-negativeness as it is supposed to have; (iv) the test works well even when components of the data vector have high correlations, as long as such correlation reduces suitably fast as the separation of the component indices increases (at least with polynomial rate). The limiting distribution of the test statistic and the power function are derived. The new test is also compared with several other existing tests through Monte Carlo experiments. With acute lymphoblastic leukemia gene expression data, we demonstrated how the new test can be used to give more consistent results in detecting differently expressed Gene Ontology terms than competing tests. In the last part of the dissertation, we consider power adjustments to address a question of how to fairly compare the power of competing methods in simulation studies when they have different empirical type I error rates. After discussing some existing methods and their drawbacks, we introduce a new power adjustment method. The new power adjustment method is used to compare the simulation results in the previous two parts of the dissertation. en_US
dc.language.iso en_US en_US
dc.subject Edgeworth expansion en_US
dc.subject Asymptotic power en_US
dc.subject Local alternative en_US
dc.subject Transformation test and confidence interval en_US
dc.subject High-dimensional data en_US
dc.subject Gene set testing en_US
dc.title More powerful two-sample tests for univariate and high-dimensional data en_US
dc.type Dissertation en_US Doctor of Philosophy en_US
dc.description.level Doctoral en_US
dc.description.department Department of Statistics en_US
dc.description.advisor Haiyan Wang en_US 2019 en_US December en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search K-REx

Advanced Search


My Account


Center for the

Advancement of Digital