More powerful two-sample tests for univariate and high-dimensional data

Date

2019-12-01

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Comparing the means of two populations is a common task in scientific studies. In this dissertation, we consider more powerful tests for testing the equality of means for univariate and high-dimensional settings. In the univariate case, the classical two-sample t-test is not robust to skewed population, and the large-sample test has low accuracy for finite sample sizes. The first part of this dissertation proposes two new types of tests, the TCFU, and the TT tests, for comparing means with unequal-variance populations. The TCFU test uses Welch’s t-statistic as the test statistic and the Cornish-Fisher expansion as its critical values. The TT tests transform Welch’s t-statistic and use the normal percentiles as critical values. Four types of monotone transformations are considered for the TT tests. Power and type I error rate comparison of different tests are conducted theoretically and numerically. Analytical conditions are derived to help practitioners choose a powerful test. Two real-data examples are presented to illustrate the application of the new tests.

The second part considers a more challenging situation: testing the equality of two high-dimensional means. When the sample sizes are much smaller than the dimensionality, it is not viable to construct a uniformly most powerful test. Here we propose a new test based on the average squared component-wise t-statistic. Our new test shares some similarity with the generalized component test (GCT) proposed by Gregory et al. (2015), but it differs from the latter test in the following aspects: (i) our new test constructs a different scaling parameter that can be directly estimated from the data instead of from the t-statistics sequence. (ii) it does not require the stationarity condition implicitly assumed in the GCT test; (iii) the new variance estimator guarantees non-negativeness as it is supposed to have; (iv) the test works well even when components of the data vector have high correlations, as long as such correlation reduces suitably fast as the separation of the component indices increases (at least with polynomial rate). The limiting distribution of the test statistic and the power function are derived. The new test is also compared with several other existing tests through Monte Carlo experiments. With acute lymphoblastic leukemia gene expression data, we demonstrated how the new test can be used to give more consistent results in detecting differently expressed Gene Ontology terms than competing tests.

In the last part of the dissertation, we consider power adjustments to address a question of how to fairly compare the power of competing methods in simulation studies when they have different empirical type I error rates. After discussing some existing methods and their drawbacks, we introduce a new power adjustment method. The new power adjustment method is used to compare the simulation results in the previous two parts of the dissertation.

Description

Keywords

Edgeworth expansion, Asymptotic power, Local alternative, Transformation test and confidence interval, High-dimensional data, Gene set testing

Graduation Month

December

Degree

Doctor of Philosophy

Department

Department of Statistics

Major Professor

Haiyan Wang

Date

2019

Type

Dissertation

Citation