Design-based efficiency for analyzing cluster-randomized experiments

Xiong, Yeng

Design-based efficiency for analyzing cluster-randomized experiments

dc.contributor.author	Xiong, Yeng
dc.date.accessioned	2020-08-13T14:52:26Z
dc.date.available	2020-08-13T14:52:26Z
dc.date.graduationmonth	August
dc.date.issued	2020-08-01
dc.description.abstract	Cluster randomized experiments (CREs) have three defining features: (i) treatments are randomized to clusters, or groups of units, rather than units themselves, (ii) clusters are formed a priori to experimentation and without researcher intervention, and (iii) the research objective and analysis is still centered on units. CREs are common, particularly for intervention studies in public health and political studies. Yet, despite their growing popularity, there is still ongoing debate, even among the experts, on their analysis and design methodologies. We center our focus on design-based estimators of the population average treatment effect (PATE) and the standard error (SE) under Neyman-Rubin's potential outcomes framework. The inherent disparity between the experimental and observational units in CREs can lead to some analytical and design challenges---for example, bias, large variability, and/or lack of location invariance. Moreover, randomizing treatments to clusters is known to be less efficient than randomizing to individual units. Conventionally, clusters in CREs are sampled using simple random sampling. Stratifying or matching clusters into pairs based on important covariates can improve precision on estimation. We instead propose a different sampling scheme: sampling with probability proportional to size without replacement. This modification leads to a new estimator of PATE that can accommodate the clustering structure in CREs without having to compromise on desirable statistical properties. We then derive a conservative estimator for the variance of our estimator. We also synthesize the myriad perspectives on designing CREs and produce recommendations on the best design practices. Finally, we introduce our R package analyzeCRE that implements the theoretical work in this dissertation and provide a guide on how to execute the functions for analyzing and designing CREs.
dc.description.advisor	Michael J. Higgins
dc.description.degree	Doctor of Philosophy
dc.description.department	Department of Statistics
dc.description.level	Doctoral
dc.identifier.uri	https://hdl.handle.net/2097/40819
dc.language.iso	en_US
dc.publisher	Kansas State University
dc.rights	© the author. This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/
dc.subject	Cluster-randomized experiments
dc.subject	Probability-proportional-to-size sampling
dc.subject	Neyman-Rubin causal model
dc.subject	Potential outcomes
dc.subject	Population average treatment effect
dc.title	Design-based efficiency for analyzing cluster-randomized experiments
dc.type	Dissertation