Mean-weighted case specific random forests for estimating causal effects

Date

2021-08-01

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Causal inference is a branch of statistics that deals with determining how responses are affected by treatments. In this dissertation, we examine two problems in causal inference under the Neyman-Rubin causal model (NRCM): estimation of counterfactuals—hypothetical unobserved responses of units under different treatment conditions—and treatment effect estimation under treatment spillover—when the treatment status of one unit affects the response of another. First, we extend the case specific random forest (CSRF) methodology to develop mean- weighted case specific random forests (MWCSRF) to estimate the average treatment effect for the treated (ATT). We consider a setting under which the data contains many control and very few treated units, and covariate space for the treated units is a small subspace of that for the control units. For example, treated units may be those that underwent an experimental procedure and control units may be the set of units in a national database. Our approach is as follows. First, we compute bootstrap sample weights for each treated unit to oversample control units nearby the treated unit. Then, we average these weights together to construct one set of “treated” sample weights. Next, we use random forests to estimate the prognostic score—the expected control outcome given a set of covariates— for each treated unit. Finally, we estimate the ATT by taking the average difference of the responses and the estimated prognostic scores across all treated units. We show via a simulation study that MWCSRF performs favorably compared to the standard random forest, causal forests, and genetic matching under both homogeneous and heterogeneous treatment effect settings, especially when the number of treated units is small. Additionally, we demonstrate that, when parallelization is not available, MWCSRF requires significantly less runtime than CSRF. We confirm our findings on a study on the efficacy of the National Supported Work Demonstration. Additionally, we develop an R package for MWCSRF. Secondly, we discuss the problem of treatment spillover in the context of Fisher’s Lady Tasting Tea experiment. We show that, by design, Lady Tasting Tea can violate the stable unit treatment value assumption (SUTVA), which requires the response of a unit to be only affected by the treatment status of that unit. We show that SUTVA may be violated under this model even when, for a given cup, the Lady’s milk-first likelihood is always higher when that cup actually receives milk first. Moreover, we show that SUTVA holds under two conditions: one in which the Lady’s likelihood for a cup is the same regardless of whether that cup was given milk first or tea first, and one in which the Lady always makes perfect guesses. These results further emphasize that SUTVA cannot be classified solely as treatment spillover problems, but can be inherent in the design of an experiment. Additionally, this result may have implications for teaching causal inference, as it may be preferable to introduce randomized experiments using examples that do not inherently violate SUTVA.

Description

Keywords

Causal inference, Random forest, Stable unit treatment value assumption, Mean-weighting, Counterfactual

Graduation Month

August

Degree

Doctor of Philosophy

Department

Department of Statistics

Major Professor

Michael J. Higgins

Date

2021

Type

Dissertation

Citation