Sample size calculation for competing risk models and design considerations for agricultural experiments

Liu, Rui

Sample size calculation for competing risk models and design considerations for agricultural experiments

Files

RuiLiu2025.pdf (768.36 KB)

Date

2025

Authors

Liu, Rui

Abstract

This dissertation develops methods for the design and analysis of experiments seen in agricultural settings. First, we develop methods for determining requisite sample sizes for obtaining a pre-specified power in competing risk models, a multistate model for survival analysis. Typically, prior to conducting an “official” elaborate and expensive study, a small pilot study is performed to better understand the target population and the differences between treatments. For first-event classic survival analysis, a consistent hazard rate is often assumed to compute the sample size for the official experiment. However, for competing risk models, the hazards for each type of event need to be specified. Additionally, the number of occurrences for each type of event may be small, especially if we further stratify the data regarding the treatment or covariates. Thus, commonly-used approaches for modeling data from the pilot study—for example, using a parametric survival model assuming an exponential distribution—is unlikely to yield useful information from the pilot study. Therefore, we propose to use a flexible parametric survival model, the generalized log-gamma survival model, to extract information from the pilot study. After that, we simulate data related to the data generation mechanism of competing risks to fit a pre-specified statistical test intended for the official study. We repeat the previous step a large number of times to compute power. We adjust the size of the simulated dataset regarding the output power until the output power reaches a pre-specified level. The minimal sample size results in the pre-specified power is the one we recommend for the official experiment. Next, cluster-randomized experiments, in which units are grouped together in some way and treatment is assigned to groups of units, are common in agricultural settings. For example, studies assessing the efficacy of a diet on the weight gain of pigs may assign the diet to pens, rather than individual pigs, as administration of the diet at the pen level is much less costly. However, in many settings, researchers may have some ability to form the clusters of units prior to randomizing treatment. In this study, we determine best practices for forming clusters of units when such an option is available. Under the Neyman-Rubin Causal Model (NRCM), we derive expressions for the efficiency loss due to cluster randomization (relative to complete randomization) in a simplified setting where cluster sizes are equal. We show that efficiency loss is a function of the intra-cluster coefficient and the correlation between potential outcomes. The efficiency loss is minimized—and, in fact, cluster-randomized experiments can outperform completely randomized experiments—when the “discrepancy” between clusters is small—that is, when the distribution of units within each cluster is as similar as possible across clusters. We then introduce a heuristic algorithm for assigning units to clusters to ensure small discrepancy between clusters. We give a thorough simulation studies to verify our results. We conclude by applying our heuristic algorithm to improve divide and conquer methods for performing kernel ridge regression in large-to-massive data settings.

Keywords

Sample size computation, Cluster randomized trials

Graduation Month

August

Degree

Doctor of Philosophy

Department

Department of Statistics

Major Professor

Michael J. Higgins

Type

Dissertation

URI

https://hdl.handle.net/2097/45240

Collections

K-State Electronic Theses, Dissertations, and Reports: 2004 -

Full item page

Sample size calculation for competing risk models and design considerations for agricultural experiments

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Graduation Month

Degree

Department

Major Professor

Date

Type

Citation

URI

Collections