High-dimensional variable selection in longitudinal and nonlinear gene-environment interaction studies

dc.contributor.authorZhou, Fei
dc.date.accessioned2021-07-28T14:48:05Z
dc.date.available2021-07-28T14:48:05Z
dc.date.graduationmonthAugusten_US
dc.date.published2021en_US
dc.description.abstractVariable selection from both the frequentist and Bayesian frameworks has gained increasing popularity in the analysis of high-dimensional genomic data. Despite the success of existing studies, challenges still remain as tailored methods for sparse interaction structures are not available when the response variables are repeatedly measured and/or have heavy-tailed distributions. These challenges have motivated the development of novel variable selection methods proposed in the following projects. Meanwhile, powerful software packages from these projects are publically available to facilitate fast and reliable computation, as well as reproducible research. In the first project, we have developed a novel penalized variable selection method to identify important lipid–environment interactions in a longitudinal lipidomics study, where the environment factors refer to a group of dummy variables corresponding to a four-level treatment factor. An efficient Newton–Raphson based algorithm was proposed within the generalized estimating equation (GEE) framework. Simulation studies have demonstrated the superior performance of our method over alternatives, in terms of both identification accuracy and prediction performance. Analysis of the high-dimensional lipid datasets collected using mice from the skin cancer prevention study identified meaningful markers that provide fresh insight into the underlying mechanism of cancer preventive effects. In the second project, we have proposed a sparse group penalization method for the bi-level GxE interaction study under the repeatedly measured phenotype to accommodate more general environment factors. Within the quadratic inference function (QIF) framework, the proposed method can achieve simultaneous identification of main and interaction effects on both the group and individual level. We conducted simulation studies to establish the advantage of the proposed regularization methods. In the case study, the environment factors include age, gender and treatment, which are either continuous or categorical. Our method leads to improved prediction and identification of main and interaction effects with important implications. In the third project, a sparse Bayesian quantile varying coefficient model has been developed for non-linear GxE studies. The proposed model can accommodate heavy-tailed errors and outliers from the disease phenotypes while pinpointing important non-linear interactions through Bayesian variable selection based on spike-and-slab priors. Fast computation has been facilitated by the efficient Gibbs sampler. Simulation studies and real data analysis with age as the univariate environment factor have been performed to show the superiority of the proposed method over multiple competing alternatives. The open source R packages with C++ implementations of all the methods under comparison have been provided along this dissertation. The R packages interep and springer, for the first two projects respectively, are available on CRAN. The R package for the last project on Bayesian regularized quantile varying coefficient model will be released soon to the public.en_US
dc.description.advisorCen Wuen_US
dc.description.degreeDoctor of Philosophyen_US
dc.description.departmentDepartment of Statisticsen_US
dc.description.levelDoctoralen_US
dc.identifier.urihttps://hdl.handle.net/2097/41579
dc.language.isoen_USen_US
dc.subjectGene-environment interactionen_US
dc.subjectRegularizationen_US
dc.subjectLongitudinalen_US
dc.subjectBayesian variable selectionen_US
dc.subjectQuantile regressionen_US
dc.subjectNon-parametric modelingen_US
dc.titleHigh-dimensional variable selection in longitudinal and nonlinear gene-environment interaction studiesen_US
dc.typeDissertationen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
FeiZhou2021.pdf
Size:
1.45 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.62 KB
Format:
Item-specific license agreed upon to submission
Description: