Penalized variable selection for gene-environment interactions

Date

2021-05-01

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Gene-environment (GxE) interaction is critical for understanding the genetic basis of complex disease beyond genetic and environment main effects. In addition to existing tools for interaction studies, penalized variable selection emerges as a promising alternative for dissecting GxE interactions. Despite the success, variable selection is limited in the following aspects. First, multidimensional measurements have not been taken into fully account in interaction studies. Published variable selection methods cannot accommodate structured sparsity in the framework of integrating multiomics data for disease outcomes. Second, in the big data context, no variable selection method has been developed so far to conduct tailored interaction analysis. Third, the solution to case control association GxE studies with high dimensional genomics variants in the big data context has not been made available so far. In this dissertation, we tackle these challenges rising from GxE interaction studies in the modern era through the following projects.

In the first project, we have developed a novel variable selection method to integrate multi-omics measurements in GxE interaction studies. Extensive studies have already revealed that analyzing omics data across multi-platforms is not only sensible biologically but also resulting in improved identification and prediction performance. Our integrative model can efficiently pinpoint important regulators of gene expressions through sparse dimensionality reduction and link the disease outcomes to multiple effects in the integrative GxE studies via accommodating a sparse bi-level structure. Simulation studies show the integrative model leads to better identification of GxE interactions and regulators than that of the alternative methods. In two GxE lung cancer studies with high dimensional multi-omics data, the integrative model leads to improved prediction and findings with important biological implications.

In the second project, we propose to conduct interaction studies in the big data context by adopting the divide-and-conquer strategy. In particular, the sparse group variable selection for important GxE effects has been developed within the framework of alternating direction method of multiplier (ADMM). To accommodate the large-scale data in terms of either samples or features, we have developed two novel parallel ADMM based variable selection methods across samples and features, respectively. The corresponding parallel algorithms can be efficiently implemented in distributed computing platforms. Simulation studies demonstrate that the parallel ADMM based penalization methods significantly improve the computational speed for analyzing large scale data from GxE interaction studies with satisfactory identification and prediction performance.

In the third project, we extend the proposed parallel ADMM based variable selection for GxE interactions in the case-control association study of type 2 diabetes. Within the parallel computation framework, we have developed a penalized logistic regression model accommodating the bi-level selection tailored for the case control GxE interaction study. The advantage of the proposed parallel penalization method has been fully illustrated in the distributed learning scenario. Simulation studies show the proposed method dramatically reduces the computational time while maintaining a competitive performance compared to the non-parallel counterparts. In the case study of type 2 diabetes with environmental factors and high dimensional SNP measurements, the proposed parallel penalization method leads to the identification of biologically important interaction effects.

Description

Keywords

Gene-environment interaction, Case-control study, Alternating direction method of multipliers (ADMM), Integrated analysis, High-dimensional variable selection

Graduation Month

May

Degree

Doctor of Philosophy

Department

Department of Statistics

Major Professor

Cen Wu

Date

2021

Type

Dissertation

Citation