Prediction and variable selection in sparse ultrahigh dimensional additive models

K-REx Repository

Show simple item record Ramirez, Girly Manguba 2013-07-17T16:34:11Z 2013-07-17T16:34:11Z 2013-07-17
dc.description.abstract The advance in technologies has enabled many fields to collect datasets where the number of covariates (p) tends to be much bigger than the number of observations (n), the so-called ultrahigh dimensionality. In this setting, classical regression methodologies are invalid. There is a great need to develop methods that can explain the variations of the response variable using only a parsimonious set of covariates. In the recent years, there have been significant developments of variable selection procedures. However, these available procedures usually result in the selection of too many false variables. In addition, most of the available procedures are appropriate only when the response variable is linearly associated with the covariates. Motivated by these concerns, we propose another procedure for variable selection in ultrahigh dimensional setting which has the ability to reduce the number of false positive variables. Moreover, this procedure can be applied when the response variable is continuous or binary, and when the response variable is linearly or non-linearly related to the covariates. Inspired by the Least Angle Regression approach, we develop two multi-step algorithms to select variables in sparse ultrahigh dimensional additive models. The variables go through a series of nonlinear dependence evaluation following a Most Significant Regression (MSR) algorithm. In addition, the MSR algorithm is also designed to implement prediction of the response variable. The first algorithm called MSR-continuous (MSRc) is appropriate for a dataset with a response variable that is continuous. Simulation results demonstrate that this algorithm works well. Comparisons with other methods such as greedy-INIS by Fan et al. (2011) and generalized correlation procedure by Hall and Miller (2009) showed that MSRc not only has false positive rate that is significantly less than both methods, but also has accuracy and true positive rate comparable with greedy-INIS. The second algorithm called MSR-binary (MSRb) is appropriate when the response variable is binary. Simulations demonstrate that MSRb is competitive in terms of prediction accuracy and true positive rate, and better than GLMNET in terms of false positive rate. Application of MSRb to real datasets is also presented. In general, MSR algorithm usually selects fewer variables while preserving the accuracy of predictions. en_US
dc.language.iso en_US en_US
dc.publisher Kansas State University en
dc.subject Variable selection en_US
dc.subject Prediction en_US
dc.subject Smoothing en_US
dc.subject Additive models en_US
dc.subject Parsity en_US
dc.subject Ultrahigh dimensional en_US
dc.title Prediction and variable selection in sparse ultrahigh dimensional additive models en_US
dc.type Dissertation en_US Doctor of Philosophy en_US
dc.description.level Doctoral en_US
dc.description.department Department of Statistics en_US
dc.description.advisor Haiyan Wang en_US
dc.subject.umi Statistics (0463) en_US 2013 en_US August en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search K-REx

Advanced Search


My Account


Center for the

Advancement of Digital


118 Hale Library

Manhattan KS 66506

(785) 532-7444