Prediction and variable selection in sparse ultrahigh dimensional additive models

Ramirez, Girly Manguba

Prediction and variable selection in sparse ultrahigh dimensional additive models

dc.contributor.author	Ramirez, Girly Manguba
dc.date.accessioned	2013-07-17T16:34:11Z
dc.date.available	2013-07-17T16:34:11Z
dc.date.graduationmonth	August	en_US
dc.date.issued	2013-07-17
dc.date.published	2013	en_US
dc.description.abstract	The advance in technologies has enabled many fields to collect datasets where the number of covariates (p) tends to be much bigger than the number of observations (n), the so-called ultrahigh dimensionality. In this setting, classical regression methodologies are invalid. There is a great need to develop methods that can explain the variations of the response variable using only a parsimonious set of covariates. In the recent years, there have been significant developments of variable selection procedures. However, these available procedures usually result in the selection of too many false variables. In addition, most of the available procedures are appropriate only when the response variable is linearly associated with the covariates. Motivated by these concerns, we propose another procedure for variable selection in ultrahigh dimensional setting which has the ability to reduce the number of false positive variables. Moreover, this procedure can be applied when the response variable is continuous or binary, and when the response variable is linearly or non-linearly related to the covariates. Inspired by the Least Angle Regression approach, we develop two multi-step algorithms to select variables in sparse ultrahigh dimensional additive models. The variables go through a series of nonlinear dependence evaluation following a Most Significant Regression (MSR) algorithm. In addition, the MSR algorithm is also designed to implement prediction of the response variable. The first algorithm called MSR-continuous (MSRc) is appropriate for a dataset with a response variable that is continuous. Simulation results demonstrate that this algorithm works well. Comparisons with other methods such as greedy-INIS by Fan et al. (2011) and generalized correlation procedure by Hall and Miller (2009) showed that MSRc not only has false positive rate that is significantly less than both methods, but also has accuracy and true positive rate comparable with greedy-INIS. The second algorithm called MSR-binary (MSRb) is appropriate when the response variable is binary. Simulations demonstrate that MSRb is competitive in terms of prediction accuracy and true positive rate, and better than GLMNET in terms of false positive rate. Application of MSRb to real datasets is also presented. In general, MSR algorithm usually selects fewer variables while preserving the accuracy of predictions.	en_US
dc.description.advisor	Haiyan Wang	en_US
dc.description.degree	Doctor of Philosophy	en_US
dc.description.department	Department of Statistics	en_US
dc.description.level	Doctoral	en_US
dc.identifier.uri	http://hdl.handle.net/2097/15989
dc.language.iso	en_US	en_US
dc.publisher	Kansas State University	en
dc.subject	Variable selection	en_US
dc.subject	Prediction	en_US
dc.subject	Smoothing	en_US
dc.subject	Additive models	en_US
dc.subject	Parsity	en_US
dc.subject	Ultrahigh dimensional	en_US
dc.subject.umi	Statistics (0463)	en_US
dc.title	Prediction and variable selection in sparse ultrahigh dimensional additive models	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: GirlyRamirez2013.pdf
Size:: 547.2 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.62 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

K-State Electronic Theses, Dissertations, and Reports: 2004 -