Prediction and variable selection in sparse ultrahigh dimensional additive models

dc.contributor.authorRamirez, Girly Manguba
dc.date.accessioned2013-07-17T16:34:11Z
dc.date.available2013-07-17T16:34:11Z
dc.date.graduationmonthAugust
dc.date.issued2013-07-17
dc.date.published2013
dc.description.abstractThe advance in technologies has enabled many fields to collect datasets where the number of covariates (p) tends to be much bigger than the number of observations (n), the so-called ultrahigh dimensionality. In this setting, classical regression methodologies are invalid. There is a great need to develop methods that can explain the variations of the response variable using only a parsimonious set of covariates. In the recent years, there have been significant developments of variable selection procedures. However, these available procedures usually result in the selection of too many false variables. In addition, most of the available procedures are appropriate only when the response variable is linearly associated with the covariates. Motivated by these concerns, we propose another procedure for variable selection in ultrahigh dimensional setting which has the ability to reduce the number of false positive variables. Moreover, this procedure can be applied when the response variable is continuous or binary, and when the response variable is linearly or non-linearly related to the covariates. Inspired by the Least Angle Regression approach, we develop two multi-step algorithms to select variables in sparse ultrahigh dimensional additive models. The variables go through a series of nonlinear dependence evaluation following a Most Significant Regression (MSR) algorithm. In addition, the MSR algorithm is also designed to implement prediction of the response variable. The first algorithm called MSR-continuous (MSRc) is appropriate for a dataset with a response variable that is continuous. Simulation results demonstrate that this algorithm works well. Comparisons with other methods such as greedy-INIS by Fan et al. (2011) and generalized correlation procedure by Hall and Miller (2009) showed that MSRc not only has false positive rate that is significantly less than both methods, but also has accuracy and true positive rate comparable with greedy-INIS. The second algorithm called MSR-binary (MSRb) is appropriate when the response variable is binary. Simulations demonstrate that MSRb is competitive in terms of prediction accuracy and true positive rate, and better than GLMNET in terms of false positive rate. Application of MSRb to real datasets is also presented. In general, MSR algorithm usually selects fewer variables while preserving the accuracy of predictions.
dc.description.advisorHaiyan Wang
dc.description.degreeDoctor of Philosophy
dc.description.departmentDepartment of Statistics
dc.description.levelDoctoral
dc.identifier.urihttp://hdl.handle.net/2097/15989
dc.language.isoen_US
dc.publisherKansas State University
dc.rights© the author. This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/
dc.subjectVariable selection
dc.subjectPrediction
dc.subjectSmoothing
dc.subjectAdditive models
dc.subjectParsity
dc.subjectUltrahigh dimensional
dc.subject.umiStatistics (0463)
dc.titlePrediction and variable selection in sparse ultrahigh dimensional additive models
dc.typeDissertation

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
GirlyRamirez2013.pdf
Size:
547.2 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.62 KB
Format:
Item-specific license agreed upon to submission
Description: