Using statistical learning to predict survival of passengers on the RMS Titanic

dc.contributor.authorWhitley, Michael Aaron
dc.date.accessioned2015-11-19T19:22:08Z
dc.date.available2015-11-19T19:22:08Z
dc.date.graduationmonthDecember
dc.date.issued2015-12-01
dc.description.abstractWhen exploring data, predictive analytics techniques have proven to be effective. In this report, the efficiency of several predictive analytics methods are explored. During the time of this study, Kaggle.com, a data science competition website, had the predictive modeling competition, "Titanic: Machine Learning from Disaster" available. This competition posed a classification problem to build a predictive model to predict the survival of passengers on the RMS Titanic. The focus of our approach was on applying a traditional classification and regression tree algorithm. The algorithm is greedy and can over fit the training data, which consequently can yield non-optimal prediction accuracy. In efforts to correct such issues with using the classification and regression tree algorithm, we have implemented cost complexity pruning and ensemble methods such as bagging and random forests. However, no improvement was observed here which may be an artifact associated with the Titanic data and may not be representative of those methods’ performances. The decision trees and prediction accuracy of each method are presented and compared. Results indicate that the predictors sex/title, fare price, age, and passenger class are the most important variables in predicting survival of the passengers.
dc.description.advisorChristopher I. Vahl
dc.description.degreeMaster of Science
dc.description.departmentStatistics
dc.description.levelMasters
dc.identifier.urihttp://hdl.handle.net/2097/20541
dc.language.isoen_US
dc.publisherKansas State University
dc.rights© the author. This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/
dc.subjectDecision tree
dc.subjectEnsemble
dc.subjectKaggle
dc.subjectTitanic
dc.subject.umiStatistics (0463)
dc.titleUsing statistical learning to predict survival of passengers on the RMS Titanic
dc.typeReport

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
MichaelWhitley2015.pdf
Size:
694.03 KB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.62 KB
Format:
Item-specific license agreed upon to submission
Description: