Machine learning and data science for a household-specific poverty level prediction task

dc.contributor.authorVenkatramolla, Sudesh Kumar
dc.date.accessioned2019-04-16T20:57:18Z
dc.date.available2019-04-16T20:57:18Z
dc.date.graduationmonthMayen_US
dc.date.issued2019-05-01
dc.date.published2019en_US
dc.description.abstractThis project focuses on a prediction task from the Kaggle data science challenge site: prediction of the poverty level of individual households using supervised classification learning. In Latin America, the Proxy Means Test (PMT) is the most popular method used to verify the income qualification. The PMT works by considering the observable properties of a household, such as the walls, ceilings, and electric devices in a family home. These and other general assets are used to classify the poverty level, assigning one of the four labels: (1) extreme poverty, (2) moderate poverty, (3) vulnerable households and (4) non-vulnerable households. The accuracy of learned classification models submitted as solutions to this data challenge has tended to decrease as a function of dataset size. Therefore, in this project, I am focusing on methods for boosting accuracy in detecting poverty level using committee machines (bagging, boosting, etc.) for supervised inductive learning. Because the task is classification learning, my first approach is to apply random forests (a decision tree ensemble method); depending on the accuracy, I will proceed with the advanced methods, such as light gradient-boosting methods (GBMs) and neural networks that are frequently used on large, complex multivariate classification tasks. The inference task is to predict the poverty level of a new household using attributes of the family home and other attributes found to be relevant by the learning algorithm. This enables use of cases of artificial intelligence for social good, such as helping governments and relief and economic development agencies to identify communities in need.en_US
dc.description.advisorWilliam H. Hsuen_US
dc.description.degreeMaster of Scienceen_US
dc.description.departmentDepartment of Computer Scienceen_US
dc.description.levelMastersen_US
dc.identifier.urihttp://hdl.handle.net/2097/39520
dc.language.isoen_USen_US
dc.subjectMachine Learningen_US
dc.subjectData Scienceen_US
dc.subjectPredictionen_US
dc.subjectClassificationen_US
dc.titleMachine learning and data science for a household-specific poverty level prediction tasken_US
dc.typeReporten_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
SudeshkumarVenkatramolla2019.pdf
Size:
486.93 KB
Format:
Adobe Portable Document Format
Description:
Masters Report
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.62 KB
Format:
Item-specific license agreed upon to submission
Description: