Classification of Twitter disaster data using a hybrid feature-instance adaptation approach

dc.contributor.authorMazloom, Reza
dc.date.accessioned2018-04-20T19:33:56Z
dc.date.available2018-04-20T19:33:56Z
dc.date.graduationmonthMayen_US
dc.date.issued2018-05-01en_US
dc.date.published2018en_US
dc.description.abstractHuge amounts of data that are generated on social media during emergency situations are regarded as troves of critical information. The use of supervised machine learning techniques in the early stages of a disaster is challenged by the lack of labeled data for that particular disaster. Furthermore, supervised models trained on labeled data from a prior disaster may not produce accurate results. To address these challenges, domain adaptation approaches, which learn models for predicting the target, by using unlabeled data from the target disaster in addition to labeled data from prior source disasters, can be used. However, the resulting models can still be affected by the variance between the target domain and the source domain. In this context, we propose to use a hybrid feature-instance adaptation approach based on matrix factorization and the k-nearest neighbors algorithm, respectively. The proposed hybrid adaptation approach is used to select a subset of the source disaster data that is representative of the target disaster. The selected subset is subsequently used to learn accurate supervised or domain adaptation Naïve Bayes classifiers for the target disaster. In other words, this study focuses on transforming the existing source data to bring it closer to the target data, thus overcoming the domain variance which may prevent effective transfer of information from source to target. A combination of selective and transformative methods are used on instances and features, respectively. We show experimentally that the proposed approaches are effective in transferring information from source to target. Furthermore, we provide insights with respect to what types and combinations of selections/transformations result in more accurate models for the target.en_US
dc.description.advisorDoina Carageaen_US
dc.description.degreeMaster of Scienceen_US
dc.description.departmentDepartment of Computer Scienceen_US
dc.description.levelMastersen_US
dc.identifier.urihttp://hdl.handle.net/2097/38872
dc.language.isoen_USen_US
dc.subjectTweet classificationen_US
dc.subjectDomain adaptation
dc.subjectMatrix factorization
dc.subjectK-nearest neighbors
dc.subjectDisaster response
dc.titleClassification of Twitter disaster data using a hybrid feature-instance adaptation approachen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
RezaMazloom2018.pdf
Size:
465.04 KB
Format:
Adobe Portable Document Format
Description:
Thesis
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.62 KB
Format:
Item-specific license agreed upon to submission
Description: