Domain adaptation for classifying disaster-related Twitter data

K-REx Repository

Show simple item record

dc.contributor.author Sopova, Oleksandra
dc.date.accessioned 2017-04-17T15:05:00Z
dc.date.available 2017-04-17T15:05:00Z
dc.date.issued 2017-05-01 en_US
dc.identifier.uri http://hdl.handle.net/2097/35388
dc.description.abstract Machine learning is the subfield of Artificial intelligence that gives computers the ability to learn without being explicitly programmed, as it was defined by Arthur Samuel - the American pioneer in the field of computer gaming and artificial intelligence who was born in Emporia, Kansas. Supervised Machine Learning is focused on building predictive models given labeled training data. Data may come from a variety of sources, for instance, social media networks. In our research, we use Twitter data, specifically, user-generated tweets about disasters such as floods, hurricanes, terrorist attacks, etc., to build classifiers that could help disaster management teams identify useful information. A supervised classifier trained on data (training data) from a particular domain (i.e. disaster) is expected to give accurate predictions on unseen data (testing data) from the same domain, assuming that the training and test data have similar characteristics. Labeled data is not easily available for a current target disaster. However, labeled data from a prior source disaster is presumably available, and can be used to learn a supervised classifier for the target disaster. Unfortunately, the source disaster data and the target disaster data may not share the same characteristics, and the classifier learned from the source may not perform well on the target. Domain adaptation techniques, which use unlabeled target data in addition to labeled source data, can be used to address this problem. We study single-source and multi-source domain adaptation techniques, using Nave Bayes classifier. Experimental results on Twitter datasets corresponding to six disasters show that domain adaptation techniques improve the overall performance as compared to basic supervised learning classifiers. Domain adaptation is crucial for many machine learning applications, as it enables the use of unlabeled data in domains where labeled data is not available. en_US
dc.language.iso en_US en_US
dc.publisher Kansas State University en
dc.subject Machine learning en_US
dc.subject Domain adaptation
dc.subject Classification
dc.subject Twitter
dc.title Domain adaptation for classifying disaster-related Twitter data en_US
dc.type Report en_US
dc.description.degree Master of Science en_US
dc.description.level Masters en_US
dc.description.department Department of Computing and Information Sciences en_US
dc.description.advisor Doina Caragea en_US
dc.date.published 2017 en_US
dc.date.graduationmonth May en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search K-REx


Advanced Search

Browse

My Account

Statistics








Center for the

Advancement of Digital

Scholarship

118 Hale Library

Manhattan KS 66506


(785) 532-7444

cads@k-state.edu