Domain adaptation for classifying disaster-related Twitter data

K-REx Repository

Show simple item record Sopova, Oleksandra 2017-04-17T15:05:00Z 2017-04-17T15:05:00Z 2017-05-01 en_US
dc.description.abstract Machine learning is the subfield of Artificial intelligence that gives computers the ability to learn without being explicitly programmed, as it was defined by Arthur Samuel - the American pioneer in the field of computer gaming and artificial intelligence who was born in Emporia, Kansas. Supervised Machine Learning is focused on building predictive models given labeled training data. Data may come from a variety of sources, for instance, social media networks. In our research, we use Twitter data, specifically, user-generated tweets about disasters such as floods, hurricanes, terrorist attacks, etc., to build classifiers that could help disaster management teams identify useful information. A supervised classifier trained on data (training data) from a particular domain (i.e. disaster) is expected to give accurate predictions on unseen data (testing data) from the same domain, assuming that the training and test data have similar characteristics. Labeled data is not easily available for a current target disaster. However, labeled data from a prior source disaster is presumably available, and can be used to learn a supervised classifier for the target disaster. Unfortunately, the source disaster data and the target disaster data may not share the same characteristics, and the classifier learned from the source may not perform well on the target. Domain adaptation techniques, which use unlabeled target data in addition to labeled source data, can be used to address this problem. We study single-source and multi-source domain adaptation techniques, using Nave Bayes classifier. Experimental results on Twitter datasets corresponding to six disasters show that domain adaptation techniques improve the overall performance as compared to basic supervised learning classifiers. Domain adaptation is crucial for many machine learning applications, as it enables the use of unlabeled data in domains where labeled data is not available. en_US
dc.language.iso en_US en_US
dc.publisher Kansas State University en
dc.subject Machine learning en_US
dc.subject Domain adaptation
dc.subject Classification
dc.subject Twitter
dc.title Domain adaptation for classifying disaster-related Twitter data en_US
dc.type Report en_US Master of Science en_US
dc.description.level Masters en_US
dc.description.department Department of Computing and Information Sciences en_US
dc.description.advisor Doina Caragea en_US 2017 en_US May en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search K-REx

Advanced Search


My Account


Center for the

Advancement of Digital