Domain adaptation for classifying disaster-related Twitter data

Sopova, Oleksandra

Domain adaptation for classifying disaster-related Twitter data

Files

OleksandraSopova2017.pdf (307.06 KB)

Date

2017-05-01

Authors

Sopova, Oleksandra

Publisher

Kansas State University

Abstract

Machine learning is the subfield of Artificial intelligence that gives computers the ability to learn without being explicitly programmed, as it was defined by Arthur Samuel - the American pioneer in the field of computer gaming and artificial intelligence who was born in Emporia, Kansas. Supervised Machine Learning is focused on building predictive models given labeled training data. Data may come from a variety of sources, for instance, social media networks. In our research, we use Twitter data, specifically, user-generated tweets about disasters such as floods, hurricanes, terrorist attacks, etc., to build classifiers that could help disaster management teams identify useful information. A supervised classifier trained on data (training data) from a particular domain (i.e. disaster) is expected to give accurate predictions on unseen data (testing data) from the same domain, assuming that the training and test data have similar characteristics. Labeled data is not easily available for a current target disaster. However, labeled data from a prior source disaster is presumably available, and can be used to learn a supervised classifier for the target disaster. Unfortunately, the source disaster data and the target disaster data may not share the same characteristics, and the classifier learned from the source may not perform well on the target. Domain adaptation techniques, which use unlabeled target data in addition to labeled source data, can be used to address this problem. We study single-source and multi-source domain adaptation techniques, using Nave Bayes classifier. Experimental results on Twitter datasets corresponding to six disasters show that domain adaptation techniques improve the overall performance as compared to basic supervised learning classifiers. Domain adaptation is crucial for many machine learning applications, as it enables the use of unlabeled data in domains where labeled data is not available.

Keywords

Domain adaptation, Classification, TwitterMachine learning

Graduation Month

May

Degree

Master of Science

Department

Department of Computing and Information Sciences

Major Professor

Doina Caragea

Type

Report

URI

http://hdl.handle.net/2097/35388

Collections

K-State Electronic Theses, Dissertations, and Reports: 2004 -

Full item page

Domain adaptation for classifying disaster-related Twitter data

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Graduation Month

Degree

Department

Major Professor

Date

Type

Citation

URI

Collections