Entity extraction, animal disease-related event recognition and classification from web

dc.contributor.authorVolkova, Svitlana
dc.date.accessioned2010-08-10T13:18:27Z
dc.date.available2010-08-10T13:18:27Z
dc.date.graduationmonthAugusten_US
dc.date.issued2010-08-10T13:18:27Z
dc.date.published2010en_US
dc.description.abstractGlobal epidemic surveillance is an essential task for national biosecurity management and bioterrorism prevention. The main goal is to protect the public from major health threads. To perform this task effectively one requires reliable, timely and accurate medical information from a wide range of sources. Towards this goal, we present a framework for epidemiological analytics that can be used to extract and visualize infectious disease outbreaks from the variety of unstructured web sources automatically. More precisely, in this thesis, we consider several research tasks including document relevance classification, entity extraction and animal disease-related event recognition in the veterinary epidemiology domain. First, we crawl web sources and classify collected documents by topical relevance using supervised learning algorithms. Next, we propose a novel approach for automated ontology construction in the veterinary medicine domain. Our approach is based on semantic relationship discovery using syntactic patterns. We then apply our automatically-constructed ontology for the domain-specific entity extraction task. Moreover, we compare our ontology-based entity extraction results with an alternative sequence labeling approach. We introduce a sequence labeling method for the entity tagging that relies on syntactic feature extraction using a sliding window. Finally, we present our novel sentence-based event recognition approach that includes three main steps: entity extraction of animal diseases, species, locations, dates and the confirmation status n-grams; event-related sentence classification into two categories - suspected or confirmed; automated event tuple generation and aggregation. We show that our document relevance classification results as well as entity extraction and disease-related event recognition results are significantly better compared to the results reported by other animal disease surveillance systems.en_US
dc.description.advisorWilliam H. Hsuen_US
dc.description.degreeMaster of Scienceen_US
dc.description.departmentDepartment of Computing and Information Sciencesen_US
dc.description.levelMastersen_US
dc.description.sponsorshipNational Agriculture Biosecurity Centeren_US
dc.identifier.urihttp://hdl.handle.net/2097/4593
dc.language.isoen_USen_US
dc.publisherKansas State Universityen
dc.subjectentity extractionen_US
dc.subjectevent recognition and classificationen_US
dc.subjectweb miningen_US
dc.subjectdocument classificationen_US
dc.subjectnamed entity recognitionen_US
dc.subject.umiComputer Science (0984)en_US
dc.titleEntity extraction, animal disease-related event recognition and classification from weben_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
SvitlanaVolkova2010.pdf
Size:
4.05 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.69 KB
Format:
Item-specific license agreed upon to submission
Description: