K-State Research Exchange

K-State Research Exchange >
K-State Electronic Theses, Dissertations, and Reports >
K-State Electronic Theses, Dissertations, and Reports: 2004 - >

Please use this identifier to cite or link to this item: http://hdl.handle.net/2097/4593

Title: Entity extraction, animal disease-related event recognition and classification from web
Authors: Volkova, Svitlana
Publication Date: 2010
Graduation Month: August
Type: Thesis
Degree: Master of Science
Department: Department of Computing and Information Sciences
Major Professor: William H. Hsu
Keywords: entity extraction
event recognition and classification
web mining
document classification
named entity recognition
Abstract: Global epidemic surveillance is an essential task for national biosecurity management and bioterrorism prevention. The main goal is to protect the public from major health threads. To perform this task effectively one requires reliable, timely and accurate medical information from a wide range of sources. Towards this goal, we present a framework for epidemiological analytics that can be used to extract and visualize infectious disease outbreaks from the variety of unstructured web sources automatically. More precisely, in this thesis, we consider several research tasks including document relevance classification, entity extraction and animal disease-related event recognition in the veterinary epidemiology domain. First, we crawl web sources and classify collected documents by topical relevance using supervised learning algorithms. Next, we propose a novel approach for automated ontology construction in the veterinary medicine domain. Our approach is based on semantic relationship discovery using syntactic patterns. We then apply our automatically-constructed ontology for the domain-specific entity extraction task. Moreover, we compare our ontology-based entity extraction results with an alternative sequence labeling approach. We introduce a sequence labeling method for the entity tagging that relies on syntactic feature extraction using a sliding window. Finally, we present our novel sentence-based event recognition approach that includes three main steps: entity extraction of animal diseases, species, locations, dates and the confirmation status n-grams; event-related sentence classification into two categories - suspected or confirmed; automated event tuple generation and aggregation. We show that our document relevance classification results as well as entity extraction and disease-related event recognition results are significantly better compared to the results reported by other animal disease surveillance systems.
Appears in Collections:K-State Electronic Theses, Dissertations, and Reports: 2004 -

Files in This Item:

File Description SizeFormat
SvitlanaVolkova2010.pdf4.15 MBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Copyright © 2002-2009  Kansas State University    |   K-State Libraries - Feedback