Querying semantically heterogeneous data sources using ontologies

Date

2008-12-19T14:34:13Z

Journal Title

Journal ISSN

Volume Title

Publisher

Kansas State University

Abstract

In recent years, we have witnessed a significant increase in the number, size and diversity of the available data sources in many application domains. Data sources in a particular domain are autonomously created and maintained, and therefore distributed and semantically heterogeneous. In this thesis, we focused on the problem of querying such semantically heterogeneous data sources from a user's perspective. We approach this problem by using the concepts of ontologies and mappings between ontologies. A system for answering queries in a transparent way to the user has been designed and implemented. The main components of this system are an ontology mapping algorithm that maps user ontologies to data source ontologies, and a query processing engine that maps user queries to queries that can be answered by the data sources in the system. We have shown that machine learning algorithms can also be incorporated in the system, thus making it possible to learn machine learning classifiers (in particular, generative models such as Naïve Bayes) from distributed, semantically heterogeneous data sources. Because many data sources today are relational in nature, in this work we have dealt specifically with relational data sources, as opposed to flat files, XML or object oriented data sources. However, our system can be easily extended to other types of data sources.

Description

Keywords

Ontologies, Querying, Semantically Heterogeneous, Relational Data Sources, Protege, Oracle

Graduation Month

December

Degree

Master of Science

Department

Department of Computing and Information Sciences

Major Professor

Doina Caragea

Date

2008

Type

Thesis

Citation