Querying semantically heterogeneous data sources using ontologies
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
In recent years, we have witnessed a significant increase in the number, size and diversity of the available data sources in many application domains. Data sources in a particular domain are autonomously created and maintained, and therefore distributed and semantically heterogeneous. In this thesis, we focused on the problem of querying such semantically heterogeneous data sources from a user's perspective. We approach this problem by using the concepts of ontologies and mappings between ontologies. A system for answering queries in a transparent way to the user has been designed and implemented. The main components of this system are an ontology mapping algorithm that maps user ontologies to data source ontologies, and a query processing engine that maps user queries to queries that can be answered by the data sources in the system. We have shown that machine learning algorithms can also be incorporated in the system, thus making it possible to learn machine learning classifiers (in particular, generative models such as Naïve Bayes) from distributed, semantically heterogeneous data sources. Because many data sources today are relational in nature, in this work we have dealt specifically with relational data sources, as opposed to flat files, XML or object oriented data sources. However, our system can be easily extended to other types of data sources.