Diploma Thesis
My Diploma Thesis (almost equal to a Master Thesis, except for some formalities) was entitled "Integrated Processing of Object-Relational and XML Databases with SQL:1999", and I have written it at the VSIS group (Distributed Systems and Information Systems) at the Department of Computer Science of the University of Hamburg. My first advisor was Prof. Dr. Norbert Ritter, my second advisor Doz. Dr. Martin Lehmann. This was a co-operative work with Natalia Hänikel and Research Assistant Iryna Kozlova. It is part of the SQXML Project.
- Diplomarbeit.pdf – my Diploma Thesis
- SQXML.ppt.zip – presentation slides
Abstract
"Database integration, also known as Enterprise Information Integration (EII), has become increasingly important
recently due to the success of XML and the need to store and manage XML data. The goal of database integration is
to provide transparent integrated access to multiple data sources, and it is therefore a key requirement of organizations,
which need to integrate data from multiple sources. In schema integration, which is one EII approach, the database systems
remain fully autonomous, and a middleware layer is introduced that presents a global view of the database schemata to the
user and performs query transformation to answer global queries by sending partial queries to the databases and combining
the results.
In this thesis a schema integration architecture for the two prevalent types of database management systems is proposed,
namely for object-relational SQL:1999 and native XML Schema database systems. In particular, the case of a global SQL:1999
schema is studied. The main challenge is to create the global schema despite of the heterogeneity between the original
schemata. In order to overcome the inherent conceptual differences between SQL:1999 and XML Schema, a new approach based
on the CommonWarehouse Metamodel standard (CWM) is proposed in this thesis. For matching the schemata, i.e., finding their
correspondences, and merging them into the global schema, state-of-the-art algorithms are further improved. Furthermore,
for eliminating modeling conflicts that may occur between the schemata due to design autonomy, possible conflicts are
studied and resolved. Finally, a query processing architecture is proposed for splitting queries on the global schema
into partial queries on the original database schemata and for integrating their results."