Automatic Schema Matching

Semantic mappings are expressions that relate data in different data sources. They play a crucial role in any data sharing architecture. Since the schemas of the data sources in such architectures are independently designed, it is inevitable that there exist differences between them. These differences can range from differences in the naming of elements, choice of elements, different normalizations, different data models, etc. Mappings between the different schemas provide the glue that binds the schemas together thus enabling data sharing. The construction of these mappings is hence huge roadblock in the widespread adoption of these data sharing architectures. Schema matching is the first step towards the construction of mappings. A match between two schemas identifies the elements in the two schemas that are similar to each other. These matches can be used as building blocks to construct complete mappings.

Our solution to the schema matching problem is characterized by our ability to combine multiple pieces of evidence in an extensible framework and our ability to learn from past experience. We believe that the expertise of a human expert, who might be employed to a schema matching task, lies in his ability to put together different types of evidences in the schema and his ability to extrapolate patterns in schemas from having seen numerous related schemas in the part (and having performed mappings between them). Our current focus is on techniques that try to answer the following question: given a collection (or a corpus) or related schemas and mappings, can we improve in our ability to match two new schemas that are not part of our corpus?

Project Members

Publications

Corpus-based Schema Matching, Jayant Madhavan, Philip A. Bernstein, Kuang Chen, Alon Halevy, and Pradeep Shenoy, at the Workshop on Information Integration on the Web at the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI'2003), Acapulco, Mexico.
Learning to Map between Ontologies on the Semantic Web, AnHai Doan, Jayant Madhavan, Pedro Domingos, and Alon Halevy, at the Eleventh International World Wide Web Conference (WWW'2002), Hawaii, USA. Extended Journal Version.
Generic Schema Matching with Cupid, Jayant Madhavan, Philip A. Bernstein, and Erhard Rahm, at the Twenty Seventh International Conference on Very Large Databases (VLDB'2001), Roma, Italy. Extended Technical Report.
Reconciling Schemas of Disparate Data Sources: A Machine Learning Approach, AnHai Doan, Pedro Domingos, and Alon Halevy. Proceedings of the ACM SIGMOD Conference on Management of Data (SIGMOD’2001), Santa Barbara, USA.