Automatic Schema Matching

Semantic mappings are expressions that relate data in different data sources. They play a crucial role in any data sharing architecture. Since the schemas of the data sources in such architectures are independently designed, it is inevitable that there exist differences between them. These differences can range from differences in the naming of elements, choice of elements, different normalizations, different data models, etc. Mappings between the different schemas provide the glue that binds the schemas together thus enabling data sharing. The construction of these mappings is hence huge roadblock in the widespread adoption of these data sharing architectures. Schema matching is the first step towards the construction of mappings. A match between two schemas identifies the elements in the two schemas that are similar to each other. These matches can be used as building blocks to construct complete mappings.

Our solution to the schema matching problem is characterized by our ability to combine multiple pieces of evidence in an extensible framework and our ability to learn from past experience. We believe that the expertise of a human expert, who might be employed to a schema matching task, lies in his ability to put together different types of evidences in the schema and his ability to extrapolate patterns in schemas from having seen numerous related schemas in the part (and having performed mappings between them). Our current focus is on techniques that try to answer the following question: given a collection (or a corpus) or related schemas and mappings, can we improve in our ability to match two new schemas that are not part of our corpus?  

Project Members