An Open World Database for Automatic Data Debiasing
Open world database management systems assume tuples not in the database still exist and are becoming an increasingly important area of research. Themis is the first open world database that automatically rebalances arbitrarily biased samples to approximately answer queries as if they were issued over the entire population. We leverage apriori population aggregate information to develop and combine two different approaches for automatic debiasing: sample reweighting and Bayesian network probabilistic modeling. We build a prototype of Themis and demonstrate that Themis achieves higher query accuracy than a baseline uniform reweighting, an alternative sample reweighting technique, and a variety of Bayesian network probabilistic models while maintaining interactive query response times. We also show that Themis is robust to differences in the support between the sample and population.
Contact Laurel Orr.
This work is supported by the National Science Foundation through NSF grants AITF 1535565 and III-1614738 and through a gift from Intel.