Advertisement

The VLDB Journal

, 18:1021 | Cite as

\({10^{(10^{6})}}\) worlds and beyond: efficient representation and processing of incomplete information

  • Lyublena Antova
  • Christoph Koch
  • Dan Olteanu
Special Issue Paper

Abstract

We present a decomposition-based approach to managing probabilistic information. We introduce world-set decompositions (WSDs), a space-efficient and complete representation system for finite sets of worlds. We study the problem of efficiently evaluating relational algebra queries on world-sets represented by WSDs. We also evaluate our technique experimentally in a large census data scenario and show that it is both scalable and efficient.

Keywords

Incomplete information Uncertain and probabilistic databases Query processing 

References

  1. 1.
    Abiteboul S., Hull R., Vianu V.: Foundations of Databases. Addison-Wesley, Reading (1995)MATHGoogle Scholar
  2. 2.
    Aho A.V., Beeri C., Ullman J.D.: The theory of joins in relational databases. ACM Trans. Database Syst. 4(3), 297–314 (1979)CrossRefGoogle Scholar
  3. 3.
    Andritsos, P., Fuxman, A., Miller, R.J.: Clean answers over dirty databases: a probabilistic approach. In: Proceedings of ICDE (2006)Google Scholar
  4. 4.
    Antova, L., Jansen, T., Koch, C., Olteanu, D.: Fast and simple relational processing of uncertain data. In: Proceedings of ICDE (2008)Google Scholar
  5. 5.
    Antova, L., Koch, C., Olteanu, D.: \({{\mathbf{10^{10^6} }}}\) worlds and beyond: efficient representation and processing of incomplete information. Technical Report cs.DB/0606075, ACM CORR, v1 (2006)Google Scholar
  6. 6.
    Antova, L., Koch, C., Olteanu, D.: \({{\mathbf{10^{10^6} }}}\) worlds and beyond: efficient representation and processing of incomplete information. In: Proceedings of ICDE (2007)Google Scholar
  7. 7.
    Antova, L., Koch, C., Olteanu, D.: From complete to incomplete information and back. In: Proceedings of SIGMOD (2007)Google Scholar
  8. 8.
    Antova, L., Koch, C., Olteanu, D.: World-set decompositions: Expressiveness and efficient algorithms. In: Proceedings of ICDT (2007)Google Scholar
  9. 9.
    Arenas, M., Bertossi, L.E., Chomicki, J.: Consistent query answers in inconsistent databases. In: Proceedings of PODS, pp. 68–79 (1999)Google Scholar
  10. 10.
    Benjelloun, O., Sarma, A.D., Halevy, A., Widom, J.: ULDBs: databases with uncertainty and lineage. In: Proceedings of VLDB (2006)Google Scholar
  11. 11.
    Bohannon, P., Fan, W., Flaster, M., Rastogi, R.: A cost-based model and effective heuristic for repairing constraints by value modification. In: Proceedings of SIGMOD (2005)Google Scholar
  12. 12.
    Calí, A., Lembo, D., Rosati, R.: On the decidability and complexity of query answering over inconsistent and incomplete databases. In: Proceedings of PODS, pp. 260–271 (2003)Google Scholar
  13. 13.
    Cheng, R., Singh, S., Prabhakar, S.: U-DBMS: a database system for managing constantly-evolving data. In: Proceedings of VLDB, pp. 1271–1274 (2005)Google Scholar
  14. 14.
    Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. In: Proceedings of VLDB, pp. 864–875 (2004)Google Scholar
  15. 15.
    Galhardas, H., Florescu, D., Shasha, D., Simon, E.: AJAX: an extensible data cleaning tool. In: Proceedings of SIGMOD (2000)Google Scholar
  16. 16.
    Götz, M., Koch, C.: A compositional framework for complex queries over uncertain data. In: Proceedings of ICDT (2009)Google Scholar
  17. 17.
    Grahne, G.: Dependency satisfaction in databases with incomplete information. In: Proceedings of VLDB, pp. 37–45 (1984)Google Scholar
  18. 18.
    Gupta, R., Sarawagi, S.: Creating probabilistic databases from information extraction models. In: Proceedings of VLDB (2006)Google Scholar
  19. 19.
    Imielinski T., Lipski W.: Incomplete information in relational databases. J. ACM 31, 761–791 (1984)MATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    Imielinski, T., Naqvi, S., Vadaparty, K.: Incomplete objects—a data model for design and planning applications. In: Proceedings of SIGMOD, pp. 288–297 (1991)Google Scholar
  21. 21.
    Koch, C.: Approximating predicates and expressive queries on probabilistic databases. In: Proceedings of PODS (2008)Google Scholar
  22. 22.
    Koch, C., Olteanu, D.: Conditioning probabilistic databases. In: Proceedings of VLDB (2008)Google Scholar
  23. 23.
    Maier D., Mendelzon A.O., Sagiv Y.: Testing implications of data dependencies. ACM Trans. Database Syst. 4(4), 455–469 (1979)CrossRefGoogle Scholar
  24. 24.
    Rahm, E., Do, H.H.: Data cleaning: problems and current approaches. IEEE Data Engineering Bulletin (2000)Google Scholar
  25. 25.
    Raman, V., Hellerstein, J.M.: Potter’s wheel: an interactive data cleaning system. In: Proceedings of VLDB (2001)Google Scholar
  26. 26.
    Ruggles, S., Sobek, M., Alexander, T., Fitch, C.A., Goeken, R., Hall, P.K., King, M., Ronnander, C.: Integrated public use microdata series: V3.0. http://www.ipums.org (2004)
  27. 27.
    Sarma, A.D., Benjelloun, O., Halevy, A., Widom, J.: Working models for uncertain data. In: Proceedings of ICDE (2006)Google Scholar
  28. 28.
    Sen, P., Deshpande, A.: Representing and querying correlated tuples in probabilistic databases. In: Proceedings of ICDE (2007)Google Scholar
  29. 29.
    Singh, S., Mayfield, C., Shah, R., Prabhakar, S., Hambrusch, S., Neville, J., Cheng, R.: Database support for probabilistic attributes and tuples. In: Proceedings of ICDE (2008)Google Scholar

Copyright information

© Springer-Verlag 2009

Authors and Affiliations

  1. 1.Cornell UniversityIthacaUSA
  2. 2.Oxford UniversityOxfordUK

Personalised recommendations