The VLDB Journal

, 18:989 | Cite as

Representing uncertain data: models, properties, and algorithms

  • Anish Das Sarma
  • Omar Benjelloun
  • Alon Halevy
  • Shubha Nabar
  • Jennifer Widom
Special Issue Paper

Abstract

In general terms, an uncertain relation encodes a set of possible certain relations. There are many ways to represent uncertainty, ranging from alternative values for attributes to rich constraint languages. Among the possible models for uncertain data, there is a tension between simple and intuitive models, which tend to be incomplete, and complete models, which tend to be nonintuitive and more complex than necessary for many applications. We present a space of models for representing uncertain data based on a variety of uncertainty constructs and tuple-existence constraints. We explore a number of properties and results for these models. We study completeness of the models, as well as closure under relational operations, and we give results relating closure and completeness. We then examine whether different models guarantee unique representations of uncertain data, and for those models that do not, we provide complexity results and algorithms for testing equivalence of representations. The next problem we consider is that of minimizing the size of representation of models, showing that minimizing the number of tuples also minimizes the size of constraints. We show that minimization is intractable in general and study the more restricted problem of maintaining minimality incrementally when performing operations. Finally, we present several results on the problem of approximating uncertain data in an insufficiently expressive model.

Keywords

Uncertain data Data modeling Uncertainty 

References

  1. 1.
    Christmas Bird Count Homepage. http://www.audobon.org/bird/cbc/
  2. 2.
    Abiteboul S., Hull R., Vianu V.: Foundations of Databases. Addison-Wesley, Reading (1995)MATHGoogle Scholar
  3. 3.
    Abiteboul, S., Kanellakis, P., Grahne, G.: On the representation and querying of sets of possible Worlds. Theor. Comput. Sci. 78(1) (1991)Google Scholar
  4. 4.
    Agarwal, S., Keller, A.M., Wiederhold, G., Saraswat, K.: Flexible relation: an approach for integrating data from multiple, possibly inconsistent databases. In: Proceedings of ICDE (1995)Google Scholar
  5. 5.
    Agrawal, S., Chaudhuri, S., Das, G., Gionis, A.: Automated ranking of database query results. In: Proceedings of CIDR (2003)Google Scholar
  6. 6.
    Antova, L., Koch, C., Olteanu, D.: MayBMS: managing incomplete information with probabilistic World-set decompositions. In: Proceedings of ICDE (2007)Google Scholar
  7. 7.
    Antova, L., Koch, C., Olteanu, D.: World-set decompositions: expressiveness and efficient algorithms. In: Proceedings of ICDT (2007)Google Scholar
  8. 8.
    Arenas, M., Bertossi, L., Chomicki, J.: Answer sets for consistent query answering in inconsistent databases. TPLP 3(4) (2003)Google Scholar
  9. 9.
    Arenas, M., Bertossi, L.E., Chomicki, J.: Consistent query answers in inconsistent databases. In: Proceedings of ACM PODS (1999)Google Scholar
  10. 10.
    Barbará, D., Garcia-Molina, H, Porter, D.: The management of probabilistic data. TKDE 4(5) (1992)Google Scholar
  11. 11.
    Barga, R.S., Pu, C.: Accessing imprecise data: an approach based on intervals. IEEE Data Eng. Bull. 16(2) (1993)Google Scholar
  12. 12.
    Benjelloun, O., Das Sarma, A., Halevy, A., Widom, J.: ULDBs: databases with uncertainty and lineage. In: Proceedings of VLDB (2006)Google Scholar
  13. 13.
    Boulos, J., Dalvi, N., Mandhani, B., Mathur, S., Re C., Suciu, D.: MYSTIQ: a system for finding more answers by using probabilities. In: Proceedings of ACM SIGMOD (2005)Google Scholar
  14. 14.
    Bry, F.: Query answering in information systems with integrity constraints. In: Proceedings of the IFIP TC11 Working Group 11.5, First Working Conference on Integrity and Internal Control in Information Systems (1997)Google Scholar
  15. 15.
    Buckles, B.P., Petry, F.E.: A fuzzy model for relational databases. Int. J. Fuzzy Sets Syst. 7 (1982)Google Scholar
  16. 16.
    Burdick D., Deshpande P.M., Jayram T.S., Ramakrishnan R., Vaithyanathan S.: OLAP over uncertain and imprecise data. J. VLDB 16(1), 123–144 (2007)CrossRefGoogle Scholar
  17. 17.
    Cali, A., Lembo, D., Rosati, R.: On the decidability and complexity of query answering over inconsistent and incomplete databases. In: Proceedings of ACM PODS (2003)Google Scholar
  18. 18.
    Cavallo, R., Pittarelli, M.: The theory of probabilistic databases. In: Proceedings of VLDB (1987)Google Scholar
  19. 19.
    Cheng, R., Singh, S., Prabhakar, S.: U-DBMS: a database system for managing constantly-evolving data. In: Proceedings of VLDB (2005)Google Scholar
  20. 20.
    Chomicki, J., Marcinkowski, J.: Minimal-change integrity maintenance using tuple deletionsGoogle Scholar
  21. 21.
    Codd, E.F.: Extending the database relational model to capture more meaning. ACM Trans. Database Syst. 4(4) (1979)Google Scholar
  22. 22.
    Dalvi, N., Miklau, G., Suciu, D.: Asymptotic conditional probabilities for conjunctive queries. In: Proceedings of ICDT (2005)Google Scholar
  23. 23.
    Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. In: Proceedings of VLDB (2004)Google Scholar
  24. 24.
    Dalvi, N., Suciu, D.: Answering queries from statistics and probabilistic views. In: Proceedings of VLDB (2005)Google Scholar
  25. 25.
    Das Sarma, A., Benjelloun, O., Halevy, A., Widom, J.: Working models for uncertain data. In: Proceedings of ICDE (2006)Google Scholar
  26. 26.
    Das Sarma, A., Nabar, S., Widom, J.: Representing uncertain data: uniqueness, equivalence, minimization, and approximation. Technical report, Stanford InfoLab (2005). http://dbpubs.stanford.edu/pub/2005-38
  27. 27.
    DeMichiel, L.G.: Resolving database incompatibility: an approach to performing relational operations over mismatched domains. IEEE Trans. Knowl. Data Eng. 1(4) (1989)Google Scholar
  28. 28.
    Dung, P.M.: Integrating data from possibly inconsistent databases. In: COOPIS ’96: Proceedings of the First IFCIS International Conference on Cooperative Information Systems (1996)Google Scholar
  29. 29.
    Friedman, N., Getoor, L., Koller, D., Pfeffer, A.: Learning probabilistic relational models. In: Proceedings of IJCAI (1999)Google Scholar
  30. 30.
    Fuhr, N.: A probabilistic framework for vague queries and imprecise information in databases. In: Proceedings of VLDB (1990)Google Scholar
  31. 31.
    Fuhr, N., Rölleke, T.: A probabilistic NF2 relational algebra for imprecision in databases. Unpublished Manuscript (1997)Google Scholar
  32. 32.
    Fuhr, N., Rölleke T.: A probabilistic relational algebra for the integration of information retrieval and database systems. ACM TOIS 14(1) (1997)Google Scholar
  33. 33.
    Garey M.R., Johnson D.S.: Computers and Intractability. W.H. Freeman, San Francisco (1979)MATHGoogle Scholar
  34. 34.
    Grahne, G.: Dependency satisfaction in databases with incomplete information. In: Proceedings of VLDB (1984)Google Scholar
  35. 35.
    Grahne, G.: Horn tables—an efficient tool for handling incom- plete information in databases. In: Proceedings of ACM PODS (1989)Google Scholar
  36. 36.
    Greco, G., Greco, S., Zumpano, E.: A logical framework for querying and repairing inconsistent databases. IEEE Trans. Knowl. Data Eng. 15(6)Google Scholar
  37. 37.
    Green, T.J., Tannen, V.: Models for incomplete and probabilistic information. In: Proceedings of IIDB Workshop (2006)Google Scholar
  38. 38.
    Imielinski, T., Lipski, W.: Incomplete information in relational databases. J. ACM 31(4) (1984)Google Scholar
  39. 39.
    Imielinski, T., Naqvi, S., Vadaparty, K.: Incomplete objects—data model for design and planning applications. In: Proceedings of ACM SIGMOD (1991)Google Scholar
  40. 40.
    Jampani, R., Perez, L., Wu, M., Xu, F., Jermaine C., Haas, P.J.: Mcdb: A monte carlo approach to managing uncertain data. In: Proceedings of ACM SIGMOD (2008)Google Scholar
  41. 41.
    Karnaugh, M.: The map method for synthesis of combinational logic circuits. Trans. AIEE. pt I (1953)Google Scholar
  42. 42.
    Kautz, H., Selman, B.: Knowledge compilation and theory approximation. J. ACM (1996)Google Scholar
  43. 43.
    Lakshmanan L.V.S., Leone N., Ross R., Subrahmanian V.S.: ProbView: a flexible probabilistic database system. ACM TODS bf 22(3) (1997)Google Scholar
  44. 44.
    Lee, S.K.: An extended relational database model for uncertain and imprecise information. In: Proceedings of VLDB (1992)Google Scholar
  45. 45.
    Libkin, L., Wong, L.: Semantic representations and query languages for or-sets. In: Proceedings of ACM PODS (1993)Google Scholar
  46. 46.
    Liu, K., Sunderraman, R.: Indefinite and maybe information in relational databases. ACM TODS (1990)Google Scholar
  47. 47.
    McCluskey, E.J.: Minimization of boolean functions. Bell Syst. Tech. J. (1956)Google Scholar
  48. 48.
    Motro, A.: Management of uncertainty in database systems. Modern database systems: the object model, interoperability, and beyond (1994)Google Scholar
  49. 49.
    Paschos, V.Th.: Polynomial approximation and graph-coloring. Computing 70(1) (2003)Google Scholar
  50. 50.
    Pearl J.: Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, Menlo Park (1988)Google Scholar
  51. 51.
    Purdy, W.: A logic for natural language. J. Formal Logic 32(1) (1991)Google Scholar
  52. 52.
    Quine, W.: The problem of simplifying truth functions. Am. Math. Monthly 59(1) (1952)Google Scholar
  53. 53.
    Re, C., Suciu, D.: Materialized views in probabilistic databases for information exchange and query optimization. In: Proceedings of VLDB (2007)Google Scholar
  54. 54.
    Sanghai, S., Domingos, P., Weld, D.: Dynamic probabilistic relational models. In: Proceedings of IJCAI (2003)Google Scholar
  55. 55.
    Schmidt R.A.: Relational grammars for knowledge representation. In: Böttner, M., Thümmel, W. (eds) Variable-Free Semantics. Artikulation und Sprache, vol. 3, pp. 162–180. Secolo Verlag, Osnabrück (2000)Google Scholar
  56. 56.
    Sen, P., Deshpande, A.: Representing and querying correlated tuples in probabilistic databases. In: Proceedings of ICDE (2007)Google Scholar
  57. 57.
    Theobald, A. Weikum, G.: The XXL search engine: ranked retrieval of XML data using indexes and ontologies. In: Proceedings of ACM SIGMOD (2002)Google Scholar
  58. 58.
    Vardi, M.Y.: Querying logical databases. In: Proceedings of ACM PODS (1985)Google Scholar
  59. 59.
    Wang, D.Z., Michelakis, E., Garofalakis, M., Hellerstein, J.M.: Bayesstore: managing large, uncertain data repositories with probabilistic graphical models. In: Proceedings of VLDB (2008)Google Scholar
  60. 60.
    Widom, J.: Trio: a system for integrated management of data, accuracy, and lineage. In: Proceedings of CIDR (2005)Google Scholar
  61. 61.
    Wijsen, J.: Condensed representation of database repairs for consistent query answering. In: Proceedings of ICDT (2003)Google Scholar

Copyright information

© Springer-Verlag 2009

Authors and Affiliations

  • Anish Das Sarma
    • 1
  • Omar Benjelloun
    • 2
  • Alon Halevy
    • 2
  • Shubha Nabar
    • 3
  • Jennifer Widom
    • 1
  1. 1.Stanford UniversityStanfordUSA
  2. 2.Google Inc.Mountain ViewUSA
  3. 3.Microsoft CorpRedmondUSA

Personalised recommendations