Representing uncertain data: models, properties, and algorithms
- 205 Downloads
- 17 Citations
Abstract
In general terms, an uncertain relation encodes a set of possible certain relations. There are many ways to represent uncertainty, ranging from alternative values for attributes to rich constraint languages. Among the possible models for uncertain data, there is a tension between simple and intuitive models, which tend to be incomplete, and complete models, which tend to be nonintuitive and more complex than necessary for many applications. We present a space of models for representing uncertain data based on a variety of uncertainty constructs and tuple-existence constraints. We explore a number of properties and results for these models. We study completeness of the models, as well as closure under relational operations, and we give results relating closure and completeness. We then examine whether different models guarantee unique representations of uncertain data, and for those models that do not, we provide complexity results and algorithms for testing equivalence of representations. The next problem we consider is that of minimizing the size of representation of models, showing that minimizing the number of tuples also minimizes the size of constraints. We show that minimization is intractable in general and study the more restricted problem of maintaining minimality incrementally when performing operations. Finally, we present several results on the problem of approximating uncertain data in an insufficiently expressive model.
Keywords
Uncertain data Data modeling UncertaintyReferences
- 1.Christmas Bird Count Homepage. http://www.audobon.org/bird/cbc/
- 2.Abiteboul S., Hull R., Vianu V.: Foundations of Databases. Addison-Wesley, Reading (1995)MATHGoogle Scholar
- 3.Abiteboul, S., Kanellakis, P., Grahne, G.: On the representation and querying of sets of possible Worlds. Theor. Comput. Sci. 78(1) (1991)Google Scholar
- 4.Agarwal, S., Keller, A.M., Wiederhold, G., Saraswat, K.: Flexible relation: an approach for integrating data from multiple, possibly inconsistent databases. In: Proceedings of ICDE (1995)Google Scholar
- 5.Agrawal, S., Chaudhuri, S., Das, G., Gionis, A.: Automated ranking of database query results. In: Proceedings of CIDR (2003)Google Scholar
- 6.Antova, L., Koch, C., Olteanu, D.: MayBMS: managing incomplete information with probabilistic World-set decompositions. In: Proceedings of ICDE (2007)Google Scholar
- 7.Antova, L., Koch, C., Olteanu, D.: World-set decompositions: expressiveness and efficient algorithms. In: Proceedings of ICDT (2007)Google Scholar
- 8.Arenas, M., Bertossi, L., Chomicki, J.: Answer sets for consistent query answering in inconsistent databases. TPLP 3(4) (2003)Google Scholar
- 9.Arenas, M., Bertossi, L.E., Chomicki, J.: Consistent query answers in inconsistent databases. In: Proceedings of ACM PODS (1999)Google Scholar
- 10.Barbará, D., Garcia-Molina, H, Porter, D.: The management of probabilistic data. TKDE 4(5) (1992)Google Scholar
- 11.Barga, R.S., Pu, C.: Accessing imprecise data: an approach based on intervals. IEEE Data Eng. Bull. 16(2) (1993)Google Scholar
- 12.Benjelloun, O., Das Sarma, A., Halevy, A., Widom, J.: ULDBs: databases with uncertainty and lineage. In: Proceedings of VLDB (2006)Google Scholar
- 13.Boulos, J., Dalvi, N., Mandhani, B., Mathur, S., Re C., Suciu, D.: MYSTIQ: a system for finding more answers by using probabilities. In: Proceedings of ACM SIGMOD (2005)Google Scholar
- 14.Bry, F.: Query answering in information systems with integrity constraints. In: Proceedings of the IFIP TC11 Working Group 11.5, First Working Conference on Integrity and Internal Control in Information Systems (1997)Google Scholar
- 15.Buckles, B.P., Petry, F.E.: A fuzzy model for relational databases. Int. J. Fuzzy Sets Syst. 7 (1982)Google Scholar
- 16.Burdick D., Deshpande P.M., Jayram T.S., Ramakrishnan R., Vaithyanathan S.: OLAP over uncertain and imprecise data. J. VLDB 16(1), 123–144 (2007)CrossRefGoogle Scholar
- 17.Cali, A., Lembo, D., Rosati, R.: On the decidability and complexity of query answering over inconsistent and incomplete databases. In: Proceedings of ACM PODS (2003)Google Scholar
- 18.Cavallo, R., Pittarelli, M.: The theory of probabilistic databases. In: Proceedings of VLDB (1987)Google Scholar
- 19.Cheng, R., Singh, S., Prabhakar, S.: U-DBMS: a database system for managing constantly-evolving data. In: Proceedings of VLDB (2005)Google Scholar
- 20.Chomicki, J., Marcinkowski, J.: Minimal-change integrity maintenance using tuple deletionsGoogle Scholar
- 21.Codd, E.F.: Extending the database relational model to capture more meaning. ACM Trans. Database Syst. 4(4) (1979)Google Scholar
- 22.Dalvi, N., Miklau, G., Suciu, D.: Asymptotic conditional probabilities for conjunctive queries. In: Proceedings of ICDT (2005)Google Scholar
- 23.Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. In: Proceedings of VLDB (2004)Google Scholar
- 24.Dalvi, N., Suciu, D.: Answering queries from statistics and probabilistic views. In: Proceedings of VLDB (2005)Google Scholar
- 25.Das Sarma, A., Benjelloun, O., Halevy, A., Widom, J.: Working models for uncertain data. In: Proceedings of ICDE (2006)Google Scholar
- 26.Das Sarma, A., Nabar, S., Widom, J.: Representing uncertain data: uniqueness, equivalence, minimization, and approximation. Technical report, Stanford InfoLab (2005). http://dbpubs.stanford.edu/pub/2005-38
- 27.DeMichiel, L.G.: Resolving database incompatibility: an approach to performing relational operations over mismatched domains. IEEE Trans. Knowl. Data Eng. 1(4) (1989)Google Scholar
- 28.Dung, P.M.: Integrating data from possibly inconsistent databases. In: COOPIS ’96: Proceedings of the First IFCIS International Conference on Cooperative Information Systems (1996)Google Scholar
- 29.Friedman, N., Getoor, L., Koller, D., Pfeffer, A.: Learning probabilistic relational models. In: Proceedings of IJCAI (1999)Google Scholar
- 30.Fuhr, N.: A probabilistic framework for vague queries and imprecise information in databases. In: Proceedings of VLDB (1990)Google Scholar
- 31.Fuhr, N., Rölleke, T.: A probabilistic NF2 relational algebra for imprecision in databases. Unpublished Manuscript (1997)Google Scholar
- 32.Fuhr, N., Rölleke T.: A probabilistic relational algebra for the integration of information retrieval and database systems. ACM TOIS 14(1) (1997)Google Scholar
- 33.Garey M.R., Johnson D.S.: Computers and Intractability. W.H. Freeman, San Francisco (1979)MATHGoogle Scholar
- 34.Grahne, G.: Dependency satisfaction in databases with incomplete information. In: Proceedings of VLDB (1984)Google Scholar
- 35.Grahne, G.: Horn tables—an efficient tool for handling incom- plete information in databases. In: Proceedings of ACM PODS (1989)Google Scholar
- 36.Greco, G., Greco, S., Zumpano, E.: A logical framework for querying and repairing inconsistent databases. IEEE Trans. Knowl. Data Eng. 15(6)Google Scholar
- 37.Green, T.J., Tannen, V.: Models for incomplete and probabilistic information. In: Proceedings of IIDB Workshop (2006)Google Scholar
- 38.Imielinski, T., Lipski, W.: Incomplete information in relational databases. J. ACM 31(4) (1984)Google Scholar
- 39.Imielinski, T., Naqvi, S., Vadaparty, K.: Incomplete objects—data model for design and planning applications. In: Proceedings of ACM SIGMOD (1991)Google Scholar
- 40.Jampani, R., Perez, L., Wu, M., Xu, F., Jermaine C., Haas, P.J.: Mcdb: A monte carlo approach to managing uncertain data. In: Proceedings of ACM SIGMOD (2008)Google Scholar
- 41.Karnaugh, M.: The map method for synthesis of combinational logic circuits. Trans. AIEE. pt I (1953)Google Scholar
- 42.Kautz, H., Selman, B.: Knowledge compilation and theory approximation. J. ACM (1996)Google Scholar
- 43.Lakshmanan L.V.S., Leone N., Ross R., Subrahmanian V.S.: ProbView: a flexible probabilistic database system. ACM TODS bf 22(3) (1997)Google Scholar
- 44.Lee, S.K.: An extended relational database model for uncertain and imprecise information. In: Proceedings of VLDB (1992)Google Scholar
- 45.Libkin, L., Wong, L.: Semantic representations and query languages for or-sets. In: Proceedings of ACM PODS (1993)Google Scholar
- 46.Liu, K., Sunderraman, R.: Indefinite and maybe information in relational databases. ACM TODS (1990)Google Scholar
- 47.McCluskey, E.J.: Minimization of boolean functions. Bell Syst. Tech. J. (1956)Google Scholar
- 48.Motro, A.: Management of uncertainty in database systems. Modern database systems: the object model, interoperability, and beyond (1994)Google Scholar
- 49.Paschos, V.Th.: Polynomial approximation and graph-coloring. Computing 70(1) (2003)Google Scholar
- 50.Pearl J.: Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, Menlo Park (1988)Google Scholar
- 51.Purdy, W.: A logic for natural language. J. Formal Logic 32(1) (1991)Google Scholar
- 52.Quine, W.: The problem of simplifying truth functions. Am. Math. Monthly 59(1) (1952)Google Scholar
- 53.Re, C., Suciu, D.: Materialized views in probabilistic databases for information exchange and query optimization. In: Proceedings of VLDB (2007)Google Scholar
- 54.Sanghai, S., Domingos, P., Weld, D.: Dynamic probabilistic relational models. In: Proceedings of IJCAI (2003)Google Scholar
- 55.Schmidt R.A.: Relational grammars for knowledge representation. In: Böttner, M., Thümmel, W. (eds) Variable-Free Semantics. Artikulation und Sprache, vol. 3, pp. 162–180. Secolo Verlag, Osnabrück (2000)Google Scholar
- 56.Sen, P., Deshpande, A.: Representing and querying correlated tuples in probabilistic databases. In: Proceedings of ICDE (2007)Google Scholar
- 57.Theobald, A. Weikum, G.: The XXL search engine: ranked retrieval of XML data using indexes and ontologies. In: Proceedings of ACM SIGMOD (2002)Google Scholar
- 58.Vardi, M.Y.: Querying logical databases. In: Proceedings of ACM PODS (1985)Google Scholar
- 59.Wang, D.Z., Michelakis, E., Garofalakis, M., Hellerstein, J.M.: Bayesstore: managing large, uncertain data repositories with probabilistic graphical models. In: Proceedings of VLDB (2008)Google Scholar
- 60.Widom, J.: Trio: a system for integrated management of data, accuracy, and lineage. In: Proceedings of CIDR (2005)Google Scholar
- 61.Wijsen, J.: Condensed representation of database repairs for consistent query answering. In: Proceedings of ICDT (2003)Google Scholar