Synonyms
Provenance in probabilistic databases
Definition
Lineage, also called Boolean provenance, event expression, or why-provenance, is a form of provenance or origin of the answer(s) to a query executed on a database. Lineage is expressed as a Boolean formula with variables assigned to the tuples in the database, where joint usage of the tuples (by the database join operation) is captured by Boolean conjunction (AND, ∧) and alternative usage (projection or union) by Boolean disjunction (OR, ∨). Uncertain data is typically expressed in the form of a probabilistic database, which is a compact representation of a probability distribution over a set of deterministic database instances (called possible worlds). When an input query is evaluated on such a probabilistic database, instead of a deterministic set of tuples representing the answer, the output is a distribution on possible answers for the possible worlds. The query evaluation problem on uncertain data aims to compute this...
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsRecommended Reading
Afrati FN, Vasilakopoulos A. Query containment for databases with uncertainty and lineage. In: Proceedings of the 4th International VLDB Workshop on Management of Uncertain Data; 2010. p. 67–81.
Aggarwal CC. Managing and mining uncertain data. New York: Springer Publishing Company, Incorporated; 2009.
Akers SB. Binary decision diagrams. IEEE Trans. Comput. 1978;27(6):509–16.
Amarilli A, Bourhis P, Senellart P. Tractable lineages on treelike instances: limits and extensions. In: Proceedings of the 35th ACM Symposium on Principles of Database Systems; 2016. p. 355–370.
Beame P, Li J, Roy S, Suciu D. Exact model counting of query expressions: limitations of propositional methods. ACM Trans Database Syst. 2017;42(1):1:1–1:46.
Beame P, Van den Broeck G, Gribkoff E, Suciu D. Symmetric weighted first-order model counting. In: Proceedings of the 34th ACM Symposium on Principles of Database Systems; 2015. p. 313–28.
Benjelloun O, Sarma AD, Hayworth C, Widom J. An introduction to ULDBs and the Trio system. IEEE Data Eng Bull. 2006;29(1):5–16.
Blaustein BT, Seligman L, Morse M, Allen MD, Rosenthal A. PLUS: Synthesizing privacy, lineage, uncertainty and security. In: Proceedings of the Workshops of 24th International Conference on Data Engineering; 2008. p. 242–5.
Bryant RE. Graph-based algorithms for Boolean function manipulation. IEEE Trans Comput 1986;35(8):677–91.
Buneman P, Khanna S, Tan WC. Why and where: a characterization of data provenance. In: Proceedings of the 8th International Conference on Database Theory; 2001. p. 316–30.
Cui Y, Widom J, Wiener JL. Tracing the lineage of view data in a warehousing environment. ACM Trans Database Syst. 2000;25(2):179–227.
Dalvi, N, Suciu, D. Management of probabilistic data: foundations and challenges. In: Proceedings of the 26th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 2007. p. 1–12.
Dalvi N, Suciu D. The dichotomy of probabilistic inference for unions of conjunctive queries. J ACM. 2013;59(6):30:1–87.
Dalvi NN, Suciu D. Efficient query evaluation on probabilistic databases. In: Proceedings of the 30th International Conference on Very Large Data Bases; 2004. p. 864–75.
Fink R, Olteanu D. On the optimal approximation of queries using tractable propositional languages. In: Proceedings of the 14th International Conference on Database Theory; 2011. p. 174–185.
Fink R, Olteanu D. Dichotomies for queries with negation in probabilistic databases. ACM Trans Database Syst. 2016;41(1):4.
Fink R, Han L, Olteanu D. Aggregation in probabilistic databases via knowledge compilation. Proc VLDB Endow. 2012;5(5):490–501.
Fuhr N, Rölleke T. A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans Inf Syst 1997;15(1):32–66.
Green TJ. Containment of conjunctive queries on annotated relations. In: Proceedings of the 12th International Conference on Database Theory; 2009. p. 296–309.
Green TJ, Tannen V. Models for incomplete and probabilistic information. IEEE Data Eng Bull. 2006;29(1):17–24.
Green TJ, Karvounarakis G, Tannen V. Provenance semirings. In: Proceedings of the 26th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 2007. p. 31–40.
Gurvich VA. Criteria for repetition-freeness of functions in the algebra of logic. Soviet Math Dokl. 1991;43(3):721–6.
Huang J, Darwiche A. The language of search. J Artif Intel Res. 2007;29:191–219.
Imielinski T, Lipski W Jr. Incomplete information in relational databases. J ACM. 1984;31(4):761–91.
Jha AK, Suciu D. Knowledge compilation meets database theory: compiling queries to decision diagrams. In: Proceedings of the 14th International Conference on Database Theory; 2011. p. 162–73.
Kanagal B, Deshpande A. Lineage processing over correlated probabilistic databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2010. p. 675–686.
Karp RM, Luby M. Monte-Carlo algorithms for enumeration and reliability problems. In: Proceedings of the 24th Annual Symposium on Foundations of Computer Science; 1983. p. 56–64.
Khanna S, Roy S, Tannen V. Queries with difference on probabilistic databases. Proc VLDB Endow. 2011;4(11):1051–62.
Masek WJ. A fast algorithm for the string editing problem and decision graph complexity. Master’s thesis, MIT; 1976.
Meiser T, Dylla M, Theobald M. Interactive reasoning in uncertain RDF knowledge bases. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management; 2011. p. 2557–2560.
Newman I. On read-once Boolean functions. In: Paterson MS, editor. Boolean function complexity. Cambridge/New York: Cambridge University Press; 1992. p. 25–34.
Olteanu D, van Schaik SJ. ENFrame: a framework for processing probabilistic data. ACM Trans Database Syst. 2016;41(1):3:1–3:44.
Pearl J. Probabilistic reasoning in intelligent systems: networks of plausible inference. San Francisco: Morgan Kaufmann Publishers Inc; 1988.
Roy S, Perduca V, Tannen V. Faster query answering in probabilistic databases using read-once functions. In: Proceedings of the 14th International Conference on Database Theory; 2011. p. 232–43.
Sen P, Deshpande A, Getoor L. Read-once functions and query evaluation in probabilistic databases. Proc VLDB Endow. 2010;3(1):1068–79.
Suciu D, Olteanu D, Christopher R, Koch C. Probabilistic databases. 1st ed. San Rafael: Morgan & Claypool Publishers; 2011.
Valiant LG. The complexity of enumeration and reliability problems. SIAM J Comput. 1979;8(3):410–21.
Wegener I. Branching programs and binary decision diagrams: theory and applications. Philadelphia: SIAM; 2000. ISBN:0-89871-458-3.
Zimányi E. Query evaluation in probabilistic relational databases. Theor Comput Sci. 1997;171(1–2): 179–219.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Roy, S. (2018). Uncertain Data Lineage. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_80759
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_80759
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering