Abstract
In uncertain data management, lineages are often used for probability computation of result tuples. However, most of existing works focus on tuple level lineage, which results in imprecise data derivation. Besides, correlations among attributes cannot be captured. In this paper, for base tuples with multiple uncertain attributes, we define attribute level annotation to annotate each attribute. Utilizing these annotations to generate lineages of result tuples can realize more precise derivation. Simultaneously, they can be used for dependency graph construction. Utilizing dependency graph, we can represent not only constraints on schemas but also correlations among attributes. Combining the dependency graph and attribute level lineage, we can correctly compute probabilities of result tuples and precisely derivate data. In experiments, comparing lineage on tuple level and attribute level, it shows that our method has advantages on derivation precision and storage cost.
Similar content being viewed by others
References
Sarma A D, Theobald M, Widom J. Exploiting lineage for confidence computation in uncertain and probabilistic databases [C] // Proc 24th International Conference on Data Engineering. Washington D C: IEEE Computer Society Press, 2008: 1023–1032.
Dalvi N, Suciu D. Efficient query evaluation on probabilistic databases [J]. The VLDB Journal, 2007, 16(4): 523–544.
Benjelloun O, Sarma A D, Halevy A, et al. Databases with uncertainty and lineage [J]. The VLDB Journal, 2008, 17(2): 243–264.
Sen P, Deshpande A. Representing and querying correlated tuples in probabilistic databases [C] // Proc 23rd International Conference on Data Engineering. Washington D C: IEEE Computer Society Press, 2007: 596–605.
Huang J, Antova L, Koch C, et al. MayBMS: a probabilistic database management system [C] // Proc 36th ACM International Conference on Management of Data. New York: ACM Press, 2009: 1071–1074.
Singh S, Mayfield C, Shah R, et al. Database support for probabilistic attributes and tuples [C] // Proc 24th International Conference on Data Engineering. Washington D C: IEEE Computer Society Press, 2008: 1053–1061.
Zhou A Y, Jin C Q, Wang G R, et al. A survey on the management of uncertain data [J]. Chinese Journal of Computers, 2009, 32(1): 1–16(Ch).
Fuhr N, Rölleke T. A probabilistic relational algebra for the integration of information retrieval and database systems [J]. ACM Transactions on Information Systems, 1997, 15(1): 32–66.
Lakshmanan L V S, Leone N, Ross R, et al. Probview: A flexible probabilistic database system [J]. ACM Transactions on Database Systems, 1997, 22(3): 419–469.
Sarma A D, Benjelloun O, Halevy A, et al. Working models for uncertain data [C] // Proc 22nd International Conference on Data Engineering. Washington D C: IEEE Computer Society Press, 2006.
Wang L, Zhou G Y, Wang L W, et al. Attribute level lineage and probabilistic computation of uncertain data [J]. Journal of Software, 2014, 25(4): 863–879(Ch).
Kanagal B, Deshpande A. Lineage processing over correlated probabilistic databases [C] // Proc 37th ACM International Conference on Management of Data. New York: ACM Press, 2010: 675–686.
Duan L, Yue K, Jin C, et al. Tracing errors in probabilistic databases based on the Bayesian network [C] // Database Systems for Advanced Applications. New York: Springer -Verlag, 2015: 104–119.
Malki A, Benslimane D, Benslimane S M, et al. Data Services with uncertain and correlated semantics [J]. World Wide Web, 2016, 19(1): 157–175.
Xu C, Gu Y, Chen L, et al. Interval reverse nearest neighbor queries on uncertain data with markov correlations [C] // 2013 IEEE 29th International Conference on Data Engineering (ICDE). Washington D C: IEEE Press, 2013: 170–181.
Yue K, Wu H, Liu W, et al. Representing and processing lineages over uncertain data based on the Bayesian network [J]. Applied Soft Computing, 2015, 37: 345–362.
Koch C, Olteanu D. Conditioning probabilistic databases [J]. Proceedings of the VLDB Endowment, 2008, 1(1): 313–325.
Sen P, Deshpande A, Getoor L. Representing tuple and attribute uncertainty in probabilistic databases [C] // Proc 7th IEEE International Conference on Data Mining Workshops. Washington D C: IEEE Computer Society Press, 2007: 507–512.
Getoor L, Taskar B, Koller D. Selectivity estimation using probabilistic models [J]. Special Interest Group on Management of Data Record, 2001, 30(2): 461–472.
Transaction Processing Council (TPC). TPC benchmark H: Standard specification [EB/OL].[2015-10-20]. http://www.tpc.org/tpch.
Author information
Authors and Affiliations
Corresponding author
Additional information
Foundation item: Supported by the Key Program of National Natural Science Foundation of China (61232002); The National Natural Science Foundation of China (61202033); The Program for Innovative Research Team of Wuhan (2014070504020237); The Ph. D. Seed Foundation of Wuhan University (2012211020207); The Science and Technology Support Program of Hubei Province (2015BAA127)
Rights and permissions
About this article
Cite this article
Wang, L., Wang, L. & Peng, Z. Attribute level lineage in uncertain data with dependencies. Wuhan Univ. J. Nat. Sci. 21, 376–386 (2016). https://doi.org/10.1007/s11859-016-1184-3
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11859-016-1184-3