Skip to main content
Log in

Attribute level lineage in uncertain data with dependencies

  • Computer Science
  • Published:
Wuhan University Journal of Natural Sciences

Abstract

In uncertain data management, lineages are often used for probability computation of result tuples. However, most of existing works focus on tuple level lineage, which results in imprecise data derivation. Besides, correlations among attributes cannot be captured. In this paper, for base tuples with multiple uncertain attributes, we define attribute level annotation to annotate each attribute. Utilizing these annotations to generate lineages of result tuples can realize more precise derivation. Simultaneously, they can be used for dependency graph construction. Utilizing dependency graph, we can represent not only constraints on schemas but also correlations among attributes. Combining the dependency graph and attribute level lineage, we can correctly compute probabilities of result tuples and precisely derivate data. In experiments, comparing lineage on tuple level and attribute level, it shows that our method has advantages on derivation precision and storage cost.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Sarma A D, Theobald M, Widom J. Exploiting lineage for confidence computation in uncertain and probabilistic databases [C] // Proc 24th International Conference on Data Engineering. Washington D C: IEEE Computer Society Press, 2008: 1023–1032.

    Google Scholar 

  2. Dalvi N, Suciu D. Efficient query evaluation on probabilistic databases [J]. The VLDB Journal, 2007, 16(4): 523–544.

    Article  Google Scholar 

  3. Benjelloun O, Sarma A D, Halevy A, et al. Databases with uncertainty and lineage [J]. The VLDB Journal, 2008, 17(2): 243–264.

    Article  Google Scholar 

  4. Sen P, Deshpande A. Representing and querying correlated tuples in probabilistic databases [C] // Proc 23rd International Conference on Data Engineering. Washington D C: IEEE Computer Society Press, 2007: 596–605.

    Google Scholar 

  5. Huang J, Antova L, Koch C, et al. MayBMS: a probabilistic database management system [C] // Proc 36th ACM International Conference on Management of Data. New York: ACM Press, 2009: 1071–1074.

    Google Scholar 

  6. Singh S, Mayfield C, Shah R, et al. Database support for probabilistic attributes and tuples [C] // Proc 24th International Conference on Data Engineering. Washington D C: IEEE Computer Society Press, 2008: 1053–1061.

    Google Scholar 

  7. Zhou A Y, Jin C Q, Wang G R, et al. A survey on the management of uncertain data [J]. Chinese Journal of Computers, 2009, 32(1): 1–16(Ch).

    Article  Google Scholar 

  8. Fuhr N, Rölleke T. A probabilistic relational algebra for the integration of information retrieval and database systems [J]. ACM Transactions on Information Systems, 1997, 15(1): 32–66.

    Article  Google Scholar 

  9. Lakshmanan L V S, Leone N, Ross R, et al. Probview: A flexible probabilistic database system [J]. ACM Transactions on Database Systems, 1997, 22(3): 419–469.

    Article  Google Scholar 

  10. Sarma A D, Benjelloun O, Halevy A, et al. Working models for uncertain data [C] // Proc 22nd International Conference on Data Engineering. Washington D C: IEEE Computer Society Press, 2006.

    Google Scholar 

  11. Wang L, Zhou G Y, Wang L W, et al. Attribute level lineage and probabilistic computation of uncertain data [J]. Journal of Software, 2014, 25(4): 863–879(Ch).

    CAS  Google Scholar 

  12. Kanagal B, Deshpande A. Lineage processing over correlated probabilistic databases [C] // Proc 37th ACM International Conference on Management of Data. New York: ACM Press, 2010: 675–686.

    Google Scholar 

  13. Duan L, Yue K, Jin C, et al. Tracing errors in probabilistic databases based on the Bayesian network [C] // Database Systems for Advanced Applications. New York: Springer -Verlag, 2015: 104–119.

    Google Scholar 

  14. Malki A, Benslimane D, Benslimane S M, et al. Data Services with uncertain and correlated semantics [J]. World Wide Web, 2016, 19(1): 157–175.

    Article  Google Scholar 

  15. Xu C, Gu Y, Chen L, et al. Interval reverse nearest neighbor queries on uncertain data with markov correlations [C] // 2013 IEEE 29th International Conference on Data Engineering (ICDE). Washington D C: IEEE Press, 2013: 170–181.

    Google Scholar 

  16. Yue K, Wu H, Liu W, et al. Representing and processing lineages over uncertain data based on the Bayesian network [J]. Applied Soft Computing, 2015, 37: 345–362.

    Article  Google Scholar 

  17. Koch C, Olteanu D. Conditioning probabilistic databases [J]. Proceedings of the VLDB Endowment, 2008, 1(1): 313–325.

    Article  Google Scholar 

  18. Sen P, Deshpande A, Getoor L. Representing tuple and attribute uncertainty in probabilistic databases [C] // Proc 7th IEEE International Conference on Data Mining Workshops. Washington D C: IEEE Computer Society Press, 2007: 507–512.

    Google Scholar 

  19. Getoor L, Taskar B, Koller D. Selectivity estimation using probabilistic models [J]. Special Interest Group on Management of Data Record, 2001, 30(2): 461–472.

    Google Scholar 

  20. Transaction Processing Council (TPC). TPC benchmark H: Standard specification [EB/OL].[2015-10-20]. http://www.tpc.org/tpch.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiyong Peng.

Additional information

Foundation item: Supported by the Key Program of National Natural Science Foundation of China (61232002); The National Natural Science Foundation of China (61202033); The Program for Innovative Research Team of Wuhan (2014070504020237); The Ph. D. Seed Foundation of Wuhan University (2012211020207); The Science and Technology Support Program of Hubei Province (2015BAA127)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, L., Wang, L. & Peng, Z. Attribute level lineage in uncertain data with dependencies. Wuhan Univ. J. Nat. Sci. 21, 376–386 (2016). https://doi.org/10.1007/s11859-016-1184-3

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11859-016-1184-3

Keywords

CLC number

Navigation