Skip to main content
Log in

A survey of uncertain data management

  • Review Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Uncertain data are data with uncertainty information, which exist widely in database applications. In recent years, uncertainty in data has brought challenges in almost all database management areas such as data modeling, query representation, query processing, and data mining. There is no doubt that uncertain data management has become a hot research topic in the field of data management. In this study, we explore problems in managing uncertain data, present state-of-the-art solutions, and provide future research directions in this area. The discussed uncertain data management techniques include data modeling, query processing, and data mining in uncertain data in the forms of relational, XML, graph, and stream.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Fuhr N, Rölleke T. A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Transactions on Information Systems, 1997, 15(1): 32–66

    Google Scholar 

  2. Imieliński T, Lipski W. Incomplete information in relational databases. Journal of the ACM, 1984, 31(4): 761–791

    MathSciNet  MATH  Google Scholar 

  3. Barbará D, Garcia–Molina H, Porter D. The management of probabilistic data. IEEE Transactions on Knowledge and Data Engineering, 1992, 4(5): 487–502

    Google Scholar 

  4. Lakshmanan L V, Leone N, Ross R, Subrahmanian V S. Probview: a flexible probabilistic database system. ACM Transactions on Database Systems, 1997, 22(3): 419–469

    Google Scholar 

  5. Zimányi E. Query evaluation in probabilistic relational databases. Theoretical Computer Science, 1997, 171(1): 179–219

    MathSciNet  MATH  Google Scholar 

  6. Sen P, Deshpande A. Representing and querying correlated tuples in probabilistic databases. In: Proceedings of the 23rd International Conference on Data Engineering. 2007, 596–605

    Google Scholar 

  7. Suciu D. Probabilistic databases. SIGACT News, 2008, 39(2): 111–124

    Google Scholar 

  8. Cavallo R, Pittarelli M. The theory of probabilistic databases. In: Proceedings of the International Conference on Very Large Data Bases, 1987, 87: 1–4

    Google Scholar 

  9. Benjelloun O, Sarma A D, Halevy A, Widom J. ULDBS: databases with uncertainty and lineage. In: Proceedings of the 32nd International Conference on Very Large Data Bases. VLDB Endowment, 2006, 953–964

    Google Scholar 

  10. Sen P, Deshpande A, Getoor L. Read–once functions and query evaluation in probabilistic databases. Proceedings of the VLDB Endowment, 2010, 3(1–2): 1068–1079

    Google Scholar 

  11. Olteanu D, Huang J. Using OBDDs for efficient query evaluation on probabilistic databases. In: Proceeding of the International Conference on Scalable Uncertainty Management. 2008, 326–340

    Google Scholar 

  12. Roy S, Perduca V, Tannen V. Faster query answering in probabilistic databases using read–once functions. In: Proceedings of the 14th International Conference on Database Theory. 2011, 232–243

    Google Scholar 

  13. Kenig B, Gal A, Strichman O. A new class of lineage expressions over probabilistic databases computable in P–time. In: Proceedings of the 7th International Conference on Scalable Uncertainty Management. 2013, 219–232

    Google Scholar 

  14. Widom J. Trio: a system for integrated management of data, accuracy, and lineage. Stanford Infolab, 2004

    Google Scholar 

  15. Antova L, Koch C, Olteanu D. Maybms: managing incomplete information with probabilistic world–set decompositions. In: Proceedings of the 23rd International Conference on Data Engineering. 2007, 1479–1480

    Google Scholar 

  16. Cheng R, Singh S, Prabhakar S. U–DBMS: a database system for managing constantly–evolving data. In: Proceedings of the 31st International Conference on Very Large Data Bases. VLDB Endowment, 2005, 1271–1274

    Google Scholar 

  17. Boulos J, Dalvi N, Mandhani B, Mathur S, Re C, Suciu D. Mystiq: a system for finding more answers by using probabilities. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data. 2005, 891–893

    Google Scholar 

  18. Olteanu D, Huang J, Koch C. Sprout: Lazy vs. eager query plans for tuple–independent probabilistic databases. In: Proceedings of the 25th International Conference on Data Engineering. 2009, 640–651

    Google Scholar 

  19. Kimelfeld B, Kosharovsky Y, Sagiv Y. Query evaluation over probabilistic XML. The International Journal on Very Large Data Bases, 2009, 18(5): 1117–1140

    Google Scholar 

  20. Senellart P, Souihli A. Proapprox: a lightweight approximation query processor over probabilistic trees. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. 2011, 1295–1298

    Google Scholar 

  21. Welbourne E, Khoussainova N, Letchner J, Li Y, Balazinska M, Borriello G, Suciu D. Cascadia: a system for specifying, detecting, and managing rfid events. In: Proceedings of the 6th International Conference on Mobile Systems, Applications, and Services. 2008, 281–294

    Google Scholar 

  22. Tran T T, Peng L, Li B, Diao Y, Liu A. PODS: a new model and processing algorithms for uncertain data streams. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 2010, 159–170

    Google Scholar 

  23. Tran T T, Peng L, Diao Y, McGregor A, Liu A. Claro: modeling and processing uncertain data streams. The International Journal on Very Large Data Bases, 2012, 21(5): 651–676

    Google Scholar 

  24. Aggarwal C C, Yu P S. A survey of uncertain data algorithms and applications. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(5): 609–623

    Google Scholar 

  25. Zhou A Y. A survey on the management of uncertain data. Chinese Journal of Computers, 2009, 32(1): 1–16

    Google Scholar 

  26. Kimelfeld B, Senellart P. Probabilistic XML: Models and Complexity. Advances in Probabilistic Databases for Uncertain Information Management, Springer, Berlin, Heidelberg, 2013, 39–66

    Google Scholar 

  27. Sarma A D, Benjelloun O, Halevy A, Widom J. Working models for uncertain data. In: Proceedings of the 22nd International Conference on Data Engineering. 2006, 7

    Google Scholar 

  28. Green T J, Tannen V. Models for incomplete and probabilistic information. In: Proceedings of the International Conference on Extending Database Technology. 2006, 278–296

    Google Scholar 

  29. Sen P, Deshpande A, Getoor L. PRDB: managing and exploiting rich correlations in probabilistic databases. The International Journal on Very Large Data Bases, 2009, 18(5): 1065–1090

    Google Scholar 

  30. Chen R, Mao Y, Kiringa I. GRN model of probabilistic databases: construction, transition and querying. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 2010, 291–302

    Google Scholar 

  31. Cheng R, Xia Y, Prabhakar S, Shah R, Vitter J S. Efficient indexing methods for probabilistic threshold queries over uncertain data. In: Proceedings of the 30th International Conference on Very Large Data Bases. VLDB Endowment, 2004, 876–887

    Google Scholar 

  32. Tao Y, Cheng R, Xiao X, Ngai W K, Kao B, Prabhakar S. Indexing multi–dimensional uncertain data with arbitrary probability density functions. In: Proceedings of the 31st International Conference on Very Large Data Bases. VLDB Endowment, 2005, 922–933

    Google Scholar 

  33. Burdick D, Deshpande P M, Jayram T, Ramakrishnan R, Vaithyanathan S. Olap over uncertain and imprecise data. In: Proceedings of the 31st International Conference on Very Large Data Bases. VLDB Endowment, 2005, 970–981

    Google Scholar 

  34. Jayram T, Kale S, Vee E. Efficient aggregation algorithms for probabilistic data. In: Proceedings of the 18th Annual ACM–SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, 2007, 346–355

    MATH  Google Scholar 

  35. Dalvi N, Suciu D. Efficient query evaluation on probabilistic databases. In: Proceedings of the 30th International Conference on Very Large Data Bases. VLDB Endowment, 2004, 864–875

    Google Scholar 

  36. Cormode G, Garofalakis M. Sketching probabilistic data streams. In: Proceedings of the ACM SIGMOD International Conference onManagement of Data. 2007, 281–292

    Google Scholar 

  37. Ross R, Subrahmanian V, Grant J. Aggregate operators in probabilistic databases. Journal of the ACM, 2005, 52(1): 54–101

    MathSciNet  MATH  Google Scholar 

  38. Kanagal B, Deshpande A. Efficient query evaluation over temporally correlated probabilistic streams. In: Proceedings of the 25th International Conference on Data Engineering. 2009, 1315–1318

    Google Scholar 

  39. Burdick D, Deshpande P M, Jayram T, Ramakrishnan R, Vaithyanathan S. Efficient allocation algorithms for olap over imprecise data. In: Proceedings of the 32nd International Conference on Very Large Data Bases. VLDB Endowment, 2006, 391–402

    Google Scholar 

  40. Ré C, Suciu D. The trichotomy of having queries on a probabilistic database. The International Journal on Very Large Data Bases, 2009, 18(5): 1091–1116

    Google Scholar 

  41. Fink R, Han L, Olteanu D. Aggregation in probabilistic databases via knowledge compilation. Proceedings of the VLDB Endowment. 2012, 5(5): 490–501

    Google Scholar 

  42. Ngai W K, Kao B, Chui C K, Cheng R, Chau M, Yip K Y. Efficient clustering of uncertain data. In: Proceedings of the 6th International Conference on Data Mining. 2006, 436–445

    Google Scholar 

  43. Agrawal P, Widom J. Confidence–aware join algorithms. In: Proceedings of the 25th International Conference on Data Engineering. 2009, 628–639

    Google Scholar 

  44. Cheng R, Singh S, Prabhakar S, Shah R, Vitter J S, Xia Y. Efficient join processing over uncertain data. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management. 2006, 738–747

    Google Scholar 

  45. Kriegel H P, Kunath P, Pfeifle M, Renz M. Probabilistic similarity join on uncertain data. In: Proceedings of the International Conference on Database Systems for Advanced Applications. 2006, 295–309

    Google Scholar 

  46. Ljosa V, Singh A K. Top–k spatial joins of probabilistic objects. In: Proceedings of the 24th International Conference on Data Engineering. 2008, 566–575

    Google Scholar 

  47. Jestes J, Li F, Yan Z, Yi K. Probabilistic string similarity joins. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 2010, 327–338

    Google Scholar 

  48. Lian X, Chen L. Set similarity join on probabilistic data. Proceedings of the VLDB Endowment. 2010, 3(1–2): 650–659

    Google Scholar 

  49. Andritsos P, Fuxman A, Miller R J. Clean answers over dirty databases: probabilistic approach. In: Proceedings of the 22nd International Conference on Data Engineering. 2006, 30

    Google Scholar 

  50. Wick M, McCallum A, Miklau G. Scalable probabilistic databases with factor graphs and mcmc. Proceedings of the VLDB Endowment. 2010, 3(1–2): 794–804

    Google Scholar 

  51. Qi Y, Jain R, Singh S, Prabhakar S. Threshold query optimization for uncertain data. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 2010, 315–326

    Google Scholar 

  52. Moore K F, Rastogi V, Ré C, Suciu D. Query containment of tier–2 queries over a probabilistic database. In: Proceedings of the VLDB Workshop on Management of Uncertain Data. 2010, 47–62

    Google Scholar 

  53. Ge T, Grabiner D, Zdonik S. Monte carlo query processing of uncertain multidimensional array data. In: Proceedings of the 27th International Conference on Data Engineering. 2011, 936–947

    Google Scholar 

  54. Soliman M A, Ilyas I F, Chang K C C. Top–k query processing in uncertain databases. In: Proceedings of the 23rd International Conference on Data Engineering. 2007, 896–905

    Google Scholar 

  55. Yi K, Li F, Kollios G, Srivastava D. Efficient processing of top–k queries in uncertain databases with x–relations. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(12): 1669–1682

    Google Scholar 

  56. Huang Y K, Chen C C, Lee C. Continuous k–nearest neighbor query for moving objects with uncertain velocity. GeoInformatica, 2009, 13(1): 1–25

    Google Scholar 

  57. Zhang X, Chomicki J. Semantics and evaluation of top–k queries in probabilistic databases. Distributed and Parallel Databases, 2009, 26(1): 67–126

    Google Scholar 

  58. Hua M, Pei J, Zhang W, Lin X. Ranking queries on uncertain data: a probabilistic threshold approach. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 2008, 673–686

    Google Scholar 

  59. Cormode G, Li F, Yi K. Semantics of ranking queries for probabilistic data and expected ranks. In: Proceedings of the 25th International Conference on Data Engineering. 2009, 305–316

    Google Scholar 

  60. Ge T, Zdonik S, Madden S. Top–k queries on uncertain data: on score distribution and typical answers. In: Proceedings of the 35th ACM SIGMOD International Conference on Management of Data. 2009, 375–388

    Google Scholar 

  61. Soliman M A, Ilyas I F. Ranking with uncertain scores. In: Proceedings of the 25th International Conference on Data Engineering. 2009, 317–328

    Google Scholar 

  62. Li J, Deshpande A. Ranking continuous probabilistic datasets. Proceedings of the VLDB Endowment, 2010, 3(1–2): 638–649

    Google Scholar 

  63. Cheng R, Chen J, Mokbel M, Chow C Y. Probabilistic verifiers: evaluating constrained nearest–neighbor queries over uncertain data. In: Proceedings of the 24th International Conference on Data Engineering. 2008, 973–982

    Google Scholar 

  64. Cheng R, Chen L, Chen J, Xie X. Evaluating probability threshold k–nearest–neighbor queries over uncertain data. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology. 2009, 672–683

    Google Scholar 

  65. Zhang Y, Lin X, Zhu G, Zhang W, Lin Q. Efficient rank based knn query processing over uncertain data. In: Proceedings of the 26th International Conference on Data Engineering. 2010, 28–39

    Google Scholar 

  66. Lian X, Chen L. Probabilistic group nearest neighbor queries in uncertain databases. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(6): 809–824

    Google Scholar 

  67. Yuen S M, Tao Y, Xiao X, Pei J, Zhang D. Superseding nearest neighbor search on uncertain spatial databases. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(7): 1041–1055

    Google Scholar 

  68. Cheema M A, Lin X, Wang W, Zhang W, Pei J. Probabilistic reverse nearest neighbor queries on uncertain data. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(4): 550–564

    Google Scholar 

  69. Lian X, Chen L. Probabilistic inverse ranking queries in uncertain databases. The International Journal on Very Large Data Bases, 2011, 20(1): 107–127

    MathSciNet  Google Scholar 

  70. Lian X, Chen L. Efficient processing of probabilistic reverse nearest neighbor queries over uncertain data. The International Journal on Very Large Data Bases, 2009, 18(3): 787–808

    Google Scholar 

  71. Pei J, Jiang B, Lin X, Yuan Y. Probabilistic skylines on uncertain data. In: Proceedings of the 33rd International Conference on Very Large Data Bases. VLDB Endowment, 2007, 15–26

    Google Scholar 

  72. Yuan Y, Wang G. Answering probabilistic reachability queries over uncertain graphs. Chinese Journal of Computers, 2010, 33(8): 1378–1386

    MathSciNet  Google Scholar 

  73. Lian X, Chen L. Top–k dominating queries in uncertain databases. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology. 2009, 660–671

    Google Scholar 

  74. Grädel E, Gurevich Y, Hirsch C. The complexity of query reliability. In: Proceedings of the 17th ACM SIGACT–SIGMOD–SIGART Symposium on Principles of Database Systems. 1998, 227–234

    Google Scholar 

  75. Dalvi N, Suciu D. The dichotomy of conjunctive queries on probabilistic structures. In: Proceedings of the 26th ACM SIGMODSIGACT–SIGART Symposium on Principles of Database Systems. 2007, 293–302

    Google Scholar 

  76. Fagin R, Lotem A, Naor M. Optimal aggregation algorithms for middleware. In: Proceedings of the 20th ACM SIGMOD–SIGACTSIGART Symposium on Principles of Database Systems. 2001, 102–113

    Google Scholar 

  77. Li J, Saha B, Deshpande A. A unified approach to ranking in probabilistic databases. Proceedings of the VLDB Endowment. 2009, 2(1): 502–513

    Google Scholar 

  78. Li F, Yi K, Jestes J. Ranking distributed probabilistic data. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. 2009, 361–374

    Google Scholar 

  79. Dai X, Yiu M L, Mamoulis N, Tao Y, Vaitis M. Probabilistic spatial queries on existentially uncertain data. Advances in Spatial and Temporal Databases, 2005, 400–417

    Google Scholar 

  80. Yiu M L, Mamoulis N, Dai X, Tao Y, Vaitis M. Efficient evaluation of probabilistic advanced spatial queries on existentially uncertain data. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(1): 108–122

    Google Scholar 

  81. Cheng R, Kalashnikov D V, Prabhakar S. Evaluating probabilistic queries over imprecise data. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data. 2003, 551–562

    Google Scholar 

  82. Kriegel H P, Kunath P, Renz M. Probabilistic nearest–neighbor query on uncertain objects. In: Proceedings of the International Conference on Database Systems for Advanced Applications. 2007, 337–348

    Google Scholar 

  83. Lian X, Chen L. Probabilistic inverse ranking queries over uncertain data. In: Proceedings of the International Conference on Database Systems for Advanced Applications. 2009, 35–50

    Google Scholar 

  84. Lian X, Chen L. Monochromatic and bichromatic reverse skyline search over uncertain databases. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. ACM, 2008, 213–226

    Google Scholar 

  85. Tao Y, Xiao X, Cheng R. Range search on multidimensional uncertain data. ACM Transactions on Database Systems, 2007, 32(3): 15

    Google Scholar 

  86. Bohm C, Pryakhin A, Schubert M. The gauss–tree: efficient object identification in databases of probabilistic feature vectors. In: Proceedings of the 22nd International Conference on Data Engineering. 2006, 9

    Google Scholar 

  87. Ljosa V, Singh A K. APLA: indexing arbitrary probability distributions. In: Proceedings of the 23rd International Conference on Data Engineering. 2007, 946–955

    Google Scholar 

  88. Cheng R, Xie X, Yiu M L, Chen J, Sun L. UV–diagram: a voronoi diagram for uncertain data. In: Proceedings of the 26th International Conference on Data Engineering, 2010, 796–807

    Google Scholar 

  89. Angiulli F, Fassetti F. Indexing uncertain data in general metric spaces. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(9): 1640–1657

    Google Scholar 

  90. Singh S, Mayfield C, Prabhakar S, Shah R, Hambrusch S. Indexing uncertain categorical data. In: Proceedings of the 23rd International Conference on Data Engineering. 2007, 616–625

    Google Scholar 

  91. Kanagal B, Deshpande A. Indexing correlated probabilistic databases. In: Proceedings of the 35th SIGMOD International Conference on Management of Data. 2009, 455–468

    Google Scholar 

  92. Chau M, Cheng R, Kao B, Ng J. Uncertain data mining: an example in clustering location data. In: Proceedings of the Pacific–Asia Conference on Knowledge Discovery and Data Mining. 2006, 199–204

    Google Scholar 

  93. Li Y, Han J, Yang J. Clustering moving objects. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004, 617–622

    Google Scholar 

  94. Lee S D, Kao B, Cheng R. Reducing UK–means to K–means. In: Proceedings of the 7th International Conference on Data Mining Workshops, 2007, 483–488

    Google Scholar 

  95. Kao B, Lee S D, Cheung DW, Ho WS, Chan K. Clustering uncertain data using voronoi diagrams. In: Proceedings of the 8th International Conference on Data Mining. 2008, 333–342

    Google Scholar 

  96. Dehne F, Noltemeier H. Voronoi trees and clustering problems. Information Systems, 1987, 12(2): 171–175

    Google Scholar 

  97. Gullo F, Ponti G, Tagarelli A. Clustering uncertain data via Kmedoids. In: Proceedings of the International Conference on Scalable Uncertainty Management. 2008, 229–242

    Google Scholar 

  98. Cormode G, McGregor A. Approximation algorithms for clustering uncertain data. In: Proceedings of the 27th ACM SIGMOD–SIGACTSIGART Symposium on Principles of Database Systems. 2008, 191–200

    Google Scholar 

  99. Kriegel H P, Pfeifle M. Density–based clustering of uncertain data. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. 2005, 672–677

    Google Scholar 

  100. Kriegel H P, Pfeifle M. Hierarchical density–based clustering of uncertain data. In: Proceedings of the 5th IEEE International Conference on Data Mining. 2005, 4

    Google Scholar 

  101. Xu H, Li G. Density–based probabilistic clustering of uncertain data. In: Proceedings of the International Conference on Computer Science and Software Engineering. 2008, 474–477

    Google Scholar 

  102. Hamdan H, Govaert G. Mixture model clustering of uncertain data. In: Proceedings of the 14th IEEE International Conference on Fuzzy Systems. 2005, 879–884

    Google Scholar 

  103. Xiao L, Hung E. An efficient distance calculation method for uncertain objects. In: Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining. 2007, 10–17

    Google Scholar 

  104. Bi J, Zhang T. Support vector classification with input data uncertainty. Advances in Neural Information Processing Systems. 2004, 17: 161–169

    Google Scholar 

  105. Bhattacharyya C, Pannagadatta K, Smola A J. A second order cone programming formulation for classifying missing data. Advances in Neural Information Processing Systems. 2005, 17: 153–160

    Google Scholar 

  106. Yang J, Gunn S. Exploiting uncertain data in support vector classification. In: Proceedings of the International Conference on Knowledge–Based Intelligent Information and Engineering Systems. 2007, 148–155

    Google Scholar 

  107. Yang J, Gunn S. Iterative constraints in support vector classification with uncertain information. Constraint–based Mining and Learning, 2007: 49

  108. Demichelis F, Magni P, Piergiorgi P, Rubin M A, Bellazzi R. A hierarchical naive bayes model for handling sample heterogeneity in classification problems: an application to tissue microarrays. BMC Bioinformatics, 2006, 7(1): 514

    Google Scholar 

  109. Chui C K, Kao B, Hung E. Mining frequent itemsets from uncertain data. In: Proceedings of the Pacific–Asia Conference on Knowledge Discovery and Data Mining. 2007, 47–58

    Google Scholar 

  110. Chui C K, Kao B. A decremental approach for mining frequent itemsets from uncertain data. In: Proceedings of the Pacific–Asia Conference on Knowledge Discovery and Data Mining. 2008, 64–75

    Google Scholar 

  111. Leung C S, Carmichael C L, Hao B. Efficient mining of frequent patterns from uncertain data. In: Proceedings of the 7th International Conference on Data Mining Workshops. 2007, 489–494

    Google Scholar 

  112. Leung C K S, Brajczuk D A. Efficient mining of frequent itemsets from data streams. In: Proceedings of the British National Conference on Databases. 2008, 2–14

    Google Scholar 

  113. Leung C K S, Mateo M A F, Brajczuk D A. A tree–based approach for frequent pattern mining from uncertain data. In: Proceedings of the Pacific–Asia Conference on Knowledge Discovery and Data Mining. 2008, 653–661

    Google Scholar 

  114. Hewawasam K, Premaratne K, Subasingha S, Shyu ML. Rule mining and classification in imperfect databases. In: Proceedings of the 8th International Conference on Information Fusion. 2005, 661–668

    Google Scholar 

  115. Tobji M A B, Yaghlane B B, Mellouli K. A new algorithm for mining frequent itemsets from evidential databases. Proceedings of Information Processing and Management of Uncertainty. 2008, 8: 1535–1542

    Google Scholar 

  116. Tobji M A B, Yaghlane B B, Mellouli K. Frequent itemset mining from databases including one evidential attribute. In: Proceedings of the International Conference on Scalable Uncertainty Management. 2008, 19–32

    Google Scholar 

  117. Abiteboul S, Kimelfeld B, Sagiv Y, Senellart P. On the expressiveness of probabilistic XML models. The International Journal on Very Large Data Bases, 2009, 18(5): 1041–1064

    Google Scholar 

  118. Li T, Shao Q, Chen Y. PEPX: a query–friendly probabilistic XML database. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management. 2006, 848–849

    Google Scholar 

  119. Nierman A, Jagadish H. ProTDB: probabilistic data in XML. In: Proceedings of the 28th International Conference on Very Large Data Bases. VLDB Endowment, 2002, 646–657

    Google Scholar 

  120. Abiteboul S, Senellart P. Querying and updating probabilistic information in XML. In: In: Proceedings of the International Conference on Extending Database Technology. 2006, 1059–1068

    Google Scholar 

  121. Senellart P, Abiteboul S. On the complexity of managing probabilistic XML data. In: Proceedings of the 26th ACM SIGMOD–SIGACTSIGART Symposium on Principles of Database Systems. 2007, 283–292

    Google Scholar 

  122. Hung E, Getoor L, Subrahmanian V. Probabilistic interval XML. In: Proceedings of International Conference on Database Theory. 2003, 361–377

    Google Scholar 

  123. Hung E, Getoor L, Subrahmanian V. PXML: a probabilistic semistructured data model and algebra. In: Proceedings of the 19th International Conference on Data Engineering. 2003, 467–478

    Google Scholar 

  124. Abiteboul S, Chan T H H, Kharlamov E, Nutt W, Senellart P. Aggregate queries for discrete and continuous probabilistic XML. In: Proceedings of the 13th International Conference on Database Theory. 2010, 50–61

    Google Scholar 

  125. Kimelfeld B, Kosharovsky Y, Sagiv Y. Query efficiency in probabilistic XML models. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 2008, 701–714

    Google Scholar 

  126. Zhao W, Dekhtyar A, Goldsmith J. Databases for interval probabilities. International Journal of Intelligent Systems, 2004, 19(9): 789–815

    MATH  Google Scholar 

  127. Zhao W, Dekhtyar A, Goldsmith J. A framework for management of semistructured probabilistic data. Journal of Intelligent Information Systems, 2005, 25(3): 293–332

    MATH  Google Scholar 

  128. Dekhtyar A, Goldsmith J, Hawkes S R. Semistructured probabilistic databases. In: Proceedings of the 13th International Conference on Scientific and Statistical Database Management. 2001, 36–45

    Google Scholar 

  129. Hung E. Managing uncertainty and ontologies in databases. UMD Theses and Dissertations, 2005

    Google Scholar 

  130. Magnani M, Montesi D. Management of interval probabilistic data. Acta Informatica, 2008, 45(2): 93–130

    MathSciNet  MATH  Google Scholar 

  131. Cohen S, Kimelfeld B, Sagiv Y. Incorporating constraints in probabilistic XML. In: Proceedings of the 27th ACM SIGMOD–SIGACTSIGART Symposium on Principles of Database Systems. 2008, 109–118

    Google Scholar 

  132. Kimelfeld B, Sagiv Y. Matching twigs in probabilistic XML. In: Proceedings of the 33rd International Conference on Very Large Data Bases. VLDB Endowment, 2007, 27–38

    Google Scholar 

  133. Adar E, Ré C. Managing uncertainty in social networks. IEEE Data Eng. Bull, 2007, 30(2): 15–22

    Google Scholar 

  134. Hintsanen P. The most reliable subgraph problem. In: Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery. 2007, 471–478

    Google Scholar 

  135. Hintsanen P, Toivonen H. Finding reliable subgraphs from large probabilistic graphs. Data Mining and Knowledge Discovery, 2008, 17(1): 3–23

    MathSciNet  Google Scholar 

  136. Zou Z, Li J, Gao H, Zhang S. Frequent subgraph pattern mining on uncertain graph data. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 583–592

    Google Scholar 

  137. Zou Z, Li J, Gao H, Zhang S. Mining frequent subgraph patterns from uncertain graphs. Journal of Software, 2009, 20(11): 2965–2976

    Google Scholar 

  138. Zou Z, Li J, Gao H, Zhang S. Mining frequent subgraph patterns from uncertain graph data. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(9): 1203–1218

    Google Scholar 

  139. Potamias M, Bonchi F, Gionis A, Kollios G. K–nearest neighbors in uncertain graphs. Proceedings of the VLDB Endowment. 2010, 3(1–2): 997–1008

    Google Scholar 

  140. Yuan Y, Chen L, Wang G. Efficiently answering probability thresholdbased shortest path queries over uncertain graphs. In: Proceedings of the International Conference on Database Systems for Advanced Applications. 2010, 155–170

    Google Scholar 

  141. Papapetrou O, Ioannou E, Skoutas D. Efficient discovery of frequent subgraph patterns in uncertain graph databases. In: Proceedings of the 14th International Conference on Extending Database Technology. 2011, 355–366

    Google Scholar 

  142. Han M, Zhang W, Li J Z. Raking: an efficient k–maximal frequent pattern mining algorithm on uncertain graph database. Chinese Journal of Computers, 2010, 33(8): 1387–1395

    MathSciNet  Google Scholar 

  143. Zou Z, Gao H, Li J. Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2010, 633–642

    Google Scholar 

  144. Zou Z, Li J, Gao H, Zhang S. Finding top–k maximal cliques in an uncertain graph. In: Proceedings of the 26th International Conference on Data Engineering. 2010, 649–652

    Google Scholar 

  145. Yuan Y, Wang G, Wang H, Chen L. Efficient subgraph search over large uncertain graphs. Proceedings of the VLDB Endowment, 2011, 4(11): 876–886

    Google Scholar 

  146. Yuan Y, Wang G, Chen L, Wang H. Efficient subgraph similarity search on large probabilistic graph databases. Proceedings of the VLDB Endowment, 2012, 5(9): 800–811

    Google Scholar 

  147. Koyutürk M, Grama A, Szpankowski W. An efficient algorithm for detecting frequent subgraphs in biological networks. Bioinformatics. 2004, 20(Suppl 1): 200–207

    Google Scholar 

  148. Valiant L G. The complexity of enumeration and reliability problems. SIAM Journal on Computing, 1979, 8(3): 410–421

    MathSciNet  MATH  Google Scholar 

  149. Jin C, Yi K, Chen L, Yu J X, Lin X. Sliding–window top–k queries on uncertain streams. Proceedings of the VLDB Endowment, 2008, 1(1): 301–312

    Google Scholar 

  150. Ré C, Letchner J, Balazinksa M, Suciu D. Event queries on correlated probabilistic streams. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 2008, 715–728

    Google Scholar 

  151. Alon N, Matias Y, Szegedy M. The space complexity of approximating the frequency moments. In: Proceedings of the 28th Annual ACM Symposium on Theory of Computing. 1996, 20–29

    MATH  Google Scholar 

  152. Flajolet P, Martin G N. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 1985, 31(2): 182–209

    MathSciNet  MATH  Google Scholar 

  153. Zhang T, Ramakrishnan R, Livny M. Birch: an efficient data clustering method for very large databases. ACM Sigmod Record. 1996, 25(2): 103–114

    Google Scholar 

  154. Aggarwal C C, Han J, Wang J, Yu P S. A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases, VLDB Endowment. 2003, 81–92

    Google Scholar 

  155. Aggarwal C C, Yu P S. A framework for clustering uncertain data streams. In: Proceedings of the 24th International Conference on Data Engineering. 2008, 150–159

    Google Scholar 

  156. Li Z, Ge T. Online windowed subsequence matching over probabilistic sequences. In: Proceedings of the International Conference on Management of Data. 2012, 277–288

    Google Scholar 

  157. Lian X, Chen L. Efficient join processing on uncertain data streams. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 857–866

    Google Scholar 

  158. Ge T, Liu F. Accuracy–aware uncertain stream databases. In: Proceedings of the 28th International Conference on Data Engineering. 2012, 174–185

    Google Scholar 

  159. Peng L, Diao Y, Liu A. Optimizing probabilistic query processing on continuous uncertain data. Proceedings of the VLDB Endowment, 2011, 4(11): 1169–1180

    Google Scholar 

  160. Jayram T, McGregor A, Muthukrishnan S, Vee E. Estimating statistical aggregates on probabilistic data streams. In: Proceedings of the 26th ACM SIGMOD–SIGACT–SIGART Symposium on Principles of Database Systems. 2007, 243–252

    Google Scholar 

  161. Zhang Q, Li F, Yi K. Finding frequent items in probabilistic data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2008, 819–832

    Google Scholar 

  162. Aggarwal C C, Han J, Wang J, Philip S Y. On high dimensional projected clustering of data streams. Data Mining and Knowledge Discovery, 2005, 10(3): 251–273

    MathSciNet  Google Scholar 

  163. Zhang C, Gao M, Zhou A. Tracking high quality clusters over uncertain data streams. In: Proceedings of the 25th International Conference on Data Engineering. 2009, 1641–1648

    Google Scholar 

  164. Zhang W, Lin X, Zhang Y, Wang W, Zhu G, Xu Yu J. Probabilistic skyline operator over sliding windows. Information Systems, 2013, 38(8): 1212–1233

    Google Scholar 

  165. Subramaniam S, Palpanas T, Papadopoulos D, Kalogeraki V, Gunopulos D. Online outlier detection in sensor data using non–parametric models. In: Proceedings of the 32nd International Conference on Very Large Data Bases, VLDB Endowment. 2006, 187–198

    Google Scholar 

  166. Deshpande A, Guestrin C, Madden S R, Hellerstein J M, Hong W. Model–driven data acquisition in sensor networks. In: Proceedings of the 30th International Conference on Very Large Data Bases. VLDB Endowment, 2004, 588–599

    Google Scholar 

  167. Hida Y, Huang P, Nishtala R. Aggregation query under uncertainty in sensor networks. Technical Report, 2004

    Google Scholar 

  168. Welbourne E, Khoussainova N, Letchner J, Li Y, Balazinska M, Borriello G, Suciu D. Cascadia: a system for specifying, detecting, and managing rfid events. In: Proceedings of the 6th International Conference on Mobile Systems, Applications, and Services. 2008, 281–294

    Google Scholar 

  169. Kanagal B, Deshpande A. Online filtering, smoothing and probabilistic modeling of streaming data. In: Proceedings of the 24th IEEE International Conference on Data Engineering. 2008, 1160–1169

    Google Scholar 

  170. Zhang C J, Chen L, Tong Y, Liu Z. Cleaning uncertain data with a noisy crowd. In: Proceedings of the 31st IEEE International Conference on Data Engineering. 2015, 6–17

    Google Scholar 

  171. Mo L, Cheng R, Li X, Cheung D W, Yang X S. Cleaning uncertain data for top–k queries. In: Proceedings of the 29th IEEE International Conference on Data Engineering. 2013, 134–145

    Google Scholar 

  172. Panse F, Van Keulen M, De Keijzer A, Ritter N. Duplicate detection in probabilistic data. CDE Workshops. 2010, 179–182

    Google Scholar 

  173. Van Keulen M, De Keijzer A. Qualitative effects of knowledge rules and user feedback in probabilistic data integration. Proceedings of The VLDB Endowment, 2009, 18(5): 1191–1217

    Google Scholar 

  174. Cheng R, Chen J, Xie X. Cleaning uncertain data with quality guarantees. Proceedings of The VLDB Endowment, 2008, 1(1): 722–735

    Google Scholar 

  175. Dong X L, Halevy A, Yu C. Data integration with uncertainty. Proceedings of The VLDB Endowment, 2009, 18(2): 469–500

    Google Scholar 

Download references

Acknowledgements

This paper was partially supported by NSFC (61602159, U1509216, 61472099, 61133002), National Sci-Tech Support Plan (2015BAH10F01) and the Scientific Research Foundation for the Returned Overseas Chinese Scholars of Heilongjiang Province (LC2016026).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongzhi Wang.

Additional information

Lingli Li is an associate professor at Heilongjiang University, China. She obtained her PhD degree from Harbin Institute of Technology in 2015. Her research interests include data management, data quality, entity resolution. She has published more than 10 papers in refereed journals and conferences such as IEEE Trans. of Knowledge and Data Engineering.

Hongzhi Wang is a professor and doctoral supervisor at Harbin Institute of Technology, China. His research area is data management, including data quality, XML data management, and graph management. He has published more than 100 papers in refereed journals and conferences. He is a recipient of the outstanding dissertation award of CCF, Microsoft Fellow, and IBM PhD Fellowship.

Jianzhong Li is a professor and doctoral supervisor at Harbin Institute of Technology, China. He is a senior member of CCF. His research interests include database, parallel computing, and wireless sensor networks, etc.

Hong Gao is a professor and doctoral supervisor at Harbin Institute of Technology, China. She is a senior member of CCF. Her research interests include data management, wireless sensor networks, and graph database, etc.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, L., Wang, H., Li, J. et al. A survey of uncertain data management. Front. Comput. Sci. 14, 162–190 (2020). https://doi.org/10.1007/s11704-017-7063-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-017-7063-z

Keywords

Navigation