A survey of uncertain data management

Li, Lingli; Wang, Hongzhi; Li, Jianzhong; Gao, Hong

doi:10.1007/s11704-017-7063-z

A survey of uncertain data management

Review Article
Published: 06 September 2018

Volume 14, pages 162–190, (2020)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Lingli Li¹,
Hongzhi Wang²,
Jianzhong Li² &
…
Hong Gao²

429 Accesses
10 Citations
Explore all metrics

Abstract

Uncertain data are data with uncertainty information, which exist widely in database applications. In recent years, uncertainty in data has brought challenges in almost all database management areas such as data modeling, query representation, query processing, and data mining. There is no doubt that uncertain data management has become a hot research topic in the field of data management. In this study, we explore problems in managing uncertain data, present state-of-the-art solutions, and provide future research directions in this area. The discussed uncertain data management techniques include data modeling, query processing, and data mining in uncertain data in the forms of relational, XML, graph, and stream.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Working Model for Uncertain Data with Lineage

Uncertain Data Integration

Towards Hybrid Uncertain Data Modeling in Databases

References

Fuhr N, Rölleke T. A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Transactions on Information Systems, 1997, 15(1): 32–66
Google Scholar
Imieliński T, Lipski W. Incomplete information in relational databases. Journal of the ACM, 1984, 31(4): 761–791
MathSciNet MATH Google Scholar
Barbará D, Garcia–Molina H, Porter D. The management of probabilistic data. IEEE Transactions on Knowledge and Data Engineering, 1992, 4(5): 487–502
Google Scholar
Lakshmanan L V, Leone N, Ross R, Subrahmanian V S. Probview: a flexible probabilistic database system. ACM Transactions on Database Systems, 1997, 22(3): 419–469
Google Scholar
Zimányi E. Query evaluation in probabilistic relational databases. Theoretical Computer Science, 1997, 171(1): 179–219
MathSciNet MATH Google Scholar
Sen P, Deshpande A. Representing and querying correlated tuples in probabilistic databases. In: Proceedings of the 23rd International Conference on Data Engineering. 2007, 596–605
Google Scholar
Suciu D. Probabilistic databases. SIGACT News, 2008, 39(2): 111–124
Google Scholar
Cavallo R, Pittarelli M. The theory of probabilistic databases. In: Proceedings of the International Conference on Very Large Data Bases, 1987, 87: 1–4
Google Scholar
Benjelloun O, Sarma A D, Halevy A, Widom J. ULDBS: databases with uncertainty and lineage. In: Proceedings of the 32nd International Conference on Very Large Data Bases. VLDB Endowment, 2006, 953–964
Google Scholar
Sen P, Deshpande A, Getoor L. Read–once functions and query evaluation in probabilistic databases. Proceedings of the VLDB Endowment, 2010, 3(1–2): 1068–1079
Google Scholar
Olteanu D, Huang J. Using OBDDs for efficient query evaluation on probabilistic databases. In: Proceeding of the International Conference on Scalable Uncertainty Management. 2008, 326–340
Google Scholar
Roy S, Perduca V, Tannen V. Faster query answering in probabilistic databases using read–once functions. In: Proceedings of the 14th International Conference on Database Theory. 2011, 232–243
Google Scholar
Kenig B, Gal A, Strichman O. A new class of lineage expressions over probabilistic databases computable in P–time. In: Proceedings of the 7th International Conference on Scalable Uncertainty Management. 2013, 219–232
Google Scholar
Widom J. Trio: a system for integrated management of data, accuracy, and lineage. Stanford Infolab, 2004
Google Scholar
Antova L, Koch C, Olteanu D. Maybms: managing incomplete information with probabilistic world–set decompositions. In: Proceedings of the 23rd International Conference on Data Engineering. 2007, 1479–1480
Google Scholar
Cheng R, Singh S, Prabhakar S. U–DBMS: a database system for managing constantly–evolving data. In: Proceedings of the 31st International Conference on Very Large Data Bases. VLDB Endowment, 2005, 1271–1274
Google Scholar
Boulos J, Dalvi N, Mandhani B, Mathur S, Re C, Suciu D. Mystiq: a system for finding more answers by using probabilities. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data. 2005, 891–893
Google Scholar
Olteanu D, Huang J, Koch C. Sprout: Lazy vs. eager query plans for tuple–independent probabilistic databases. In: Proceedings of the 25th International Conference on Data Engineering. 2009, 640–651
Google Scholar
Kimelfeld B, Kosharovsky Y, Sagiv Y. Query evaluation over probabilistic XML. The International Journal on Very Large Data Bases, 2009, 18(5): 1117–1140
Google Scholar
Senellart P, Souihli A. Proapprox: a lightweight approximation query processor over probabilistic trees. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. 2011, 1295–1298
Google Scholar
Welbourne E, Khoussainova N, Letchner J, Li Y, Balazinska M, Borriello G, Suciu D. Cascadia: a system for specifying, detecting, and managing rfid events. In: Proceedings of the 6th International Conference on Mobile Systems, Applications, and Services. 2008, 281–294
Google Scholar
Tran T T, Peng L, Li B, Diao Y, Liu A. PODS: a new model and processing algorithms for uncertain data streams. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 2010, 159–170
Google Scholar
Tran T T, Peng L, Diao Y, McGregor A, Liu A. Claro: modeling and processing uncertain data streams. The International Journal on Very Large Data Bases, 2012, 21(5): 651–676
Google Scholar
Aggarwal C C, Yu P S. A survey of uncertain data algorithms and applications. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(5): 609–623
Google Scholar
Zhou A Y. A survey on the management of uncertain data. Chinese Journal of Computers, 2009, 32(1): 1–16
Google Scholar
Kimelfeld B, Senellart P. Probabilistic XML: Models and Complexity. Advances in Probabilistic Databases for Uncertain Information Management, Springer, Berlin, Heidelberg, 2013, 39–66
Google Scholar
Sarma A D, Benjelloun O, Halevy A, Widom J. Working models for uncertain data. In: Proceedings of the 22nd International Conference on Data Engineering. 2006, 7
Google Scholar
Green T J, Tannen V. Models for incomplete and probabilistic information. In: Proceedings of the International Conference on Extending Database Technology. 2006, 278–296
Google Scholar
Sen P, Deshpande A, Getoor L. PRDB: managing and exploiting rich correlations in probabilistic databases. The International Journal on Very Large Data Bases, 2009, 18(5): 1065–1090
Google Scholar
Chen R, Mao Y, Kiringa I. GRN model of probabilistic databases: construction, transition and querying. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 2010, 291–302
Google Scholar
Cheng R, Xia Y, Prabhakar S, Shah R, Vitter J S. Efficient indexing methods for probabilistic threshold queries over uncertain data. In: Proceedings of the 30th International Conference on Very Large Data Bases. VLDB Endowment, 2004, 876–887
Google Scholar
Tao Y, Cheng R, Xiao X, Ngai W K, Kao B, Prabhakar S. Indexing multi–dimensional uncertain data with arbitrary probability density functions. In: Proceedings of the 31st International Conference on Very Large Data Bases. VLDB Endowment, 2005, 922–933
Google Scholar
Burdick D, Deshpande P M, Jayram T, Ramakrishnan R, Vaithyanathan S. Olap over uncertain and imprecise data. In: Proceedings of the 31st International Conference on Very Large Data Bases. VLDB Endowment, 2005, 970–981
Google Scholar
Jayram T, Kale S, Vee E. Efficient aggregation algorithms for probabilistic data. In: Proceedings of the 18th Annual ACM–SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, 2007, 346–355
MATH Google Scholar
Dalvi N, Suciu D. Efficient query evaluation on probabilistic databases. In: Proceedings of the 30th International Conference on Very Large Data Bases. VLDB Endowment, 2004, 864–875
Google Scholar
Cormode G, Garofalakis M. Sketching probabilistic data streams. In: Proceedings of the ACM SIGMOD International Conference onManagement of Data. 2007, 281–292
Google Scholar
Ross R, Subrahmanian V, Grant J. Aggregate operators in probabilistic databases. Journal of the ACM, 2005, 52(1): 54–101
MathSciNet MATH Google Scholar
Kanagal B, Deshpande A. Efficient query evaluation over temporally correlated probabilistic streams. In: Proceedings of the 25th International Conference on Data Engineering. 2009, 1315–1318
Google Scholar
Burdick D, Deshpande P M, Jayram T, Ramakrishnan R, Vaithyanathan S. Efficient allocation algorithms for olap over imprecise data. In: Proceedings of the 32nd International Conference on Very Large Data Bases. VLDB Endowment, 2006, 391–402
Google Scholar
Ré C, Suciu D. The trichotomy of having queries on a probabilistic database. The International Journal on Very Large Data Bases, 2009, 18(5): 1091–1116
Google Scholar
Fink R, Han L, Olteanu D. Aggregation in probabilistic databases via knowledge compilation. Proceedings of the VLDB Endowment. 2012, 5(5): 490–501
Google Scholar
Ngai W K, Kao B, Chui C K, Cheng R, Chau M, Yip K Y. Efficient clustering of uncertain data. In: Proceedings of the 6th International Conference on Data Mining. 2006, 436–445
Google Scholar
Agrawal P, Widom J. Confidence–aware join algorithms. In: Proceedings of the 25th International Conference on Data Engineering. 2009, 628–639
Google Scholar
Cheng R, Singh S, Prabhakar S, Shah R, Vitter J S, Xia Y. Efficient join processing over uncertain data. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management. 2006, 738–747
Google Scholar
Kriegel H P, Kunath P, Pfeifle M, Renz M. Probabilistic similarity join on uncertain data. In: Proceedings of the International Conference on Database Systems for Advanced Applications. 2006, 295–309
Google Scholar
Ljosa V, Singh A K. Top–k spatial joins of probabilistic objects. In: Proceedings of the 24th International Conference on Data Engineering. 2008, 566–575
Google Scholar
Jestes J, Li F, Yan Z, Yi K. Probabilistic string similarity joins. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 2010, 327–338
Google Scholar
Lian X, Chen L. Set similarity join on probabilistic data. Proceedings of the VLDB Endowment. 2010, 3(1–2): 650–659
Google Scholar
Andritsos P, Fuxman A, Miller R J. Clean answers over dirty databases: probabilistic approach. In: Proceedings of the 22nd International Conference on Data Engineering. 2006, 30
Google Scholar
Wick M, McCallum A, Miklau G. Scalable probabilistic databases with factor graphs and mcmc. Proceedings of the VLDB Endowment. 2010, 3(1–2): 794–804
Google Scholar
Qi Y, Jain R, Singh S, Prabhakar S. Threshold query optimization for uncertain data. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 2010, 315–326
Google Scholar
Moore K F, Rastogi V, Ré C, Suciu D. Query containment of tier–2 queries over a probabilistic database. In: Proceedings of the VLDB Workshop on Management of Uncertain Data. 2010, 47–62
Google Scholar
Ge T, Grabiner D, Zdonik S. Monte carlo query processing of uncertain multidimensional array data. In: Proceedings of the 27th International Conference on Data Engineering. 2011, 936–947
Google Scholar
Soliman M A, Ilyas I F, Chang K C C. Top–k query processing in uncertain databases. In: Proceedings of the 23rd International Conference on Data Engineering. 2007, 896–905
Google Scholar
Yi K, Li F, Kollios G, Srivastava D. Efficient processing of top–k queries in uncertain databases with x–relations. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(12): 1669–1682
Google Scholar
Huang Y K, Chen C C, Lee C. Continuous k–nearest neighbor query for moving objects with uncertain velocity. GeoInformatica, 2009, 13(1): 1–25
Google Scholar
Zhang X, Chomicki J. Semantics and evaluation of top–k queries in probabilistic databases. Distributed and Parallel Databases, 2009, 26(1): 67–126
Google Scholar
Hua M, Pei J, Zhang W, Lin X. Ranking queries on uncertain data: a probabilistic threshold approach. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 2008, 673–686
Google Scholar
Cormode G, Li F, Yi K. Semantics of ranking queries for probabilistic data and expected ranks. In: Proceedings of the 25th International Conference on Data Engineering. 2009, 305–316
Google Scholar
Ge T, Zdonik S, Madden S. Top–k queries on uncertain data: on score distribution and typical answers. In: Proceedings of the 35th ACM SIGMOD International Conference on Management of Data. 2009, 375–388
Google Scholar
Soliman M A, Ilyas I F. Ranking with uncertain scores. In: Proceedings of the 25th International Conference on Data Engineering. 2009, 317–328
Google Scholar
Li J, Deshpande A. Ranking continuous probabilistic datasets. Proceedings of the VLDB Endowment, 2010, 3(1–2): 638–649
Google Scholar
Cheng R, Chen J, Mokbel M, Chow C Y. Probabilistic verifiers: evaluating constrained nearest–neighbor queries over uncertain data. In: Proceedings of the 24th International Conference on Data Engineering. 2008, 973–982
Google Scholar
Cheng R, Chen L, Chen J, Xie X. Evaluating probability threshold k–nearest–neighbor queries over uncertain data. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology. 2009, 672–683
Google Scholar
Zhang Y, Lin X, Zhu G, Zhang W, Lin Q. Efficient rank based knn query processing over uncertain data. In: Proceedings of the 26th International Conference on Data Engineering. 2010, 28–39
Google Scholar
Lian X, Chen L. Probabilistic group nearest neighbor queries in uncertain databases. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(6): 809–824
Google Scholar
Yuen S M, Tao Y, Xiao X, Pei J, Zhang D. Superseding nearest neighbor search on uncertain spatial databases. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(7): 1041–1055
Google Scholar
Cheema M A, Lin X, Wang W, Zhang W, Pei J. Probabilistic reverse nearest neighbor queries on uncertain data. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(4): 550–564
Google Scholar
Lian X, Chen L. Probabilistic inverse ranking queries in uncertain databases. The International Journal on Very Large Data Bases, 2011, 20(1): 107–127
MathSciNet Google Scholar
Lian X, Chen L. Efficient processing of probabilistic reverse nearest neighbor queries over uncertain data. The International Journal on Very Large Data Bases, 2009, 18(3): 787–808
Google Scholar
Pei J, Jiang B, Lin X, Yuan Y. Probabilistic skylines on uncertain data. In: Proceedings of the 33rd International Conference on Very Large Data Bases. VLDB Endowment, 2007, 15–26
Google Scholar
Yuan Y, Wang G. Answering probabilistic reachability queries over uncertain graphs. Chinese Journal of Computers, 2010, 33(8): 1378–1386
MathSciNet Google Scholar
Lian X, Chen L. Top–k dominating queries in uncertain databases. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology. 2009, 660–671
Google Scholar
Grädel E, Gurevich Y, Hirsch C. The complexity of query reliability. In: Proceedings of the 17th ACM SIGACT–SIGMOD–SIGART Symposium on Principles of Database Systems. 1998, 227–234
Google Scholar
Dalvi N, Suciu D. The dichotomy of conjunctive queries on probabilistic structures. In: Proceedings of the 26th ACM SIGMODSIGACT–SIGART Symposium on Principles of Database Systems. 2007, 293–302
Google Scholar
Fagin R, Lotem A, Naor M. Optimal aggregation algorithms for middleware. In: Proceedings of the 20th ACM SIGMOD–SIGACTSIGART Symposium on Principles of Database Systems. 2001, 102–113
Google Scholar
Li J, Saha B, Deshpande A. A unified approach to ranking in probabilistic databases. Proceedings of the VLDB Endowment. 2009, 2(1): 502–513
Google Scholar
Li F, Yi K, Jestes J. Ranking distributed probabilistic data. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. 2009, 361–374
Google Scholar
Dai X, Yiu M L, Mamoulis N, Tao Y, Vaitis M. Probabilistic spatial queries on existentially uncertain data. Advances in Spatial and Temporal Databases, 2005, 400–417
Google Scholar
Yiu M L, Mamoulis N, Dai X, Tao Y, Vaitis M. Efficient evaluation of probabilistic advanced spatial queries on existentially uncertain data. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(1): 108–122
Google Scholar
Cheng R, Kalashnikov D V, Prabhakar S. Evaluating probabilistic queries over imprecise data. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data. 2003, 551–562
Google Scholar
Kriegel H P, Kunath P, Renz M. Probabilistic nearest–neighbor query on uncertain objects. In: Proceedings of the International Conference on Database Systems for Advanced Applications. 2007, 337–348
Google Scholar
Lian X, Chen L. Probabilistic inverse ranking queries over uncertain data. In: Proceedings of the International Conference on Database Systems for Advanced Applications. 2009, 35–50
Google Scholar
Lian X, Chen L. Monochromatic and bichromatic reverse skyline search over uncertain databases. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. ACM, 2008, 213–226
Google Scholar
Tao Y, Xiao X, Cheng R. Range search on multidimensional uncertain data. ACM Transactions on Database Systems, 2007, 32(3): 15
Google Scholar
Bohm C, Pryakhin A, Schubert M. The gauss–tree: efficient object identification in databases of probabilistic feature vectors. In: Proceedings of the 22nd International Conference on Data Engineering. 2006, 9
Google Scholar
Ljosa V, Singh A K. APLA: indexing arbitrary probability distributions. In: Proceedings of the 23rd International Conference on Data Engineering. 2007, 946–955
Google Scholar
Cheng R, Xie X, Yiu M L, Chen J, Sun L. UV–diagram: a voronoi diagram for uncertain data. In: Proceedings of the 26th International Conference on Data Engineering, 2010, 796–807
Google Scholar
Angiulli F, Fassetti F. Indexing uncertain data in general metric spaces. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(9): 1640–1657
Google Scholar
Singh S, Mayfield C, Prabhakar S, Shah R, Hambrusch S. Indexing uncertain categorical data. In: Proceedings of the 23rd International Conference on Data Engineering. 2007, 616–625
Google Scholar
Kanagal B, Deshpande A. Indexing correlated probabilistic databases. In: Proceedings of the 35th SIGMOD International Conference on Management of Data. 2009, 455–468
Google Scholar
Chau M, Cheng R, Kao B, Ng J. Uncertain data mining: an example in clustering location data. In: Proceedings of the Pacific–Asia Conference on Knowledge Discovery and Data Mining. 2006, 199–204
Google Scholar
Li Y, Han J, Yang J. Clustering moving objects. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004, 617–622
Google Scholar
Lee S D, Kao B, Cheng R. Reducing UK–means to K–means. In: Proceedings of the 7th International Conference on Data Mining Workshops, 2007, 483–488
Google Scholar
Kao B, Lee S D, Cheung DW, Ho WS, Chan K. Clustering uncertain data using voronoi diagrams. In: Proceedings of the 8th International Conference on Data Mining. 2008, 333–342
Google Scholar
Dehne F, Noltemeier H. Voronoi trees and clustering problems. Information Systems, 1987, 12(2): 171–175
Google Scholar
Gullo F, Ponti G, Tagarelli A. Clustering uncertain data via Kmedoids. In: Proceedings of the International Conference on Scalable Uncertainty Management. 2008, 229–242
Google Scholar
Cormode G, McGregor A. Approximation algorithms for clustering uncertain data. In: Proceedings of the 27th ACM SIGMOD–SIGACTSIGART Symposium on Principles of Database Systems. 2008, 191–200
Google Scholar
Kriegel H P, Pfeifle M. Density–based clustering of uncertain data. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. 2005, 672–677
Google Scholar
Kriegel H P, Pfeifle M. Hierarchical density–based clustering of uncertain data. In: Proceedings of the 5th IEEE International Conference on Data Mining. 2005, 4
Google Scholar
Xu H, Li G. Density–based probabilistic clustering of uncertain data. In: Proceedings of the International Conference on Computer Science and Software Engineering. 2008, 474–477
Google Scholar
Hamdan H, Govaert G. Mixture model clustering of uncertain data. In: Proceedings of the 14th IEEE International Conference on Fuzzy Systems. 2005, 879–884
Google Scholar
Xiao L, Hung E. An efficient distance calculation method for uncertain objects. In: Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining. 2007, 10–17
Google Scholar
Bi J, Zhang T. Support vector classification with input data uncertainty. Advances in Neural Information Processing Systems. 2004, 17: 161–169
Google Scholar
Bhattacharyya C, Pannagadatta K, Smola A J. A second order cone programming formulation for classifying missing data. Advances in Neural Information Processing Systems. 2005, 17: 153–160
Google Scholar
Yang J, Gunn S. Exploiting uncertain data in support vector classification. In: Proceedings of the International Conference on Knowledge–Based Intelligent Information and Engineering Systems. 2007, 148–155
Google Scholar
Yang J, Gunn S. Iterative constraints in support vector classification with uncertain information. Constraint–based Mining and Learning, 2007: 49
Demichelis F, Magni P, Piergiorgi P, Rubin M A, Bellazzi R. A hierarchical naive bayes model for handling sample heterogeneity in classification problems: an application to tissue microarrays. BMC Bioinformatics, 2006, 7(1): 514
Google Scholar
Chui C K, Kao B, Hung E. Mining frequent itemsets from uncertain data. In: Proceedings of the Pacific–Asia Conference on Knowledge Discovery and Data Mining. 2007, 47–58
Google Scholar
Chui C K, Kao B. A decremental approach for mining frequent itemsets from uncertain data. In: Proceedings of the Pacific–Asia Conference on Knowledge Discovery and Data Mining. 2008, 64–75
Google Scholar
Leung C S, Carmichael C L, Hao B. Efficient mining of frequent patterns from uncertain data. In: Proceedings of the 7th International Conference on Data Mining Workshops. 2007, 489–494
Google Scholar
Leung C K S, Brajczuk D A. Efficient mining of frequent itemsets from data streams. In: Proceedings of the British National Conference on Databases. 2008, 2–14
Google Scholar
Leung C K S, Mateo M A F, Brajczuk D A. A tree–based approach for frequent pattern mining from uncertain data. In: Proceedings of the Pacific–Asia Conference on Knowledge Discovery and Data Mining. 2008, 653–661
Google Scholar
Hewawasam K, Premaratne K, Subasingha S, Shyu ML. Rule mining and classification in imperfect databases. In: Proceedings of the 8th International Conference on Information Fusion. 2005, 661–668
Google Scholar
Tobji M A B, Yaghlane B B, Mellouli K. A new algorithm for mining frequent itemsets from evidential databases. Proceedings of Information Processing and Management of Uncertainty. 2008, 8: 1535–1542
Google Scholar
Tobji M A B, Yaghlane B B, Mellouli K. Frequent itemset mining from databases including one evidential attribute. In: Proceedings of the International Conference on Scalable Uncertainty Management. 2008, 19–32
Google Scholar
Abiteboul S, Kimelfeld B, Sagiv Y, Senellart P. On the expressiveness of probabilistic XML models. The International Journal on Very Large Data Bases, 2009, 18(5): 1041–1064
Google Scholar
Li T, Shao Q, Chen Y. PEPX: a query–friendly probabilistic XML database. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management. 2006, 848–849
Google Scholar
Nierman A, Jagadish H. ProTDB: probabilistic data in XML. In: Proceedings of the 28th International Conference on Very Large Data Bases. VLDB Endowment, 2002, 646–657
Google Scholar
Abiteboul S, Senellart P. Querying and updating probabilistic information in XML. In: In: Proceedings of the International Conference on Extending Database Technology. 2006, 1059–1068
Google Scholar
Senellart P, Abiteboul S. On the complexity of managing probabilistic XML data. In: Proceedings of the 26th ACM SIGMOD–SIGACTSIGART Symposium on Principles of Database Systems. 2007, 283–292
Google Scholar
Hung E, Getoor L, Subrahmanian V. Probabilistic interval XML. In: Proceedings of International Conference on Database Theory. 2003, 361–377
Google Scholar
Hung E, Getoor L, Subrahmanian V. PXML: a probabilistic semistructured data model and algebra. In: Proceedings of the 19th International Conference on Data Engineering. 2003, 467–478
Google Scholar
Abiteboul S, Chan T H H, Kharlamov E, Nutt W, Senellart P. Aggregate queries for discrete and continuous probabilistic XML. In: Proceedings of the 13th International Conference on Database Theory. 2010, 50–61
Google Scholar
Kimelfeld B, Kosharovsky Y, Sagiv Y. Query efficiency in probabilistic XML models. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 2008, 701–714
Google Scholar
Zhao W, Dekhtyar A, Goldsmith J. Databases for interval probabilities. International Journal of Intelligent Systems, 2004, 19(9): 789–815
MATH Google Scholar
Zhao W, Dekhtyar A, Goldsmith J. A framework for management of semistructured probabilistic data. Journal of Intelligent Information Systems, 2005, 25(3): 293–332
MATH Google Scholar
Dekhtyar A, Goldsmith J, Hawkes S R. Semistructured probabilistic databases. In: Proceedings of the 13th International Conference on Scientific and Statistical Database Management. 2001, 36–45
Google Scholar
Hung E. Managing uncertainty and ontologies in databases. UMD Theses and Dissertations, 2005
Google Scholar
Magnani M, Montesi D. Management of interval probabilistic data. Acta Informatica, 2008, 45(2): 93–130
MathSciNet MATH Google Scholar
Cohen S, Kimelfeld B, Sagiv Y. Incorporating constraints in probabilistic XML. In: Proceedings of the 27th ACM SIGMOD–SIGACTSIGART Symposium on Principles of Database Systems. 2008, 109–118
Google Scholar
Kimelfeld B, Sagiv Y. Matching twigs in probabilistic XML. In: Proceedings of the 33rd International Conference on Very Large Data Bases. VLDB Endowment, 2007, 27–38
Google Scholar
Adar E, Ré C. Managing uncertainty in social networks. IEEE Data Eng. Bull, 2007, 30(2): 15–22
Google Scholar
Hintsanen P. The most reliable subgraph problem. In: Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery. 2007, 471–478
Google Scholar
Hintsanen P, Toivonen H. Finding reliable subgraphs from large probabilistic graphs. Data Mining and Knowledge Discovery, 2008, 17(1): 3–23
MathSciNet Google Scholar
Zou Z, Li J, Gao H, Zhang S. Frequent subgraph pattern mining on uncertain graph data. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 583–592
Google Scholar
Zou Z, Li J, Gao H, Zhang S. Mining frequent subgraph patterns from uncertain graphs. Journal of Software, 2009, 20(11): 2965–2976
Google Scholar
Zou Z, Li J, Gao H, Zhang S. Mining frequent subgraph patterns from uncertain graph data. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(9): 1203–1218
Google Scholar
Potamias M, Bonchi F, Gionis A, Kollios G. K–nearest neighbors in uncertain graphs. Proceedings of the VLDB Endowment. 2010, 3(1–2): 997–1008
Google Scholar
Yuan Y, Chen L, Wang G. Efficiently answering probability thresholdbased shortest path queries over uncertain graphs. In: Proceedings of the International Conference on Database Systems for Advanced Applications. 2010, 155–170
Google Scholar
Papapetrou O, Ioannou E, Skoutas D. Efficient discovery of frequent subgraph patterns in uncertain graph databases. In: Proceedings of the 14th International Conference on Extending Database Technology. 2011, 355–366
Google Scholar
Han M, Zhang W, Li J Z. Raking: an efficient k–maximal frequent pattern mining algorithm on uncertain graph database. Chinese Journal of Computers, 2010, 33(8): 1387–1395
MathSciNet Google Scholar
Zou Z, Gao H, Li J. Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2010, 633–642
Google Scholar
Zou Z, Li J, Gao H, Zhang S. Finding top–k maximal cliques in an uncertain graph. In: Proceedings of the 26th International Conference on Data Engineering. 2010, 649–652
Google Scholar
Yuan Y, Wang G, Wang H, Chen L. Efficient subgraph search over large uncertain graphs. Proceedings of the VLDB Endowment, 2011, 4(11): 876–886
Google Scholar
Yuan Y, Wang G, Chen L, Wang H. Efficient subgraph similarity search on large probabilistic graph databases. Proceedings of the VLDB Endowment, 2012, 5(9): 800–811
Google Scholar
Koyutürk M, Grama A, Szpankowski W. An efficient algorithm for detecting frequent subgraphs in biological networks. Bioinformatics. 2004, 20(Suppl 1): 200–207
Google Scholar
Valiant L G. The complexity of enumeration and reliability problems. SIAM Journal on Computing, 1979, 8(3): 410–421
MathSciNet MATH Google Scholar
Jin C, Yi K, Chen L, Yu J X, Lin X. Sliding–window top–k queries on uncertain streams. Proceedings of the VLDB Endowment, 2008, 1(1): 301–312
Google Scholar
Ré C, Letchner J, Balazinksa M, Suciu D. Event queries on correlated probabilistic streams. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 2008, 715–728
Google Scholar
Alon N, Matias Y, Szegedy M. The space complexity of approximating the frequency moments. In: Proceedings of the 28th Annual ACM Symposium on Theory of Computing. 1996, 20–29
MATH Google Scholar
Flajolet P, Martin G N. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 1985, 31(2): 182–209
MathSciNet MATH Google Scholar
Zhang T, Ramakrishnan R, Livny M. Birch: an efficient data clustering method for very large databases. ACM Sigmod Record. 1996, 25(2): 103–114
Google Scholar
Aggarwal C C, Han J, Wang J, Yu P S. A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases, VLDB Endowment. 2003, 81–92
Google Scholar
Aggarwal C C, Yu P S. A framework for clustering uncertain data streams. In: Proceedings of the 24th International Conference on Data Engineering. 2008, 150–159
Google Scholar
Li Z, Ge T. Online windowed subsequence matching over probabilistic sequences. In: Proceedings of the International Conference on Management of Data. 2012, 277–288
Google Scholar
Lian X, Chen L. Efficient join processing on uncertain data streams. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 857–866
Google Scholar
Ge T, Liu F. Accuracy–aware uncertain stream databases. In: Proceedings of the 28th International Conference on Data Engineering. 2012, 174–185
Google Scholar
Peng L, Diao Y, Liu A. Optimizing probabilistic query processing on continuous uncertain data. Proceedings of the VLDB Endowment, 2011, 4(11): 1169–1180
Google Scholar
Jayram T, McGregor A, Muthukrishnan S, Vee E. Estimating statistical aggregates on probabilistic data streams. In: Proceedings of the 26th ACM SIGMOD–SIGACT–SIGART Symposium on Principles of Database Systems. 2007, 243–252
Google Scholar
Zhang Q, Li F, Yi K. Finding frequent items in probabilistic data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2008, 819–832
Google Scholar
Aggarwal C C, Han J, Wang J, Philip S Y. On high dimensional projected clustering of data streams. Data Mining and Knowledge Discovery, 2005, 10(3): 251–273
MathSciNet Google Scholar
Zhang C, Gao M, Zhou A. Tracking high quality clusters over uncertain data streams. In: Proceedings of the 25th International Conference on Data Engineering. 2009, 1641–1648
Google Scholar
Zhang W, Lin X, Zhang Y, Wang W, Zhu G, Xu Yu J. Probabilistic skyline operator over sliding windows. Information Systems, 2013, 38(8): 1212–1233
Google Scholar
Subramaniam S, Palpanas T, Papadopoulos D, Kalogeraki V, Gunopulos D. Online outlier detection in sensor data using non–parametric models. In: Proceedings of the 32nd International Conference on Very Large Data Bases, VLDB Endowment. 2006, 187–198
Google Scholar
Deshpande A, Guestrin C, Madden S R, Hellerstein J M, Hong W. Model–driven data acquisition in sensor networks. In: Proceedings of the 30th International Conference on Very Large Data Bases. VLDB Endowment, 2004, 588–599
Google Scholar
Hida Y, Huang P, Nishtala R. Aggregation query under uncertainty in sensor networks. Technical Report, 2004
Google Scholar
Welbourne E, Khoussainova N, Letchner J, Li Y, Balazinska M, Borriello G, Suciu D. Cascadia: a system for specifying, detecting, and managing rfid events. In: Proceedings of the 6th International Conference on Mobile Systems, Applications, and Services. 2008, 281–294
Google Scholar
Kanagal B, Deshpande A. Online filtering, smoothing and probabilistic modeling of streaming data. In: Proceedings of the 24th IEEE International Conference on Data Engineering. 2008, 1160–1169
Google Scholar
Zhang C J, Chen L, Tong Y, Liu Z. Cleaning uncertain data with a noisy crowd. In: Proceedings of the 31st IEEE International Conference on Data Engineering. 2015, 6–17
Google Scholar
Mo L, Cheng R, Li X, Cheung D W, Yang X S. Cleaning uncertain data for top–k queries. In: Proceedings of the 29th IEEE International Conference on Data Engineering. 2013, 134–145
Google Scholar
Panse F, Van Keulen M, De Keijzer A, Ritter N. Duplicate detection in probabilistic data. CDE Workshops. 2010, 179–182
Google Scholar
Van Keulen M, De Keijzer A. Qualitative effects of knowledge rules and user feedback in probabilistic data integration. Proceedings of The VLDB Endowment, 2009, 18(5): 1191–1217
Google Scholar
Cheng R, Chen J, Xie X. Cleaning uncertain data with quality guarantees. Proceedings of The VLDB Endowment, 2008, 1(1): 722–735
Google Scholar
Dong X L, Halevy A, Yu C. Data integration with uncertainty. Proceedings of The VLDB Endowment, 2009, 18(2): 469–500
Google Scholar

Download references

Acknowledgements

This paper was partially supported by NSFC (61602159, U1509216, 61472099, 61133002), National Sci-Tech Support Plan (2015BAH10F01) and the Scientific Research Foundation for the Returned Overseas Chinese Scholars of Heilongjiang Province (LC2016026).

Author information

Authors and Affiliations

Department of Computer Science and Technology, Heilongjiang University, Heilongjiang, 150001, China
Lingli Li
Department of Computer Science and Technology, Harbin Institute of Technology, Heilongjiang, 150001, China
Hongzhi Wang, Jianzhong Li & Hong Gao

Authors

Lingli Li
View author publications
You can also search for this author in PubMed Google Scholar
Hongzhi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jianzhong Li
View author publications
You can also search for this author in PubMed Google Scholar
Hong Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongzhi Wang.

Additional information

Lingli Li is an associate professor at Heilongjiang University, China. She obtained her PhD degree from Harbin Institute of Technology in 2015. Her research interests include data management, data quality, entity resolution. She has published more than 10 papers in refereed journals and conferences such as IEEE Trans. of Knowledge and Data Engineering.

Hongzhi Wang is a professor and doctoral supervisor at Harbin Institute of Technology, China. His research area is data management, including data quality, XML data management, and graph management. He has published more than 100 papers in refereed journals and conferences. He is a recipient of the outstanding dissertation award of CCF, Microsoft Fellow, and IBM PhD Fellowship.

Jianzhong Li is a professor and doctoral supervisor at Harbin Institute of Technology, China. He is a senior member of CCF. His research interests include database, parallel computing, and wireless sensor networks, etc.

Hong Gao is a professor and doctoral supervisor at Harbin Institute of Technology, China. She is a senior member of CCF. Her research interests include data management, wireless sensor networks, and graph database, etc.

Electronic supplementary material

Supplementary material, approximately 111 KB.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, L., Wang, H., Li, J. et al. A survey of uncertain data management. Front. Comput. Sci. 14, 162–190 (2020). https://doi.org/10.1007/s11704-017-7063-z

Download citation

Received: 20 February 2017
Accepted: 23 June 2017
Published: 06 September 2018
Issue Date: February 2020
DOI: https://doi.org/10.1007/s11704-017-7063-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey of uncertain data management

Abstract

Access this article

Similar content being viewed by others

A Working Model for Uncertain Data with Lineage

Uncertain Data Integration

Towards Hybrid Uncertain Data Modeling in Databases

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 111 KB.

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A survey of uncertain data management

Abstract

Access this article

Similar content being viewed by others

A Working Model for Uncertain Data with Lineage

Uncertain Data Integration

Towards Hybrid Uncertain Data Modeling in Databases

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 111 KB.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation