Skip to main content

Uncertain Data Mining

  • Reference work entry
  • First Online:
  • 33 Accesses

Synonyms

Data analytics; Probabilistic data management

Definition

Data mining is the process of discovering potentially useful patterns from large amounts of data with which interesting knowledge is extracted [34]. Traditional data mining algorithms and techniques mostly assume that the underlying data which describe physical objects or observations are precise and deterministic. However, in many applications, data is often imprecise or uncertain; the values of a data object are probabilistic in nature and are often expressed with probability distributions. The study of uncertain data mining is about modifying traditional models, methods, and algorithms or inventing new techniques in order to cope with data uncertainty during the mining process.

Historical Background

Data mining became a very active topic of research in the late 1990s. Many flagship data mining conferences were first organized around that time. For example, the first ACM KDD conference was held in 1995 in Montreal [31]...

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Recommended Reading

  1. Aggarwal CC. Managing and mining uncertain data. Advances in database systems, vol. 35. Kluwer; 2009.

    Google Scholar 

  2. Aggarwal CC. An introduction to cluster analysis. In: Data clustering: algorithms and applications; 2013. p. 1–28.

    Book  Google Scholar 

  3. Aggarwal CC, Han J, Wang J, Yu PS. A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases; 2003. p. 81–92.

    Chapter  Google Scholar 

  4. Aggarwal CC, Li Y, Wang J, Wang J. Frequent pattern mining with uncertain data. In: IV JFE, Fogelman-Souli F, Flach PA, Zaki M, editors. Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2009. p. 9–38.

    Google Scholar 

  5. Aggarwal CC, Yu PS. A framework for clustering uncertain data streams. In: Proceedings of the 24th International Conference on Data Engineering; 2008. p. 150–9.

    Google Scholar 

  6. Aggarwal CC, Yu PS. Privacy-preserving data mining: a survey. In: Handbook of database security – applications and trends; 2008. p. 431–60.

    Book  Google Scholar 

  7. Aggarwal CC, Yu PS. A survey of uncertain data algorithms and applications. IEEE Trans Knowl Data Eng. 2009;21(5):609–23.

    Article  Google Scholar 

  8. Agrawal R, Imielinski T, Swami A. Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1993. p. 207–16.

    Google Scholar 

  9. Agrawal R, Srikant R. Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases; 1994.

    Google Scholar 

  10. Agrawal R, Srikant R. Mining sequential patterns. In: Proceedings of the 11th International Conference on Data Engineering; 1995.

    Google Scholar 

  11. Andrews R, Diederich J, Tickle AB. Survey and critique of techniques for extracting rules from trained artificial neural networks. Knowl Based Syst. 1995;8(6):373–89.

    Article  MATH  Google Scholar 

  12. Angiulli F, Fassetti F. Nearest neighbor-based classification of uncertain data. ACM Trans Knowl Discov Data. 2013;7(1):1.

    Article  Google Scholar 

  13. Ankerst M, Breunig MM, Kriegel H, Sander J. OPTICS: ordering points to identify the clustering structure. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1999. p. 49–60.

    Google Scholar 

  14. Barbará D, Garcia-Molina H, Porter D. The management of probabilistic data. IEEE Trans Knowl Data Eng. 1992;4(5):487–502.

    Article  Google Scholar 

  15. Benjelloun O, Sarma AD, Halevy AY, Theobald M, Widom J. Databases with uncertainty and lineage. VLDB J. 2008;17(2):243–264.

    Article  Google Scholar 

  16. Bernecker T, Kriegel H-P, Renz M, Verhein F, Zfle A. Probabilistic frequent itemset mining in uncertain databases. In: IV JFE, Fogelman-Souli F, Flach PA, Zaki M, editors. KDD. New York: ACM; 2009. p. 119–28.

    Google Scholar 

  17. Bi J, Zhang T. Support vector classification with input data uncertainty. In: Advances in Neural Information Proceedings of the Systems 17, Proceedings of the Neural Information Proceedings of the Systems; 2004. p. 161–8.

    Google Scholar 

  18. Cavallo R, Pittarelli M. The theory of probabilistic databases. In: Proceedings of the 13th International Conference on Very Large Data Bases; 1987. p. 71–81.

    Google Scholar 

  19. Cercone N, Lin TY, Wu X, editors. Proceedings of the 2001 IEEE International Conference on Data Mining; 2001.

    Google Scholar 

  20. Chau M, Cheng R, Kao B, Ngai J. Uncertain data mining: an example in clustering location data. In: Advances in Knowledge Discovery and Data Mining, 10th Pacific-Asia Conference; 2006. p. 199–204.

    Chapter  Google Scholar 

  21. Cheng R, Chen J, Mokbel MF, Chow C. Probabilistic verifiers: evaluating constrained nearest-neighbor queries over uncertain data. In: Proceedings of the 24th International Conference on Data Engineering; 2008. p. 973–82.

    Google Scholar 

  22. Cheng R, Kalashnikov DV, Prabhakar S. Evaluating probabilistic queries over imprecise data. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data, San Diego; 2003. p. 551–62.

    Google Scholar 

  23. Cheng R, Kalashnikov DV, Prabhakar S. Querying imprecise data in moving object environments. IEEE Trans Knowl Data Eng. 2004;16(9):1112–27.

    Article  Google Scholar 

  24. Cheng R, Prabhakar S, Kalashnikov DV. Querying imprecise data in moving object environments. In: Proceedings of the 19th International Conference on Data Engineering; 2003. p. 723–5.

    Google Scholar 

  25. Cheng R, Xia Y, Prabhakar S, Shah R, Vitter JS. Efficient indexing methods for probabilistic threshold queries over uncertain data. In: Proceedings of the 30th International Conference on Very Large Data Bases; 2004. p. 876–87.

    Chapter  Google Scholar 

  26. Chui CK, Kao B, Hung E. Mining frequent itemsets from uncertain data. In: Advances in Knowledge Discovery and Data Mining, 11th Pacific-Asia Conference; 2007. p. 47–58.

    Google Scholar 

  27. Codd EF. A relational model of data for large shared data banks. Commun ACM. 1970;13(6):377–87.

    Article  MATH  Google Scholar 

  28. Cohen WW, Singer Y. A simple, fast, and effective rule learner. In: Proceedings of the 16th National Conference on Artificial Intelligence and 11th Innovative Applications of Artificial Intelligence Conference; 1999. p. 335–42.

    Google Scholar 

  29. Cormode G, McGregor A. Approximation algorithms for clustering uncertain data. In: Proceedings of the 27th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 2008. p. 191–200.

    Google Scholar 

  30. Ester M, Kriegel H, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining; 1996. p. 226–31.

    Google Scholar 

  31. Fayyad UM, Uthurusamy R, editors. In: Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining; 1995.

    Google Scholar 

  32. Gullo F, Ponti G, Tagarelli A. Clustering uncertain data via k-medoids. In: Proceedings of the scalable uncertainty management, second international conference; 2008. p. 229–42.

    Google Scholar 

  33. Gullo F, Tagarelli A. Uncertain centroid based partitional clustering of uncertain data. Proc VLDB Endow. 2012;5(7):610–21.

    Article  Google Scholar 

  34. Han J, Kamber M. Data mining: concepts and techniques. Morgan Kaufmann; 2000.

    Google Scholar 

  35. Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. In: Proceedings of the ACM International Conference on Management of Data and Symposium on Principles of Database Systems; 2000. p. 1–12.

    Google Scholar 

  36. Imielinski T, Jr WL. Incomplete information in relational databases. J ACM. 1984;31(4):761–91.

    Article  MathSciNet  MATH  Google Scholar 

  37. Jr WL. On databases with incomplete information. J ACM. 1981;28(1):41–70.

    Article  MathSciNet  Google Scholar 

  38. Kao B, Lee SD, Lee FKF, Cheung DW, Ho W. Clustering uncertain data using voronoi diagrams and r-tree index. IEEE Trans Knowl Data Eng. 2010;22(9):1219–33.

    Article  Google Scholar 

  39. Kriegel H, Pfeifle M. Density-based clustering of uncertain data. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2005. p. 672–7.

    Google Scholar 

  40. Kriegel H, Pfeifle M. Hierarchical density-based clustering of uncertain data. In: Proceedings of the 5th IEEE International Conference on Data Mining; 2005. p. 689–92.

    Google Scholar 

  41. Kuramochi M, Karypis G. Frequent subgraph discovery. In: Proceedings of the IEEE International Conference on Data Mining; 2001.

    Google Scholar 

  42. Langley P, Iba W, Thompson K. An analysis of Bayesian classifiers. In: Proceedings of the 10th National Conference on Artificial Intelligence; 1992. p. 223–8.

    Google Scholar 

  43. Lee M, Kuo FC, Whitmore GA, Sklar J. Importance of replication in microarray gene expression studies: Statistical methods and evidence from repetitive CDNA hybridizations. Proc Natl Acad Sci USA. 2000;97(18):9834–9.

    Article  MATH  Google Scholar 

  44. Lee SD, Kao B, Cheng R. Reducing uk-means to k-means. In: Proceedings of the 7th IEEE International Conference on Data Mining; 2007. p. 483–8.

    Google Scholar 

  45. Leung CK-S, Mateo MAF, Brajczuk DA. A tree-based approach for frequent pattern mining from uncertain data. In: Washio T, Suzuki E, Ting KM, Inokuchi A, editors. Proceedings of the 12th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining; 2008. p. 653–61.

    Google Scholar 

  46. MacQueen JB. Some methods for classification and analysis of multivariate observations. In: Cam LML, Neyman J, editors. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; 1967. p. 281–97.

    Google Scholar 

  47. Ngai WK, Kao B, Cheng R, Chau M, Lee SD, Cheung DW, Yip KY. Metric and trigonometric pruning for clustering of uncertain data in 2D geometric space. Inf Syst. 2011;36(2):476–97.

    Article  Google Scholar 

  48. Ngai WK, Kao B, Chui CK, Cheng R, Chau M, Yip KY. Efficient clustering of uncertain data. In: Proceedings of the 6th IEEE International Conference on Data Mining; 2006. p. 436–45.

    Google Scholar 

  49. Pei J, Han J, Lu H, Nishio S, Tang S, Yang D. H-mine: hyper-structure mining of frequent patterns in large databases. In: Cercone N, Lin TY, Wu X, editors. Proceedings of the 1st IEEE International Conference on Data Mining; 2001. p. 441–8.

    Google Scholar 

  50. Purdue University Database Group. ORION, a database system for managing uncertain data (http://orion.cs.purdue.edu/).

  51. Qin B, Xia Y, Prabhakar S. Rule induction for uncertain data. Knowl Inf Syst. 2011;29(1): 103–30.

    Article  Google Scholar 

  52. Qin B, Xia Y, Sathyesh R, Ge J, Prabhakar S. Classify uncertain data with decision tree. In: Proceedings of the 16th International Conference on Database Systems for Advanced Applications; 2011. p. 454–7.

    Chapter  Google Scholar 

  53. Quinlan JR. C4.5: programs for machine learning. Morgan Kaufmann; 1993.

    Google Scholar 

  54. Ren J, Lee SD, Chen X, Kao B, Cheng R, Cheung DW. Naive Bayes classification of uncertain data. In: Proceedings of the 9th IEEE International Conference on Data Mining; 2009. p. 944–9.

    Google Scholar 

  55. Sammut C, Webb GI, editors. Encyclopedia of machine learning. Springer; 2010.

    Google Scholar 

  56. Shekhar S, Xiong H, editors. Encyclopedia of GIS. New York: Springer; 2008.

    Google Scholar 

  57. Singh S, Mayfield C, Shah R, Prabhakar S, Hambrusch SE, Neville J, Cheng R. Database support for probabilistic attributes and tuples. In: Proceedings of the 24th International Conference on Data Engineering; 2008. p. 1053–61.

    Google Scholar 

  58. Tan P, Steinbach M, Kumar V. Introduction to data mining. Boston: Addison-Wesley; 2005.

    Google Scholar 

  59. Tao Y, Cheng R, Xiao X, Ngai WK, Kao B, Prabhakar S. Indexing multi-dimensional uncertain data with arbitrary probability density functions. In: Proceedings of the 31st International Conference on Very Large Data Bases; 2005. p. 922–33.

    Google Scholar 

  60. Tsang S, Kao B, Yip KY, Ho W, Lee SD. Decision trees for uncertain data. IEEE Trans Knowl Data Eng. 2011;23(1):64–78.

    Article  Google Scholar 

  61. Vapnik VN. The nature of statistical learning theory. New York: Springer; 1995.

    Book  MATH  Google Scholar 

  62. Wang J, Karypis G. On mining instance-centric classification rules. IEEE Trans Knowl Data Eng. 2006;18(11):1497–511.

    Article  Google Scholar 

  63. Wolfson O, Sistla AP, Chamberlain S, Yesha Y. Updating and querying databases that track mobile units. Distrib Parallel Databases. 1999;7(3):257–387.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ben Kao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Kao, B., Liu, X. (2018). Uncertain Data Mining. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_80760

Download citation

Publish with us

Policies and ethics