Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Clustering Overview and Applications

  • Dimitrios Gunopulos
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_602

Synonyms

Unsupervised learning

Definition

Clustering is the assignment of objects to groups of similar objects (clusters). The objects are typically described as vectors of features (also called attributes). Attributes can be numerical (scalar) or categorical. The assignment can be hard, where each object belongs to one cluster, or fuzzy, where an object can belong to several clusters with a probability. The clusters can be overlapping, though typically they are disjoint. A distance measure is a function that quantifies the similarity of two objects.

Historical Background

Clustering is one of the most useful tasks in data analysis. The goal of clustering is to discover groups of similar objects and to identify interesting patterns in the data. Typically, the clustering problem is about partitioning a given data set into groups (clusters) such that the data points in a cluster are more similar to each other than points in different clusters [4, 8]. For example, consider a retail...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1998. p. 94–105.Google Scholar
  2. 2.
    Bezdeck JC, Ehrlich R, Full W. FCM: Fuzzy C-Means algorithm. Comput Geosci. 1984;10(2–3):191–203.CrossRefGoogle Scholar
  3. 3.
    Ester M, Kriegel H.-Peter, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining; 1996. p. 226–31.Google Scholar
  4. 4.
    Everitt BS, Landau S, Leese M. Cluster analysis. London: Hodder Arnold; 2001.zbMATHGoogle Scholar
  5. 5.
    Fayyad UM, Piatesky-Shapiro G, Smuth P, Uthurusamy R. Advances in knowledge discovery and data mining. Menlo Park: AAAI Press; 1996.Google Scholar
  6. 6.
    Han J, Kamber M. Data mining: concepts and techniques. San Fransisco: Morgan Kaufmann Publishers; 2001.zbMATHGoogle Scholar
  7. 7.
    Huang Z. A fast clustering algorithm to cluster very large categorical data sets in data mining. In: Proceedings of the ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery; 1997.Google Scholar
  8. 8.
    Jain AK, Murty MN, Flyn PJ. Data clustering: a review. ACM Comput Surv. 1999;31(3):264–323.CrossRefGoogle Scholar
  9. 9.
    Karypis G, Han E-H, Kumar V. CHAMELEON: a hierarchical clustering algorithm using dynamic modeling. IEEE Comput. 1999;32(8):68–75.CrossRefGoogle Scholar
  10. 10.
    MacQueen JB Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability; 1967. p. 281–97.Google Scholar
  11. 11.
    Mitchell T. Machine learning. New York: McGraw-Hill; 1997.zbMATHGoogle Scholar
  12. 12.
    Ng R, Han J. Efficient and effective clustering methods for spatial data mining. In: Proceedings of the 20th International Conference on Very Large Data Bases; 1994. p. 144–55.Google Scholar
  13. 13.
    Theodoridis S, Koutroubas K. Pattern recognition. New York: Academic; 1999.Google Scholar
  14. 14.
    Vazirgiannis M, Halkidi M, Gunopulos D. Uncertainty handling and quality assessment in data mining. New York: Springer; 2003.zbMATHCrossRefGoogle Scholar
  15. 15.
    Wang W, Yang J, Muntz R. STING: a statistical information grid approach to spatial data mining. In: Proceedings of the 23th Internationa Conference on Very Large Data Bases; 1997. p. 186–95.Google Scholar
  16. 16.
    Zhang T, Ramakrishnman R, Linvy M. BIRCH: an efficient method for very large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1996. p. 103–14.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringThe University of California at Riverside, Bourns College of EngineeringRiversideUSA

Section editors and affiliations

  • Dimitrios Gunopulos
    • 1
  1. 1.Department of Computer Science and EngineeringThe University of California at Riverside, Bourns College of EngineeringRiversideUSA