Skip to main content

Clustering

  • Chapter
  • First Online:
Machine Learning

Abstract

Unsupervised learning aims to discover underlying properties and patterns from unlabeled training samples and lays the foundation for further data analysis. Among various unsupervised learning techniques, the most researched and applied one is clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 64.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Aloise D, Deshpande A, Hansen P, Popat P (2009) NP-hardness of Euclidean sum-of-squares clustering. Mach Learn 75(2):245–248

    Article  Google Scholar 

  • Ankerst M, Breunig M, Kriegel H-P, Sander J (1999) OPTICS: ordering points to identify the clustering structure. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD), Philadelphia, PA, pp 49–60

    Google Scholar 

  • Banerjee A, Merugu S, Dhillon I, Ghosh J (2005) Clustering with bregman divergences. J Mach Learn Res 6:1705–1749

    MathSciNet  MATH  Google Scholar 

  • Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York, NY

    Google Scholar 

  • Bilmes JA (1998) A gentle tutorial of the EM algorithm and its applications to parameter estimation for Gaussian mixture and hidden Markov models. Technical Report TR-97-021, Department of Electrical Engineering and Computer Science, University of California at Berkeley, Berkeley, CA

    Google Scholar 

  • Chandola V, Banerjee A, umar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):Article 15

    Google Scholar 

  • Deza M, Deza E (2009) Encyclopedia of Distances. Springer, Berlin

    Google Scholar 

  • Dhillon IS, Guan Y, Kulis B (2004) Kernel \(k\)-means: Spectral clustering and normalized cuts. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), Seattle, WA, pp 551–556

    Google Scholar 

  • Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases. In: Proceedings of the 2nd international conference on knowledge discovery and data mining (KDD), Portland, OR, pp 226–231

    Google Scholar 

  • Estivill-Castro V (2002) Why so many clustering algorithms—a position paper. SIGKDD Explor 1(4):65–75

    Article  Google Scholar 

  • Guha S, Rastogi R, Shim K (1999) ROCK: a robust clustering algorithm for categorical attributes. In: Proceedings of the 15th international conference on data engineering (ICDE), Sydney, Australia, pp 512–521

    Google Scholar 

  • Halkidi M, Batistakis Y, Vazirgiannis M (2001) On clustering validation techniques. J Intell Inf Syst 27(2–3):107–145

    Article  Google Scholar 

  • Hinneburg A, Keim DA (1998) An efficient approach to clustering in large multimedia databases with noise. In: Proceedings of the 4th international conference on knowledge discovery and data mining (KDD), New York, NY, pp 58–65

    Google Scholar 

  • Hodge VJ, Austin J (2004) A survey of outlier detection methodologies. Artif Intell Rev 22(2):85–126

    Article  Google Scholar 

  • Huang Z (1998) Extensions to the \(k\)-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2(3):283–304

    Article  Google Scholar 

  • Jacobs DW, Weinshall D, Gdalyahu Y (2000) Classification with non-metric distances: image retrieval and class representation. IEEE Trans Patt Anal Mach Intell 6(22):583–600

    Article  Google Scholar 

  • Jain AK (2009) Data clustering: 50 years beyond \(k\)-means. Patt Recogn Lett 371(8):651–666

    Article  Google Scholar 

  • Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice Hall, Upper Saddle River, NJ

    Google Scholar 

  • Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 3(31):264–323

    Article  Google Scholar 

  • Kaufman L, Rousseeuw PJ (1987) Clustering by means of medoids. In: Dodge Y (ed) Statistical data analysis based on the \(L_1\)-Norm and related methods. Elsevier, Amsterdam, Netherlands, pp 405–416

    Google Scholar 

  • Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York, NY

    Google Scholar 

  • Kohonen T (2001) Self-organizing maps, 3rd edn. Springer, Berlin

    Google Scholar 

  • Liu FT, Ting KM, Zhou Z-H (2012) Isolation-based anomaly detection. ACM Trans Knowl Discov Data 6(1):Article 3

    Google Scholar 

  • Maulik U, Bandyopadhyay S (2002) Performance evaluation of some clustering algorithms and validity indices. IEEE Trans Patt Anal Mach Intell 24(12):1650–1654

    Article  Google Scholar 

  • McLachlan G, Peel D (2000) Finite mixture models. Wiley, New York, NY

    Google Scholar 

  • Mitchell T (1997) Machine learning. McGraw Hill, New York, NY

    Google Scholar 

  • Pelleg D, Moore A (2000) X-means: extending \(k\)-means with efficient estimation of the number of clusters. In: Proceedings of the 17th international conference on machine learning (ICML), Stanford, CA, pp 727–734

    Google Scholar 

  • Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65

    Article  Google Scholar 

  • Schölkopf B, Smola A, Müller K-R (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5):1299–1319

    Article  Google Scholar 

  • Stanfill C, Waltz D (1986) Toward memory-based reasoning. Commun ACM 29(2):1213–1228

    Article  Google Scholar 

  • Tan X, Chen S, Zhou Z-H, Liu J (2009) Face recognition under occlusions and variant expressions with partial similarity. IEEE Trans Inf Forensics Secur 2(4):217–230

    Google Scholar 

  • Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc -Ser B 63(2):411–423

    Article  MathSciNet  Google Scholar 

  • von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416

    Article  MathSciNet  Google Scholar 

  • Xing EP, Ng AY, Jordan MI, Russell S (2003) Distance metric learning, with application to clustering with side-information. In: Becker S, Thrun S, Obermayer K (eds) Advances in neural information processing systems 15 (NIPS). MIT Press, Cambridge, MA, pp 505–512

    Google Scholar 

  • Xu R, Wunsch D II (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 3(16):645–678

    Article  Google Scholar 

  • Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD), Montreal, Canada, pp 103–114

    Google Scholar 

  • Zhou Z-H (2012) Ensemble methods: foundations and algorithms. Chapman & Hall/CRC, Boca Raton, FL

    Google Scholar 

  • Zhou Z-H, Yu Y (2005) Ensembling local learners through multimodal perturbation. IEEE Trans Syst Man Cybern -Part B: Cybern 35(4):725–735

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhi-Hua Zhou .

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Zhou, ZH. (2021). Clustering. In: Machine Learning. Springer, Singapore. https://doi.org/10.1007/978-981-15-1967-3_9

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-1967-3_9

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-1966-6

  • Online ISBN: 978-981-15-1967-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics