Skip to main content

Finding Associations in Data Through Cluster Analysis

  • Chapter
  • First Online:
Applying Predictive Analytics

Abstract

Clustering is a technique of grouping similar observations into smaller groups within the larger population. The resulting groups should be homogeneous, with each member of the cluster having more in common with members of the same cluster than with members of the other clusters. Cluster analysis is an exploratory data analysis tool which aims at sorting different objects into groups in a way to maximize the degree of association between objects in the same cluster. In this chapter several clustering techniques are explained. A simple example is used to explain the different clustering methods. Finally, clustering is applied to a subset of the automobile insurance data set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 84.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Ferreira L, Hitchcock DB (2009) A comparison of hierarchical methods for clustering functional data. Commun Stat – Simul Comput 38:1925–1949

    Article  MathSciNet  Google Scholar 

  • Hands S, Everitt B (1987) A Monte Carlo study of the recovery of cluster structure in binary data by hierarchical clustering techniques. Multivar Behav Res 22:235–243

    Article  Google Scholar 

  • Kaufman L, Rousseeuw P (1990) Finding groups in data: an introduction to cluster analysis. Wiley Publishing

    Book  Google Scholar 

  • Mahalanobis PC (1936) On the generalized distance in statistics. J Genet 41:159–193

    MATH  Google Scholar 

  • Prabhakaran S (2019) Mahalanobis distance – understanding the math with examples (python). Retrieved from https://www.machinelearningplus.com/statistics/mahalanobis-distance/

  • Roux M (2018) A comparative study of divisive and agglomerative hierarchical clustering algorithms. J Classif, Springer 5(2):345–366

    Article  MathSciNet  Google Scholar 

  • Sokal RR, Michener CD (1958) A statistical method for evaluating systematic relationships. Univ Kansas Sci Bull 28:1409–1438

    Google Scholar 

  • Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a dataset via the gap statistic. J R Stat Soc 63(2):411–423

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

McCarthy, R.V., McCarthy, M.M., Ceccucci, W. (2022). Finding Associations in Data Through Cluster Analysis. In: Applying Predictive Analytics. Springer, Cham. https://doi.org/10.1007/978-3-030-83070-0_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-83070-0_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-83069-4

  • Online ISBN: 978-3-030-83070-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics