Finding Associations in Data Through Cluster Analysis

McCarthy, Richard V.; McCarthy, Mary M.; Ceccucci, Wendy

doi:10.1007/978-3-030-83070-0_8

Richard V. McCarthy⁴,
Mary M. McCarthy⁵ &
Wendy Ceccucci⁴

1928 Accesses

Abstract

Clustering is a technique of grouping similar observations into smaller groups within the larger population. The resulting groups should be homogeneous, with each member of the cluster having more in common with members of the same cluster than with members of the other clusters. Cluster analysis is an exploratory data analysis tool which aims at sorting different objects into groups in a way to maximize the degree of association between objects in the same cluster. In this chapter several clustering techniques are explained. A simple example is used to explain the different clustering methods. Finally, clustering is applied to a subset of the automobile insurance data set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ferreira L, Hitchcock DB (2009) A comparison of hierarchical methods for clustering functional data. Commun Stat – Simul Comput 38:1925–1949
Article MathSciNet Google Scholar
Hands S, Everitt B (1987) A Monte Carlo study of the recovery of cluster structure in binary data by hierarchical clustering techniques. Multivar Behav Res 22:235–243
Article Google Scholar
Kaufman L, Rousseeuw P (1990) Finding groups in data: an introduction to cluster analysis. Wiley Publishing
Book Google Scholar
Mahalanobis PC (1936) On the generalized distance in statistics. J Genet 41:159–193
MATH Google Scholar
Prabhakaran S (2019) Mahalanobis distance – understanding the math with examples (python). Retrieved from https://www.machinelearningplus.com/statistics/mahalanobis-distance/
Roux M (2018) A comparative study of divisive and agglomerative hierarchical clustering algorithms. J Classif, Springer 5(2):345–366
Article MathSciNet Google Scholar
Sokal RR, Michener CD (1958) A statistical method for evaluating systematic relationships. Univ Kansas Sci Bull 28:1409–1438
Google Scholar
Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a dataset via the gap statistic. J R Stat Soc 63(2):411–423
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Quinnipiac University, Hamden, CT, USA
Richard V. McCarthy & Wendy Ceccucci
Central Connecticut State University, New Britain, CT, USA
Mary M. McCarthy

Authors

Richard V. McCarthy
View author publications
You can also search for this author in PubMed Google Scholar
Mary M. McCarthy
View author publications
You can also search for this author in PubMed Google Scholar
Wendy Ceccucci
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

McCarthy, R.V., McCarthy, M.M., Ceccucci, W. (2022). Finding Associations in Data Through Cluster Analysis. In: Applying Predictive Analytics. Springer, Cham. https://doi.org/10.1007/978-3-030-83070-0_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-83070-0_8
Published: 01 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-83069-4
Online ISBN: 978-3-030-83070-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics