Measuring the influence of individual data points in a cluster analysis

Milligan, Glenn W.; Cheng, Richard

doi:10.1007/BF01246105

Measuring the influence of individual data points in a cluster analysis

Published: September 1996

Volume 13, pages 315–335, (1996)
Cite this article

Journal of Classification Aims and scope Submit manuscript

Glenn W. Milligan^nAff1 &
Richard Cheng^nAff1

331 Accesses
34 Citations
Explore all metrics

Abstract

The problem of measuring the impact of individual data points in a cluster analysis is examined. The purpose is to identify those data points that have an influence on the resulting cluster partitions. Influence of a single data point is considered present when different cluster partitions result from the removal of the element from the data set. The Hubert and Arabie (1985) corrected Rand index was used to provide numerical measures of influence of a data point. Simulated data sets consisting of a variety of cluster structures and error conditions were generated to validate the influence measures. The results showed that the measure of internal influence was 100% accurate in identifying those data elements exhibiting an influential effect. The nature of the influence, whether beneficial or detrimental to the clustering, can be evaluated with the use of the gamma and point-biserial statistics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

ANDERBERG, M.R. (1973),Cluster Analysis for Applications, New York: Academic Press.
Google Scholar
BELBIN, L., FAITH, D., and MILLIGAN, G.W. (1992), “A Comparison of Two Approaches to Beta-Flexible Clustering,”Multivariate Behavioral Research, 27, 417–433.
Google Scholar
BRECKENRIDGE, J.N. (1989), “Replicating Cluster Analysis: Method, Consistency, and Validity,”Multivariate Behavioral Research, 24, 147–161.
Google Scholar
CHENG, R., and MILLIGAN, G.W. (1995), “Mapping Influence Regions in Hierarchical Clustering,”Multivariate Behavioral Research, 30, 547–576.
Google Scholar
CORMACK, R.M. (1971), “A Review of Classification,”Journal of Royal Statistical Society, Series A, 134, 321–367.
Google Scholar
CROVELLO, T. (1968), “The Effect of Change of Number of OTU's in a Numerical Taxonomic Study,”Brittonia, 20, 346–367.
Google Scholar
CROVELLO, T. (1969), “Effects of Change of Characters and of Number of Characters in Numerical Taxonomy,”American Midland Naturalist, 81, 68–86.
Google Scholar
DUBES, R., and JAIN, A.K. (1980), “Clustering Methodologies in Exploratory Data Analysis,” inAdvances in Computers (Vol. 19), Ed., M. C. Yovits, New York: Academic Press, 113–215.
Google Scholar
EDELBROCK, C. (1979), “Comparing the Accuracy of Hierarchical Clustering Algorithms: The Problem of Classifying Everybody,”Multivariate Behavioral Research, 14, 367–384.
Google Scholar
EVERITT, B.S. (1974),Cluster Analysis, New York: Wiley.
Google Scholar
GNANADESIKAN, R., KETTENRING, J.R., and LANDWEHR, J.M. (1977), “Interpreting and Assessing the Results of Cluster Analyses,”Bulletin of the International Statistical Institute, 47, 451–463.
Google Scholar
GOODMAN, L.A., and KRUSKAL, W.H. (1954), “Measures of Association for Cross-Classifications,”Journal of the American Statistical Association, 49, 732–764.
Google Scholar
GORDON, A.D. (1981),Classification: Methods for the Exploratory Analysis of Multivariate Data, London: Chapman & Hall.
Google Scholar
GORDON, A.D. (1987), “A Review of Hierarchical Classification,”Journal of the Royal Statistical Society, Series A, 150, 119–137.
Google Scholar
GORDON, A.D., and DE CATA, A. (1988), “Stability and Influence in Sum of Squares Clustering,”metron, 46, 347–360.
Google Scholar
GOWER, J.C., and ROSS, G.J.S. (1969), “Minimum Spanning Trees and Single-Link Cluster Analysis,”Applied Statistics, 18, 54–64.
Google Scholar
HUBERT, L.J., and ARABIE, P. (1985), “Comparing Partitions,”Journal of Classification, 2, 193–218.
Google Scholar
JAIN, A.K., and DUBES, R.C. (1988),Algorithms for Clustering Data, Englewood Cliffs, NJ: Prentice-Hall.
Google Scholar
JOLLIFFE, I.T., JONES, B., and MORGAN, B.J.T. (1988), “Stability and Influence in Cluster Analysis,” inData Analysis and Informatics, V, Ed., E. Diday, Amsterdam: Elsevier (North-Holland), 507–514.
Google Scholar
MCINTYRE, R.M., and BLASHFIELD, R.K. (1980), “A Nearest-Centroid Technique for Evaluating the Minimum-Variance Clustering Procedure,”Multivariate Behavioral Research, 15, 225–238.
Google Scholar
MILLIGAN, G.W. (1980), “An Examination of the Effect of Six Types of Error Perturbation on Fifteen Clustering Algorithms,”Psychometrika, 45, 325–342.
Google Scholar
MILLIGAN, G.W. (1981), “A Monte Carlo Study of Thirty Internal Criterion Measures for Cluster Analysis,”Psychometrika, 46, 187–199.
Google Scholar
MILLIGAN, G.W. (1985), “An Algorithm for Generating Artificial Test Clusters,”Psychometrika, 50, 123–127.
Google Scholar
MILLIGAN, G.W. (1989), “A Validation Study of a Variable Weighting Algorithm for Cluster Analysis,”Journal of Classification, 6, 53–71.
Google Scholar
MILLIGAN, G.W. (1995), “Clustering Validation: Results and Implications for Applied Analyses,” inClustering and Classification, Eds., P. Arabie, L. Hubert, and G. De Soete, River Edge, New Jersey: World Scientific Press, 345–375.
Google Scholar
MOREY, L.C., BLASHFIELD, R.K., and SKINNER, H.A. (1983), “A Comparison of Cluster Analysis Techniques Within a Sequential Validation Framework,”Multivariate Behavioral Research, 18, 309–329.
Google Scholar
SILVESTRI, L., and HILL, I.R. (1964), “Some Problems of the Taxonometric Approach,” inPhenetic and Phylogenetic Classification, Eds., V.H. Heywood and J. McNeill, London: The Systematics Association (The Systematics Assocation Publication No. 6), 87–104.
Google Scholar
SMITH, P.S., and DUBES, R. (1980), “Stability of a Hierarchical Clustering,”Pattern Recognition, 12, 177–187.
Google Scholar
SOKAL, R.R., KIM, J., and ROHLF, F.J. (1992), “Character and OTU Stability in Five Taxonomic Groups,”Journal of Classification, 9, 117–140.
Google Scholar
WARD, J.H. JR. (1963), “Hierarchical Grouping to Optimize an Objective Function,”Journal of the American Statistical Association, 58, 236–244.
Google Scholar

Download references

Author information

Glenn W. Milligan & Richard Cheng
Present address: Department of Management Sciences, The Ohio State University, 302 Hagerty Hall, 43210, Columbus, Ohio, USA

Authors and Affiliations

Authors

Glenn W. Milligan
View author publications
You can also search for this author in PubMed Google Scholar
Richard Cheng
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Milligan, G.W., Cheng, R. Measuring the influence of individual data points in a cluster analysis. Journal of Classification 13, 315–335 (1996). https://doi.org/10.1007/BF01246105

Download citation

Issue Date: September 1996
DOI: https://doi.org/10.1007/BF01246105

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Measuring the influence of individual data points in a cluster analysis

Abstract

Access this article

Similar content being viewed by others

Comparing clusterings and numbers of clusters by aggregation of calibrated clustering validity indexes

Understanding information theoretic measures for comparing clusterings

Variable Selection in Cluster Analysis: An Approach Based on a New Index

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Measuring the influence of individual data points in a cluster analysis

Abstract

Access this article

Similar content being viewed by others

Comparing clusterings and numbers of clusters by aggregation of calibrated clustering validity indexes

Understanding information theoretic measures for comparing clusterings

Variable Selection in Cluster Analysis: An Approach Based on a New Index

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation