Skip to main content
Log in

Measuring the influence of individual data points in a cluster analysis

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

The problem of measuring the impact of individual data points in a cluster analysis is examined. The purpose is to identify those data points that have an influence on the resulting cluster partitions. Influence of a single data point is considered present when different cluster partitions result from the removal of the element from the data set. The Hubert and Arabie (1985) corrected Rand index was used to provide numerical measures of influence of a data point. Simulated data sets consisting of a variety of cluster structures and error conditions were generated to validate the influence measures. The results showed that the measure of internal influence was 100% accurate in identifying those data elements exhibiting an influential effect. The nature of the influence, whether beneficial or detrimental to the clustering, can be evaluated with the use of the gamma and point-biserial statistics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • ANDERBERG, M.R. (1973),Cluster Analysis for Applications, New York: Academic Press.

    Google Scholar 

  • BELBIN, L., FAITH, D., and MILLIGAN, G.W. (1992), “A Comparison of Two Approaches to Beta-Flexible Clustering,”Multivariate Behavioral Research, 27, 417–433.

    Google Scholar 

  • BRECKENRIDGE, J.N. (1989), “Replicating Cluster Analysis: Method, Consistency, and Validity,”Multivariate Behavioral Research, 24, 147–161.

    Google Scholar 

  • CHENG, R., and MILLIGAN, G.W. (1995), “Mapping Influence Regions in Hierarchical Clustering,”Multivariate Behavioral Research, 30, 547–576.

    Google Scholar 

  • CORMACK, R.M. (1971), “A Review of Classification,”Journal of Royal Statistical Society, Series A, 134, 321–367.

    Google Scholar 

  • CROVELLO, T. (1968), “The Effect of Change of Number of OTU's in a Numerical Taxonomic Study,”Brittonia, 20, 346–367.

    Google Scholar 

  • CROVELLO, T. (1969), “Effects of Change of Characters and of Number of Characters in Numerical Taxonomy,”American Midland Naturalist, 81, 68–86.

    Google Scholar 

  • DUBES, R., and JAIN, A.K. (1980), “Clustering Methodologies in Exploratory Data Analysis,” inAdvances in Computers (Vol. 19), Ed., M. C. Yovits, New York: Academic Press, 113–215.

    Google Scholar 

  • EDELBROCK, C. (1979), “Comparing the Accuracy of Hierarchical Clustering Algorithms: The Problem of Classifying Everybody,”Multivariate Behavioral Research, 14, 367–384.

    Google Scholar 

  • EVERITT, B.S. (1974),Cluster Analysis, New York: Wiley.

    Google Scholar 

  • GNANADESIKAN, R., KETTENRING, J.R., and LANDWEHR, J.M. (1977), “Interpreting and Assessing the Results of Cluster Analyses,”Bulletin of the International Statistical Institute, 47, 451–463.

    Google Scholar 

  • GOODMAN, L.A., and KRUSKAL, W.H. (1954), “Measures of Association for Cross-Classifications,”Journal of the American Statistical Association, 49, 732–764.

    Google Scholar 

  • GORDON, A.D. (1981),Classification: Methods for the Exploratory Analysis of Multivariate Data, London: Chapman & Hall.

    Google Scholar 

  • GORDON, A.D. (1987), “A Review of Hierarchical Classification,”Journal of the Royal Statistical Society, Series A, 150, 119–137.

    Google Scholar 

  • GORDON, A.D., and DE CATA, A. (1988), “Stability and Influence in Sum of Squares Clustering,”metron, 46, 347–360.

    Google Scholar 

  • GOWER, J.C., and ROSS, G.J.S. (1969), “Minimum Spanning Trees and Single-Link Cluster Analysis,”Applied Statistics, 18, 54–64.

    Google Scholar 

  • HUBERT, L.J., and ARABIE, P. (1985), “Comparing Partitions,”Journal of Classification, 2, 193–218.

    Google Scholar 

  • JAIN, A.K., and DUBES, R.C. (1988),Algorithms for Clustering Data, Englewood Cliffs, NJ: Prentice-Hall.

    Google Scholar 

  • JOLLIFFE, I.T., JONES, B., and MORGAN, B.J.T. (1988), “Stability and Influence in Cluster Analysis,” inData Analysis and Informatics, V, Ed., E. Diday, Amsterdam: Elsevier (North-Holland), 507–514.

    Google Scholar 

  • MCINTYRE, R.M., and BLASHFIELD, R.K. (1980), “A Nearest-Centroid Technique for Evaluating the Minimum-Variance Clustering Procedure,”Multivariate Behavioral Research, 15, 225–238.

    Google Scholar 

  • MILLIGAN, G.W. (1980), “An Examination of the Effect of Six Types of Error Perturbation on Fifteen Clustering Algorithms,”Psychometrika, 45, 325–342.

    Google Scholar 

  • MILLIGAN, G.W. (1981), “A Monte Carlo Study of Thirty Internal Criterion Measures for Cluster Analysis,”Psychometrika, 46, 187–199.

    Google Scholar 

  • MILLIGAN, G.W. (1985), “An Algorithm for Generating Artificial Test Clusters,”Psychometrika, 50, 123–127.

    Google Scholar 

  • MILLIGAN, G.W. (1989), “A Validation Study of a Variable Weighting Algorithm for Cluster Analysis,”Journal of Classification, 6, 53–71.

    Google Scholar 

  • MILLIGAN, G.W. (1995), “Clustering Validation: Results and Implications for Applied Analyses,” inClustering and Classification, Eds., P. Arabie, L. Hubert, and G. De Soete, River Edge, New Jersey: World Scientific Press, 345–375.

    Google Scholar 

  • MOREY, L.C., BLASHFIELD, R.K., and SKINNER, H.A. (1983), “A Comparison of Cluster Analysis Techniques Within a Sequential Validation Framework,”Multivariate Behavioral Research, 18, 309–329.

    Google Scholar 

  • SILVESTRI, L., and HILL, I.R. (1964), “Some Problems of the Taxonometric Approach,” inPhenetic and Phylogenetic Classification, Eds., V.H. Heywood and J. McNeill, London: The Systematics Association (The Systematics Assocation Publication No. 6), 87–104.

    Google Scholar 

  • SMITH, P.S., and DUBES, R. (1980), “Stability of a Hierarchical Clustering,”Pattern Recognition, 12, 177–187.

    Google Scholar 

  • SOKAL, R.R., KIM, J., and ROHLF, F.J. (1992), “Character and OTU Stability in Five Taxonomic Groups,”Journal of Classification, 9, 117–140.

    Google Scholar 

  • WARD, J.H. JR. (1963), “Hierarchical Grouping to Optimize an Objective Function,”Journal of the American Statistical Association, 58, 236–244.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Milligan, G.W., Cheng, R. Measuring the influence of individual data points in a cluster analysis. Journal of Classification 13, 315–335 (1996). https://doi.org/10.1007/BF01246105

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01246105

Keywords

Navigation