Advertisement

Journal of Classification

, Volume 8, Issue 1, pp 65–91 | Cite as

A permutation-based algorithm for block clustering

  • Diane E. Duffy
  • adolfo J. Quiroz
Article

Abstract

Hartigan (1972) discusses the direct clustering of a matrix of data into homogeneous blocks. He introduces a stepwise divisive method for block clustering within a certain class of block structures which induce clustering trees for both row and column margins. While this class of structures is appealing, the stopping criterion for his method, which is based on asymptotic theory and the assumption that the individual elements of the data matrix are normally distributed, is quite restrictive. In this paper we propose a permutation-based algorithm for block clustering within the same class of block structures. By using permutation arguments to decide where to split and when to stop, our algorithm becomes applicable in a wide variety of cases, including matrices of categorical data and matrices of small-to-moderate size. In addition, our algorithm offers considerable flexibility in how block homogeneity is defined. The algorithm is studied in a series of simulation experiments on matrices of known structure, and illustrated in examples drawn from the fields of taxonomy, political science, and data architecture.

Keywords

Binary splitting Blck clustering Markov chain simulation method Permutation distribution 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. ADELSON, R. M., nORMAN, J. M., and LAPORTE, G. (1976), “A Dynamic Programming Formulation With Diverse Applications,”Operational Research Quaterly, 27, 119–121.MATHGoogle Scholar
  2. ALDOUS, D. (1987), “On the Markov Chain Simulation Method for Uniform Combinatorial Distributions and Simulated Annealing,”Probability in the Engineering and Informational Sciences, 1, 33–46.MATHCrossRefGoogle Scholar
  3. ARABIE, P., BOORMAN, S. A. and LEVITT, P. R. (1978), “Constructing Blockmodels: How and Why,”Journal of Mathematical Psychology, 17, 21–63.MATHCrossRefGoogle Scholar
  4. ARABIE, P., and BOORMAN, S. A. (1982), “Blockmodels; Development and Prospects,” inClassifying Social Data: New Applications of Analytic Methods for Social Science Research, Eds., Herschel C. Hudson and Associates, San Francisco: Jossey-Bass Publisher, Chapter 11.Google Scholar
  5. BILLINGSLEY, P. (1968),Convergence of Probability Measures, New York: Wiley.MATHGoogle Scholar
  6. BOCK, H. H. (1979), “Simultaneous Clustering of Objects and Variables,” inAnalyse de Données et Informatique. Le Chesnay (France): Institut National de Recherche en Informatique et en Automatique, 187–203.Google Scholar
  7. BREIGER, R. L., BOORMAN, S. A., and ARABIE, P. (1975), “An Algorithm for Clustering Relational Data With Applications to Social Network Analysis and Comparison with Multidimensional Scaling,”Journal of Matheamtical Psychology, 12, 328–383.CrossRefGoogle Scholar
  8. BREIMAN, L., FRIEDMAN, J. H., OLSHEN, R., and STONE, C. J. (1984),Classification and Regression Trees, Belmont, CA: Wadsworth:MATHGoogle Scholar
  9. DE SOETE, G., DESARBO, W. S., FURNAS, G. W., and CARROLL, J. D. (1984), “The Estimation of Ultrametric and Path Length Trees From Rectangular Proximity Data,”Psychometrika, 49 (3), 289–310.CrossRefGoogle Scholar
  10. DEUTSCH, S. B. and MARTIN J. J. (1971), “An Ordering Algorithm for Analysis of Data Arrays,”Operations Research, 19, 1350–1362.MATHCrossRefGoogle Scholar
  11. DIACONIS, P., and SHASHAHANI, M. (1981), “Generating a Random Permutation with Random Transposition,”Zeitschrift fuer Wahrscheinlichkeitstheorie and Verwandte Gebiete, 57, 159–179.MATHCrossRefGoogle Scholar
  12. DUFFY, D. E., and KEMPERMAN, J. H. B. (1990), “Entropy-Based Splitting Criteria,” Morristown, NJ: Bell Communications Research Technical memorandum, in preparation.Google Scholar
  13. DUFFY, D. E., FOWLKES, E. B., and KANE, L. D. (1987), “Cluster Analysis in Strategic Data Architecture Design,” in 1987Bellcore Database Symposium. Morristown, NJ: Bell Communications Research, Inc., 175–186.Google Scholar
  14. EDELSBRUNNER, H. (1987),Algorithms in Combinatorial Geometry, New York: Springer-Verlag.MATHGoogle Scholar
  15. GILULA, Z. (1986), “Grouping and Association in Contingency Tables: An Exploratory Canonical Correlation Approach,”Journal of the American Statistical Association, 81, 773–779.MATHCrossRefMathSciNetGoogle Scholar
  16. GOODMAN, L. A. (1981), “Critieria for Determining Whether Certain Categories in a Cross-Classification Table Should be Combined With Special Reference to Occupation Categories in an Occupational Mobility Table,”American Journal of Sociology, 87, 612–650.CrossRefGoogle Scholar
  17. GOVAERT, G. (1977), “Algorithme de Classification d'un Tableau de Contingence,”Proceedings of the First International Symposium on Data Analysis and Informatics, 2, Le Chesnay (France): Institut National de Recherche en Informatique et Automatique, 487–500.Google Scholar
  18. GREENACRE, M. J. (1988), “Clustering the Rows and Columns of a Contingency Table,”Journal of Classification, 5, 39–51.MATHCrossRefMathSciNetGoogle Scholar
  19. HARTIGAN, J. A. (1972), “Direct Clustering of a Data Matrix,”Journal of the American Statistical Association, 6, 123–129.CrossRefGoogle Scholar
  20. HARTIGAN, J. A. (1975),Clustering Algorithms, New York: Wiley.MATHGoogle Scholar
  21. HARTIGAN, J. A. (1976), “Modal Blocks in Dentition of West Coast Mammals,”Systematic Zoology, 25, 149–160.CrossRefGoogle Scholar
  22. HEISER, W. J., and MEULMAN, J. (1983), “Analyzing Rectangular Tables by Joint and Constrained Multidimensional Scalling,”Journal of Econometrics, 22, 139–167.CrossRefGoogle Scholar
  23. HILL, M. O. (1974), “Correspondence Analysis: A Neglected Multivariate Method,”Applied Statistics, 23, 340–354.CrossRefGoogle Scholar
  24. HOLLAND, P. W., and LEINHARDT, S. (1981), “An Exponential Family of Probability Distributions for Directed Graphs (with discussion),”Journal of the American Statistical Association, 76, 33–65.MATHCrossRefMathSciNetGoogle Scholar
  25. HUBERT, L. J. (1974), “Problems of Seriation Using a Subject by Item Response Matrix,”Psychological Bulletin, 81, 976–983.CrossRefGoogle Scholar
  26. HUBERT, L. J., and GOLLEDGE, R. G. (1981), “Matrix Reorganization and Dynamic Programming: Applications to Paired Comparisons and Unidimensional Seriation,”Psychometrika, 46, 429–441.MATHCrossRefGoogle Scholar
  27. LAMBERT, J. M. and WILLIAMS, W. T. (1962), “Multivariate Methods in Plant Ecology IV. Nodal Analysis,”Journal of Ecology, 50, 775–802.CrossRefGoogle Scholar
  28. MEHTA, C. R., PATEL, N. R., and GRAY, R. (1985), “Computing an Exact Confidence Interval for the Common Odds Ratio in Several 2×2 Contingency Tables,”Journal of the American Statistical Association, 8, 969–973.CrossRefMathSciNetGoogle Scholar
  29. SCHMID, B. (1984), “Niche Width and Variation Within and Between Populations in Colonizing Species (Carex flava group)”Oecologia (Berlin),63, 1–5.CrossRefGoogle Scholar
  30. SOKAL, R. R., and SNEATH, P. H. A. (1963),Principles of Numerical Taxonomy, San Francisco: Freeman.Google Scholar
  31. WANG, Y. J., and WONG, G. Y. (1987), “Stochastic Blockmodels for Directed Graphs,”Journal of the American Statistical Association, 82, 8–19.CrossRefMathSciNetGoogle Scholar
  32. WONG, G. Y. (1987), “Bayesian Models for Directed Graphs,”Journal of the American Statistical Association, 82, 140–148.MATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag 1991

Authors and Affiliations

  • Diane E. Duffy
    • 1
  • adolfo J. Quiroz
    • 2
  1. 1.BellcoreMorristowriUSA
  2. 2.Universidad Simon BolivarCaracasVenezuela

Personalised recommendations