A permutation-based algorithm for block clustering
- 198 Downloads
Hartigan (1972) discusses the direct clustering of a matrix of data into homogeneous blocks. He introduces a stepwise divisive method for block clustering within a certain class of block structures which induce clustering trees for both row and column margins. While this class of structures is appealing, the stopping criterion for his method, which is based on asymptotic theory and the assumption that the individual elements of the data matrix are normally distributed, is quite restrictive. In this paper we propose a permutation-based algorithm for block clustering within the same class of block structures. By using permutation arguments to decide where to split and when to stop, our algorithm becomes applicable in a wide variety of cases, including matrices of categorical data and matrices of small-to-moderate size. In addition, our algorithm offers considerable flexibility in how block homogeneity is defined. The algorithm is studied in a series of simulation experiments on matrices of known structure, and illustrated in examples drawn from the fields of taxonomy, political science, and data architecture.
KeywordsBinary splitting Blck clustering Markov chain simulation method Permutation distribution
Unable to display preview. Download preview PDF.
- ARABIE, P., and BOORMAN, S. A. (1982), “Blockmodels; Development and Prospects,” inClassifying Social Data: New Applications of Analytic Methods for Social Science Research, Eds., Herschel C. Hudson and Associates, San Francisco: Jossey-Bass Publisher, Chapter 11.Google Scholar
- BOCK, H. H. (1979), “Simultaneous Clustering of Objects and Variables,” inAnalyse de Données et Informatique. Le Chesnay (France): Institut National de Recherche en Informatique et en Automatique, 187–203.Google Scholar
- DUFFY, D. E., and KEMPERMAN, J. H. B. (1990), “Entropy-Based Splitting Criteria,” Morristown, NJ: Bell Communications Research Technical memorandum, in preparation.Google Scholar
- DUFFY, D. E., FOWLKES, E. B., and KANE, L. D. (1987), “Cluster Analysis in Strategic Data Architecture Design,” in 1987Bellcore Database Symposium. Morristown, NJ: Bell Communications Research, Inc., 175–186.Google Scholar
- GOVAERT, G. (1977), “Algorithme de Classification d'un Tableau de Contingence,”Proceedings of the First International Symposium on Data Analysis and Informatics, 2, Le Chesnay (France): Institut National de Recherche en Informatique et Automatique, 487–500.Google Scholar
- SOKAL, R. R., and SNEATH, P. H. A. (1963),Principles of Numerical Taxonomy, San Francisco: Freeman.Google Scholar