Abstract
This paper develops a probabilistic clustering model for mixeddata. The model allows analysis of variables of mixed type: thevariables may be nominal, ordinal and/or quantitative. The modelcontains the well-known models of latent class analysis as submodels.As in latent class analysis, local independence of the variables isassumed. The parameters of the model are estimated by the EMalgorithm. Test statistics and goodness-of-fit measures are proposedfor model selection. Two artificial data sets show the usefulness ofthese tests. An empirical example completes the presentation.
Similar content being viewed by others
References
Bacher, J. (1994). Clusteranalyse. Anwendungsorientierte Einführung [Applied cluster analysis, in German]. München: Oldenbourg.
Bock, H. H. (1989). Probabilistic aspects in cluster Analysis. In: O. Optiz (ed.), Conceptual and Numerical Analysis of Data. Berlin-Heidelberg-New York: Springer Press, pp. 12–44.
Bryant, P. G. (1991). Large-sample results for optimization-based clustering methods. Journal of Classification 8: 31–44.
Everitt, B. (1980). Cluster Analysis, 2nd edn. New York: Halsted Press.
Fielding, A. (1987). Latent structure models. In: C. A. O'Muircheartaigh & C. Payne (eds). Exploring Data Structures. London-New York-Sydney-Toronto: John Wiley & Sons, pp. 125–158.
Fox, J. (1982). Selective aspects of measuring resemblance for taxonomy. In: H. C. Hudson (ed.), Classifying Social Data. New Applications of Analytic Methods for Social Science Research. San Francisco-Washington-London: Jossey-Bass Publishers, pp. 127–151.
Jahnke, H. (1988). Clusteranalyse als Verfahren der schließenden Statistik. [Cluster Analysis as a Method of Inference Statistics, in German.] Göttingen: Vandenhoeck & Ruprecht.
Jain, A. K. & Dubes, R. C. (1988). Algorithms for Clustering Data. Englewood Cliffs, New Jersey: Prentice.
Kendall, M. (1980). Multivariate Analysis, 2nd edn. London: Charles Griffin & Company LTD.
Pollard, D. (1981). Strong consistency of K-means clustering. Annals of Statistics 9: 135–140.
Pollard, D. (1982). A central limit theorem for K-means clustering. Annals of Probability 10: 919–926.
Rost, J. (1985). A latent class model for rating data. Psychometrika 50: 37–39.
Van de Pol, F. & de Leeuw, P. (1986). A latent Markov model of correct measurement error in categorial data. Sociological Methods and Research 15: 118–141.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Bacher, J. A Probabilistic Clustering Model for Variables of Mixed Type. Quality & Quantity 34, 223–235 (2000). https://doi.org/10.1023/A:1004759101388
Issue Date:
DOI: https://doi.org/10.1023/A:1004759101388