Model Selection Criteria for Model-Based Clustering of Categorical Time Series Data: A Monte Carlo Study

Dias, José G.

doi:10.1007/978-3-540-70981-7_3

José G. Dias³

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

3817 Accesses
4 Citations

Abstract

An open issue in the statistical literature is the selection of the number of components for model-based clustering of time series data with a finite number of states (categories) that are observed several times. We set a finite mixture of Markov chains for which the performance of selection methods that use different information criteria is compared across a large experimental design. The results show that the performance of the information criteria vary across the design. Overall, AIC3 outperforms more widespread information criteria such as AIC and BIC for these finite mixture models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

AKAIKE, H. (1974): A New Look at Statistical Model Identification. IEEE Transactions on Automatic Control, AC-19, 716–723.
Article MathSciNet Google Scholar
BANFIELD, J.D. and Raftery, A.E. (1993): Model-based Gaussian and Non-Gaussian Grupoing. Biometrics, 49, 803–821.
Article MathSciNet MATH Google Scholar
BOZDOGAN, H. (1987): Model Selection and Akaike’s Information Criterion (AIC): The General Theory and Its Analytical Extensions. Psychometrika, 52, 345–370.
Article MathSciNet MATH Google Scholar
BOZDOGAN, H. (1993): Choosing the Number of Component Clusters in the Mixture-Model Using a New Informational Complexity Criterion of the Inverse-Fisher Information Matrix. In: O. Opitz, B. Lausen and R. Klar (Eds.): Information and Classification, Concepts, Methods and Applications. Springer, Berlin, 40–54.
Chapter Google Scholar
CADEZ, I., HECKERMAN, D., MEEK, C., SMYTH, P. and WHITE, S. (2003): Visualization of Navigation Patterns on a Web Site Using Model-Based Clustering. Data Mining and Knowledge Discovery, 7, 399–424.
Article MathSciNet Google Scholar
DESARBO, W.S., LEHMANN, D.R. and HOLLMAN, F.G. (2004): Modeling Dynamic Effects in Repeated-measures Experiments Involving Preference/Choice: An Illustration Involving Stated Preference Analysis. Applied Psychological Measurement, 28, 186–209.
Article MathSciNet Google Scholar
DIAS, J.G. (2004): Controlling the Level of Separation of Components in Monte Carlo Studies of Latent Class Models. In: D. Banks, L. House, F.R. McMorris, P. Arabie and W. Gaul (Eds.): Classification, Clustering, and Data Mining Applications. Springer, Berlin, 77–84.
Chapter Google Scholar
DIAS, J.G. (2006): Model Selection for the Binary Latent Class Model. A Monte Carlo Simulation. In: V. Batagelj, H.-H. Bock, A. Ferligoj and A. Ziberna (Eds.): Data Science and Classification. Springer, Berlin, 91–99.
Chapter Google Scholar
DIAS, J.G. and WILLEKENS, F. (2005): Model-based Clustering of Sequential Data with an Application to Contraceptive Use Dynamics. Mathematical Population Studies, 12, 135–157.
Article MathSciNet MATH Google Scholar
LO, Y., MENDELL, N.R. and RUBIN, D.B. (2001): Testing the Number of Components in a Normal Mixture. Biometrika, 88, 767–778.
Article MathSciNet Google Scholar
MCLACHLAN, G.J. and PEEL, D. (2000): Finite Mixture Models. John Wiley & Sons, New York.
Book MATH Google Scholar
POULSEN, C.S. (1990): Mixed Markov and Latent Markov Modelling Applied to Brand Choice Behavior. International Journal of Research in Marketing, 7, 5–19.
Article Google Scholar
RAMASWAMY, V., DESARBO, W.S., REIBSTEIN, D.J. and ROBINSON, W.T. (1993): An Empirical Pooling Approach for Estimating Marketing Mix Elasticities with PIMS Data. Marketing Science, 12, 103–124.
Article Google Scholar
SCHWARZ, G. (1978): Estimating the Dimension of a Model. Annals of Statistics, 6, 461–464.
Article MathSciNet MATH Google Scholar
WILKS, S.S. (1938): The Large Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses. Annals of Mathematical Statistics, 9, 60–62.
Article Google Scholar
WOLFE, J.H. (1970): Pattern Clustering by Multivariate Mixture Analysis. Multivariate Behavioral Research, 5, 329–350.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Quantitative Methods — GIESTA/UNIDE, ISCTE — Higher Institute of Social Sciences and Business Studies, Av. das Forças Armadas, 1649-026, Lisboa, Portugal
José G. Dias

Authors

José G. Dias
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Business Administration and Economics, Bielefeld University, Universitätsstr. 25, 33501, Bielefeld, Germany
Reinhold Decker
Department of Economics, Freie Universität Berlin, Garystraße 21, 14195, Berlin, Germany
Hans -J. Lenz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dias, J.G. (2007). Model Selection Criteria for Model-Based Clustering of Categorical Time Series Data: A Monte Carlo Study. In: Decker, R., Lenz, H.J. (eds) Advances in Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70981-7_3

Download citation

DOI: https://doi.org/10.1007/978-3-540-70981-7_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70980-0
Online ISBN: 978-3-540-70981-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics