Skip to main content

Model Selection Criteria for Model-Based Clustering of Categorical Time Series Data: A Monte Carlo Study

  • Conference paper
Advances in Data Analysis

Abstract

An open issue in the statistical literature is the selection of the number of components for model-based clustering of time series data with a finite number of states (categories) that are observed several times. We set a finite mixture of Markov chains for which the performance of selection methods that use different information criteria is compared across a large experimental design. The results show that the performance of the information criteria vary across the design. Overall, AIC3 outperforms more widespread information criteria such as AIC and BIC for these finite mixture models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • AKAIKE, H. (1974): A New Look at Statistical Model Identification. IEEE Transactions on Automatic Control, AC-19, 716–723.

    Article  MathSciNet  Google Scholar 

  • BANFIELD, J.D. and Raftery, A.E. (1993): Model-based Gaussian and Non-Gaussian Grupoing. Biometrics, 49, 803–821.

    Article  MathSciNet  MATH  Google Scholar 

  • BOZDOGAN, H. (1987): Model Selection and Akaike’s Information Criterion (AIC): The General Theory and Its Analytical Extensions. Psychometrika, 52, 345–370.

    Article  MathSciNet  MATH  Google Scholar 

  • BOZDOGAN, H. (1993): Choosing the Number of Component Clusters in the Mixture-Model Using a New Informational Complexity Criterion of the Inverse-Fisher Information Matrix. In: O. Opitz, B. Lausen and R. Klar (Eds.): Information and Classification, Concepts, Methods and Applications. Springer, Berlin, 40–54.

    Chapter  Google Scholar 

  • CADEZ, I., HECKERMAN, D., MEEK, C., SMYTH, P. and WHITE, S. (2003): Visualization of Navigation Patterns on a Web Site Using Model-Based Clustering. Data Mining and Knowledge Discovery, 7, 399–424.

    Article  MathSciNet  Google Scholar 

  • DESARBO, W.S., LEHMANN, D.R. and HOLLMAN, F.G. (2004): Modeling Dynamic Effects in Repeated-measures Experiments Involving Preference/Choice: An Illustration Involving Stated Preference Analysis. Applied Psychological Measurement, 28, 186–209.

    Article  MathSciNet  Google Scholar 

  • DIAS, J.G. (2004): Controlling the Level of Separation of Components in Monte Carlo Studies of Latent Class Models. In: D. Banks, L. House, F.R. McMorris, P. Arabie and W. Gaul (Eds.): Classification, Clustering, and Data Mining Applications. Springer, Berlin, 77–84.

    Chapter  Google Scholar 

  • DIAS, J.G. (2006): Model Selection for the Binary Latent Class Model. A Monte Carlo Simulation. In: V. Batagelj, H.-H. Bock, A. Ferligoj and A. Ziberna (Eds.): Data Science and Classification. Springer, Berlin, 91–99.

    Chapter  Google Scholar 

  • DIAS, J.G. and WILLEKENS, F. (2005): Model-based Clustering of Sequential Data with an Application to Contraceptive Use Dynamics. Mathematical Population Studies, 12, 135–157.

    Article  MathSciNet  MATH  Google Scholar 

  • LO, Y., MENDELL, N.R. and RUBIN, D.B. (2001): Testing the Number of Components in a Normal Mixture. Biometrika, 88, 767–778.

    Article  MathSciNet  Google Scholar 

  • MCLACHLAN, G.J. and PEEL, D. (2000): Finite Mixture Models. John Wiley & Sons, New York.

    Book  MATH  Google Scholar 

  • POULSEN, C.S. (1990): Mixed Markov and Latent Markov Modelling Applied to Brand Choice Behavior. International Journal of Research in Marketing, 7, 5–19.

    Article  Google Scholar 

  • RAMASWAMY, V., DESARBO, W.S., REIBSTEIN, D.J. and ROBINSON, W.T. (1993): An Empirical Pooling Approach for Estimating Marketing Mix Elasticities with PIMS Data. Marketing Science, 12, 103–124.

    Article  Google Scholar 

  • SCHWARZ, G. (1978): Estimating the Dimension of a Model. Annals of Statistics, 6, 461–464.

    Article  MathSciNet  MATH  Google Scholar 

  • WILKS, S.S. (1938): The Large Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses. Annals of Mathematical Statistics, 9, 60–62.

    Article  Google Scholar 

  • WOLFE, J.H. (1970): Pattern Clustering by Multivariate Mixture Analysis. Multivariate Behavioral Research, 5, 329–350.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dias, J.G. (2007). Model Selection Criteria for Model-Based Clustering of Categorical Time Series Data: A Monte Carlo Study. In: Decker, R., Lenz, H.J. (eds) Advances in Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70981-7_3

Download citation

Publish with us

Policies and ethics