Sequential Activity Profiling: Latent Dirichlet Allocation of Markov Chains

Girolami, Mark; Kabán, Ata

doi:10.1007/s10618-005-0362-2

Sequential Activity Profiling: Latent Dirichlet Allocation of Markov Chains

Published: May 2005

Volume 10, pages 175–196, (2005)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Mark Girolami¹ &
Ata Kabán²

388 Accesses
21 Citations
3 Altmetric
Explore all metrics

Abstract

To provide a parsimonious generative representation of the sequential activity of a number of individuals within a population there is a necessary tradeoff between the definition of individual specific and global representations. A linear-time algorithm is proposed that defines a distributed predictive model for finite state symbolic sequences which represent the traces of the activity of a number of individuals within a group. The algorithm is based on a straightforward generalization of latent Dirichlet allocation to time-invariant Markov chains of arbitrary order. The modelling assumption made is that the possibly heterogeneous behavior of individuals may be represented by a relatively small number of simple and common behavioral traits which may interleave randomly according to an individual-specific distribution. The results of an empirical study on three different application domains indicate that this modelling approach provides an efficient low-complexity and intuitively interpretable representation scheme which is reflected by improved prediction performance over comparable models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Variational Conditional Dependence Hidden Markov Models for Skeleton-Based Action Recognition

Hidden Markov Models for Surprising Pattern Detection in Discrete Symbol Sequence Data

Temporally Coherent Role-Topic Models (TCRTM): Deinterlacing Overlapping Activity Patterns

References

Anderson, C., Domingos, P., and Weld, D. 2001. Adaptive web navigation for wireless devices. In Proceedings of the 17th International Joint Conference on Artificial Intelligence, pp. 879–884.
Attias, H. 2001. Learning in high dimension: modular mixture models. In Proc. AI and Statistics.
Blei, D.M., Ng, A.Y., and Jordan, M.I. 2003. Latent dirchlet allocation. Journal of Machine Learning Research, 3(5):993–1022.
Google Scholar
Borges, J. and Levene, M. 1999. Data mining of user navigation patterns. WEBKDD, pp. 92–111.
Cadez, I., Heckerman, D., Meek, C., Smyth, P., and White, S. 2003. Model-based clustering and visualisation of navigation patterns on a web site. Journal of data Mining and Knowledge Discovery, 7(4):399–424.
Google Scholar
Cover, T.M and Thomas, J.A. 1991. Elements of Information Theory. New York: Wiley.
Google Scholar
Deshpande, M. and Karypis. to appear. Selective markov models for predicting web-page accesses. ACM Transactions on Internet Technology.
Hofmann, T. 2001. Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 42:177–196.
Google Scholar
Hofmann, T. 2001. Learning What People (Don’t) Want. In European Conference on Machine Learning, pp. 214–225.
Frydman, H. 1984. Maximum likelihood estimation in the mover-stayer model. Journal of the American Statistical Society, 79:632–638.
Google Scholar
Krogh, A. Hidden Markov Models in computational biology: Applications to protein modelling. Journal of Molecular Biology, 235:1501–1531.
Lee, D. and Sebastian Seung, H. 2001. Algorithms for Non-negative Matrix Factorization. In Advances in Neural Information Processing Systems 13, Todd K. Leen, Thomas G. Dietterich, and Volker. Tresp, (eds.) MIT Press, pp. 556–562.
Linton, F., Joy, D., Schaefer, H.-P. and Charron, A. 2000. OWL: A recommender system for organization-wide learning. Educational Technology & Society, 3.
Mannila, H., and Rusakov, D. 2001. Decomposing event sequences into independent components. In First SIAM Conference on Data Mining.
Manning, C.D. and Schütze, H. 1999. Foundations of Statistical Natural Language Processing. Cambridge, Massachusetts: The MIT Press.
Google Scholar
Minka, T. and Lafferty, J. 2002. Expectation-propogation for the generative aspect model. In Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence.
Mitchell, T. 1996. Machine Learning. New York, US: McGraw-Hill.
Google Scholar
Rabiner, L.R. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. In Proc. of the IEEE, 77(2):257–285.
Google Scholar
Raftery, A. 1985. A model for higher-order Markov chains. Journal of the Royal Statistical Society B, 47(3):528–539.
Google Scholar
Ronning, G. 1989. Maximum likelihood estimation of Dirichlet distributions. Journal of Statistical Computation and Simulation, 32(4):215–221.
Google Scholar
Ross, D.A. and Zemel, R.S. 2003. Multiple-cause vector quantiztion. In Advances in Neural Information Processing Systems 15, S. Becker, S. Thrun and K. Obermayer (Eds.), MIT Press, pp. 1017–1024.
Sarukkai, R. 2000. Link predicition and path analysis using markov chains. Computer Networks, 33(1–6):377–386.
Google Scholar
Saul, L. and Pereira, F. 1997. Aggregate and mixed-order markov models for statistical language processing. In Proceedings of 2nd International Conference on Empirical Methods in Natural Language Processing, pp. 81–89.
Saul, L.K. and Jordan, M.I. 1999. Mixed memory markov models: Decomposing complex stochastic processes as mixtures of simpler ones. Machine Learning, 37:75–87.
Google Scholar
Tino, P. and Dorffner, G. 2001. Predicting the future of discrete sequences from fractal representations of the past. Machine Learning, 45(2):187–218.
Google Scholar
Lappalainen, H. and Miskin, J.W. 2000. Ensemble learning. In Advances in Independent Component Analysis, M. Girolami (Ed.). Springer-Verlag, pp. 75–92.

Download references

Author information

Authors and Affiliations

Bioinformatics Reseach Centre, Department of Computing Science, University of Glasgow, Glasgow, G12 8QQ, UK
Mark Girolami
School of Computer Science, University of Birmingham, Birmingham, B15 2TT, UK
Ata Kabán

Authors

Mark Girolami
View author publications
You can also search for this author in PubMed Google Scholar
Ata Kabán
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mark Girolami.

Additional information

Mark Girolami is a Reader in the Department of Computing Science at the University of Glasgow. In 2000 he was the TEKES visiting professor at the Laboratory of Computing and Information Science in Helsinki University of Technology. In 1998 and 1999 Dr. Girolami was a research fellow at the Laboratory for Advanced Brain Signal Processing in the Brain Science Institute, RIKEN, Wako-Shi, Japan. He has been a visiting researcher at the Computational Neurobiology Laboratory (CNL) of the Salk Institute. This year (2005) he will take up an MRC funded Discipline Hopping Award in the Department of Biochemistry. Mark Girolami holds a degree in Mechanical Engineering from the University of Glasgow (1985), and a Ph.D. in Computing Science from the University of Paisley (1998). Dr. Girolami was a development engineer with IBM from 1985 until 1995 when he left to pursue an academic career.

Ata Kabán received the B.Sc. degree with honours (1999) in computer science from the University “Babes-Bolyai” of Cluj-Napoca, Romania, and the Ph.D. degree in computer science (2001) from the University of Paisley, UK. She is a lecturer in the School of Computer Science of the University of Birmingham. Her current interests concern probabilistic modelling, machine learning and their applications to automated data analysis. She has been a visiting researcher at Helsinki University of Technology (June–December 2000 and in the summer of 2003). Prior to her career in Computer Science, she received the B.A. degree in musical composition (1994) and the M.A. (1995) and Ph.D. (1999) degrees in musicology from the Music Academy “Gh. Dima” of Cluj-Napoca, Romania.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Girolami, M., Kabán, A. Sequential Activity Profiling: Latent Dirichlet Allocation of Markov Chains. Data Min Knowl Disc 10, 175–196 (2005). https://doi.org/10.1007/s10618-005-0362-2

Download citation

Received: 15 July 2003
Accepted: 27 January 2005
Issue Date: May 2005
DOI: https://doi.org/10.1007/s10618-005-0362-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sequential Activity Profiling: Latent Dirichlet Allocation of Markov Chains

Abstract

Access this article

Similar content being viewed by others

Variational Conditional Dependence Hidden Markov Models for Skeleton-Based Action Recognition

Hidden Markov Models for Surprising Pattern Detection in Discrete Symbol Sequence Data

Temporally Coherent Role-Topic Models (TCRTM): Deinterlacing Overlapping Activity Patterns

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Sequential Activity Profiling: Latent Dirichlet Allocation of Markov Chains

Abstract

Access this article

Similar content being viewed by others

Variational Conditional Dependence Hidden Markov Models for Skeleton-Based Action Recognition

Hidden Markov Models for Surprising Pattern Detection in Discrete Symbol Sequence Data

Temporally Coherent Role-Topic Models (TCRTM): Deinterlacing Overlapping Activity Patterns

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation