State Aggregation in Higher Order Markov Chains for Finding Online Communities

  • Xin Wang
  • Ata Kabán
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4224)


We develop and investigate probabilistic approaches of state clustering in higher-order Markov chains. A direct extension of the Aggregate Markov model to higher orders turns out to be problematic due to the large number of parameters required. However, in many cases, the events in the finite memory are not equally salient in terms of their predictive value. We exploit this to reduce the number of parameters. We use a hidden variable to infer which of the past events is the most predictive and develop two different mixed-order approximations of the higher-order aggregate Markov model. We apply these models to the problem of community identification from event sequences produced through online computer-mediated interactions. Our approach bypasses the limitations of static approaches and offers a flexible modelling tool, able to reveal novel and insightful structural aspects of online interaction dynamics.


Markov Model State Cluster Past Event Hand Plot Memory Depth 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Brin, S., Page, L.: The anatomy of a large-scale hypertextual (Web) search engine. In: Proceedings of The Seventh International World Wide Web Conference, pp. 107–117 (1998)Google Scholar
  2. Cohn, D., Chang, H.: Learning to Probabilistically Identify Authoritative Documents, In: Proc. of 17th Int’l Conf on Machine Learning, pp. 167–174 (2000)Google Scholar
  3. Flake, G., Lawrence, S., Giles, C.L., Coetzee, F.: Self-Organization and Identification of Web Communities. IEEE Computer 35(3), 66–71 (2002)Google Scholar
  4. He, X., Zha, H., Ding, C.H.Q., Simon, H.D.: Web document clustering using hyperlink structures. Computational Statistics and Data Analysis 41(1), 19–45 (2002)MATHCrossRefMathSciNetGoogle Scholar
  5. Kaban, A., Wang, X.: Deconvolutive Clustering of Markov States. In: Proc. 17-th European Conference on Machine Learning (ECML 2006) (to appear)Google Scholar
  6. Kleinberg, J.: Bursty and Hierarchical Structure in Streams. Data Mining and Knowledge Discovery 7(4), 373–397 (2003)CrossRefMathSciNetGoogle Scholar
  7. Kleinberg, J.: Authoritative sources in hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)MATHCrossRefMathSciNetGoogle Scholar
  8. Newman, M.E.J.: Detecting community structure in networks. Euro. Phys. J. B 38, 321–330 (2004)CrossRefGoogle Scholar
  9. Raftery, A.E.: A model for high-order Markov chains. Journal of the Royal Statistical Society, series B 47, 528–539 (1985)MATHMathSciNetGoogle Scholar
  10. Saul, L.K., Jordan, M.I.: Mixed Memory Markov Models: Decomposing Complex Stochastic Processes as Mixtures of Simpler Ones. Machine Learning 37(1), 75–87 (1999)MATHCrossRefGoogle Scholar
  11. Saul, L.K., Pereira, F.: Aggregate and Mixed-OrderMarkov Models for Statistical Language Processing. In: Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, pp. 81–89 (1997)Google Scholar
  12. Wang, X., Kabán, A.: Model-based Estimation of Word Saliency in Text. In: Proc. of the 9-th International Conference on Discovery Science (DS2006), Barcelona, Spain (October 2006) (to appear)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Xin Wang
    • 1
  • Ata Kabán
    • 1
  1. 1.School of Computer ScienceThe University of BirminghamBirminghamUK

Personalised recommendations