State Aggregation in Higher Order Markov Chains for Finding Online Communities
We develop and investigate probabilistic approaches of state clustering in higher-order Markov chains. A direct extension of the Aggregate Markov model to higher orders turns out to be problematic due to the large number of parameters required. However, in many cases, the events in the finite memory are not equally salient in terms of their predictive value. We exploit this to reduce the number of parameters. We use a hidden variable to infer which of the past events is the most predictive and develop two different mixed-order approximations of the higher-order aggregate Markov model. We apply these models to the problem of community identification from event sequences produced through online computer-mediated interactions. Our approach bypasses the limitations of static approaches and offers a flexible modelling tool, able to reveal novel and insightful structural aspects of online interaction dynamics.
KeywordsMarkov Model State Cluster Past Event Hand Plot Memory Depth
Unable to display preview. Download preview PDF.
- Brin, S., Page, L.: The anatomy of a large-scale hypertextual (Web) search engine. In: Proceedings of The Seventh International World Wide Web Conference, pp. 107–117 (1998)Google Scholar
- Cohn, D., Chang, H.: Learning to Probabilistically Identify Authoritative Documents, In: Proc. of 17th Int’l Conf on Machine Learning, pp. 167–174 (2000)Google Scholar
- Flake, G., Lawrence, S., Giles, C.L., Coetzee, F.: Self-Organization and Identification of Web Communities. IEEE Computer 35(3), 66–71 (2002)Google Scholar
- Kaban, A., Wang, X.: Deconvolutive Clustering of Markov States. In: Proc. 17-th European Conference on Machine Learning (ECML 2006) (to appear)Google Scholar
- Saul, L.K., Pereira, F.: Aggregate and Mixed-OrderMarkov Models for Statistical Language Processing. In: Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, pp. 81–89 (1997)Google Scholar
- Wang, X., Kabán, A.: Model-based Estimation of Word Saliency in Text. In: Proc. of the 9-th International Conference on Discovery Science (DS2006), Barcelona, Spain (October 2006) (to appear)Google Scholar