IPHITS: An Incremental Latent Topic Model for Link Structure

  • Huifang Ma
  • Weizhong Zhao
  • Zhixin Li
  • Zhongzhi Shi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5839)


The structure of linked documents is dynamic and keeps on changing. Even though different methods have been proposed to exploit the link structure in identifying hubs and authorities in a set of linked documents, no existing approach can effectively deal with its changing situation. This paper explores changes in linked documents and proposes an incremental link probabilistic framework, which we call IPHITS. The model deals with online document streams in a faster, scalable way and uses a novel link updating technique that can cope with dynamic changes. Experimental results on two different sources of online information demonstrate the time saving strength of our method. Besides, we make analysis of the stable rankings under small perturbations to the linkage patterns.


link analysis incremental learning PHITS IPHITS 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bharat, K., Henzinger, M.R.: Improved algorithms for topic distillation in a hyperlinked environment. In: 21st annual international ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, pp. 104–111 (1998)Google Scholar
  2. 2.
    Borodin, A., Roberts, G.O., Rosenthal, J.S., Tsaparas, P.: Link Analysis Ranking: Algorithms, Theory, and Experiments. ACM Transactions on Internet Technology 5(1), 231–297 (2005)CrossRefGoogle Scholar
  3. 3.
    Brandes, U.: A faster algorithm for betweenness centrality. Journal of Mathematical Sociology 25(2), 163–177 (2001)CrossRefzbMATHGoogle Scholar
  4. 4.
    Chakrabarti, S., Dom, B., Gibson, D., Kleinberg, J., Raghavan, P., Rajagopalan, S.: Automatic resource list compilation by analyzing hyperlink structure and associated text. In: 7th International World Wide Web Conference, Brisbane, Austrilia, pp. 65–74 (1998)Google Scholar
  5. 5.
    Chou, T.C., Chen, M.C.: Using incremental PLSA for threshold resilient online event analysis. IEEE Trans. Knowledge and Data Engineering 20(3), 289–299 (2008)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Cohn, D., Chang, H.: Learning to probabilistically identify authoritative documents. In: 7th International Conference on Machine Learning, Austin, Texas, pp. 167–174 (2000)Google Scholar
  7. 7.
    Cohn, D., Hofmann, T.: The missing link - a probabilistic model of document content and hypertext connectivity. Neural Information Processing Systems 13 (2001)Google Scholar
  8. 8.
    Ding, C., He, X., Husbands, P., Zha, H., Simon, H.D.: PageRank, HITS and a unified framework for link analysis. In: 25th annual international ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland, pp. 353–354 (2002)Google Scholar
  9. 9.
    Doan, A., Domingos, P., Halevy, A.Y.: Learning to match the schemas of data sources: A multistrategy approach. Machine Learning 50(3), 279–301 (2003)CrossRefzbMATHGoogle Scholar
  10. 10.
    Getoor, L., Diehl, C.P.: Link mining: a survey. ACM SIGKDD Explorations Newsletter 7(2), 2–12 (2005)Google Scholar
  11. 11.
    Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Maching Learning 42(1), 177–196 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Jeh, G., Widom, J.: Scaling personalized web search. In: 12th International World Wide Web Conference, Budapest, Hungary, pp. 271–279 (2003)Google Scholar
  13. 13.
    Kleinberg, J.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Madadhain, J.O’., Hutchins, J., Smyth, P.: Prediction and ranking algorithms for even-based network data. SIGKDD Explorations 7(2) (2005)Google Scholar
  15. 15.
    Madadhain, J.O’., Smyth, P.: EventRank: A framework for ranking time-varying networks. In: 3rd KDD Workshop on Link Discovery LinkKDD, Issues, Approaches and Applications, Chicago, Illinois, pp. 9–16 (2005)Google Scholar
  16. 16.
    Ng, A.Y., Zheng, A.X., Jordan, M.I.: Link analysis, eigenvectors and stability. In: 17th International Joint Conference on Artificial Intelligence, Seattle, USA, pp. 903–910 (2001)Google Scholar
  17. 17.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Technical report, Stanford University (1998)Google Scholar
  18. 18.
    Richardson, M., Domingos, P.: The intelligent surfer: probabilistic combination of link and content information in PageRank. Advances Neural Information Processing Systems 14 (2002)Google Scholar
  19. 19.
    Richardson, M., Prakash, A., Brill, E.: Beyond PageRank: machine learning for static ranking. In: 15th International World Wide Web Conference, Edinburth, Scotland, pp. 707–715 (2006)Google Scholar
  20. 20.
    Seeley, J.: The net of reciprocal influence: A problem in treating sociometric data. Canadian Journal of Psychology 3, 234–240 (1949)CrossRefGoogle Scholar
  21. 21.
  22. 22.
    Wu, F., Huberman, B.: Discovering communities in linear time: A physics approach. Europhysics Letters 38, 331–338 (2004)Google Scholar
  23. 23.
    Xu, G.: Building implicit links from content for forum search. In: 29th annual international ACM SIGIR conference on Research and development in information retrieval, Seattle, Washington, pp. 300–207 (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Huifang Ma
    • 1
    • 2
  • Weizhong Zhao
    • 1
    • 2
  • Zhixin Li
    • 1
    • 2
  • Zhongzhi Shi
    • 1
    • 2
  1. 1.Key Lab of Intelligent Information Processing, Institute of Computing TechnologyChinese Academy of SciencesBeijingChina
  2. 2.Graduate University of the Chinese Academy of SciencesBeijingChina

Personalised recommendations