Topic-Level Random Walk through Probabilistic Model

  • Zi Yang
  • Jie Tang
  • Jing Zhang
  • Juanzi Li
  • Bo Gao
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5446)


In this paper, we study the problem of topic-level random walk, which concerns the random walk at the topic level. Previously, several related works such as topic sensitive page rank have been conducted. However, topics in these methods were predefined, which makes the methods inapplicable to different domains. In this paper, we propose a four-step approach for topic-level random walk. We employ a probabilistic topic model to automatically extract topics from documents. Then we perform the random walk at the topic level. We also propose an approach to model topics of the query and then combine the random walk ranking score with the relevance score based on the modeling results. Experimental results on a real-world data set show that our proposed approach can significantly outperform the baseline methods of using language model and that of using traditional PageRank.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)zbMATHGoogle Scholar
  2. 2.
    Buckley, C., Voorhees, E.M.: Retrieval evaluation with incomplete information. In: SIGIR 2004, pp. 25–32 (2004)Google Scholar
  3. 3.
    Craswell, N., de Vries, A.P., Soboroff, I.: Overview of the trec-2005 enterprise track. In: TREC 2005 Conference Notebook, pp. 199–205 (2005)Google Scholar
  4. 4.
    Griffiths, T.L., Steyvers, M.: Finding scientific topics. In: Proceedings of the National Academy of Sciences, pp. 5228–5235 (2004)Google Scholar
  5. 5.
    Haveliwala, T.H.: Topic-sensitive pagerank. In: Proceedings of the 11th international conference on World Wide Web (WWW 2002), pp. 517–526 (2002)Google Scholar
  6. 6.
    Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of SIGIR 1999, pp. 50–57 (1999)Google Scholar
  7. 7.
    Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Lavrenko, V., Croft, W.B.: Relevance-based language models. In: Proceedings of 24th Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 120–127 (2001)Google Scholar
  9. 9.
    Mei, Q., Cai, D., Zhang, D., Zhai, C.: Topic modeling with network regularization. In: Proceedings of WWW 2008, pp. 101–110 (2008)Google Scholar
  10. 10.
    Nie, L., Davison, B.D., Qi, X.: Topical link analysis for web search. In: SIGIR 2006, pp. 91–98 (2006)Google Scholar
  11. 11.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical Report SIDL-WP-1999-0120, Stanford University (1999)Google Scholar
  12. 12.
    Richardson, M., Domingos, P.: The intelligent surfer: Probabilistic combination of link and content information in pagerank. In: NIPS 2002 (2002)Google Scholar
  13. 13.
    Robertson, S.E., Walker, S., Hancock-Beaulieu, M., Gatford, M., Payne, A.: Okapi at trec-4. In: Text REtrieval Conference (1996)Google Scholar
  14. 14.
    Rocchio, J.J.: Relevance feedback in information retrieval, pp. 313–323. Prentice Hall, Englewood Cliffs (1971)Google Scholar
  15. 15.
    Tang, J., Jin, R., Zhang, J.: A topic modeling approach and its integration into the random walk framework for academic search. In: ICDM 2008 (2008)Google Scholar
  16. 16.
    Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: Arnetminer: Extraction and mining of academic social networks. In: KDD 2008, pp. 990–998 (2008)Google Scholar
  17. 17.
    Wei, X., Croft, W.B.: Lda-based document models for ad-hoc retrieval. In: SIGIR 2006, pp. 178–185 (2006)Google Scholar
  18. 18.
    Xue, G.-R., Zeng, H.-J., Chen, Z., Yu, Y., Ma, W.-Y., Xi, W., Fan, W.: Optimizing web search using web click-through data. In: CIKM 2004, pp. 118–126 (2004)Google Scholar
  19. 19.
    Zhai, C., Lafferty, J.: Model-based feedback in the language modeling approach to information retrieval. In: CIKM 2001, pp. 403–410 (2001)Google Scholar
  20. 20.
    Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proceedings of SIGIR 2001, pp. 334–342 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Zi Yang
    • 1
  • Jie Tang
    • 1
  • Jing Zhang
    • 1
  • Juanzi Li
    • 1
  • Bo Gao
    • 1
  1. 1.Department of Computer Science & TechnologyTsinghua UniversityChina

Personalised recommendations