Advertisement

Information Retrieval

, Volume 14, Issue 5, pp 488–514 | Cite as

Page importance computation based on Markov processes

  • Bin GaoEmail author
  • Tie-Yan Liu
  • Yuting Liu
  • Taifeng Wang
  • Zhi-Ming Ma
  • Hang Li
Article

Abstract

This paper is concerned with Markov processes for computing page importance. Page importance is a key factor in Web search. Many algorithms such as PageRank and its variations have been proposed for computing the quantity in different scenarios, using different data sources, and with different assumptions. Then a question arises, as to whether these algorithms can be explained in a unified way, and whether there is a general guideline to design new algorithms for new scenarios. In order to answer these questions, we introduce a General Markov Framework in this paper. Under the framework, a Web Markov Skeleton Process is used to model the random walk conducted by the web surfer on a given graph. Page importance is then defined as the product of two factors: page reachability, the average possibility that the surfer arrives at the page, and page utility, the average value that the page gives to the surfer in a single visit. These two factors can be computed as the stationary probability distribution of the corresponding embedded Markov chain and the mean staying time on each page of the Web Markov Skeleton Process respectively. We show that this general framework can cover many existing algorithms including PageRank, TrustRank, and BrowseRank as its special cases. We also show that the framework can help us design new algorithms to handle more complex problems, by constructing graphs from new data sources, employing new family members of the Web Markov Skeleton Process, and using new methods to estimate these two factors. In particular, we demonstrate the use of the framework with the exploitation of a new process, named Mirror Semi-Markov Process. In the new process, the staying time on a page, as a random variable, is assumed to be dependent on both the current page and its inlink pages. Our experimental results on both the user browsing graph and the mobile web graph validate that the Mirror Semi-Markov Process is more effective than previous models in several tasks, even when there are web spams and when the assumption on preferential attachment does not hold.

Keywords

Page importance PageRank BrowseRank Web Markov skeleton process Mirror semi-Markov process 

Notes

Acknowledgments

We thank Chuan Zhou for his valuable suggestions and comments on this work, and thank Liang Tang for his help on part of the experiments.

References

  1. Berberich, K., Vazirgiannis, M., & Weikum, G. (2004). Time-aware authority ranking. In Algorithms and Models for the Web-Graph: Third International Workshop, WAW’04 (pp. 131–141). Springer.Google Scholar
  2. Bianchini, M., Gori, M., & Scarselli, F. (2005). Inside pagerank. ACM Transactions on Interet Technology, 5(1), 92–128.CrossRefGoogle Scholar
  3. Boldi, P., Santini, M., & Vigna, S. (2005). Pagerank as a function of the damping factor. In WWW ’05. ACM.Google Scholar
  4. Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1–7), 107–117.Google Scholar
  5. Chen, Z., Tao, L., Wang, J., Liu, W., & Ma, W. (2002). A unified framework for web link analysis. In WISE’02.Google Scholar
  6. Ding, C., He, X., Husbands, P., Zha, H., & Simon, H. (2001). PageRank, HITS, and a unified framework link analysis. LBNL Tech Report 49372, Nov 2001 (updated September 2002).Google Scholar
  7. Golub, G. H., & Loan, C. F. V. (1996). Matrix computations (3rd ed.). Baltimore, MD, USA: Johns Hopkins University Press.zbMATHGoogle Scholar
  8. Gao, B., Liu, T., Ma, Z., Wang, T., & Li, H. (2009). A general markov framework for page importance computation. In the Proceedings of the 18th ACM conference on information and knowledge management (CIKM 2009) (pp. 1835–1838).Google Scholar
  9. Gyongyi, Z., & Garcia-Molina, H. (2004). Web spam Taxonomy. Technical report, Stanford Digital Library Technologies Project.Google Scholar
  10. Gyongyi, Z., Garcia-Molina, H., & Pedersen, J. (2004). Combating web spam with trustrank. In VLDB ’04 (pp. 576–587). VLDB Endowment.Google Scholar
  11. Haveliwala, T. (1999). Efficient computation of pageRank. Technical Report 1999–31.Google Scholar
  12. Haveliwala, T., & Kamvar, S. (2003). The second eigenvalue of the google matrix.Google Scholar
  13. Haveliwala, T., Kamvar, S., & Jeh, G. (2003). An analytical comparison of approaches to personalizing pagerank.Google Scholar
  14. Haveliwala, T. H. (May 2002). Topic-sensitive pagerank. In WWW ’02, Honolulu, Hawaii.Google Scholar
  15. Hou, Z., & Liu, G. (2005). Markov Skeleton processes and their applications. USA: Science Press and International PressGoogle Scholar
  16. Hou, Z., Liu, Z., & Zou, J. (June 1998). Markov Skeleton Processes. Chinese Science Bulletin, 43(11), 881–889.Google Scholar
  17. Jeh, G., Widom, J. (2002). SimRank: A measure of structural-context similarity. In KDD ’02.Google Scholar
  18. Jindal, A., Crutchfield, C., Goel, S., Kolluri, R., & Jain, R. (2008). The mobile web is structurally different. In the Proceedings of the 11th IEEE global internet symposium.Google Scholar
  19. Kleinberg, J. M. (1998). Authoritative sources in a hyperlinked environment. In SODA ’98 (pp. 668–677). Philadelphia, PA, USA: Society for Industrial and Applied Mathematics.Google Scholar
  20. Langville, A. N., & Meyer, C. D. (2004). Deeper inside pagerank. Internet Mathematics, 1(3), 335–400.MathSciNetzbMATHCrossRefGoogle Scholar
  21. Liu, Y., Gao, B., Liu, T., Zhang, Y., Ma, Z., He, S., et al. (2008). BrowseRank: Letting users vote for page importance. In SIGIR ’08 (pp. 451–458).Google Scholar
  22. McSherry, F. (2005). A uniform approach to accelerated pagerank computation. In WWW ’05 (pp. 575–582). New York, NY, USA: ACM.Google Scholar
  23. Nie, Z., Zhang, Y., Wen, J., & Ma, W. (2005). Object-level ranking: Bringing order to web objects. In WWW’05.Google Scholar
  24. Page, L., Brin, S., Motwani, R., & Winograd, T. (1998). The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project.Google Scholar
  25. Papoulis, A., & Pillai, S. U. (2001). Probability, random variables and stochastic processes. New York: McGraw-Hill Science/Engineering/Math.Google Scholar
  26. Poblete, B., Castillo, C., & Gionis, A. (2008). Dr. Searcher and Mr. Browser: A unified hyperlink-click graph. In CIKM ’08: Proceeding of the 17th ACM conference on information and knowledge mining (pp. 1123–1132).Google Scholar
  27. Richardson, M., & Domingos, P. (2002). The intelligent surfer: Probabilistic combination of link and content information in PageRank. In Advances in neural information processing systems 14. Cambridge: MIT Press.Google Scholar
  28. Yu, P. S., Li, X., & Liu, B. (2005). Adding the temporal dimension to search—A case study in publication search. Proceedings of the 2005 IEEE/WIC/ACM international conference on web intelligence.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Bin Gao
    • 1
    Email author
  • Tie-Yan Liu
    • 1
  • Yuting Liu
    • 2
  • Taifeng Wang
    • 1
  • Zhi-Ming Ma
    • 3
  • Hang Li
    • 1
  1. 1.Microsoft Research Asia, Sigma CenterHaidian District, BeijingPeople’s Republic of China
  2. 2.Beijing Jiaotong UniversityHaidian District, BeijingPeople’s Republic of China
  3. 3.Academy of Mathematics and Systems ScienceChinese Academy of SciencesHaidian District, BeijingPeople’s Republic of China

Personalised recommendations