Advertisement

Web Spam Identification with User Browsing Graph

  • Huijia Yu
  • Yiqun Liu
  • Min Zhang
  • Liyun Ru
  • Shaoping Ma
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5839)

Abstract

Combating Web spam has become one of the top challenges for Web search engines. Most previous researches in link-based Web spam identification focus on exploiting hyperlink graphs and corresponding user-behavior models. However, the fact that hyperlinks can be easily added and removed by Web spammers makes hyperlink graph unreliable. We construct a user browsing graph based on users’ Web access log and adopt link analysis algorithms on this graph to identify Web spam pages. The constructed graph is much smaller than the original Web Graph, and link analysis algorithms can perform efficiently on them. Comparative experimental results also show that algorithms performed on the constructed graph outperforms those on the original graph.

Keywords

Spam identification TrustRank User browsing graph 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    CNNIC (China Internet Network Information Center), the 23th report in development of Internet in China, http://www.cnnic.net.cn/uploadfiles/pdf/2009/1/13/92458.pdf
  2. 2.
    Silverstein, C., Marais, H., Henzinger, M., Moricz, M.: Analysis of a very large Web search engine query log. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 6–12. ACM Press, California (1999)Google Scholar
  3. 3.
    Gyöngyi, Z., Garcia-Molina, H., Pedersen, J.: Combating Web spam with TrustRank. In: Proceedings of the 30th VLDB Conference, pp. 576–587. ACM Press, Toronto (2004)Google Scholar
  4. 4.
    Benczúr, A.A., Csalogány, K., Sarlós, T., et al.: SpamRank-Fully Automatic Link Spam Detection Work in progress. In: 1st international Workshop on Adversarial information Retrieval on the Web, Chiba (2005), http://airweb.cse.lehigh.edu/2005/benczur.pdf
  5. 5.
    Craswell, N., Hawking, D., Robertson, S.: Effective site finding using link anchor information. In: Proceedings of the 24th SIGIR Conference, pp. 250–257. ACM Press, New Orleans (2001)Google Scholar
  6. 6.
    Liu, Y., Gao, B., Liu, T., Zhang, Y., Ma, Z., He, S., Li, H.: BrowseRank: letting Web users vote for page importance. In: Proceedings of the 31st SIGIR Conference, pp. 451–458. ACM Press, Singapore (2008)Google Scholar
  7. 7.
    Bilenko, M., White, R.W.: Mining the search trails of surfing crowds: identifying relevant Websites from user activity. In: Proceeding of the 17th WWW Conference, pp. 51–60. ACM Press, Beijing (2008)Google Scholar
  8. 8.
    Liu, Y., Cen, R., Zhang, M., Ma, S., Ru, L.: Identifying Web spam with user behavior analysis. In: 4th international Workshop on Adversarial information Retrieval on the Web, pp. 9–16. ACM Press, Beijing (2008)Google Scholar
  9. 9.
    Wu, B., Goel, V., Davison, B.D.: Topical TrustRank: Using topicality to combat web spam. In: Proceedings of the 15th WWW Conference, pp. 63–72. ACM Press, Scotland (2006)Google Scholar
  10. 10.
    Ntoulas, A., Najork, M., Manasse, M., Fetterly, D.: Detecting spam web pages through content analysis. In: Proceedings of the 15th WWW Conference, pp. 83–92. ACM Press, Scotland (2006)Google Scholar
  11. 11.
    Svore, K., Wu, Q., Burges, C., Raman, A.: Improving Web Spam Classification using Rank-time Features. In: Proceedings of AIRWeb 2007, pp. 9–16. ACM Press, New York (2007)Google Scholar
  12. 12.
    Liu, Y., Zhang, M., Ma, S.: Web key resource page selection based on non content information. J. Transactions on Intelligent System 2(1), 45–52 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Huijia Yu
    • 1
  • Yiqun Liu
    • 1
  • Min Zhang
    • 1
  • Liyun Ru
    • 1
  • Shaoping Ma
    • 1
  1. 1.State Key Lab of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology, Department of Computer Science and TechnologyTsinghua UniversityBeijingP.R. China

Personalised recommendations