Skip to main content

Combating Spamdexing: Incorporating Heuristics in Link-Based Ranking

  • Conference paper
Book cover Algorithms and Models for the Web-Graph (WAW 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4936))

Included in the following conference series:

Abstract

Users typically locate useful Web pages by querying a search engine. However, today’s search engines are seriously threatened by malicious spam pages that attempt to subvert the unbiased searching and ranking services provided by the engines. Given the large fraction of Web traffic originating from search engine referrals and the high potential monetary value of this traffic, it is not surprising that some Web site owners try to influence the ranking function of a search engine in a malicious way, thus giving rise to Web spam. Since the algorithmic identification of spam is very difficult, most techniques require either some human assistance or extensive training to effectively deal with spam. We exploit the possibility of automatically reducing Web spam page in a Web collection by analyzing the Web graph, coupled with very simple content analysis. We present empirical evaluation of our approach on 1 million Web pages from the health domain. Our results clearly indicate that we can effectively filter out a significant fraction of Web spam pages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adali, S., Liu, T., Magdon-Ismail, M.: Optimal link bombs are uncoordinated. In: Proceedings of AIRWeb (2005)

    Google Scholar 

  2. Benczúr, A.A., Csalogány, K., Sarlós, T., Uher, M.: SpamRank – fully automatic link spam detection. Work in progress (2006)

    Google Scholar 

  3. Castillo, C., Donato, D., Gionis, A., Murdock, V., Silvestri, F.: Know your neighbors: Web spam detection using the web topology. Work in progress (2007)

    Google Scholar 

  4. Collins, G.: Latest Search Engine Spam Techniques. SitePoint (August 2004), http://www.sitepoint.com/print/search-engine-spam-techniques

  5. Convey, E.: Porn sneaks way back on web. The Boston Herald, p. 028 (May 22, 1996)

    Google Scholar 

  6. Drost, I., Scheffer, T.: Thwarting the nigritude ultramarine: Learning to identify link spam. In: Proceedings of the European Conference on Machine Learning (2005)

    Google Scholar 

  7. Fetterly, D., Manasse, M., Najork, M.: Spam, damn spam, and statitics. In: Proceedings of WebDB 2004 (2004)

    Google Scholar 

  8. Gyöngyi, Z., Garcia-Molina, H.: Link spam alliances. In: Proceedings of the 31th VLDB Conference (2005)

    Google Scholar 

  9. Gyöngyi, Z., Garcia-Molina, H.: Web spam taxonomy. In: Proceedings of AIRWeb (2005)

    Google Scholar 

  10. Gyöngyi, Z., Garcia-Molina, H., Pedersen, J.: Combating web spam with TrustRank. In: Proceedings of the 30th VLDB Conference (2004)

    Google Scholar 

  11. Krishnan, V., Raj, R.: Web spam detection with anti-trust rank. In: Proceedings of AIRWeb (2006)

    Google Scholar 

  12. Langville, A.N., Meyer, C.D.: Google’s PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press, Princeton (2006)

    MATH  Google Scholar 

  13. Ntoulas, A., Najork, M., Manasse, M., Fetterly, D.: Detecting spam web pages through content analysis. In: Proceedings of the WWW Conference (May 2006)

    Google Scholar 

  14. Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report, Standford University (1999)

    Google Scholar 

  15. Yahoo! Research. Web Collection UK-2006. Yahoo! Research and University of Milan (2006), http://www.yr-bcn.es/webspam/datasets/uk2006-info/

  16. Sobek, M.: PR0 - Google’s PageRank 0 Penalty. eFactory GmbH & Co. KG Internet-Agentur (2003), http://pr.efactory.de/e-pr0.shtml

  17. Urvoy, T., Lavergne, T., Filoche, P.: Tracking web spam with hidden style similarity. In: Proceedings of the AIRWeb (2006)

    Google Scholar 

  18. Wikipedia, the free encyclopedia. Spamdexing (August 2006), http://en.wikipedia.org/wiki/Spamdexing

  19. Wu, B., Davison, B.D.: Cloaking and redirection: A preliminary study. In: Proceedings of AIRWeb (2005)

    Google Scholar 

  20. Wu, B., Goel, V., Davison, B.D.: Topical TrustRank: Using topicality to combat web spam. In: Proceddings of the WWW Conference (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

William Aiello Andrei Broder Jeannette Janssen Evangelos Milios

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Abou-Assaleh, T., Das, T. (2008). Combating Spamdexing: Incorporating Heuristics in Link-Based Ranking. In: Aiello, W., Broder, A., Janssen, J., Milios, E. (eds) Algorithms and Models for the Web-Graph. WAW 2006. Lecture Notes in Computer Science, vol 4936. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78808-9_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78808-9_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78807-2

  • Online ISBN: 978-3-540-78808-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics