Skip to main content

Spam Host Detection Using Ant Colony Optimization

  • Conference paper
  • First Online:
IT Convergence and Services

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 107))

Abstract

Inappropriate effort of web manipulation or spamming in order to boost up a web page into the first rank of a search result is an important problem, and affects the efficiency of a search engine. This article presents a spam host detection approach. We exploit both content and link features extracting from hosts to train a learning model based on ant colony optimization algorithm. Experiments on the WEBSPAM-UK2006 dataset show that the proposed method provides higher precision in detecting spam than the baseline C.45 and SVM.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Gyöngyi Z, Garcia-Molina H (2005) Web spam taxonomy. In: Proceedings of the 1st international workshop on adversarial information retrieval on the web

    Google Scholar 

  2. Gyöngyi Z, Garcia-Molina H, Pedersen J (2004) Combating web spam with TrustRank. In: Proceedings of the 30th international conference on very large data bases

    Google Scholar 

  3. Wu B, Davison BD (2005) Identifying link farm spam pages. In: Proceedings of the 14th international world wide web conference

    Google Scholar 

  4. Dai N, Davison BD, Qi X (2009) Looking into the past to better classify web spam. In: Proceedings of the 5th international workshop on adversarial information retrieval on the web

    Google Scholar 

  5. Chung Y, Toyoda M, Kitsuregawa M (2009) A study of link farm distribution and evolution using a time series of web snapshots. In: Proceedings of the 5th international workshop on adversarial information retrieval on the web

    Google Scholar 

  6. Martinez-Romo J, Araujo L (2009) Web spam identification through language model analysis. In: Proceedings of the 5th international workshop on adversarial information retrieval on the web

    Google Scholar 

  7. Dorigo M, Di Caro G, Gambardella LM (1999) Ant algorithms for discrete optimization. Artif Life 5(2):137–172

    Article  Google Scholar 

  8. Dorigo M, Maniezzo V, Coloni A (1996) Ant system: optimization by a colony of cooperating agents. IEEE Trans Syst Man Cybern 26(1):29–41

    Article  Google Scholar 

  9. Castillo C, Donato D, Becchetti L, Boldi P, Leonardi S, Santini M, Vigna S (2006) A reference collection for web spam. ACM SIGIR Forum 40(2):11–24

    Article  Google Scholar 

  10. Becchetti L, Castillo C, Donato D, Leonardi S, Baeza-Yates R (2006) Link-based characterization and detection of web spam. In: Proceedings of the 2nd international workshop on adversarial information retrieval on the web

    Google Scholar 

  11. Castillo C, Donato D, Gionis A, Murdock V, Silvestri F (2007) Know your neighbors: web spam detection using the web topology. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval

    Google Scholar 

  12. Ntoulas A, Najork M, Manasse M, Fetterly D (2006) Detecting spam web pages through content analysis. In: Proceedings of the 15th international world wide web conference

    Google Scholar 

  13. Davison BD (2000) Recognizing nepotistic links on the web. In: Proceedings of AAAI workshop on artificial intelligence for web search

    Google Scholar 

  14. Henzinger MR, Motwani R, Silverstein C (2002) Challenges in web search engines. ACM SIGIR Forum 36(2):11–22

    Article  Google Scholar 

  15. Fetterly D, Manasse M, Najork M (2004) Spam, dam spam, and statistics: using statistical analysis to locate spam web pages. In: Proceedings of the 7th international workshop on the web and databases

    Google Scholar 

  16. Page L, Brin S, Motwani R, Winograd T (1999) The PageRank citation ranking: bringing order to the web. Technical report, Stanford InfoLab

    Google Scholar 

  17. Internet archive. The wayback machine. http://www.archive.org/

  18. Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques with Java implementations, 2nd edn. Morgan Kaufmann, San Francisco

    Google Scholar 

  19. Parpinelli RS, Lopes HS, Freitas AA (2002) Data mining with an ant colony optimization algorithm. IEEE Trans Evol Comput 6(4):321–332

    Article  Google Scholar 

  20. Dorigo M, Gambardella LM (1997) Ant colony system: a cooperative learning approach to the traveling salesman problem. IEEE Trans Evol Comput 1(1):53–66

    Article  Google Scholar 

  21. Dorigo, M (2004) Ant colony optimization public software. http://iridia.ulb.ac.be/~mdorigo/ACO/aco-code/public-software.html/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arnon Rungsawang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media B.V.

About this paper

Cite this paper

Rungsawang, A., Taweesiriwate, A., Manaskasemsak, B. (2011). Spam Host Detection Using Ant Colony Optimization. In: Park, J., Arabnia, H., Chang, HB., Shon, T. (eds) IT Convergence and Services. Lecture Notes in Electrical Engineering, vol 107. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-2598-0_2

Download citation

  • DOI: https://doi.org/10.1007/978-94-007-2598-0_2

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-007-2597-3

  • Online ISBN: 978-94-007-2598-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics