Abstract
Inappropriate effort of web manipulation or spamming in order to boost up a web page into the first rank of a search result is an important problem, and affects the efficiency of a search engine. This article presents a spam host detection approach. We exploit both content and link features extracting from hosts to train a learning model based on ant colony optimization algorithm. Experiments on the WEBSPAM-UK2006 dataset show that the proposed method provides higher precision in detecting spam than the baseline C.45 and SVM.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Gyöngyi Z, Garcia-Molina H (2005) Web spam taxonomy. In: Proceedings of the 1st international workshop on adversarial information retrieval on the web
Gyöngyi Z, Garcia-Molina H, Pedersen J (2004) Combating web spam with TrustRank. In: Proceedings of the 30th international conference on very large data bases
Wu B, Davison BD (2005) Identifying link farm spam pages. In: Proceedings of the 14th international world wide web conference
Dai N, Davison BD, Qi X (2009) Looking into the past to better classify web spam. In: Proceedings of the 5th international workshop on adversarial information retrieval on the web
Chung Y, Toyoda M, Kitsuregawa M (2009) A study of link farm distribution and evolution using a time series of web snapshots. In: Proceedings of the 5th international workshop on adversarial information retrieval on the web
Martinez-Romo J, Araujo L (2009) Web spam identification through language model analysis. In: Proceedings of the 5th international workshop on adversarial information retrieval on the web
Dorigo M, Di Caro G, Gambardella LM (1999) Ant algorithms for discrete optimization. Artif Life 5(2):137–172
Dorigo M, Maniezzo V, Coloni A (1996) Ant system: optimization by a colony of cooperating agents. IEEE Trans Syst Man Cybern 26(1):29–41
Castillo C, Donato D, Becchetti L, Boldi P, Leonardi S, Santini M, Vigna S (2006) A reference collection for web spam. ACM SIGIR Forum 40(2):11–24
Becchetti L, Castillo C, Donato D, Leonardi S, Baeza-Yates R (2006) Link-based characterization and detection of web spam. In: Proceedings of the 2nd international workshop on adversarial information retrieval on the web
Castillo C, Donato D, Gionis A, Murdock V, Silvestri F (2007) Know your neighbors: web spam detection using the web topology. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval
Ntoulas A, Najork M, Manasse M, Fetterly D (2006) Detecting spam web pages through content analysis. In: Proceedings of the 15th international world wide web conference
Davison BD (2000) Recognizing nepotistic links on the web. In: Proceedings of AAAI workshop on artificial intelligence for web search
Henzinger MR, Motwani R, Silverstein C (2002) Challenges in web search engines. ACM SIGIR Forum 36(2):11–22
Fetterly D, Manasse M, Najork M (2004) Spam, dam spam, and statistics: using statistical analysis to locate spam web pages. In: Proceedings of the 7th international workshop on the web and databases
Page L, Brin S, Motwani R, Winograd T (1999) The PageRank citation ranking: bringing order to the web. Technical report, Stanford InfoLab
Internet archive. The wayback machine. http://www.archive.org/
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques with Java implementations, 2nd edn. Morgan Kaufmann, San Francisco
Parpinelli RS, Lopes HS, Freitas AA (2002) Data mining with an ant colony optimization algorithm. IEEE Trans Evol Comput 6(4):321–332
Dorigo M, Gambardella LM (1997) Ant colony system: a cooperative learning approach to the traveling salesman problem. IEEE Trans Evol Comput 1(1):53–66
Dorigo, M (2004) Ant colony optimization public software. http://iridia.ulb.ac.be/~mdorigo/ACO/aco-code/public-software.html/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media B.V.
About this paper
Cite this paper
Rungsawang, A., Taweesiriwate, A., Manaskasemsak, B. (2011). Spam Host Detection Using Ant Colony Optimization. In: Park, J., Arabnia, H., Chang, HB., Shon, T. (eds) IT Convergence and Services. Lecture Notes in Electrical Engineering, vol 107. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-2598-0_2
Download citation
DOI: https://doi.org/10.1007/978-94-007-2598-0_2
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-2597-3
Online ISBN: 978-94-007-2598-0
eBook Packages: EngineeringEngineering (R0)