Advertisement

Artificial Immune System Based Web Page Classification

  • Aytuğ OnanEmail author
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 349)

Abstract

Automated classification of web pages is an important research direction in web mining, which aims to construct a classification model that can classify new instances based on labeled web documents. Machine learning algorithms are adapted to textual classification problems, including web document classification. Artificial immune systems are a branch of computational intelligence inspired by biological immune systems which is utilized to solve a variety of computational problems, including classification. This paper examines the effectiveness and suitability of artificial immune system based approaches for web page classification. Hence, two artificial immune system based classification algorithms, namely Immunos-1 and Immunos-99 algorithms are compared to two standard machine learning techniques, namely C4.5 decision tree classifier and Naïve Bayes classification. The algorithms are experimentally evaluated on 50 data sets obtained from DMOZ (Open Directory Project). The experimental results indicate that artificial immune based systems achieve higher predictive performance for web page classification.

Keywords

Artificial immune systems Immunos-1 Immunos-99 Web document classification 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Fürnkranz, J.: Web Mining. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 891–920. Springer, Heidelberg (2005)Google Scholar
  2. 2.
    Zhang, Q., Richard, S.: Web Mining: A Survey of Current Research, Techniques, and Software. Int. J. Info. Tech. Dec. Mak. 7, 683–720 (2008)CrossRefGoogle Scholar
  3. 3.
    Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2011)Google Scholar
  4. 4.
    Bhatia, M.P.S., Kumar, A.: Information Retrieval and Machine Learning: Supporting Technologies for Web Mining Research and Practice. Webology 5(2), Article 55 (2008)Google Scholar
  5. 5.
    Qi, X., Davison, B.D.: Web Page Classification: Features and Algorithms. ACM Computing Surveys 41(2), Article 12 (2009)Google Scholar
  6. 6.
    Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys 34(1), 1–47 (2002)CrossRefGoogle Scholar
  7. 7.
    de Castro, L.N., Timmis, J.: Artificial Immune Systems: A Novel Paradigm to Pattern Recognition. In: Corchado, J.M., Alonso, L., Fyfe, C. (eds.) Artificial Neural Networks in Pattern Recognition, pp. 67–84 (2002)Google Scholar
  8. 8.
    Zheng, J., Chen, Y., Zhang, W.: A Survey of Artificial Immune Applications. Artificial Intelligence Review 34, 19–34 (2010)CrossRefGoogle Scholar
  9. 9.
    Lee, H.-M., Chen, C.-M., Tan, C.-C.: An Intelligent Web-Page Classifier with Fair Feature-Subset Selection. In: Joint 9th IFSA World Congress and 20th NAFIPS International Conference, pp. 395–400. IEEE Press, New York (2001)Google Scholar
  10. 10.
    Haruechaiyasak, C., Shyu, M.-C., Chen, S.-C.: Web Document Classification Based on Fuzzy Association. In: 26th Annual International Computer Software and Applications Conference, pp. 487–492. IEEE Press, New York (2002)Google Scholar
  11. 11.
    Wang, Y., Hodges, J., Tang, B.: Classification of Web Documents Using a Naïve Bayes Method. In: 15th IEEE International Conference on Tools with Artificial Intelligence, pp. 560–564. IEEE Press, New York (2003)CrossRefGoogle Scholar
  12. 12.
    Kwon, O.-W., Lee, J.-H.: Text Categorization based on K-nearest Neighbor Approach for Web site Classification. Information Processing and Management 39, 25–44 (2003)CrossRefzbMATHGoogle Scholar
  13. 13.
    Qi, D., Sun, B.: A Genetic K-means Approaches for Automated Web Page Classification. In: Proceedings of the 2004 IEEE International Conference on Information Reuse and Integration, pp. 241–246. IEEE Press, New York (2004)Google Scholar
  14. 14.
    Selamat, A., Omatu, S.: Web page feature selection and classification using neural networks. Information Sciences 158, 69–88 (2004)CrossRefMathSciNetGoogle Scholar
  15. 15.
    Yi, G., Hu, H., Lu, Z.: Web Document Classification Based on Extended Rough Set. In: PDCAT 2005, pp. 916–919. IEEE Press, New York (2005)Google Scholar
  16. 16.
    Chen, R.-C., Hsich, C.-H.: Web Page Classification Based on a Support Vector Machine Using a Weighted Vote Schema. Expert Systems with Applications 31, 427–435 (2006)CrossRefGoogle Scholar
  17. 17.
    Materna, J.: Automated Web Page Classification. In: Proceedings of Recent Advances in Slavonic Natural Language Processing, Masaryk University, pp. 84–93 (2008)Google Scholar
  18. 18.
    Zhang, J., Niu, Y., Nie, H.: Web Document Classification Based on Fuzzy k-NN Algorithm. In: Proceedings of the 2009 International Conference on Computational Intelligence and Security, pp. 193–196. IEEE Press, Washington (2009)CrossRefGoogle Scholar
  19. 19.
    Chen, C.-M., Lee, H.-M., Chang, Y.-J.: Two Novel Feature Selection Approaches for Web Page Classification. Expert Systems with Applications 36, 260–272 (2009)CrossRefGoogle Scholar
  20. 20.
    Özel, S.A.: A Web Page Classification System Based on a Genetic Algorithm Using Tagged-Terms as Features. Expert Systems with Applications 38, 3407–3415 (2011)CrossRefGoogle Scholar
  21. 21.
    de Castro, L.N., Timmis, J.: Artificial Immune System: A New Computational Intelligence Approach. Springer, Heidelberg (2002)Google Scholar
  22. 22.
    Timmis, J., Hone, A., Stibor, T., Clark, E.: Theoretical advances in artificial immune systems. Theoretical Computer Science 403, 11–32 (2008)CrossRefzbMATHMathSciNetGoogle Scholar
  23. 23.
    Sinha, J.K., Bhattacharya, S.: A Text Book of Immunology. Academic Pub., Kolkata (2006)Google Scholar
  24. 24.
    de Castro, L.N., Zuben, F.J.V.: Artificial Immune Systems: Part I- Basic Theory and Applications, Technical report, RT-DCA (1999)Google Scholar
  25. 25.
    de Castro, L., Zuben, F.: Learning and Optimization Using the Clonal Selection Principle. IEEE Transactions on Evolutionary Computation 6(3), 239–251 (2002)CrossRefGoogle Scholar
  26. 26.
    Ruochen, L., Haifeng, D., Licheng, J.: Immunity Clonal Strategies. In: Proceedings of the Fifth International Conference on Computational Intelligence and Multimedia Applications, pp. 290–295. IEEE Press, Washington (2003)Google Scholar
  27. 27.
    Garrett, S.: Parameter-Free Adaptive Clonal Selection. In: Proceedings of Congress on Evolutionary Computation, pp. 1052–1058. IEEE Press, Washington (2004)Google Scholar
  28. 28.
    White, J.A., Garrett, S.M.: Improved Pattern Recognition with Artificial Clonal Selection? In: Timmis, J., Bentley, P.J., Hart, E. (eds.) ICARIS 2003. LNCS, vol. 2787, pp. 181–193. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  29. 29.
    Carter, J.H.: The immune system as a model for classification and pattern recognition. Journal of the American Informatics Association 7, 28–41 (2000)CrossRefGoogle Scholar
  30. 30.
    Brownlee, J.: Immunos-81: The Misunderstood Artificial Immune System. Technical report, Swinburne University (2005)Google Scholar
  31. 31.
    Wilson, W.O., Birkin, P., Aickelin, U.: Price Trackers Inspired by Immune Memory. In: Bersini, H., Carneiro, J. (eds.) ICARIS 2006. LNCS, vol. 4163, pp. 362–375. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  32. 32.
    Forrest, S., Perelson, A., Allen, L., Cherukuri, R.: Self-nonself discrimination in a computer. In: Proceedings of the IEEE Symposium on Research in Security and Privacy, pp. 202–212. IEEE Press, New York (1994)Google Scholar
  33. 33.
    Talbi, E.-G.: Metaheuristics: From Design to Implementation. Wiley, New York (2009)CrossRefGoogle Scholar
  34. 34.
    Hofmeyr, S.A., Forrest, S.: Architecture for an Artificial Immune System. Evolutionary Computation 8(4), 443–473 (2000)CrossRefGoogle Scholar
  35. 35.
    Timmis, J., Neal, M., Hunt, J.: An Artificial Immune System for Data Analysis. Biosystems 55, 143–150 (2000)CrossRefGoogle Scholar
  36. 36.
    Kopacek, L., Olej, V.: Municipal Creditworthiness Mlodeling by Artificial Immune Systems. Acta Electrotehnica et Informatica 10(1), 3–11 (2010)Google Scholar
  37. 37.
    DMOZ Open Directory Project Dataset, http://www.unicauca.edu.co/~ccobos/wdc/wdc.htm
  38. 38.
    WEKA Classification Algorithms, http://wekaclassalgos.sourceforge.net/

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Faculty of Engineering, Department of Computer EngineeringCelal Bayar UniversityManisaTurkey

Personalised recommendations