Enhanced Web Log Cleaning Algorithm for Web Intrusion Detection

  • Yew Chuan Ong
  • Zuraini Ismail
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 265)


Web logs play the crucial role in detecting web attack. However, analyzing web logs become a challenge due to the huge log volume issue. The objective of this research is to create a web log cleaning algorithm for web intrusion detection. Studies on previous works showed that there are five major web log attributes needed in web log cleaning algorithm for intrusion detection, namely multimedia files, web robots request, HTTP status code, HTTP method and other files. The enhanced algorithm is based on these five major web log attributes along with a set of rules and conditions. Our experiment shows that the proposed algorithm is able to clean noisy data effectively with a percentage of reduction of 40.41 and at the same time maintain the readiness for web intrusion detection at a low false negative rate (0.00531). Future works may address the web intrusion detection mechanism.


web log data cleaning preprocessing intrusion detection 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Suthaharan, S., Panchagnula, T.: Relevance feature selection with data cleaning for intrusion detection system. In: Proceedings of the IEEE SoutheastCon, pp. 1–6. IEEE (2012)Google Scholar
  2. 2.
    Salama, S.E., Marie, M.I., El-Fangary, L.M., Helmy, Y.K.: Web Server Logs Preprocessing for Web Intrusion Detection. Computer and Information Science 4, 123–133 (2011)CrossRefGoogle Scholar
  3. 3.
    Patil, P., Patil, U.: Preprocessing of web server log file for web mining. World Journal of Science and Technology 2, 14–18 (2012)Google Scholar
  4. 4.
    Farid, D.M., Rahman, M.Z., Rahman, C.M.: Adaptive Intrusion Detection based on Boosting and Naive Bayesian Classifier. International Journal of Computer Applications 24, 12–19 (2011)CrossRefGoogle Scholar
  5. 5.
    Eshaghi, M., Gawali, S.Z.: Web Usage Mining Based on Complex Structure of XML for Web IDS. IJITEE International Journal of Innovative Technology and Exploring Engineering 2, 323–326 (2013)Google Scholar
  6. 6.
    Suen, H.Y., Lau, W.C., Yue, O.: Detecting Anomalous Web Browsing via Diffusion Wavelets. In: International Conference on Communications, pp. 1–6. IEEE (2010)Google Scholar
  7. 7.
    Chauhan, P., Singh, N., Chandra, N.: Deportment of Logs for Securing the Host System. In: 5th International Conference on Computational Intelligence and Communication Networks, pp. 355–359. IEEE (2013)Google Scholar
  8. 8.
    Aye, T.T.: Web log cleaning for mining of web usage patterns. In: 3rd International Conference on Computer Research and Development, pp. 490–494. IEEE (2011)Google Scholar
  9. 9.
    Raju, G., Satyanarayana, P.: Knowledge Discovery from Web Usage Data: Complete Preprocessing Methodology. IJCSNS International Journal of Computer Science and Network Security 8, 179–186 (2008)Google Scholar
  10. 10.
    Vellingiri, J., Pandian, S.C.: A Novel Technique for Web Log mining with Better Data Cleaning and Transaction Identification. Journal of Computer Science 7, 683–689 (2011)CrossRefGoogle Scholar
  11. 11.
    Reddy, K.S., Varma, G., Babu, I.R.: Preprocessing the web server logs: an illustrative approach for effective usage mining. ACM SIGSOFT Software Engineering Notes 37, 1–5 (2012)Google Scholar
  12. 12.
    Castellano, G., Fanelli, A., Torsello, M.: Log data preparation for mining web usage patterns. In: Proceedings of IADIS International Conference Applied Computing, pp. 371–378 (2007)Google Scholar
  13. 13.
    Suneetha, K., Krishnamoorthi, R.: Identifying user behavior by analyzing web server access log file. IJCSNS International Journal of Computer Science and Network Security 9, 327–332 (2009)Google Scholar
  14. 14.
    Anand, S., Aggarwal, R.R.: An Efficient Algorithm for Data Cleaning of Log File using File Extensions. International Journal of Computer Applications 48, 13–18 (2012)CrossRefGoogle Scholar
  15. 15.
    Stamm, S., Stern, B., Markham, G.: Reining in the web with content security policy. In: Proceedings of the 19th International Conference on World Wide Web, pp. 921–930. ACM (2010)Google Scholar
  16. 16.
    Bomhardt, C., Gaul, W., Schmidt-Thieme, L.: Web robot detection-preprocessing web logfiles for robot detection. In: New Developments in Classification and Data Analysis, pp. 113–124 (2005)Google Scholar
  17. 17.
    Doran, D., Gokhale, S.S.: Web robot detection techniques: overview and limitations. Data Mining and Knowledge Discovery 22, 183–210 (2011)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Advanced Informatics SchoolUniversiti Teknologi MalaysiaJohor BahruMalaysia

Personalised recommendations