Detection of Web site visitors based on fuzzy rough sets

  • Javad Hamidzadeh
  • Mahdieh Zabihimayvan
  • Reza Sadeghi
Methodologies and Application

Abstract

Despite emerging of Web 2.0 applications and increasing requirements to well-behaved Web robots, malicious ones can reveal irreparable risks for Web sites. Regardless of behavior of Web robots, they may occupy bandwidth and reduce performance of Web servers. In spite of many prestigious researches trying to characterize Web visitors and classify them, there is a lack of concentration on feature selection to dynamically choose attributes used to describe Web sessions. On the other hand, depending on an accurate clustering technique, which can deal with huge number of samples in a reasonable amount of time, is practically important. Therefore, in this paper, a new algorithm, fuzzy rough set–Web robot detection (FRS-WRD), is proposed based on fuzzy rough set theory to better characterize and cluster Web visitors of three real Web sites. External evaluations show that in contrast to state-of-the-art algorithms, FRS-WRD achieves better results in terms of G-mean 95%, Jaccard 88%, entropy 0.36, and finally, purity 96%. Moreover, according to confusion matrixes, it can better detect malicious Web visitors.

Keywords

Web robot detection Fuzzy rough set Clustering 

References

  1. Amigó E, Gonzalo J, Verdejo F (2013) A general evaluation measure for document organization tasks. In: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 643–652Google Scholar
  2. Ansari ZA, Sattar SA, Babu AV (2015) A fuzzy neural network based framework to discover user access patterns from web log data. Adv Data Anal Classif. doi:10.1007/s11634-015-0228-4
  3. Antoine V, Quost B, Masson M-H, Denoeux T (2014) CEVCLUS: evidential clustering with instance-level constraints for relational data. Soft Comput 18(7):1321–1335CrossRefGoogle Scholar
  4. Bomhardt C, Gaul W, Schmidt-Thieme L (2005) Web robot detection-preprocessing web logfiles for robot detection. In: Bock HH et al (eds) New developments in classification and data analysis. Springer, Berlin, pp 113–124Google Scholar
  5. Chen D, Yang W, Li F (2008) Measures of general fuzzy rough sets on a probabilistic space. Inf Sci 178(16):3177–3187MathSciNetCrossRefMATHGoogle Scholar
  6. Dubois D, Prade H (1990) Rough fuzzy sets and fuzzy rough sets*. Int J Gen Syst 17(2–3):191–209CrossRefMATHGoogle Scholar
  7. Gržinić T, Mršić L, Šaban J (2015) Lino-an intelligent system for detecting malicious web-robots. In: Asian Conference on Intelligent Information and Database Systems, Springer International Publishing, pp 559–568Google Scholar
  8. Hamidzadeh J (2015) IRDDS: instance reduction based on distance-based decision surface. J AI Data Min 3(2):121–130Google Scholar
  9. Hamidzadeh J, Monsefi R, Yazdi HS (2014) LMIRA: large margin instance reduction algorithm. Neurocomputing 145:477–487CrossRefGoogle Scholar
  10. Hamidzadeh J, Monsefi R, Yazdi HS (2015) IRAHC: instance reduction algorithm using hyperrectangle clustering. Pattern Recogn 48(5):1878–1889CrossRefGoogle Scholar
  11. Inuiguchi M, Wu W-Z, Cornelis C, Verbiest N (2015) Fuzzy-rough hybridization. Springer Handbook of Computational Intelligence. Springer, BerlinGoogle Scholar
  12. Kohonen T (2013) Essentials of the self-organizing map. Neural Netw 37:52–65CrossRefGoogle Scholar
  13. Kwon S, Oh M, Kim D, Lee J, Kim Y-G, Cha S (2012) Web robot detection based on monotonous behavior. In: Proceedings of the Information Science and Industrial Applications, vol 4. Springer-Verlag, pp 43–48Google Scholar
  14. Lee J, Cha S, Lee D, Lee H (2009) Classification of web robots: an empirical study based on over one billion requests. Comput Secur 28(8):795–802CrossRefGoogle Scholar
  15. List of User-Agents (Spiders, Robots, Browser) (2015) Retrieved from http://www.user-agents.org/
  16. Liu Z, Pan Q, Dezert J, Martin A (2016) Adaptive imputation of missing values for incomplete pattern classification. Pattern Recogn 52:85–95CrossRefGoogle Scholar
  17. Liu Z, Pan Q, Dezert J, Mercier G (2015) Credal c-means clustering method based on belief functions. Knowl Based Syst 74:119–132CrossRefGoogle Scholar
  18. Lourenço AG, Belo OO (2006) Catching web crawlers in the act. In: Proceedings of the 6th international Conference on Web Engineering, vol 263, ACM, pp 265–272Google Scholar
  19. Lu W-Z, Yu S (2006) Web robot detection based on hidden Markov model. In: 2006 International Conference on Communications, Circuits and SystemsGoogle Scholar
  20. Moghaddam VH, Hamidzadeh J (2016) New Hermite orthogonal polynomial kernel and combined kernels in support vector machine classifier. Pattern Recogn 60:921–935CrossRefGoogle Scholar
  21. Nowicki RK, Nowak BA, Woźniak M (2016) Application of rough sets in k nearest neighbours algorithm for classification of incomplete samples. In: Knowledge, Information and Creativity Support Systems. Springer International Publishing, pp 243–257Google Scholar
  22. Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11(5):341–356MathSciNetCrossRefMATHGoogle Scholar
  23. Qian Y, Wang Q, Cheng H, Liang J, Dang C (2015) Fuzzy-rough feature selection accelerator. Fuzzy Sets Syst 258:61–78MathSciNetCrossRefMATHGoogle Scholar
  24. Radzikowska AM, Kerre EE (2002) A comparative study of fuzzy rough sets. Fuzzy Sets Syst 126(2):137–155MathSciNetCrossRefMATHGoogle Scholar
  25. Sadeghi R, Hamidzadeh J (2016) Automatic support vector data description. Soft Comput. doi:10.1007/s00500-016-2317-5
  26. Shafer G (1976) A mathematical theory of evidence, vol 1. Princeton University Press, PrincetonGoogle Scholar
  27. Sisodia DS, Verma S, Vyas OP (2015) Agglomerative approach for identification and elimination of web robots from web server logs to extract knowledge about actual visitors. J Data Anal Inform Process 3(2):1–10Google Scholar
  28. Staeding A (2015) Bots versus browsers—public bots and user agents database and commentary. Retrieved from http://www.botsvsbrowsers.com/
  29. Stassopoulou A, Dikaiakos MD (2009) Web robot detection: a probabilistic reasoning approach. Comput Netw 53(3):265–278CrossRefMATHGoogle Scholar
  30. Stevanovic D, An A, Vlajic N (2012) Feature evaluation for web crawler detection with data mining techniques. Expert Syst Appl 39(10):8707–8717CrossRefGoogle Scholar
  31. Stevanovic D, Vlajic N, An A (2013) Detection of malicious and non-malicious website visitors using unsupervised neural network learning. Appl Soft Comput 13(1):698–708CrossRefGoogle Scholar
  32. Suchacka G, Sobkow M (2015) Detection of internet robots using a Bayesian approach. In: Cybernetics (CYBCONF), 2015 IEEE 2nd International Conference on, IEEE, pp 365–370Google Scholar
  33. Tan P-N, Kumar V (2002) Discovery of web robot sessions based on their navigational patterns. Data Min Knowl Disc 6(1):9–35MathSciNetCrossRefGoogle Scholar
  34. Verbiest N, Cornelis C, Herrera F (2013a) FRPS: a fuzzy rough prototype selection method. Pattern Recogn 46(10):2770–2782CrossRefMATHGoogle Scholar
  35. Verbiest N, Cornelis C, Herrera F (2013b) OWA-FRPS: a prototype selection method based on ordered weighted average fuzzy rough set theory. In: International Workshop on Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, vol 8170. Springer, Berlin, pp 180–190Google Scholar
  36. Vlajic N, Card HC (2001) Vector quantization of images using modified adaptive resonance algorithm for hierarchical clustering. IEEE Trans Neural Netw 12(5):1147–1162CrossRefGoogle Scholar
  37. Wang Xi-Zhao, Zhai Jun-Hai, Shu-Xia Lu (2008) Induction of multiple fuzzy decision trees based on rough set technique. Inf Sci 178(16):3188–3202MathSciNetCrossRefMATHGoogle Scholar
  38. Wu W-Z, Leung Y, Zhang W-X (2002) Connections between rough set theory and Dempster–Shafer theory of evidence. Int J Gen Syst 31(4):405–430MathSciNetCrossRefMATHGoogle Scholar
  39. Yao YY, Lingras PJ (1998) Interpretations of belief functions in the theory of rough sets. Inf Sci 104(1):81–106MathSciNetCrossRefMATHGoogle Scholar
  40. Zabihi M, Jahan MV, Hamidzadeh J (2014a) A density based clustering approach for web robot detection. In: Computer and Knowledge Engineering (ICCKE), 2014 4th International eConference on, IEEE, pp 23–28Google Scholar
  41. Zabihi M, Jahan MV, Hamidzadeh J (2014b) A density based clustering approach to distinguish between web robot and human requests to a web server. ISC Int J Inf Secur 6(1):77–89Google Scholar
  42. Zadeh LA (1974) The concept of a linguistic variable and its application to approximate reasoning. Springer, BerlinCrossRefGoogle Scholar
  43. Zhai J (2011) Fuzzy decision tree based on fuzzy-rough technique. Soft Comput 15(6):1087–1096CrossRefGoogle Scholar
  44. Zhao D, Traore I, Sayed B, Lu W, Saad S, Ghorbani A, Garant D (2013) Botnet detection based on traffic behavior analysis and flow intervals. Comput Secur 39:2–16CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  • Javad Hamidzadeh
    • 1
  • Mahdieh Zabihimayvan
    • 2
  • Reza Sadeghi
    • 2
  1. 1.Faculty of Computer Engineering and Information TechnologySadjad University of TechnologyMashhadIran
  2. 2.Department of Computer EngineeringImam Reza International UniversityMashhadIran

Personalised recommendations