Despite emerging of Web 2.0 applications and increasing requirements to well-behaved Web robots, malicious ones can reveal irreparable risks for Web sites. Regardless of behavior of Web robots, they may occupy bandwidth and reduce performance of Web servers. In spite of many prestigious researches trying to characterize Web visitors and classify them, there is a lack of concentration on feature selection to dynamically choose attributes used to describe Web sessions. On the other hand, depending on an accurate clustering technique, which can deal with huge number of samples in a reasonable amount of time, is practically important. Therefore, in this paper, a new algorithm, fuzzy rough set–Web robot detection (FRS-WRD), is proposed based on fuzzy rough set theory to better characterize and cluster Web visitors of three real Web sites. External evaluations show that in contrast to state-of-the-art algorithms, FRS-WRD achieves better results in terms of G-mean 95%, Jaccard 88%, entropy 0.36, and finally, purity 96%. Moreover, according to confusion matrixes, it can better detect malicious Web visitors.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Amigó E, Gonzalo J, Verdejo F (2013) A general evaluation measure for document organization tasks. In: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 643–652
Ansari ZA, Sattar SA, Babu AV (2015) A fuzzy neural network based framework to discover user access patterns from web log data. Adv Data Anal Classif. doi:10.1007/s11634-015-0228-4
Antoine V, Quost B, Masson M-H, Denoeux T (2014) CEVCLUS: evidential clustering with instance-level constraints for relational data. Soft Comput 18(7):1321–1335
Bomhardt C, Gaul W, Schmidt-Thieme L (2005) Web robot detection-preprocessing web logfiles for robot detection. In: Bock HH et al (eds) New developments in classification and data analysis. Springer, Berlin, pp 113–124
Chen D, Yang W, Li F (2008) Measures of general fuzzy rough sets on a probabilistic space. Inf Sci 178(16):3177–3187
Dubois D, Prade H (1990) Rough fuzzy sets and fuzzy rough sets*. Int J Gen Syst 17(2–3):191–209
Gržinić T, Mršić L, Šaban J (2015) Lino-an intelligent system for detecting malicious web-robots. In: Asian Conference on Intelligent Information and Database Systems, Springer International Publishing, pp 559–568
Hamidzadeh J (2015) IRDDS: instance reduction based on distance-based decision surface. J AI Data Min 3(2):121–130
Hamidzadeh J, Monsefi R, Yazdi HS (2014) LMIRA: large margin instance reduction algorithm. Neurocomputing 145:477–487
Hamidzadeh J, Monsefi R, Yazdi HS (2015) IRAHC: instance reduction algorithm using hyperrectangle clustering. Pattern Recogn 48(5):1878–1889
Inuiguchi M, Wu W-Z, Cornelis C, Verbiest N (2015) Fuzzy-rough hybridization. Springer Handbook of Computational Intelligence. Springer, Berlin
Kohonen T (2013) Essentials of the self-organizing map. Neural Netw 37:52–65
Kwon S, Oh M, Kim D, Lee J, Kim Y-G, Cha S (2012) Web robot detection based on monotonous behavior. In: Proceedings of the Information Science and Industrial Applications, vol 4. Springer-Verlag, pp 43–48
Lee J, Cha S, Lee D, Lee H (2009) Classification of web robots: an empirical study based on over one billion requests. Comput Secur 28(8):795–802
List of User-Agents (Spiders, Robots, Browser) (2015) Retrieved from http://www.user-agents.org/
Liu Z, Pan Q, Dezert J, Martin A (2016) Adaptive imputation of missing values for incomplete pattern classification. Pattern Recogn 52:85–95
Liu Z, Pan Q, Dezert J, Mercier G (2015) Credal c-means clustering method based on belief functions. Knowl Based Syst 74:119–132
Lourenço AG, Belo OO (2006) Catching web crawlers in the act. In: Proceedings of the 6th international Conference on Web Engineering, vol 263, ACM, pp 265–272
Lu W-Z, Yu S (2006) Web robot detection based on hidden Markov model. In: 2006 International Conference on Communications, Circuits and Systems
Moghaddam VH, Hamidzadeh J (2016) New Hermite orthogonal polynomial kernel and combined kernels in support vector machine classifier. Pattern Recogn 60:921–935
Nowicki RK, Nowak BA, Woźniak M (2016) Application of rough sets in k nearest neighbours algorithm for classification of incomplete samples. In: Knowledge, Information and Creativity Support Systems. Springer International Publishing, pp 243–257
Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11(5):341–356
Qian Y, Wang Q, Cheng H, Liang J, Dang C (2015) Fuzzy-rough feature selection accelerator. Fuzzy Sets Syst 258:61–78
Radzikowska AM, Kerre EE (2002) A comparative study of fuzzy rough sets. Fuzzy Sets Syst 126(2):137–155
Sadeghi R, Hamidzadeh J (2016) Automatic support vector data description. Soft Comput. doi:10.1007/s00500-016-2317-5
Shafer G (1976) A mathematical theory of evidence, vol 1. Princeton University Press, Princeton
Sisodia DS, Verma S, Vyas OP (2015) Agglomerative approach for identification and elimination of web robots from web server logs to extract knowledge about actual visitors. J Data Anal Inform Process 3(2):1–10
Staeding A (2015) Bots versus browsers—public bots and user agents database and commentary. Retrieved from http://www.botsvsbrowsers.com/
Stassopoulou A, Dikaiakos MD (2009) Web robot detection: a probabilistic reasoning approach. Comput Netw 53(3):265–278
Stevanovic D, An A, Vlajic N (2012) Feature evaluation for web crawler detection with data mining techniques. Expert Syst Appl 39(10):8707–8717
Stevanovic D, Vlajic N, An A (2013) Detection of malicious and non-malicious website visitors using unsupervised neural network learning. Appl Soft Comput 13(1):698–708
Suchacka G, Sobkow M (2015) Detection of internet robots using a Bayesian approach. In: Cybernetics (CYBCONF), 2015 IEEE 2nd International Conference on, IEEE, pp 365–370
Tan P-N, Kumar V (2002) Discovery of web robot sessions based on their navigational patterns. Data Min Knowl Disc 6(1):9–35
Verbiest N, Cornelis C, Herrera F (2013a) FRPS: a fuzzy rough prototype selection method. Pattern Recogn 46(10):2770–2782
Verbiest N, Cornelis C, Herrera F (2013b) OWA-FRPS: a prototype selection method based on ordered weighted average fuzzy rough set theory. In: International Workshop on Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, vol 8170. Springer, Berlin, pp 180–190
Vlajic N, Card HC (2001) Vector quantization of images using modified adaptive resonance algorithm for hierarchical clustering. IEEE Trans Neural Netw 12(5):1147–1162
Wang Xi-Zhao, Zhai Jun-Hai, Shu-Xia Lu (2008) Induction of multiple fuzzy decision trees based on rough set technique. Inf Sci 178(16):3188–3202
Wu W-Z, Leung Y, Zhang W-X (2002) Connections between rough set theory and Dempster–Shafer theory of evidence. Int J Gen Syst 31(4):405–430
Yao YY, Lingras PJ (1998) Interpretations of belief functions in the theory of rough sets. Inf Sci 104(1):81–106
Zabihi M, Jahan MV, Hamidzadeh J (2014a) A density based clustering approach for web robot detection. In: Computer and Knowledge Engineering (ICCKE), 2014 4th International eConference on, IEEE, pp 23–28
Zabihi M, Jahan MV, Hamidzadeh J (2014b) A density based clustering approach to distinguish between web robot and human requests to a web server. ISC Int J Inf Secur 6(1):77–89
Zadeh LA (1974) The concept of a linguistic variable and its application to approximate reasoning. Springer, Berlin
Zhai J (2011) Fuzzy decision tree based on fuzzy-rough technique. Soft Comput 15(6):1087–1096
Zhao D, Traore I, Sayed B, Lu W, Saad S, Ghorbani A, Garant D (2013) Botnet detection based on traffic behavior analysis and flow intervals. Comput Secur 39:2–16
Conflict of interest
The authors declare that they have no conflict of interest
This article does not contain any studies with animals performed by any of the authors.
Communicated by V. Loia.
In this section, a summary of all primary features used in this paper is presented. These attributes have been proposed in other related works and indicated to be helpful in separating humans from Web robots. The index column of Table 5 demonstrates whether the related attribute has higher value for Web robots (R) or human users (H).
About this article
Cite this article
Hamidzadeh, J., Zabihimayvan, M. & Sadeghi, R. Detection of Web site visitors based on fuzzy rough sets. Soft Comput 22, 2175–2188 (2018). https://doi.org/10.1007/s00500-016-2476-4
- Web robot detection
- Fuzzy rough set