Abstract
Web scanners will not only take up the bandwidth of the server, but also collect sensitive information of websites and probe vulnerabilities of the system, which seriously threaten the security of websites. Accurate detection of Web scanners can effectively mitigate this kind of thread. Existing scanner detection methods extract features from log and differentiate between scanners and legal users with machine learning. However, these methods are unable to block scanning due to lack of behavior information of clients. To solve this problem, a Web scanner detection method based on behavioral differences is proposed. It collects request information and behavior information of clients by three modules named Passive Detection, Active Injection and Active Detection. Then, six kinds of features including fingerprint of scanners and execution ability of JavaScript code are extracted to detect whether a client is a scanner. This method makes full use of the behavior characteristics of clients and the behavioral differences between scanners and legal users. The experimental results showed the method is efficient and fast in scanner detection.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Imperva. Bot traffic report 2016 [EB/OL] (2016). https://www.incapsula.com/blog/bot-traffic-report-2016.html
Asselin, E., Aguilar-Melchor, C., Jakllari, G.: Anomaly detection for web server log reduction: a simple yet efficient crawling based approach. In: 2016 IEEE Conference on Communications and Network Security (CNS), pp. 586–590. IEEE (2016)
Stock, B., Pellegrino, G., Rossow, C., Johns, M., Backes, M.: Hey, you have a problem: on the feasibility of large-scale web vulnerability notification. In: 25th USENIX Security Symposium (USENIX Security 16), pp. 1015–1032 (2016)
Kals, S., Kirda, E., Kruegel, C., Jovanovic, N.: SecuBat: a web vulnerability scanner. In: Proceedings of the 15th International Conference on World Wide Web, pp. 247–256. ACM (2006)
Zhao, T., Yuliang, L., Liu, J.H., Sun, H., Shi, F.: Web vulnerability detection based on form crawler. Comput. Eng. 34(9), 186–188 (2008)
Akrout, R., Alata, E., Kaaniche, M., Nicomette, V.: An automated black box approach for web vulnerability identification and attack scenario generation. J. Braz. Comput. Soc. 20(1), 4 (2014)
Cetin, O., Ganan, C., Korczynski, M., van Eeten, M.: Make notifications great again: learning how to notify in the age of large-scale vulnerability scanning. In: Workshop on the Economy of Information Security (2017)
Stock, B., Pellegrino, G., Li, F., Backes, M., Rossow, C.: Didnt you hear me? Towards more successful web vulnerability notifications (2018)
Geens, N., Huysmans, J., Vanthienen, J.: Evaluation of web robot discovery techniques: a benchmarking study. In: Perner, P. (ed.) ICDM 2006. LNCS (LNAI), vol. 4065, pp. 121–130. Springer, Heidelberg (2006). https://doi.org/10.1007/11790853_10
Tan, P.N., Kumar, V.: Discovery of web robot sessions based on their navigational patterns. In: Zhong, N., Liu, J. (eds.) Intelligent Technologies for Information Analysis, pp. 193–222. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-662-07952-2_9
Bomhardt, C., Gaul, W., Schmidt-Thieme, L.: Web robot detection - preprocessing web logfiles for robot detection. In: Bock, H.H., et al. (eds.) New Developments in Classification and Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization, pp. 113–124. Springer, Heidelberg (2005). https://doi.org/10.1007/3-540-27373-5_14
Stassopoulou, A., Dikaiakos, M.D.: A probabilistic reasoning approach for discovering web crawler sessions. In: Dong, G., Lin, X., Wang, W., Yang, Y., Yu, J.X. (eds.) APWeb/WAIM -2007. LNCS, vol. 4505, pp. 265–272. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-72524-4_29
Lu, W.-Z., Yu, S.-Z.: Web robot detection based on hidden Markov model. In: 2006 International Conference on Communications, Circuits and Systems, vol. 3, pp. 1806–1810. IEEE (2006)
Huntington, P., Nicholas, D., Jamali, H.R.: Web robot detection in the scholarly information environment. J. Inf. Sci. 34(5), 726–741 (2008)
Seay. Waf realized scanner recognition, completely resisted hacker scanning [EB/OL] (2016). http://www.freebuf.com/articles/web/16806.html
Liu, X., Fang, Y., Huang, C., Liu, L.: Research of identifying web vulnerability scanner based on finite state machine. J. Inf. Secur. Res. 3(2), 123–128 (2017)
Jacob, G., Kirda, E., Kruegel, C., Vigna, G.: \(\{\)PUBCRAWL\(\}\): protecting users and businesses from crawlers. In: Presented as part of the 21st USENIX Security Symposium (USENIX Security 12), pp. 507–522 (2012)
SEO optimization. Yujian [EB/OL] (2019). https://www.chabug.org/tools/655.html
Netsparker Web Application Security Scanner. Sqlmap [EB/OL] (2019). https://sqlmap.org/
Wpscanteam. Wpscan [EB/OL] (2019). https://github.com/wpscanteam/wpscan
Espreto. Wpsploit [EB/OL] (2019). https://github.com/espreto/wpsploit
OWASP Project. Dirbrute [EB/OL] (2019). https://github.com/Xyntax/DirBrute
Xmendez. Wfuzz [EB/OL] (2019). https://github.com/xmendez/wfuzz
Yu, J.X., Ou, Y., Zhang, C., Zhang, S.: Identifying interesting visitors through web log classification. IEEE Intell. Syst. 20(3), 55–59 (2005)
Stevanovic, D., Vlajic, N., An, A.: Unsupervised clustering of web sessions to detect malicious and non-malicious website users. Procedia Comput. Sci. 5, 123–131 (2011)
Doran, D., Gokhale, S.S.: An integrated method for real time and offline web robot detection. Expert Syst. 33(6), 592–606 (2016)
OpenResty. Openresty - official site [EB/OL] (2017). https://openresty.org/en/
Fuyun. Safedog [EB/OL] (2018). http://www.safedog.cn/
Trustwave. Modsecurity [EB/OL] (2019). https://modsecurity.org/
Liang, S., Li, M., Liang, J., Chen, Z.: An experimental study of response times of web applications. J. Comput. Res. Dev. 40(7), 1076–1080 (2003)
Acknowledgements
We sincerely thank SociaSec anonymous reviewers for their valuable feedback. This research was supported in part by the National Natural Science Foundation of China (U1636107, 61373168).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Fingerprint Information of Common Scanners
Name | Fingerprint | Location |
---|---|---|
wpscan | wpscan | User-Agent |
SQLmap | Sqlmap/version/#stable(http://sqlmap.org) | User-Agent |
AppScan | APPSCAN | Requset parameter, URL |
AWVS | Acunetix-Aspect | Request header |
W3af | w3af.org | User-Agent |
Burpsuite | burpcollaborator.net | Request header, parameter, URL |
WebCruiser | WebCruiser, HEAD method | User-Agent |
NetSpark | X-Scanner:Netsparker or Netsparker | Request header, parameter |
FileSensor | Scrapy/1.4.0 (+http://scrapy.org) | User-Agent |
Yujian | HEAD method, User-Agent:- | User-Agent, RM |
BBScan | BBScan/version | User-Agent |
DirBrute | whoami=wyscan_dirfuzz | Cookie |
Nikto | (Nikto/version) (Evasions:None) (Test:map_codes) | User-Agent |
B Testing Results of Scanners
Scanners | Visiting frequency | Fingerprint | Resource file | Carrying cookie | JS execution | Mouse click | All methods |
---|---|---|---|---|---|---|---|
Appscan | \(\surd \) | \(\surd \) | \(\surd \) | \(\times \) | \(\times \) | \(\surd \) | \(\surd \) |
w3af | \(\surd \) | \(\surd \) | \(\surd \) | \(\surd \) | \(\surd \) | \(\surd \) | \(\surd \) |
Burpsuite | \(\surd \) | \(\surd \) | \(\surd \) | \(\times \) | \(\surd \) | \(\surd \) | \(\surd \) |
AWVS | \(\surd \) | \(\surd \) | \(\surd \) | \(\times \) | \(\surd \) | \(\surd \) | \(\surd \) |
netsparker | \(\surd \) | \(\surd \) | \(\surd \) | \(\times \) | \(\surd \) | \(\surd \) | \(\surd \) |
htpwdScan | \(\surd \) | \(\times \) | \(\surd \) | \(\surd \) | \(\surd \) | \(\surd \) | \(\surd \) |
NagaScan | \(\surd \) | \(\times \) | \(\times \) | \(\times \) | \(\times \) | \(\times \) | \(\surd \) |
WebCrusier | \(\surd \) | \(\surd \) | \(\times \) | \(\surd \) | \(\surd \) | \(\surd \) | \(\surd \) |
Yujian directory scanning | \(\surd \) | \(\surd \) | \(\surd \) | \(\surd \) | \(\surd \) | \(\surd \) | \(\surd \) |
Yujian website identification | \(\surd \) | \(\surd \) | \(\surd \) | \(\surd \) | \(\surd \) | \(\surd \) | \(\surd \) |
SQLmap | \(\surd \) | \(\surd \) | \(\surd \) | \(\surd \) | \(\surd \) | \(\surd \) | \(\surd \) |
BBScan | \(\surd \) | \(\surd \) | \(\surd \) | \(\surd \) | \(\surd \) | \(\surd \) | \(\surd \) |
Cangibrina | \(\surd \) | \(\surd \) | \(\surd \) | \(\surd \) | \(\surd \) | \(\surd \) | \(\surd \) |
BruteXSS | \(\times \) | \(\surd \) | \(\times \) | \(\times \) | \(\surd \) | \(\times \) | \(\surd \) |
Shuriken | \(\times \) | \(\times \) | \(\times \) | \(\times \) | \(\surd \) | \(\times \) | \(\surd \) |
Weakfilescan | \(\surd \) | \(\times \) | \(\surd \) | \(\times \) | \(\surd \) | \(\surd \) | \(\surd \) |
Dirsearch | \(\surd \) | \(\times \) | \(\times \) | \(\surd \) | \(\surd \) | \(\surd \) | \(\surd \) |
Pentestdb | \(\times \) | \(\surd \) | \(\surd \) | \(\surd \) | \(\surd \) | \(\surd \) | \(\surd \) |
Lcyscan | \(\surd \) | \(\surd \) | \(\surd \) | \(\surd \) | \(\surd \) | \(\surd \) | \(\surd \) |
DirBrute | \(\surd \) | \(\surd \) | \(\surd \) | \(\times \) | \(\surd \) | \(\surd \) | \(\surd \) |
wpscan | \(\times \) | \(\surd \) | \(\times \) | \(\times \) | \(\surd \) | \(\surd \) | \(\surd \) |
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Fu, J., Li, L., Wang, Y., Huang, J., Peng, G. (2019). Web Scanner Detection Based on Behavioral Differences. In: Meng, W., Furnell, S. (eds) Security and Privacy in Social Networks and Big Data. SocialSec 2019. Communications in Computer and Information Science, vol 1095. Springer, Singapore. https://doi.org/10.1007/978-981-15-0758-8_1
Download citation
DOI: https://doi.org/10.1007/978-981-15-0758-8_1
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-0757-1
Online ISBN: 978-981-15-0758-8
eBook Packages: Computer ScienceComputer Science (R0)