Abstract
This work investigates the use of deep learning techniques to improve the performance of web application firewalls (WAFs), systems that are used to detect and prevent attacks to web applications. Typically, a waf inspects the http requests that are exchanged between client and server to spot attacks and block potential threats. We model the problem as a one-class supervised case and build a feature extractor using deep learning techniques. We treat the http requests as text and train a deep language model with a transformer encoder architecture which is a self-attention based neural network. The use of pre-trained language models has yielded significant improvements on a diverse set of NLP tasks because they are capable of doing transfer learning. We use the pre-trained model as a feature extractor to map a http request into a feature vector. These vectors are then used to train a one-class classifier. We also use a performance metric to automatically define an operational point for the one-class model. The experimental results show that the proposed approach outperforms the ones of the classic rule-based ModSecurity configured with a vanilla owasp crs and does not require the participation of a security expert to define the features.
This research was partially supported by a grant given to Nicolás Montes from ANII (http://anii.org.uy) and was done in the context of projects FMV_1_2017_136337 (Fondo María Viñas, ANII) and WAFINTL from ICT4V center (http://ict4v.org).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We could have chosen another pre-trained BPE tokenizer instead of the one proposed in [19]. The key point is to use a BPE tokenizer trained on huge corpus (40 GB of text) because they can tokenize any word (and any character) of any language without using the unknown token.
References
The Illustrated Transformer - Jay Alammar - Visualizing machine learning one concept at a time. jalammar.github.io/illustrated-transformer/. Accessed 14 Feb 2021
Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
Betarte, G., Giménez, E., Martinez, R., Pardo, Á.: Improving web application firewalls through anomaly detection. In: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 779–784. IEEE (2018)
Betarte, G., Martínez, R., Pardo, Á.: Web application attacks detection using machine learning techniques. In: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 1065–1072. IEEE (2018)
Corona, I., Ariu, D., Giacinto, G.: Hmm-web: a framework for the detection of attacks against web applications. In: Proceedings of ICC 2009, pp. 1–6 (2009)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Ethayarajh, K.: How contextual are contextualized word representations? comparing the geometry of bert, elmo, and gpt-2 embeddings. arXiv preprint arXiv:1909.00512 (2019)
Folini, C.: Handling false positives with the owasp modsecurity core rule set (2016)
Hacker, A.J.: Importance of web application firewall technology for protecting web-based resources. ICSA Labs an Independent Verizon Business (2008)
Kruegel, C., Vigna, G.: Anomaly detection of web-based attacks. In: Proceedings of CCS 2003, pp. 251–261. ACM (2003)
Lee, W.S., Liu, B.: Learning with positive and unlabeled examples using weighted logistic regression. In: ICML, vol. 3, pp. 448–455 (2003)
Liu, Y., et al.: Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Martínez, R.: Enhancing web application attack detection using machine learning. Master thesis, Facultad de Ingeniería, UdelaR - Área Informática del Pedeciba, Uruguay (2019)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
OWASP. Owasp modsecurity core rule set project. coreruleset.org. Accessed 14 Feb 2021
OWASP. Owasp top ten project. https://www.owasp.org/index.php/Category:OWASP/Top/Ten/Project. Accessed 14 Feb 2021
Peters, M.E., et al.: Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018)
Qin, Z.Q., Ma, X.K., Wang, Y.J.: Attentional payload anomaly detector for web applications. In: Cheng, L., Leung, A., Ozawa, S. (eds.) Neural Information Processing. ICONIP 2018. LNCS, vol. 11304. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-04212-7_52
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI blog 1(8), 9 (2019)
Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909 (2015)
Sureda Riera, T., Bermejo Higuera, J.-R., Bermejo Higuera, J., Martínez Herraiz, J.-J., Sicilia Montalvo, J.-A.: Prevention and fighting against web attacks through anomaly detection technology. A systematic review. Sustainability, 12(12) (2020)
Torrano-Gimenez, C., Perez-Villegas, A., Marañón, G.Á., et al.: An anomaly-based approach for intrusion detection in web traffic. J. Inf. Assurance Secur. 5(4), 446–454 (2010)
Trustwave Holdings, I.: Modsecurity: open source web application firewall
Vartouni, A.M., Teshnehlab, M., Kashi, S.S.: Leveraging deep neural networks for anomaly-based web application firewall. IET Inf. Secur. 13(4), 352–361 (2019)
Vaswani, A., et al.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)
Yu, Y., Yan, H., Guan, H., Zhou, H.: Deephttp: semantics-structure model with attention for anomalous http traffic detection and pattern mining. arXiv preprint arXiv:1810.12751 (2018)
Yuan, G., Li, B., Yao, Y., Zhang, S.: A deep learning enabled subspace spectral ensemble clustering approach for web anomaly detection. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 3896–3903. IEEE (2017)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Montes, N., Betarte, G., Martínez, R., Pardo, A. (2021). Web Application Attacks Detection Using Deep Learning. In: Tavares, J.M.R.S., Papa, J.P., González Hidalgo, M. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2021. Lecture Notes in Computer Science(), vol 12702. Springer, Cham. https://doi.org/10.1007/978-3-030-93420-0_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-93420-0_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93419-4
Online ISBN: 978-3-030-93420-0
eBook Packages: Computer ScienceComputer Science (R0)