Skip to main content

Web Application Attacks Detection Using Deep Learning

  • Conference paper
  • First Online:
Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications (CIARP 2021)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12702))

Included in the following conference series:

Abstract

This work investigates the use of deep learning techniques to improve the performance of web application firewalls (WAFs), systems that are used to detect and prevent attacks to web applications. Typically, a waf inspects the http requests that are exchanged between client and server to spot attacks and block potential threats. We model the problem as a one-class supervised case and build a feature extractor using deep learning techniques. We treat the http requests as text and train a deep language model with a transformer encoder architecture which is a self-attention based neural network. The use of pre-trained language models has yielded significant improvements on a diverse set of NLP tasks because they are capable of doing transfer learning. We use the pre-trained model as a feature extractor to map a http request into a feature vector. These vectors are then used to train a one-class classifier. We also use a performance metric to automatically define an operational point for the one-class model. The experimental results show that the proposed approach outperforms the ones of the classic rule-based ModSecurity configured with a vanilla owasp crs and does not require the participation of a security expert to define the features.

This research was partially supported by a grant given to Nicolás Montes from ANII (http://anii.org.uy) and was done in the context of projects FMV_1_2017_136337 (Fondo María Viñas, ANII) and WAFINTL from ICT4V center (http://ict4v.org).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We could have chosen another pre-trained BPE tokenizer instead of the one proposed in [19]. The key point is to use a BPE tokenizer trained on huge corpus (40 GB of text) because they can tokenize any word (and any character) of any language without using the unknown token.

References

  1. The Illustrated Transformer - Jay Alammar - Visualizing machine learning one concept at a time. jalammar.github.io/illustrated-transformer/. Accessed 14 Feb 2021

  2. Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)

    Google Scholar 

  3. Betarte, G., Giménez, E., Martinez, R., Pardo, Á.: Improving web application firewalls through anomaly detection. In: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 779–784. IEEE (2018)

    Google Scholar 

  4. Betarte, G., Martínez, R., Pardo, Á.: Web application attacks detection using machine learning techniques. In: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 1065–1072. IEEE (2018)

    Google Scholar 

  5. Corona, I., Ariu, D., Giacinto, G.: Hmm-web: a framework for the detection of attacks against web applications. In: Proceedings of ICC 2009, pp. 1–6 (2009)

    Google Scholar 

  6. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  7. Ethayarajh, K.: How contextual are contextualized word representations? comparing the geometry of bert, elmo, and gpt-2 embeddings. arXiv preprint arXiv:1909.00512 (2019)

  8. Folini, C.: Handling false positives with the owasp modsecurity core rule set (2016)

    Google Scholar 

  9. Hacker, A.J.: Importance of web application firewall technology for protecting web-based resources. ICSA Labs an Independent Verizon Business (2008)

    Google Scholar 

  10. Kruegel, C., Vigna, G.: Anomaly detection of web-based attacks. In: Proceedings of CCS 2003, pp. 251–261. ACM (2003)

    Google Scholar 

  11. Lee, W.S., Liu, B.: Learning with positive and unlabeled examples using weighted logistic regression. In: ICML, vol. 3, pp. 448–455 (2003)

    Google Scholar 

  12. Liu, Y., et al.: Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

  13. Martínez, R.: Enhancing web application attack detection using machine learning. Master thesis, Facultad de Ingeniería, UdelaR - Área Informática del Pedeciba, Uruguay (2019)

    Google Scholar 

  14. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  15. OWASP. Owasp modsecurity core rule set project. coreruleset.org. Accessed 14 Feb 2021

  16. OWASP. Owasp top ten project. https://www.owasp.org/index.php/Category:OWASP/Top/Ten/Project. Accessed 14 Feb 2021

  17. Peters, M.E., et al.: Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018)

  18. Qin, Z.Q., Ma, X.K., Wang, Y.J.: Attentional payload anomaly detector for web applications. In: Cheng, L., Leung, A., Ozawa, S. (eds.) Neural Information Processing. ICONIP 2018. LNCS, vol. 11304. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-04212-7_52

  19. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI blog 1(8), 9 (2019)

    Google Scholar 

  20. Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)

    Google Scholar 

  21. Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909 (2015)

  22. Sureda Riera, T., Bermejo Higuera, J.-R., Bermejo Higuera, J., Martínez Herraiz, J.-J., Sicilia Montalvo, J.-A.: Prevention and fighting against web attacks through anomaly detection technology. A systematic review. Sustainability, 12(12) (2020)

    Google Scholar 

  23. Torrano-Gimenez, C., Perez-Villegas, A., Marañón, G.Á., et al.: An anomaly-based approach for intrusion detection in web traffic. J. Inf. Assurance Secur. 5(4), 446–454 (2010)

    Google Scholar 

  24. Trustwave Holdings, I.: Modsecurity: open source web application firewall

    Google Scholar 

  25. Vartouni, A.M., Teshnehlab, M., Kashi, S.S.: Leveraging deep neural networks for anomaly-based web application firewall. IET Inf. Secur. 13(4), 352–361 (2019)

    Google Scholar 

  26. Vaswani, A., et al.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)

  27. Yu, Y., Yan, H., Guan, H., Zhou, H.: Deephttp: semantics-structure model with attention for anomalous http traffic detection and pattern mining. arXiv preprint arXiv:1810.12751 (2018)

  28. Yuan, G., Li, B., Yao, Y., Zhang, S.: A deep learning enabled subspace spectral ensemble clustering approach for web anomaly detection. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 3896–3903. IEEE (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Gustavo Betarte , Rodrigo Martínez or Alvaro Pardo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Montes, N., Betarte, G., Martínez, R., Pardo, A. (2021). Web Application Attacks Detection Using Deep Learning. In: Tavares, J.M.R.S., Papa, J.P., González Hidalgo, M. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2021. Lecture Notes in Computer Science(), vol 12702. Springer, Cham. https://doi.org/10.1007/978-3-030-93420-0_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-93420-0_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-93419-4

  • Online ISBN: 978-3-030-93420-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics