An adaptive system for detecting malicious queries in web attacks

  • Ying Dong
  • Yuqing ZhangEmail author
  • Hua Ma
  • Qianru Wu
  • Qixu Liu
  • Kai Wang
  • Wenjie Wang
Research Paper


Web request query strings (queries), which pass parameters to a referenced resource, are always manipulated by attackers to retrieve sensitive data and even take full control of victim web servers and web applications. However, existing malicious query detection approaches in the literature cannot cope with changing web attacks. In this paper, we introduce a novel adaptive system (AMOD) that can adaptively detect web-based code injection attacks, which are the majority of web attacks, by analyzing queries. We also present a new adaptive learning strategy, called SVM HYBRID, leveraged by our system to minimize manual work. In the evaluation, an up-to-date detection model is trained on a ten-day query dataset collected from an academic institute’s web server logs. The evaluation shows our approach overwhelms existing approaches in two respects. Firstly, AMOD outperforms existing web attack detection methods with an F-value of 99.50% and FP rate of 0.001%. Secondly, the total number of malicious queries obtained by SVM HYBRID is 3.07 times that by the popular support vector machine adaptive learning (SVM AL) method. The malicious queries obtained can be used to update the web application firewall (WAF) signature library.


web attacks adaptive learning intrusion detection anomaly detection SVM 



This work was supported in part by National Key Reasearch and Development Program of China (Grant No. 2016YFB0800703), in part by National Natural Science Foundation of China (Grant Nos. 61272481, 61572460), in part by Open Project Program of the State Key Laboratory of Information Security (Grant Nos. 2017-ZD-01, 2016-MS-02), and in part by National Information Security Special Project of the National Development and Reform Commission of China (Grant No. (2012)1424). We would also like to thank Dr. Xinyu Xing in Pennsylvania State University for his help with this work.


  1. 1.
    Symantec. Internet security threat report. 2016. Scholar
  2. 2.
    Fonseca J, Vieira M, Madeira H. Evaluation of web security mechanisms using vulnerability & attack injection. IEEE Trans Depend Secure Comput, 2014, 11: 440–453CrossRefGoogle Scholar
  3. 3.
    Imperva. Web application attack report. 2015. Web Application Attack Report Ed6.pdfGoogle Scholar
  4. 4.
    WhiteHat. Web application security statistic report. 2016. Scholar
  5. 5.
    Lawal M, Sultan A B M, Shakiru A O. Systematic literature review on SQL injection attack. Int J Soft Comput, 2016, 11: 26–35Google Scholar
  6. 6.
    Symantec. Team ghostshell hacking group back with a bang. 2015. Scholar
  7. 7.
    Aleroud A, Zhou L. Phishing environments, techniques, and countermeasures: a survey. Comput Secur, 2017, 68: 160–196CrossRefGoogle Scholar
  8. 8.
    Fang Z J, Liu Q X, Zhang Y Q, et al. A static technique for detecting input validation vulnerabilities in Android apps. Sci China Inf Sci, 2017, 60: 052111CrossRefGoogle Scholar
  9. 9.
    Prokhorenko V, Choo K K R, Ashman H. Web application protection techniques: a taxonomy. J Netw Comput Appl, 2016, 60: 95–112CrossRefGoogle Scholar
  10. 10.
    Krugel C, Vigna G, Robertson W. A multi-model approach to the detection of web-based attacks. Comput Netw, 2005, 48: 717–738CrossRefGoogle Scholar
  11. 11.
    Robertson W K, Vigna G, Kruegel C, et al. Using generalization and characterization techniques in the anomaly-based detection of web attacks. In: Proceedings of the 13th Annual Network and Distributed System Security Symposium (NDSS’06), San Diego, 2006Google Scholar
  12. 12.
    Song Y, Keromytis A D, Stolfo S J. Spectrogram: a mixture-of-markov-chains model for anomaly detection in web traffic. In: Proceedings of the 16th Annual Network and Distributed System Security Symposium (NDSS’09), San Diego, 2009. 121–135Google Scholar
  13. 13.
    Kozakevicius A, Cappo C, Mozzaquatro B A, et al. URL query string anomaly sensor designed with the bidimensional haar wavelet transform. Int J Inf Secur, 2015, 14: 561–581CrossRefGoogle Scholar
  14. 14.
    Juvonen A, Sipola T, Inen T. Online anomaly detection using dimensionality reduction techniques for http log analysis. Comput Netw, 2015, 91: 46–56CrossRefGoogle Scholar
  15. 15.
    Xie Y, Tang S, Huang X, et al. Detecting latent attack behavior from aggregated web traffic. Comput Commun, 2013, 36: 895–907CrossRefGoogle Scholar
  16. 16.
    Fan W K G. An adaptive anomaly detection of web-based attacks. In: Proceedings of the 7th International Conference on Computer Science & Education (ICCSE’12), Melbourne, 2012. 690–694Google Scholar
  17. 17.
    Pinzón C, De Paz J F, Bajo J, et al. AIIDA-SQL: an adaptive intelligent intrusion detector agent for detecting SQL injection attacks. In: Proceedings of the 10th International Conference on Hybrid Intelligent Systems (HIS’10), Atlanta, 2010. 73–78CrossRefGoogle Scholar
  18. 18.
    Meng Y, Kwok L F. Adaptive blacklist-based packet filter with a statistic-based approach in network intrusion detection. J Netw Comput Appl, 2014, 39: 83–92CrossRefGoogle Scholar
  19. 19.
    Wang W, Guyet T, Quiniou R, et al. Autonomic intrusion detection: adaptively detecting anomalies over unlabeled audit data streams in computer networks. Knowledge-Based Syst, 2014, 70: 103–117CrossRefGoogle Scholar
  20. 20.
    Zhang J, Li H Z, Gao Q G, et al. Detecting anomalies from big network traffic data using an adaptive detection approach. Inf Sci, 2015, 318: 91–110MathSciNetCrossRefGoogle Scholar
  21. 21.
    AlEroud A, Karabatis G. Queryable semantics to detect cyber-attacks: a flow-based detection approach. IEEE Trans Syst Man Cybern Syst, 2016. doi: 10.1109/TSMC.2016.2600405Google Scholar
  22. 22.
    Aleroud A, Karabatis G. Contextual information fusion for intrusion detection: a survey and taxonomy. Knowl Inf Syst, 2017, 52: 563–619CrossRefGoogle Scholar
  23. 23.
    Sousa A F, Prudencio R B, Ludermir T B, et al. Active learning and data manipulation techniques for generating training examples in meta-learning. Neurocomput, 2016, 194: 45–55CrossRefGoogle Scholar
  24. 24.
    Rossi A L D, de Leon Ferreira A C P, Soares C, et al. MetaStream: a meta-learning based method for periodic algorithm selection in time-changing data. Neurocomput, 2014, 127: 52–64CrossRefGoogle Scholar
  25. 25.
    Folino G, Sabatino P. Ensemble based collaborative and distributed intrusion detection systems: a survey. J Netw Comput Appl, 2016, 66: 1–16CrossRefGoogle Scholar
  26. 26.
    The HTTP dataset CSIC 2010. Scholar
  27. 27.
    Zheng Y H, Zhang X Y. Path sensitive static analysis of web applications for remote code execution vulnerability detection. In: Proceedings of the 35th International Conference on Software Engineering (ICSE’13), San Francisco, 2013. 652–661Google Scholar
  28. 28.
    Jamdagni A, Tan Z Y, He X J, et al. RePIDS: a multi-tier real-time payload-based intrusion detection system. Comput Netw, 2013, 57: 811–824CrossRefGoogle Scholar
  29. 29.
    Garcia-Teodoro P, Diaz-Verdejo J E, Tapiador J E, et al. Automatic generation of HTTP intrusion signatures by selective identification of anomalies. Comput Secur, 2015, 55: 159–174CrossRefGoogle Scholar
  30. 30.
    Zhong Y, Asakura H, Takakura H, et al. Detecting malicious inputs of web application parameters using character class sequences. In: Proceedings of the 39th Annual Computer Software and Applications Conference (COMPSAC’15), Taichung, 2015. 525–532Google Scholar
  31. 31.
    Ariu D, Tronci R, Giacinto G. Hmmpayl: an intrusion detection system based on hidden markov models. Comput Secur, 2011, 30: 221–241CrossRefGoogle Scholar
  32. 32.
    Wang K, Stolfo S J. Anomalous payload-based network intrusion detection. In: Proceedings of the 7th International Symposium on Recent Advances in Intrusion Detection (RAID’04), Sophia Antipolis, 2004. 203–222CrossRefGoogle Scholar
  33. 33.
    Wang K, Parekh J J, Stolfo S J. Anagram: a Content Anomaly Detector Resistant to Mimicry Attack. Berlin: Springer, 2006Google Scholar
  34. 34.
    Oza A, Ross K, Low R M, et al. HTTP attack detection using n-gram analysis. Comput Secur, 2014, 45: 242–254CrossRefGoogle Scholar
  35. 35.
    Perdisci R, Ariu D, Fogla P, et al. McPAD: a multiple classifier system for accurate payload-based anomaly detection. Comput Netw, 2009, 53: 864–881CrossRefzbMATHGoogle Scholar
  36. 36.
    Swarnkar M, Hubballi N. OCPAD: one class naive bayes classifier for payload based anomaly detection. Expert Syst Appl, 2016, 64: 330–339CrossRefGoogle Scholar
  37. 37.
    Duessel P, Gehl C, Flegel U, et al. Detecting zero-day attacks using context-aware anomaly detection at the applicationlayer. Int J Inf Secur, 2016, 16: 475–490CrossRefGoogle Scholar
  38. 38.
    Vapnik V, Kotz S. Estimation of Dependences Based on Empirical Data. New York: Springer-Verlag, 2006Google Scholar
  39. 39.
    Guo H S, Wang W J. An active learning-based SVM multi-class classification model. Pattern Recogn, 2015, 4: 1577–1597CrossRefzbMATHGoogle Scholar
  40. 40.
    Kremer J, Steenstrup P K, Igel C. Active learning with support vector machines. Data Min Knowl Disc, 2014, 4: 313–326CrossRefGoogle Scholar
  41. 41.
    Gao F, Lv W C, Zhang Y T, et al. A novel semisupervised support vector machine classifier based on active learning and context information. Multidim Syst Signal Process, 2016, 27: 969–988MathSciNetCrossRefGoogle Scholar
  42. 42.
    Wang M, Min F, Zhang Z H, et al. Active learning through density clustering. Expert Syst Appl, 2017, 85: 305–317CrossRefGoogle Scholar
  43. 43.
    Aghaee A, Ghadiri M, Baghshah M S, et al. Active distance-based clustering using K-medoids. In: Proceedings of the 20th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’16), Auckland, 2016. 253–264CrossRefGoogle Scholar
  44. 44.
    Baram Y, Ran E Y, Luz K. Online choice of active learning algorithms. J Mach Learn Res, 2012, 5: 255–291MathSciNetGoogle Scholar
  45. 45.
    Wolpert D H. Stacked generalization. Neural Netw, 1992, 5: 241–259CrossRefGoogle Scholar
  46. 46.
    Hillstone Networks. Hillstone e-series next-generation firewalls. Scholar
  47. 47.
    Fielding R, Gettys J, Mogul J, et al. RFC 2616: hypertext transfer protocol-HTTP/1.1. Comput Sci Commun Dict, 1999, 7: 3969–3973Google Scholar
  48. 48.
    Ambusaidi M A, He X J, Nanda P, et al. Building an intrusion detection system using a filter-based feature selection algorithm. IEEE Trans Comput, 2016, 65: 2986–2998MathSciNetCrossRefzbMATHGoogle Scholar
  49. 49.
    Ben-Hur A, Weston J. A user’s guide to support vector machines. In: Data Mining Techniques for the Life Sciences. Berlin: Springer, 2010. 223–239CrossRefGoogle Scholar
  50. 50.
    Xiong C, Johnson D M, Corso J J. Active clustering with model-based uncertainty reduction. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 5–17CrossRefGoogle Scholar
  51. 51.
    Prandl S, Lazarescu M, Pham D S. A study of web application firewall solutions. In: Proceedings of the 11th International Conference on Information Systems Security (ICISS’15), Kolkata, 2015. 501–510CrossRefGoogle Scholar
  52. 52.
    Trustwave. Modsecurity core rule set. 2016. ModSecurity Core Rule Set ProjectGoogle Scholar
  53. 53.
    Kantchelian A, Afroz S, Huang L, et al. Approaches to adversarial drift. In: Proceedings of the 6th ACM Workshop on Artificial Intelligence and Security (AISec’13), Berlin, 2013. 99–110Google Scholar

Copyright information

© Science China Press and Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  • Ying Dong
    • 1
  • Yuqing Zhang
    • 1
    • 2
    Email author
  • Hua Ma
    • 2
    • 3
  • Qianru Wu
    • 4
  • Qixu Liu
    • 1
    • 2
  • Kai Wang
    • 5
  • Wenjie Wang
    • 1
  1. 1.National Computer Network Intrusion Protection CenterUniversity of Chinese Academy of SciencesBeijingChina
  2. 2.State Key Laboratory of Information Security, Institute of Information EngineeringChinese Academy of SciencesBeijingChina
  3. 3.School of Mathematics and StatisticsXidian UniversityXi’anChina
  4. 4.Security DepartmentAlibaba GroupBeijingChina
  5. 5.Zhanlu LaboratoryTencent IncorporationBeijingChina

Personalised recommendations