A set of features to detect web security threats

  • Gerardo Canfora
  • Corrado Aaron VisaggioEmail author
Original Paper


The increasing growth of malicious websites and systems for distributing malware through websites is making it urgent the adoption of effective techniques for timely detection of web security threats. Current mechanisms may exhibit some limitations, mainly concerning the amount of resources required, and a low true positives rate for zero-day attacks. With this paper, we propose and validate a set of features extracted from the content and the structure of webpages, which could be used as indicators of web security threats. The features are used for building a predictor, based on five machine learning algorithms, which is applied to classify unknown web applications. The experimentation demonstrated that the proposed set of features is able to correctly classify malicious web sites with a high level of precision, corresponding to 0.84 in the best case, and recall corresponding to 0.89 in the best case. The classifiers reveal to be successful also with zero day attacks.


Web application security Static analysis Features extraction Features classification 


  1. 1.
    Akiyama, M., Yagi, T., Itoh, M.: Searching Structural Neighborhood of Malicious URLs to Improve Blacklisting. In: proc. of Applications and the Internet (SAINT), 2011 IEEE/IPSJ 11th International Symposium, IEEE, 18–21 July, pp. 1–10 (2011)Google Scholar
  2. 2.
    Alme, C.: Web browsers: an emerging platform under attack. MCAfee (2008)Google Scholar
  3. 3.
    Almorsy, M., Grundy, J., Ibrahim, A.S.: Supporting automated vulnerability analysis using formalized vulnerability signatures. In: Proc. of automated software engineering 2012 (ASE2012), ACMGoogle Scholar
  4. 4.
    Balduzzi, M., Egele, M., Kirda, E., Balzarotti, D., Kruegel, C.: A solution for the automated detection of clickjacking attacks. In: ASI-ACCS’10 (2010)Google Scholar
  5. 5.
    Barth, A., Jackson, C., Mitchell, J.C.: Robust defenses for cross-site request forgery. In: Proc. of communication and computer security (CCS’08), pp. 75–88 (2008)Google Scholar
  6. 6.
    Barth, A., Jackson, C., Mitchell, J.: Securing frame communication in browsers. Commun. ACM 52, 83–91 (2009)CrossRefGoogle Scholar
  7. 7.
    Bin, L., Jianjun, H., Fang, L., Dawei, W., Daxiang, D., Zhaohui, L.: Malicious webpages detection based on abnormal visibility recognition. In: Proc. of international conference on e-business and information system security, 2009. EBISS ’09, pp. 1–5 (2009)Google Scholar
  8. 8.
    Canali, D., Cova, M., Kruegel, C., Vigna, G.: Prophiler: a fast filter for the large-scale detection of malicious webpages. In: Proc. of the 20th international conference on World wide web (WWW’11, ACM,), pp. 197–206 (2011)Google Scholar
  9. 9.
    Charles, R., John, D., Helen, J.W., Opher, D., Saher, E.: BrowserShield, Vulnerability-driven filtering of dynamic HTML. ACM Trans. Web 1, 11 (2007)CrossRefGoogle Scholar
  10. 10.
    Chia-Mei, C., Wan-Yi, T., Hsiao-Chung, L.: Anomaly behavior analysis for webpage inspection. In: Proc. of the first international conference on networks and communications, 2009. NETCOM ’09, pp. 358–363 (2009)Google Scholar
  11. 11.
    Cova, M., Kruegel, C., Vigna, G.: Detection and analysis of drive-by-download attacks and malicious javascript code. In: WWW2010. Raleigh (2010)Google Scholar
  12. 12.
    Curtsinger, C., Livshits, B., Zorn, B., Seifert, C.: Zozzle: Low-overhead mostly static javascript malware detection. In: Proc. of the USENIX security symposium (2011)Google Scholar
  13. 13.
    Ford, S., Cova, M., Kruegel, C., Vigna, G.: Analyzing and detecting malicious flash advertisements. In: Proc. of computer security applications conference, 2009. ACSAC ’09, pp. 363–372 (2009)Google Scholar
  14. 14.
    Gargoyle, Html Unit, Gargoyle Software Inc., Accessed 02 May 2010
  15. 15.
    Gyongyi, Z., Garcia-Molina, H.: Web spam taxonomy. Stanford University, California (2004)Google Scholar
  16. 16.
    Hansen, R.: Clickjacking. Accessed 02 May 2010
  17. 17.
    Hou, Y.-T., Chang, Y., Chen, T., Laih, C.-S., Chen, C.-M.: Malicious web content detection by machine learning, Expert Syst. Appl. (2009, In Press, Corrected Proof)Google Scholar
  18. 18.
    Ikinci, A., Holz, T., Freiling, F.: Monkey-spider: detecting malicious websites with low-interaction honeyclients. Sicherheit, Saarbruecken (2008)Google Scholar
  19. 19.
    Jianwei, Z., Yonglin, Z., Jinpeng, G., Minghua, W., Xulu, J., Weimin, S., Yuejin, D.: Malicious websites on the Chinese web: overview and case study. Peking University, Beijing (2007)Google Scholar
  20. 20.
    John, J.P., Yu, F., Xie, Y., Krishnamurthy, A., Abadi, M.: deSEO: Combating search-result poisoning. In: 20th USENIX security syposium (2011)Google Scholar
  21. 21.
    Jose, M., Ralf, S., Helen, J.W., Yi-Min, W.: A systematic approach to uncover security flaws in GUI logic. In: Proceedings of the 2007 IEEE symposium on security and privacy, IEEE Computer Society (2007)Google Scholar
  22. 22.
    Kapravelos, A., Shoshitaishvili, Y., Cova, M., Kruegel, C., Vigna, G.: Revolver: an automated approach to the detection of evasive web-based malware. In: Proc. of the 22nd Usenix security symposium (2013)Google Scholar
  23. 23.
    Keats, S., Koshy, E.: The web’s most dangerous search term. McAfee (2009)Google Scholar
  24. 24.
    Lam Le, V., Welch, I., Gao, X., Komisarczuk, P.: Two-stage classification model to detect malicious webpages. In: Proc. of IEEE international conference on advanced information networking and applications (AINA), 2011, 22–25 March, IEEE, pp. 113–120 (2011)Google Scholar
  25. 25.
    Lawton, G.: Web 2.0 creates security challenges. Computer 40, 13–16 (2007)CrossRefGoogle Scholar
  26. 26.
    Liang, B., Huang, J., Liu, F., Wang, D., Dong, D., Liang, Z.: Malicious webpages detection based on abnormal visibility recognition. In: Proc. of International conference on e-business and information system security, 2009. EBISS ’09. 23–24 May, pp. 1–5 (2009)Google Scholar
  27. 27.
    Liu, P., Wang, X.: Identification of malicious webpages by inductive learning. In: Proc. of the international conference on web information systems and mining, Springer-Verlag, Shanghai (2009)Google Scholar
  28. 28.
    Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Beyond blacklists: learning to detect malicious web sites from suspicious URLs. In: Proc. of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, Paris (2009)Google Scholar
  29. 29.
    Moshchuk, E., Bragin, T., Gribble, S.D., Levy, H.M.: A crawler-based study of spyware on the Web (2006)Google Scholar
  30. 30.
    Niels, P., Rajab M.A., Panayiotis, M.: Cybercrime 2.0: when the cloud turns dark. Queue 7, 46–47 (2009)Google Scholar
  31. 31.
  32. 32.
    Provos, N., Mavrommatis, P., Abu, M., Monrose, R.F.: All your iframes point to us. Google Inc, (2008)Google Scholar
  33. 33.
    Provos, N., McNamee, D., Mavrommatis, P., Wang, K., Modadugu, A.: The ghost in the browser: analysis of web-based malware. In: Proc. of the first USENIX workshop on hot topics in Botnets (2007)Google Scholar
  34. 34.
    Rajab, M.A., Ballard, L., Mavrommatis, P., Provos, N., Zhao, X.: The nocebo effect on the web: an analysis of fake anti-virus distribution. In: Proc. of the 3rd USENIX Conference on large-scale exploits and emergent threats: botnets, spyware, worms, and more, LEET (2010)Google Scholar
  35. 35.
    Ranadive, A., Demir, T., Rizvi, S., Daswani, N.: Malware distribution via widgetization of the web. Accessed 02 May 2010
  36. 36.
    Rieck, K., Krueger, T., Dewald, A.: Cujo: Efficient detection and prevention of drive-by-download attacks. In: Proc. of the annual computer security applications conference (ACSAC) (2010)Google Scholar
  37. 37.
    Security Threat Report 2014, Sophos White PaperGoogle Scholar
  38. 38.
    Seifert, C., Welch, I., Komisarczuk, P.: Identification of malicious webpages with static heuristics. In: Proc. of telecommunication networks and applications conference, 2008. ATNAC 2008. Australasian, pp. 91–96 (2008)Google Scholar
  39. 39.
    Seifert, C.: Know your enemy: behind the scenes of malicious web servers. The Honeynet Project (2007)Google Scholar
  40. 40.
    Seifert, C., Welch, I., Komisarczuk, P.: HoneyC—the low-interaction client honeypot. NZCSRSC, Hamilton (2007)Google Scholar
  41. 41.
    Shih-Fen, L., Yung-Tsung, H., Chia-Mei, C., Bingchiang, J., Chi-Sung, L.: Malicious webpage detection by semantics-aware reasoning. In: Proc. of the eighth international conference on intelligent systems design and applications, 2008. ISDA ’08, pp. 115–120 (2008)Google Scholar
  42. 42.
    Spam SEO trends & statistics. Accessed 02 May 2010
  43. 43.
    Tao, W., Shunzheng, Y., Bailin, X., Novel, A.: Framework for learning to detect malicious webpages. In: Proc. of information technology and applications (IFITA), 2010 International. Forum 16–18 July, pp. 353–357 (2010)Google Scholar
  44. 44.
    Wang, Y.-M., Beck, D., Jiang, X., Roussev, R., Verbowski, C., Chen, S., King, S.: Automated web patrol with strider honeymonkeys: findingweb sites that exploit browser vulnerabilities. In: Proc. of the symposium on network and distributed system security (NDSS) (2006)Google Scholar
  45. 45.
    Xiaoyan, S., Yang, W., Jie, R., Yuefei, Z., Shengli, L.: Collecting internet malware based on client-side honeypot. In: Proc. of the 9th international conference for young computer scientists, 2008. ICYCS 2008, pp. 1493–1498 (2008)Google Scholar
  46. 46.
    Zhong, J., Wei, G., Zhang, D., Yang, Y.: SAB2: A novel system of malicious webpages detection. In: Proc. of IEEE international conference broadband network and multimedia technology (IC-BNMT), pp. 733–737 (2010)Google Scholar

Copyright information

© Springer-Verlag France 2016

Authors and Affiliations

  1. 1.Department of EngineeringUniversity of SannioBeneventoItaly

Personalised recommendations