Skip to main content
Log in

A set of features to detect web security threats

  • Original Paper
  • Published:
Journal of Computer Virology and Hacking Techniques Aims and scope Submit manuscript

Abstract

The increasing growth of malicious websites and systems for distributing malware through websites is making it urgent the adoption of effective techniques for timely detection of web security threats. Current mechanisms may exhibit some limitations, mainly concerning the amount of resources required, and a low true positives rate for zero-day attacks. With this paper, we propose and validate a set of features extracted from the content and the structure of webpages, which could be used as indicators of web security threats. The features are used for building a predictor, based on five machine learning algorithms, which is applied to classify unknown web applications. The experimentation demonstrated that the proposed set of features is able to correctly classify malicious web sites with a high level of precision, corresponding to 0.84 in the best case, and recall corresponding to 0.89 in the best case. The classifiers reveal to be successful also with zero day attacks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Akiyama, M., Yagi, T., Itoh, M.: Searching Structural Neighborhood of Malicious URLs to Improve Blacklisting. In: proc. of Applications and the Internet (SAINT), 2011 IEEE/IPSJ 11th International Symposium, IEEE, 18–21 July, pp. 1–10 (2011)

  2. Alme, C.: Web browsers: an emerging platform under attack. MCAfee (2008)

  3. Almorsy, M., Grundy, J., Ibrahim, A.S.: Supporting automated vulnerability analysis using formalized vulnerability signatures. In: Proc. of automated software engineering 2012 (ASE2012), ACM

  4. Balduzzi, M., Egele, M., Kirda, E., Balzarotti, D., Kruegel, C.: A solution for the automated detection of clickjacking attacks. In: ASI-ACCS’10 (2010)

  5. Barth, A., Jackson, C., Mitchell, J.C.: Robust defenses for cross-site request forgery. In: Proc. of communication and computer security (CCS’08), pp. 75–88 (2008)

  6. Barth, A., Jackson, C., Mitchell, J.: Securing frame communication in browsers. Commun. ACM 52, 83–91 (2009)

    Article  Google Scholar 

  7. Bin, L., Jianjun, H., Fang, L., Dawei, W., Daxiang, D., Zhaohui, L.: Malicious webpages detection based on abnormal visibility recognition. In: Proc. of international conference on e-business and information system security, 2009. EBISS ’09, pp. 1–5 (2009)

  8. Canali, D., Cova, M., Kruegel, C., Vigna, G.: Prophiler: a fast filter for the large-scale detection of malicious webpages. In: Proc. of the 20th international conference on World wide web (WWW’11, ACM,), pp. 197–206 (2011)

  9. Charles, R., John, D., Helen, J.W., Opher, D., Saher, E.: BrowserShield, Vulnerability-driven filtering of dynamic HTML. ACM Trans. Web 1, 11 (2007)

    Article  Google Scholar 

  10. Chia-Mei, C., Wan-Yi, T., Hsiao-Chung, L.: Anomaly behavior analysis for webpage inspection. In: Proc. of the first international conference on networks and communications, 2009. NETCOM ’09, pp. 358–363 (2009)

  11. Cova, M., Kruegel, C., Vigna, G.: Detection and analysis of drive-by-download attacks and malicious javascript code. In: WWW2010. Raleigh (2010)

  12. Curtsinger, C., Livshits, B., Zorn, B., Seifert, C.: Zozzle: Low-overhead mostly static javascript malware detection. In: Proc. of the USENIX security symposium (2011)

  13. Ford, S., Cova, M., Kruegel, C., Vigna, G.: Analyzing and detecting malicious flash advertisements. In: Proc. of computer security applications conference, 2009. ACSAC ’09, pp. 363–372 (2009)

  14. Gargoyle, Html Unit, Gargoyle Software Inc., http://htmlunit.sourceforge.net/. Accessed 02 May 2010

  15. Gyongyi, Z., Garcia-Molina, H.: Web spam taxonomy. Stanford University, California (2004)

    Google Scholar 

  16. Hansen, R.: Clickjacking. http://ha.ckers.org/blog/20080915/clickjacking/. Accessed 02 May 2010

  17. Hou, Y.-T., Chang, Y., Chen, T., Laih, C.-S., Chen, C.-M.: Malicious web content detection by machine learning, Expert Syst. Appl. (2009, In Press, Corrected Proof)

  18. Ikinci, A., Holz, T., Freiling, F.: Monkey-spider: detecting malicious websites with low-interaction honeyclients. Sicherheit, Saarbruecken (2008)

    Google Scholar 

  19. Jianwei, Z., Yonglin, Z., Jinpeng, G., Minghua, W., Xulu, J., Weimin, S., Yuejin, D.: Malicious websites on the Chinese web: overview and case study. Peking University, Beijing (2007)

    Google Scholar 

  20. John, J.P., Yu, F., Xie, Y., Krishnamurthy, A., Abadi, M.: deSEO: Combating search-result poisoning. In: 20th USENIX security syposium (2011)

  21. Jose, M., Ralf, S., Helen, J.W., Yi-Min, W.: A systematic approach to uncover security flaws in GUI logic. In: Proceedings of the 2007 IEEE symposium on security and privacy, IEEE Computer Society (2007)

  22. Kapravelos, A., Shoshitaishvili, Y., Cova, M., Kruegel, C., Vigna, G.: Revolver: an automated approach to the detection of evasive web-based malware. In: Proc. of the 22nd Usenix security symposium (2013)

  23. Keats, S., Koshy, E.: The web’s most dangerous search term. McAfee (2009)

  24. Lam Le, V., Welch, I., Gao, X., Komisarczuk, P.: Two-stage classification model to detect malicious webpages. In: Proc. of IEEE international conference on advanced information networking and applications (AINA), 2011, 22–25 March, IEEE, pp. 113–120 (2011)

  25. Lawton, G.: Web 2.0 creates security challenges. Computer 40, 13–16 (2007)

    Article  Google Scholar 

  26. Liang, B., Huang, J., Liu, F., Wang, D., Dong, D., Liang, Z.: Malicious webpages detection based on abnormal visibility recognition. In: Proc. of International conference on e-business and information system security, 2009. EBISS ’09. 23–24 May, pp. 1–5 (2009)

  27. Liu, P., Wang, X.: Identification of malicious webpages by inductive learning. In: Proc. of the international conference on web information systems and mining, Springer-Verlag, Shanghai (2009)

  28. Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Beyond blacklists: learning to detect malicious web sites from suspicious URLs. In: Proc. of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, Paris (2009)

  29. Moshchuk, E., Bragin, T., Gribble, S.D., Levy, H.M.: A crawler-based study of spyware on the Web (2006)

  30. Niels, P., Rajab M.A., Panayiotis, M.: Cybercrime 2.0: when the cloud turns dark. Queue 7, 46–47 (2009)

  31. Paul Stone. Next generation clickjacking. https://media.blackhat.com/bh-eu-10/presentations/Stone/BlackHat-EU-2010-Stone-Next-Generation-Clickjacking-slides.pdf (2010)

  32. Provos, N., Mavrommatis, P., Abu, M., Monrose, R.F.: All your iframes point to us. Google Inc, (2008)

  33. Provos, N., McNamee, D., Mavrommatis, P., Wang, K., Modadugu, A.: The ghost in the browser: analysis of web-based malware. In: Proc. of the first USENIX workshop on hot topics in Botnets (2007)

  34. Rajab, M.A., Ballard, L., Mavrommatis, P., Provos, N., Zhao, X.: The nocebo effect on the web: an analysis of fake anti-virus distribution. In: Proc. of the 3rd USENIX Conference on large-scale exploits and emergent threats: botnets, spyware, worms, and more, LEET (2010)

  35. Ranadive, A., Demir, T., Rizvi, S., Daswani, N.: Malware distribution via widgetization of the web. https://media.blackhat.com/bh-dc-11/Daswani/BlackHat_DC_2011_Daswani_Malware%20Dist-wp.pdf. Accessed 02 May 2010

  36. Rieck, K., Krueger, T., Dewald, A.: Cujo: Efficient detection and prevention of drive-by-download attacks. In: Proc. of the annual computer security applications conference (ACSAC) (2010)

  37. Security Threat Report 2014, Sophos White Paper

  38. Seifert, C., Welch, I., Komisarczuk, P.: Identification of malicious webpages with static heuristics. In: Proc. of telecommunication networks and applications conference, 2008. ATNAC 2008. Australasian, pp. 91–96 (2008)

  39. Seifert, C.: Know your enemy: behind the scenes of malicious web servers. The Honeynet Project (2007)

  40. Seifert, C., Welch, I., Komisarczuk, P.: HoneyC—the low-interaction client honeypot. NZCSRSC, Hamilton (2007)

    Google Scholar 

  41. Shih-Fen, L., Yung-Tsung, H., Chia-Mei, C., Bingchiang, J., Chi-Sung, L.: Malicious webpage detection by semantics-aware reasoning. In: Proc. of the eighth international conference on intelligent systems design and applications, 2008. ISDA ’08, pp. 115–120 (2008)

  42. Spam SEO trends & statistics. http://research.zscaler.com/2010/07/spam-seo-trends-statistics-part-ii.html. Accessed 02 May 2010

  43. Tao, W., Shunzheng, Y., Bailin, X., Novel, A.: Framework for learning to detect malicious webpages. In: Proc. of information technology and applications (IFITA), 2010 International. Forum 16–18 July, pp. 353–357 (2010)

  44. Wang, Y.-M., Beck, D., Jiang, X., Roussev, R., Verbowski, C., Chen, S., King, S.: Automated web patrol with strider honeymonkeys: findingweb sites that exploit browser vulnerabilities. In: Proc. of the symposium on network and distributed system security (NDSS) (2006)

  45. Xiaoyan, S., Yang, W., Jie, R., Yuefei, Z., Shengli, L.: Collecting internet malware based on client-side honeypot. In: Proc. of the 9th international conference for young computer scientists, 2008. ICYCS 2008, pp. 1493–1498 (2008)

  46. Zhong, J., Wei, G., Zhang, D., Yang, Y.: SAB2: A novel system of malicious webpages detection. In: Proc. of IEEE international conference broadband network and multimedia technology (IC-BNMT), pp. 733–737 (2010)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Corrado Aaron Visaggio.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Canfora, G., Visaggio, C.A. A set of features to detect web security threats. J Comput Virol Hack Tech 12, 243–261 (2016). https://doi.org/10.1007/s11416-016-0266-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11416-016-0266-2

Keywords

Navigation