Abstract
The increasing growth of malicious websites and systems for distributing malware through websites is making it urgent the adoption of effective techniques for timely detection of web security threats. Current mechanisms may exhibit some limitations, mainly concerning the amount of resources required, and a low true positives rate for zero-day attacks. With this paper, we propose and validate a set of features extracted from the content and the structure of webpages, which could be used as indicators of web security threats. The features are used for building a predictor, based on five machine learning algorithms, which is applied to classify unknown web applications. The experimentation demonstrated that the proposed set of features is able to correctly classify malicious web sites with a high level of precision, corresponding to 0.84 in the best case, and recall corresponding to 0.89 in the best case. The classifiers reveal to be successful also with zero day attacks.
Similar content being viewed by others
References
Akiyama, M., Yagi, T., Itoh, M.: Searching Structural Neighborhood of Malicious URLs to Improve Blacklisting. In: proc. of Applications and the Internet (SAINT), 2011 IEEE/IPSJ 11th International Symposium, IEEE, 18–21 July, pp. 1–10 (2011)
Alme, C.: Web browsers: an emerging platform under attack. MCAfee (2008)
Almorsy, M., Grundy, J., Ibrahim, A.S.: Supporting automated vulnerability analysis using formalized vulnerability signatures. In: Proc. of automated software engineering 2012 (ASE2012), ACM
Balduzzi, M., Egele, M., Kirda, E., Balzarotti, D., Kruegel, C.: A solution for the automated detection of clickjacking attacks. In: ASI-ACCS’10 (2010)
Barth, A., Jackson, C., Mitchell, J.C.: Robust defenses for cross-site request forgery. In: Proc. of communication and computer security (CCS’08), pp. 75–88 (2008)
Barth, A., Jackson, C., Mitchell, J.: Securing frame communication in browsers. Commun. ACM 52, 83–91 (2009)
Bin, L., Jianjun, H., Fang, L., Dawei, W., Daxiang, D., Zhaohui, L.: Malicious webpages detection based on abnormal visibility recognition. In: Proc. of international conference on e-business and information system security, 2009. EBISS ’09, pp. 1–5 (2009)
Canali, D., Cova, M., Kruegel, C., Vigna, G.: Prophiler: a fast filter for the large-scale detection of malicious webpages. In: Proc. of the 20th international conference on World wide web (WWW’11, ACM,), pp. 197–206 (2011)
Charles, R., John, D., Helen, J.W., Opher, D., Saher, E.: BrowserShield, Vulnerability-driven filtering of dynamic HTML. ACM Trans. Web 1, 11 (2007)
Chia-Mei, C., Wan-Yi, T., Hsiao-Chung, L.: Anomaly behavior analysis for webpage inspection. In: Proc. of the first international conference on networks and communications, 2009. NETCOM ’09, pp. 358–363 (2009)
Cova, M., Kruegel, C., Vigna, G.: Detection and analysis of drive-by-download attacks and malicious javascript code. In: WWW2010. Raleigh (2010)
Curtsinger, C., Livshits, B., Zorn, B., Seifert, C.: Zozzle: Low-overhead mostly static javascript malware detection. In: Proc. of the USENIX security symposium (2011)
Ford, S., Cova, M., Kruegel, C., Vigna, G.: Analyzing and detecting malicious flash advertisements. In: Proc. of computer security applications conference, 2009. ACSAC ’09, pp. 363–372 (2009)
Gargoyle, Html Unit, Gargoyle Software Inc., http://htmlunit.sourceforge.net/. Accessed 02 May 2010
Gyongyi, Z., Garcia-Molina, H.: Web spam taxonomy. Stanford University, California (2004)
Hansen, R.: Clickjacking. http://ha.ckers.org/blog/20080915/clickjacking/. Accessed 02 May 2010
Hou, Y.-T., Chang, Y., Chen, T., Laih, C.-S., Chen, C.-M.: Malicious web content detection by machine learning, Expert Syst. Appl. (2009, In Press, Corrected Proof)
Ikinci, A., Holz, T., Freiling, F.: Monkey-spider: detecting malicious websites with low-interaction honeyclients. Sicherheit, Saarbruecken (2008)
Jianwei, Z., Yonglin, Z., Jinpeng, G., Minghua, W., Xulu, J., Weimin, S., Yuejin, D.: Malicious websites on the Chinese web: overview and case study. Peking University, Beijing (2007)
John, J.P., Yu, F., Xie, Y., Krishnamurthy, A., Abadi, M.: deSEO: Combating search-result poisoning. In: 20th USENIX security syposium (2011)
Jose, M., Ralf, S., Helen, J.W., Yi-Min, W.: A systematic approach to uncover security flaws in GUI logic. In: Proceedings of the 2007 IEEE symposium on security and privacy, IEEE Computer Society (2007)
Kapravelos, A., Shoshitaishvili, Y., Cova, M., Kruegel, C., Vigna, G.: Revolver: an automated approach to the detection of evasive web-based malware. In: Proc. of the 22nd Usenix security symposium (2013)
Keats, S., Koshy, E.: The web’s most dangerous search term. McAfee (2009)
Lam Le, V., Welch, I., Gao, X., Komisarczuk, P.: Two-stage classification model to detect malicious webpages. In: Proc. of IEEE international conference on advanced information networking and applications (AINA), 2011, 22–25 March, IEEE, pp. 113–120 (2011)
Lawton, G.: Web 2.0 creates security challenges. Computer 40, 13–16 (2007)
Liang, B., Huang, J., Liu, F., Wang, D., Dong, D., Liang, Z.: Malicious webpages detection based on abnormal visibility recognition. In: Proc. of International conference on e-business and information system security, 2009. EBISS ’09. 23–24 May, pp. 1–5 (2009)
Liu, P., Wang, X.: Identification of malicious webpages by inductive learning. In: Proc. of the international conference on web information systems and mining, Springer-Verlag, Shanghai (2009)
Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Beyond blacklists: learning to detect malicious web sites from suspicious URLs. In: Proc. of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, Paris (2009)
Moshchuk, E., Bragin, T., Gribble, S.D., Levy, H.M.: A crawler-based study of spyware on the Web (2006)
Niels, P., Rajab M.A., Panayiotis, M.: Cybercrime 2.0: when the cloud turns dark. Queue 7, 46–47 (2009)
Paul Stone. Next generation clickjacking. https://media.blackhat.com/bh-eu-10/presentations/Stone/BlackHat-EU-2010-Stone-Next-Generation-Clickjacking-slides.pdf (2010)
Provos, N., Mavrommatis, P., Abu, M., Monrose, R.F.: All your iframes point to us. Google Inc, (2008)
Provos, N., McNamee, D., Mavrommatis, P., Wang, K., Modadugu, A.: The ghost in the browser: analysis of web-based malware. In: Proc. of the first USENIX workshop on hot topics in Botnets (2007)
Rajab, M.A., Ballard, L., Mavrommatis, P., Provos, N., Zhao, X.: The nocebo effect on the web: an analysis of fake anti-virus distribution. In: Proc. of the 3rd USENIX Conference on large-scale exploits and emergent threats: botnets, spyware, worms, and more, LEET (2010)
Ranadive, A., Demir, T., Rizvi, S., Daswani, N.: Malware distribution via widgetization of the web. https://media.blackhat.com/bh-dc-11/Daswani/BlackHat_DC_2011_Daswani_Malware%20Dist-wp.pdf. Accessed 02 May 2010
Rieck, K., Krueger, T., Dewald, A.: Cujo: Efficient detection and prevention of drive-by-download attacks. In: Proc. of the annual computer security applications conference (ACSAC) (2010)
Security Threat Report 2014, Sophos White Paper
Seifert, C., Welch, I., Komisarczuk, P.: Identification of malicious webpages with static heuristics. In: Proc. of telecommunication networks and applications conference, 2008. ATNAC 2008. Australasian, pp. 91–96 (2008)
Seifert, C.: Know your enemy: behind the scenes of malicious web servers. The Honeynet Project (2007)
Seifert, C., Welch, I., Komisarczuk, P.: HoneyC—the low-interaction client honeypot. NZCSRSC, Hamilton (2007)
Shih-Fen, L., Yung-Tsung, H., Chia-Mei, C., Bingchiang, J., Chi-Sung, L.: Malicious webpage detection by semantics-aware reasoning. In: Proc. of the eighth international conference on intelligent systems design and applications, 2008. ISDA ’08, pp. 115–120 (2008)
Spam SEO trends & statistics. http://research.zscaler.com/2010/07/spam-seo-trends-statistics-part-ii.html. Accessed 02 May 2010
Tao, W., Shunzheng, Y., Bailin, X., Novel, A.: Framework for learning to detect malicious webpages. In: Proc. of information technology and applications (IFITA), 2010 International. Forum 16–18 July, pp. 353–357 (2010)
Wang, Y.-M., Beck, D., Jiang, X., Roussev, R., Verbowski, C., Chen, S., King, S.: Automated web patrol with strider honeymonkeys: findingweb sites that exploit browser vulnerabilities. In: Proc. of the symposium on network and distributed system security (NDSS) (2006)
Xiaoyan, S., Yang, W., Jie, R., Yuefei, Z., Shengli, L.: Collecting internet malware based on client-side honeypot. In: Proc. of the 9th international conference for young computer scientists, 2008. ICYCS 2008, pp. 1493–1498 (2008)
Zhong, J., Wei, G., Zhang, D., Yang, Y.: SAB2: A novel system of malicious webpages detection. In: Proc. of IEEE international conference broadband network and multimedia technology (IC-BNMT), pp. 733–737 (2010)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Canfora, G., Visaggio, C.A. A set of features to detect web security threats. J Comput Virol Hack Tech 12, 243–261 (2016). https://doi.org/10.1007/s11416-016-0266-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11416-016-0266-2