An Efficient Filtering Method for Detecting Malicous Web Pages

  • Jaeun Choi
  • Gisung Kim
  • Tae Ghyoon Kim
  • Sehun Kim
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7690)


There are ways to detect malicious web pages, two of which are dynamic detection and static detection. Dynamic detection has a high detection rate but uses a high amount of resources and takes a long time, whereas static analysis only uses a small amount of resources but its detection rate is low. To minimize the weaknesses of these two methods, a filtering method was suggested. This method uses static analysis first to filter normal web pages and then uses dynamic analysis to test only the remaining suspicious web pages. In this filtering method, if a page is classified as normal at the filtering stage, it is not being tested any more. However, the existing filtering method does not consider this problem. In this paper, to solve this problem, our proposed filtering method utilizes a cost-sensitive method. Also, to increase the efficiency of the filter, features are grouped as three subsets depending on the difficulty of the extraction. The efficiency of the proposed filter can be increased, as our method only uses the necessary feature subset according to the characteristics of the web pages. An experiment showed that the proposed method shows fewer false negatives and greater efficiency than an existing filtering method.


Internet security Malicious web page Filtering method Cost-sensitive analysis Machine learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bannur, S.N., Saul, L.K., Savage, S.: Judging a site by its content: learning the textual, structural, and visual features of malicious web pages. In: Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence, pp. 1–10. ACM (2011)Google Scholar
  2. 2.
    Canali, D., Cova, M., Vigna, G., Kruegel, C.: Prophiler: A fast filter for the large-scale detection of malicious web pages. In: Proceedings of the 20th International Conference on World Wide Web, pp. 197–206. ACM (2011)Google Scholar
  3. 3.
    Cova, M., Kruegel, C., Vigna, G.: Detection and analysis of drive-by-download attacks and malicious javascript code. In: Proceedings of the 19th International Conference on World Wide Web, pp. 281–290. ACM (2010)Google Scholar
  4. 4.
    Domingos, P.: Metacost: A general method for making classifiers cost-sensitive. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 155–164. ACM (1999)Google Scholar
  5. 5.
    Eshete, B., Villafiorita, A., Weldemariam, K.: Malicious website detection: Effectiveness and efficiency issues. In: First SysSec Workshop (SysSec 2011), pp. 123–126. IEEE (2011)Google Scholar
  6. 6.
    Hou, Y.T., Chang, Y., Chen, T., Laih, C.S., Chen, C.M.: Malicious web content detection by machine learning. Expert Systems with Applications 37(1), 55–60 (2010)CrossRefGoogle Scholar
  7. 7.
  8. 8.
    Likarish, P., Jung, E., Jo, I.: Obfuscated malicious javascript detection using classification techniques. In: 4th International Conference on Malicious and Unwanted Software (MALWARE 2009), pp. 47–54. IEEE (2009)Google Scholar
  9. 9.
    Nazario, J.: Phoneyc: a virtual client honeypot. In: Proceedings of the 2nd USENIX Conference on Large-Scale Exploits and Emergent Threats: Botnets, Spyware, Worms, and More, p. 6. USENIX Association (2009)Google Scholar
  10. 10.
    The Honeynet Project. Capture-hpc,
  11. 11.
    Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)Google Scholar
  12. 12.
    Quinlan, J.R.: C4. 5: programs for machine learning. Morgan Kaufmann (1993)Google Scholar
  13. 13.
    Seifert, C., Welch, I., Komisarczuk, P.: Identification of malicious web pages with static heuristics. In: Australasian Telecommunication Networks and Applications Conference, ATNAC 2008, pp. 91–96. IEEE (2008)Google Scholar
  14. 14.
    Tao, W., Shunzheng, Y., Bailin, X.: A novel framework for learning to detect malicious web pages. In: 2010 International Forum onInformation Technology and Applications (IFITA), vol. 2, pp. 353–357. IEEE (2010)Google Scholar
  15. 15.
    Wang, K.: Mitre honeyclient development project. Internet, (accessed: March 2009)
  16. 16.
    Wang, Y.M., Beck, D., Jiang, X., Roussev, R., Verbowski, C., Chen, S., King, S.: Automated web patrol with strider honeymonkeys. In: Proceedings of the 2006 Network and Distributed System Security Symposium, pp. 35–49 (2006)Google Scholar
  17. 17.

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Jaeun Choi
    • 1
  • Gisung Kim
    • 2
  • Tae Ghyoon Kim
    • 1
  • Sehun Kim
    • 3
  1. 1.Department of Industrial and Systems EngineeringKAISTDaejeonRepublic of Korea
  2. 2.Institute for Information Technology ConvergenceKAISTDaejeonRepublic of Korea
  3. 3.Department of Industrial and Systems Engineering and Graduate School of Information SecurityKAISTDaejeonRepublic of Korea

Personalised recommendations