Journal in Computer Virology

, Volume 4, Issue 4, pp 323–334 | Cite as

An intelligent PE-malware detection system based on association mining

  • Yanfang Ye
  • Dingding Wang
  • Tao Li
  • Dongyi Ye
  • Qingshan Jiang
Original Paper


The proliferation of malware has presented a serious threat to the security of computer systems. Traditional signature-based anti-virus systems fail to detect polymorphic/metamorphic and new, previously unseen malicious executables. Data mining methods such as Naive Bayes and Decision Tree have been studied on small collections of executables. In this paper, resting on the analysis of Windows APIs called by PE files, we develop the Intelligent Malware Detection System (IMDS) using Objective-Oriented Association (OOA) mining based classification. IMDS is an integrated system consisting of three major modules: PE parser, OOA rule generator, and rule based classifier. An OOA_Fast_FP-Growth algorithm is adapted to efficiently generate OOA rules for classification. A comprehensive experimental study on a large collection of PE files obtained from the anti-virus laboratory of KingSoft Corporation is performed to compare various malware detection approaches. Promising experimental results demonstrate that the accuracy and efficiency of our IMDS system outperform popular anti-virus software such as Norton AntiVirus and McAfee VirusScan, as well as previous data mining based detection systems which employed Naive Bayes, Support Vector Machine (SVM) and Decision Tree techniques. Our system has already been incorporated into the scanning tool of KingSoft’s Anti-Virus software.


Support Vector Machine Association Rule Application Program Interface Frequent Itemsets Malicious Code 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Adleman, L.: An abstract theory of computer viruses (invited talk). In: CRYPTO ’88: Proceedings on Advances in Cryptology, pp. 354–374, New York, NY, USA. Springer, New York (1990)Google Scholar
  2. 2.
    Agrawal, R., Imielinski, T.: Mining association rules between sets of items in large databases. In: Proceedings of SIGMOD (1993)Google Scholar
  3. 3.
    Agrawal, R., Srikant, R.: Fast algorithms for association rule mining. In: Proceedings of VLDB-94 (1994)Google Scholar
  4. 4.
    Cheng, H., Yan, X., Han, J., Hsu, C.: Discriminative frequenct pattern analysis for effective classification. In: Proceedings of IEEE 23rd International Conference on Data Engineering (ICDE-07) (2007)Google Scholar
  5. 5.
    Christodorescu, M., Jha, S.: Static analysis of executables to detect malicious patterns. In: Proceedings of the 12th USENIX Security Symposium (2003)Google Scholar
  6. 6.
    Fan M. and Li C. (2003). Mining frequent patterns in an fp-tree without conditional fp-tree generation. J. Comput. Res. Dev. 40: 1216–1222 Google Scholar
  7. 7.
    Filiol E. (2005). Computer Viruses: from Theory to Applications. Springer, Heidelberg zbMATHGoogle Scholar
  8. 8.
    Filiol E. (2006). Malware pattern scanning schemes secure against black-box analysis. J. Comput. Virol. 2(1): 35–50 CrossRefGoogle Scholar
  9. 9.
    Filiol E., Jacob G. and Liard M.L. (2007). Evaluation methodology and theoretical model for antiviral behavioural detection strategies. J. Comput. Virol. 3(1): 27–37 CrossRefGoogle Scholar
  10. 10.
    Han J. and Kamber M. (2006). Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco Google Scholar
  11. 11.
    Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of SIGMOD, pp. 1–12, May (2000)Google Scholar
  12. 12.
    Hsu C. and Lin C. (2002). A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Netw. 13: 415–425 CrossRefGoogle Scholar
  13. 13.
    Jain A., Duin R. and Mao J. (2000). Statistical pattern recognition: A review. IEEE Trans. Pattern Anal. Mach. Intell. 22: 4–37 CrossRefGoogle Scholar
  14. 14.
    Kephart, J., Arnold, W.: Automatic extraction of computer virus signatures. In: Proceedings of 4th Virus Bulletin International Conference, pp. 178–184 (1994)Google Scholar
  15. 15.
    Kolter, J., Maloof, M.: Learning to detect malicious executables in the wild. In: Proceedings of KDD’04 (2004)Google Scholar
  16. 16.
    Kwak N. and Choi C. (2002). Input feature selection by mutual information based on parzen window. IEEE Trans. Pattern Anal. Mach. Intell. 24: 1667–1671 CrossRefGoogle Scholar
  17. 17.
    Langley, P.: Selection of relevant features in machine learning. In: Proceedings of AAAI Fall Symposium (1994)Google Scholar
  18. 18.
    Lee, T., Mody, J.: Behavioral classification. In: Proceedings of 2006 EICAR Conference (2006)Google Scholar
  19. 19.
    Liu, B., Hsu, W., Ma, Y.: Integreting classification and association rule mining. In: Proceedings of KDD’98 (1998)Google Scholar
  20. 20.
    Lo R., Levitt K. and Olsson R. (1995). Mcf: A malicious code filter. Comput. Secur. 14: 541–566 CrossRefGoogle Scholar
  21. 21.
    McGraw G. and Morrisett G. (2002). Attacking malicious code: report to the infosec research council. IEEE Softw. 17(5): 33–41 CrossRefGoogle Scholar
  22. 22.
    Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27 (2005)Google Scholar
  23. 23.
    Rabek, J., Khazan, R., Lewandowski, S., Cunningham, R.: Detection of injected, dynamically generated, and obfuscated malicious code. In: Proceedings of the 2003 ACM Workshop on Rapid Malcode, pp. 76–82 (2003)Google Scholar
  24. 24.
    Schultz, M., Eskin, E., Zadok, E.: Data mining methods for detection of new malicious executables. In: Security and Privacy, 2001 Proceedings. 2001 IEEE Symposium on 14–16 May, pp. 38–49 (2001)Google Scholar
  25. 25.
    Shen, Y., Yang, Q., Zhang, Z.: Objective-oriented utility-based association mining. In: Proceedings of IEEE International Conference on Data Mining (2002)Google Scholar
  26. 26.
    Sung, A., Xu, J., Chavez, P., Mukkamala, S.: Static analyzer of vicious executables (save). In: Proceedings of the 20th Annual Computer Security Applications Conference (2004)Google Scholar
  27. 27.
    Swets J. and Pickett R. (1982). Evaluation of Diagnostic System: Methods from Signal Detection Theory. Academic Press, New York Google Scholar
  28. 28.
    Tan P., Steinbach M. and Kumar V. (2005). Introduction to Data Mining. Addison Wesley, Reading Google Scholar
  29. 29.
    Vapnik V. (1999). The Nature of Statistical Learning Theory. Springer, Heidelberg Google Scholar
  30. 30.
    Wang, J., Deng, P., Fan, Y., Jaw, L., Liu, Y.: Virus detection using data mining techniques. In: Proceedings of IEEE International Conference on Data Mining (2003)Google Scholar
  31. 31.
    Witten H. and Frank E. (2005). Data Mining: Practical Machine Learning Tools with Java Implementations. Morgan Kaufmann, San Francisco Google Scholar
  32. 32.
    Xu, J., Sung, A., Chavez, P., Mukkamala, S.: Polymorphic malicous executable sanner by api sequence analysis. In: Proceedings of the International Conference on Hybrid Intelligent Systems (2004)Google Scholar
  33. 33.
    Ye, Y., Wang, D., Li, T., Ye, D.: IMDS: Intelligent malware detection system. In: Proccedings of ACM International Conference on Knowlege Discovery and Data Mining (SIGKDD 2007) (2007)Google Scholar
  34. 34.
    Yin, X., Han, J.: Cpar: Classification based on predictive association rules. In: Proceedings of 3rd SIAM International Conference on Data Mining (SDM’03), May (2003)Google Scholar
  35. 35.
    Zuo Z. and Tian Zhou M. (2004). Some further theoretical results about computer viruses. Comput. J. 47(6): 627–633 CrossRefGoogle Scholar
  36. 36.
    Zuo Z., Zhu Q.-x. and Zhou M.-t. (2005). On the time complexity of computer viruses. IEEE Trans. Inf. Theory 51(8): 2962–2966 CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag France 2008

Authors and Affiliations

  • Yanfang Ye
    • 1
  • Dingding Wang
    • 2
  • Tao Li
    • 2
  • Dongyi Ye
    • 3
  • Qingshan Jiang
    • 4
  1. 1.Department of Computer ScienceXiamen UniversityXiamenPeople’s Republic of China
  2. 2.School of Computer ScienceFlorida International UniversityMiamiUSA
  3. 3.College of Maths and Computer ScienceFuzhou UniversityFuzhouPeople’s Republic of China
  4. 4.Software SchoolXiamen UniversityXiamenPeople’s Republic of China

Personalised recommendations