Skip to main content
Log in

Research on classification of malware source code

  • Published:
Journal of Shanghai Jiaotong University (Science) Aims and scope Submit manuscript

Abstract

In the face threat of the Internet attack, malware classification is one of the promising solutions in the field of intrusion detection and digital forensics. In previous work, researchers performed dynamic analysis or static analysis after reverse engineering. But malware developers even use anti-virtual machine (VM) and obfuscation techniques to evade malware classifiers. By means of the deployment of honeypots, malware source code could be collected and analyzed. Source code analysis provides a better classification for understanding the purpose of attackers and forensics. In this paper, a novel classification approach is proposed, based on content similarity and directory structure similarity. Such a classification avoids to re-analyze known malware and allocates resources for new malware. Malware classification also let network administrators know the purpose of attackers. The experimental results demonstrate that the proposed system can classify the malware efficiently with a small misclassification ratio and the performance is better than virustotal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Jain S, Meena Y K. Byte level n-gram analysis for malware detection [M]. Berlin: Springer Heidelberg, 2011: 51–59.

    Google Scholar 

  2. Kolter J Z, Maloof M A. Learning to detect and classify malicious executables in the wild [J]. Journal of Machine Learning Research, 2006, 7: 2721–2744.

    MATH  MathSciNet  Google Scholar 

  3. Tahan G, Rokach L, Shahar Y. Mal-ID: Automatic malware detection using common segment analysis and meta-features [J]. Journal of Machine Learning Research, 2012, 13: 949–979.

    MATH  MathSciNet  Google Scholar 

  4. Zhang B, Yin J, Hao J, et al. Malicious codes detection based on ensemble learning [J]. Lecture Notes in Computer Science, 2007, 4610: 468–477.

    Article  Google Scholar 

  5. Ye Y, Wang D, Li T, et al. An intelligent pe-malware detection system based on association mining [J]. Journal in Computer Virology, 2008, 4(4): 323–334.

    Article  Google Scholar 

  6. Ye Y, Chen L, Wang D, et al. Sbmds: an interpretable string based malware detection system using SVM ensemble with bagging [J]. Journal in Computer Virology, 2009, 5(4): 283–293.

    Article  Google Scholar 

  7. Ye Y, Li T, Wang D, et al. Hierarchical associative classifier (HAC) for malware detection from the large and imbalanced gray list [J]. Journal of Intelligent Information Systems, 2010, 35(1): 1–20.

    Article  Google Scholar 

  8. Cesare S, Xiang Y. Classification of malware using structured control flow [C]//Proceedings of the 8th Australasian Symposium on Parallel and Distributed Computing (AusPDC 2010). Darlinghurst, Australia: Australian Computer Society, 2010: 61–70.

    Google Scholar 

  9. Cesare S, Xiang Y, Zhou W. Malwise—An effective and efficient classification system for packed and polymorphic malware [J]. IEEE Transactions on Computers, 2013, 62(6): 1193–1206.

    Article  MathSciNet  Google Scholar 

  10. Gheorghescu M. An automated virus classification system [C]// Virus Bulletin Conference. Dublin, Ireland: Virus Bulletin, 2005: 294–300.

    Google Scholar 

  11. Rieck K, Trinius P, Willems C, et al. Automatic analysis of malware behavior using machine learning [J]. Journal of Computer Security, 2011, 19(4): 639–668.

    Google Scholar 

  12. Willems C, Holz T, Freiling F. Toward automated dynamic malware analysis using CWSandbox [J]. IEEE Security and Privacy, 2007, 2(5): 32–39.

    Article  Google Scholar 

  13. Zhang J, Porras P, Yegneswaran V. Host-rx: Automated malware diagnosis based on probabilistic behavior models [R]. California, USA: SRI International, 2009.

    Google Scholar 

  14. Zhao H, Xu M, Zheng N, et al. Malicious executables classification based on behavioral factor analysis [C]//Proceedings of International Conference on e-Education, e-Business, e-Management and e-Learning. Washington, USA: IEEE Computer Society, 2010: 502–506.

    Google Scholar 

  15. Lutz P, Guido M, Michael P. JPlag: Finding plagiarisms among a set of programs with JPlag [J]. Journal of Universal Computer Science, 2002, 8(11): 1016–1038.

    Google Scholar 

  16. Cosma G, Joy M. An approach to source-code plagiarism detection and investigation using latent semantic analysis [J]. IEEE Transactions on Computers, 2012, 61(3): 379–394.

    Article  MathSciNet  Google Scholar 

  17. Rokach L, Romano R, Maimon O. Negation recognition in medical narrative reports [J]. Information Retrieval, 2008, 11(6): 499–538.

    Article  Google Scholar 

  18. Bloom B H. Space/time trade-offs in hash coding with allowable errors [J]. Communications of the ACM, 1970, 13(7): 422–426.

    Article  MATH  Google Scholar 

  19. Gitchell D, Tran N. Sim: A utility for detecting similarity in computer programs [C]//Proceedings of the 30th SIGCSE Technical Symposium. New York, USA: ACM, 1999: 266–270.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lai Gu-hsin  (赖谷鑫).

Additional information

Foundation item: the Project of the Ministry of Science and Technology, Taiwan, China (Nos. NSC 100-2218-E-110-004-MY3 and NSC 100-2218-E-110-011)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chia-mei, C., Gu-hsin, L. Research on classification of malware source code. J. Shanghai Jiaotong Univ. (Sci.) 19, 425–430 (2014). https://doi.org/10.1007/s12204-014-1519-1

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12204-014-1519-1

Key words

CLC number

Navigation