Research on classification of malware source code

Chia-mei, Chen; Gu-hsin, Lai

doi:10.1007/s12204-014-1519-1

Research on classification of malware source code

Published: 05 August 2014

Volume 19, pages 425–430, (2014)
Cite this article

Journal of Shanghai Jiaotong University (Science) Aims and scope Submit manuscript

Chen Chia-mei (陈嘉玫)¹ &
Lai Gu-hsin (赖谷鑫)²

254 Accesses
2 Citations
Explore all metrics

Abstract

In the face threat of the Internet attack, malware classification is one of the promising solutions in the field of intrusion detection and digital forensics. In previous work, researchers performed dynamic analysis or static analysis after reverse engineering. But malware developers even use anti-virtual machine (VM) and obfuscation techniques to evade malware classifiers. By means of the deployment of honeypots, malware source code could be collected and analyzed. Source code analysis provides a better classification for understanding the purpose of attackers and forensics. In this paper, a novel classification approach is proposed, based on content similarity and directory structure similarity. Such a classification avoids to re-analyze known malware and allocates resources for new malware. Malware classification also let network administrators know the purpose of attackers. The experimental results demonstrate that the proposed system can classify the malware efficiently with a small misclassification ratio and the performance is better than virustotal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

How different are different diff algorithms in Git?

Article Open access 11 September 2019

Application of Deep Learning for Code Smell Detection: Challenges and Opportunities

Article 03 June 2024

Malware Detection Using Automated Generation of Yara Rules on Dynamic Features

References

Jain S, Meena Y K. Byte level n-gram analysis for malware detection [M]. Berlin: Springer Heidelberg, 2011: 51–59.
Google Scholar
Kolter J Z, Maloof M A. Learning to detect and classify malicious executables in the wild [J]. Journal of Machine Learning Research, 2006, 7: 2721–2744.
MATH MathSciNet Google Scholar
Tahan G, Rokach L, Shahar Y. Mal-ID: Automatic malware detection using common segment analysis and meta-features [J]. Journal of Machine Learning Research, 2012, 13: 949–979.
MATH MathSciNet Google Scholar
Zhang B, Yin J, Hao J, et al. Malicious codes detection based on ensemble learning [J]. Lecture Notes in Computer Science, 2007, 4610: 468–477.
Article Google Scholar
Ye Y, Wang D, Li T, et al. An intelligent pe-malware detection system based on association mining [J]. Journal in Computer Virology, 2008, 4(4): 323–334.
Article Google Scholar
Ye Y, Chen L, Wang D, et al. Sbmds: an interpretable string based malware detection system using SVM ensemble with bagging [J]. Journal in Computer Virology, 2009, 5(4): 283–293.
Article Google Scholar
Ye Y, Li T, Wang D, et al. Hierarchical associative classifier (HAC) for malware detection from the large and imbalanced gray list [J]. Journal of Intelligent Information Systems, 2010, 35(1): 1–20.
Article Google Scholar
Cesare S, Xiang Y. Classification of malware using structured control flow [C]//Proceedings of the 8th Australasian Symposium on Parallel and Distributed Computing (AusPDC 2010). Darlinghurst, Australia: Australian Computer Society, 2010: 61–70.
Google Scholar
Cesare S, Xiang Y, Zhou W. Malwise—An effective and efficient classification system for packed and polymorphic malware [J]. IEEE Transactions on Computers, 2013, 62(6): 1193–1206.
Article MathSciNet Google Scholar
Gheorghescu M. An automated virus classification system [C]// Virus Bulletin Conference. Dublin, Ireland: Virus Bulletin, 2005: 294–300.
Google Scholar
Rieck K, Trinius P, Willems C, et al. Automatic analysis of malware behavior using machine learning [J]. Journal of Computer Security, 2011, 19(4): 639–668.
Google Scholar
Willems C, Holz T, Freiling F. Toward automated dynamic malware analysis using CWSandbox [J]. IEEE Security and Privacy, 2007, 2(5): 32–39.
Article Google Scholar
Zhang J, Porras P, Yegneswaran V. Host-rx: Automated malware diagnosis based on probabilistic behavior models [R]. California, USA: SRI International, 2009.
Google Scholar
Zhao H, Xu M, Zheng N, et al. Malicious executables classification based on behavioral factor analysis [C]//Proceedings of International Conference on e-Education, e-Business, e-Management and e-Learning. Washington, USA: IEEE Computer Society, 2010: 502–506.
Google Scholar
Lutz P, Guido M, Michael P. JPlag: Finding plagiarisms among a set of programs with JPlag [J]. Journal of Universal Computer Science, 2002, 8(11): 1016–1038.
Google Scholar
Cosma G, Joy M. An approach to source-code plagiarism detection and investigation using latent semantic analysis [J]. IEEE Transactions on Computers, 2012, 61(3): 379–394.
Article MathSciNet Google Scholar
Rokach L, Romano R, Maimon O. Negation recognition in medical narrative reports [J]. Information Retrieval, 2008, 11(6): 499–538.
Article Google Scholar
Bloom B H. Space/time trade-offs in hash coding with allowable errors [J]. Communications of the ACM, 1970, 13(7): 422–426.
Article MATH Google Scholar
Gitchell D, Tran N. Sim: A utility for detecting similarity in computer programs [C]//Proceedings of the 30th SIGCSE Technical Symposium. New York, USA: ACM, 1999: 266–270.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Management, National Sun Yat-Sen University, Kaohsiung 804, Taiwan, China
Chen Chia-mei (陈嘉玫)
Department of Information Management, Chinese Culture University, Taipei, 111, China
Lai Gu-hsin (赖谷鑫)

Authors

Chen Chia-mei (陈嘉玫)
View author publications
You can also search for this author in PubMed Google Scholar
Lai Gu-hsin (赖谷鑫)
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lai Gu-hsin (赖谷鑫).

Additional information

Foundation item: the Project of the Ministry of Science and Technology, Taiwan, China (Nos. NSC 100-2218-E-110-004-MY3 and NSC 100-2218-E-110-011)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chia-mei, C., Gu-hsin, L. Research on classification of malware source code. J. Shanghai Jiaotong Univ. (Sci.) 19, 425–430 (2014). https://doi.org/10.1007/s12204-014-1519-1

Download citation

Received: 28 October 2013
Published: 05 August 2014
Issue Date: August 2014
DOI: https://doi.org/10.1007/s12204-014-1519-1

Key words

CLC number

TP 393.08

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Research on classification of malware source code

Abstract

Access this article

Similar content being viewed by others

How different are different diff algorithms in Git?

Application of Deep Learning for Code Smell Detection: Challenges and Opportunities

Malware Detection Using Automated Generation of Yara Rules on Dynamic Features

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Key words

CLC number

Navigation

Research on classification of malware source code

Abstract

Access this article

Similar content being viewed by others

How different are different diff algorithms in Git?

Application of Deep Learning for Code Smell Detection: Challenges and Opportunities

Malware Detection Using Automated Generation of Yara Rules on Dynamic Features

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Search

Navigation