A software classification scheme using binary-level characteristics for efficient software filtering

Kim, Yesol; Cho, Seong-je; Han, Sangchul; You, Ilsun

doi:10.1007/s00500-016-2357-x

A software classification scheme using binary-level characteristics for efficient software filtering

Methodologies and Application
Published: 23 September 2016

Volume 22, pages 595–606, (2018)
Cite this article

Soft Computing Aims and scope Submit manuscript

Yesol Kim¹,
Seong-je Cho ORCID: orcid.org/0000-0001-9917-0429¹,
Sangchul Han² &
…
Ilsun You³

344 Accesses
5 Citations
Explore all metrics

Abstract

Software filtering systems can be employed to detect and filter out pirated or counterfeit software on the Web sites and peer-to-peer networks. They determine whether a suspicious program is legal or not by comparing it with original programs in a database or in the market. To identify pirated or counterfeit software, software filtering systems need to measure software similarity when comparing a suspicious program with original ones. In this case, the comparison overhead might be very high because the suspicious program is compared with all programs in the database or market in the worst case. This paper proposes a software classification scheme for efficient software filtering systems. The scheme focuses specifically on the Windows portable executable files which have been prime targets for software pirates. The scheme extracts software characteristics from a suspicious program and classifies it into one of pre-defined categories quickly based on the characteristics. The suspicious program is compared only with the programs in the one of pre-defined categories in most cases; thus, the comparison overhead is reduced. We propose two classification methods. The first one extracts strings from GUI-related resources of a program and computes the relevance of the program to each category based on the pre-computed score of the strings. The second one extracts API call frequency from a program’s execution codes and uses Random Forest technique to classify the program. Experimental results show that the proposed scheme can classify programs effectively and can reduce the comparison overhead significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Proposed Effective Feature Extraction and Selection for Malicious Software Classification

An effective and intelligent Windows application filtering system using software similarity

Article 18 April 2015

Malicious Software Family Classification using Machine Learning Multi-class Classifiers

References

Bayer U, Comparetti PM, Hlauschek C, Kruegel C, Kirda E (2009) Scalable, behavior-based malware clustering. In: Proceedings of symposium on network and distributed system security (NDSS). The Internet Society, Feb 2009
Bergeron J, Debbabi M, Desharnais J, Erhioui MM, Lavoie Y, Tawbi N et al (2001) Static detection of malicious code in executable programs. Int J Requir Eng 2001(184–189):79
Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article MATH Google Scholar
Cadenas JM, Garrido MC, Martínez R, Bonissone PP (2012) Extending information processing in a fuzzy random forest ensemble. Soft Comput 16(5):845–861
Article Google Scholar
Chan PPF, Hui LCK, Yiu SM (2013) Heap graph based software theft detection. IEEE Trans Inf Forensics Secur 8(1):101–110
Article Google Scholar
Chen Y-W, Wang J-L, Cai Y-Q, Ji-Xiang D (2015) A method for Chinese text classification based on apparent semantics and latent aspects. J Ambient Intell Humaniz Comput 6(4):473–480
Article Google Scholar
Dalla Preda M, Christodorescu M, Jha S, Debray S (2008) A semantics-based approach to malware detection. ACM Trans Program Lang Syst 30(5):1–54
Article MATH Google Scholar
Firdausi I, Lim C, Erwin A, Nugroho AS (2010) Analysis of machine learning techniques used in behavior-based malware detection. In: 2010 second international conference on advances in computing, control and telecommunication technologies (ACT). IEEE, pp 201–203
Gandotra E, Bansal D, Sofat S (2014) Malware analysis and classification: a survey. J Inf Secur 5:56–64
Gantz JF, Vavra T, Howard J, Rodolfo R, Lee R, Satidkanitkul A, Taori HN, Sharma R, Villate R, Florean A et al (2013) The dangerous world of counterfeit and pirated software. IDC White Paper
Gantz JF, Florean A, Lee R, Lim V, Sikdar B, Lakshmi SKS, Madhavan L, Nagappan M (2014) The link between pirated software and cyber security breaches. IDC White Paper
Gupta DL, Malviya AK, Singh S (2012) Performance analysis of classification tree learning algorithms. Int J Comput Appl 55(6) 39–44
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten HI (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1):10–18
Han KS, Kang B, Im EG (2011) Malware classification using instruction frequencies. In: Proceedings of the 2011 ACM symposium on research in applied computation (RACS). ACM, pp 298–300
Jang M, Kim D (2013) Filtering illegal android application based on feature information. In: Proceedings of the 2013 research in adaptive and convergent systems. ACM, pp 357–358
Kang SW, Shim H, Cho S, Park M, Han S (2014) A robust and efficient birthmark-based android application filtering system. In: Proceedings of the 2014 conference on research in adaptive and convergent systems. ACM, pp 253–257
Kawaguchi S, Garg PK, Matsushita M, Inoue K (2006) Mudablue: an automatic categorization system for open source repositories. J Syst Softw 79(7):939–953
Article Google Scholar
Keim DA, Oelke D, Rohrdantz C (2009) Analyzing document collections via context-aware term extraction. Springer, Berlin
Google Scholar
Kim Y, Park J, Cho S, Nah Y, Han S, Park M (2015) Machine learning-based software classification scheme for efficient program similarity analysis. In: Proceedings of the 2015 conference on research in adaptive and convergent systems. ACM, pp 114–118
Kim D, Kim Y, Cho S, Park M, Han S, Lee G, Hwang Y (2016) An effective and intelligent windows application filtering system using software similarity. Soft Comput 20(5):1821–1827
Article Google Scholar
Kolter JZ, Maloof MA (2006) Learning to detect and classify malicious executables in the wild. J Mach Learn Res 7:2721–2744
MathSciNet MATH Google Scholar
Lanzi A, Sharif M, Lee W (2009) K-tracer: a system for extracting kernel malware behavior. In: Proceedings of symposium on network and distributed system security (NDSS). The Internet Society, Feb 2009
Lee T, Mody JJ (2006) Behavioral classification. In: Proceedings of annual conference of the European Institute for Computer Antivirus Research (EICAR), pp 1–17, Apr 2006
Linn C, Debray S (2003) Obfuscation of executable code to improve resistance to static disassembly. In: Proceedings of the 10th ACM conference on computer and communications security. ACM, pp 290–299
Litvak M, Last M, Kandel A (2013) Degext: a language-independent keyphrase extractor. J Ambient Intell Humaniz Comput 4(3) 377–387
McMillan C, Linares-Vasquez M, Poshyvanyk D, Grechanik M (2011) Categorizing software applications for maintenance. In: Proceedings of the 27th IEEE international conference on software maintenance (ICSM 2011), Williamsburg, VA, USA, pp 343–352. IEEE, Sept 2011
Moser A, Kruegel C, Kirda E (2007) Limits of static analysis for malware detection. In: Twenty-third annual computer security applications conference, 2007. ACSAC 2007. IEEE, pp 421–430
Narudin FA, Feizollah A, Anuar NB, Gani A (2016) Evaluation of machine learning classifiers for mobile malware detection. Soft Comput 20(1):343–357
Article Google Scholar
Oshiro TM, Perez PS, Baranauskas JA (2012) How many trees in a random forest? In: MLDM. Springer, pp 154–168
Palmien F, Fiore U, Castiglionec A, De Santis A (2013) On the detection of card-sharing traffic through wavelet analysis and support vector machines. Appl Soft Comput 13(1):615–627
Article Google Scholar
Rieck K, Holz T, Willems C, Düssel P, Laskov P (2008) Learning and classification of malware behavior. In: Proceedings of conference on detection of intrusions and malware, and vulnerability assessment (DIMVA). Springer, pp 108–125
Rieck K, Trinius P, Willems C, Holz T (2011) Automatic analysis of malware behavior using machine learning. J Comput Secur 19(4):639–668
Article Google Scholar
Salton G, Wong A, Yang C-S (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620
Article MATH Google Scholar
Schultz MG, Eskin E, Zadok E, Stolfo SJ (2001) Data mining methods for detection of new malicious executables. In: Proceedings of IEEE symposium on security and privacy. IEEE, pp 38–49, May 2001
SourceForge. http://sourceforge.net
Sparck Jones K (1972) A statistical interpretation of term specificity and its application in retrieval. J Doc 28(1):11–21
Article Google Scholar
Suh GE, Lee JW, Zhang D, Devadas S (2004) Secure program execution via dynamic information flow tracking. In: ACM Sigplan Notices, vol 39. ACM, pp 85–96
Takçı H, Güngör T (2012) A high performance centroid-based classification approach for language identification. Pattern Recognit Lett 33(16):2077–2084
Article Google Scholar
Tian K, Revelle M, Poshyvanyk D (2009) Using latent Dirichlet allocation for automatic categorization of software. In: Proceedings of the 6th IEEE international working conference on mining software repositories (MSR’09), Vancouver, Canada. IEEE, pp 163–166, May
Ugurel S, Krovetz R, Giles CL (2002) What’s the code? Automatic classification of source code archives. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 632–638
Wang D, Zhang H (2013) Inverse-category-frequency based supervised term weighting schemes for text categorization. J Inf Sci Eng 29(2):209–225
Google Scholar
Willems C, Holz T, Freiling F (2007) Cwsandbox: towards automated dynamic binary analysis. IEEE Secur Priv 5(2):32–39
Article Google Scholar
Yang C-Z, Tu M-H (2012) Lacta: an enhanced automatic software categorization on the native code of android applications. In: Proceedings of the international multiconference of engineers and computer scientists (IMECS 2012), vol 1, Hong Kong, Mar 2012

Download references

Acknowledgments

The present research was funded by (1) Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (No. NRF-2015R1D1A1A02061946) and (2) the research fund of Dankook University (BK21 Plus) in 2014.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Dankook University, Yongin, 16890, Korea
Yesol Kim & Seong-je Cho
Department of Computer Engineering, Konkuk University, Chungju, 27478, Korea
Sangchul Han
Department of Information Security, Soonchunhyang University, Asan, 31538, Korea
Ilsun You

Authors

Yesol Kim
View author publications
You can also search for this author in PubMed Google Scholar
Seong-je Cho
View author publications
You can also search for this author in PubMed Google Scholar
Sangchul Han
View author publications
You can also search for this author in PubMed Google Scholar
Ilsun You
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Seong-je Cho.

Ethics declarations

Conflict of interest

The authors declare there is no conflict of interests regarding the publication of this paper.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, Y., Cho, Sj., Han, S. et al. A software classification scheme using binary-level characteristics for efficient software filtering. Soft Comput 22, 595–606 (2018). https://doi.org/10.1007/s00500-016-2357-x

Download citation

Published: 23 September 2016
Issue Date: January 2018
DOI: https://doi.org/10.1007/s00500-016-2357-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A software classification scheme using binary-level characteristics for efficient software filtering

Abstract

Access this article

Similar content being viewed by others

Proposed Effective Feature Extraction and Selection for Malicious Software Classification

An effective and intelligent Windows application filtering system using software similarity

Malicious Software Family Classification using Machine Learning Multi-class Classifiers

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A software classification scheme using binary-level characteristics for efficient software filtering

Abstract

Access this article

Similar content being viewed by others

Proposed Effective Feature Extraction and Selection for Malicious Software Classification

An effective and intelligent Windows application filtering system using software similarity

Malicious Software Family Classification using Machine Learning Multi-class Classifiers

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation