Skip to main content
Log in

Feature Selection for Malware Detection on the Android Platform Based on Differences of IDF Values

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Android is the mobile operating system most frequently targeted by malware in the smartphone ecosystem, with a market share significantly higher than its competitors and a much larger total number of applications. Detection of malware before being published on official or unofficial application markets is critically important due to the typical end users’ widespread security inadequacy. In this paper, a novel feature selection method is proposed along with an Android malware detection approach. The feature selection method proposed in this study makes use of permissions, API calls, and strings as features, which are statically extractable from the Android executables (APK files) and it can be used in a machine learning process with different algorithms to detect malware on the Android platform. A novel document frequencybased approach, namely Delta IDF, was designed and implemented for feature selection. Delta IDF was tested upon three universal benchmark datasets that contain Android malware samples and highly promising results were obtained by using several binary classification algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Zhauniarovich Y, Gadyatskaya O. Small changes, big changes: An updated view on the Android permission system. In Proc. the 19th International Symposium on Research in Attacks, Intrusions, and Defenses, September 2016, pp.346-367.

  2. Zhao Z, Wang J, Bai J. Malware detection method based on the control-flow construct feature of software. IET Information Security, 2014, 8(1): 18-24.

    Google Scholar 

  3. Arp D, Spreitzenbarth M, Hübner M, Gascon H, Rieck K, Siemens C. DREBIN: Effective and explainable detection of Android malware in your pocket. In Proc. the 21st Annual Network and Distributed System Security Symposium, February 2014.

  4. Wang X, Zhang D, Su X, Li W. Mlifdect: Android malware detection based on parallel machine learning and information fusion. Security and Communication Networks, 2017, 2017: Article No. 6451260.

  5. Zhou Y, Jiang X. Dissecting Android malware: Characterization and evolution. In Proc. the 33rd IEEE Symposium on Security and Privacy, May 2012, pp.95-109.

  6. Yerima S Y, Sezer S, Muttik I. Android malware detection using parallel machine learning classifiers. In Proc. the 8th International Conference on Next Generation Mobile Apps, Services and Technologies, September 2014, pp.37-42.

  7. Alatwi H A, Oh T, Fokoué E, Stackpole B. Android malware detection using category-based machine learning classifiers. In Proc. the 17th Annual Conference on Information Technology Education, September 2016, pp.54-59.

  8. Coronado-De-Alba L D, Rodríguez-Mota A, Escamilla-Ambrosio P J. Feature selection and ensemble of classifiers for Android malware detection. In Proc. the 8th IEEE Latin-American Conference on Communications, November 2016.

  9. Karbab E B, Debbabi M, Derhab A, Mouheb D. MalDozer: Automatic framework for Android malware detection using deep learning. Digital Investigation, 2018, 24: S48-S59.

    Google Scholar 

  10. Firdaus A, Anuar N, Faizal M et al. Bio-inspired computational paradigm for feature investigation and malware detection: Interactive analytics. Multimedia Tools and Applications, 2018, 77(14): 17519-17555.

    Google Scholar 

  11. Milosevic N, Dehghantanha A, Choo K K R. Machine learning aided Android malware classification. Computers & Electrical Engineering, 2017, 61: 266-274.

    Google Scholar 

  12. Lin C T, Wang N J, Xiao H, Eckert C. Feature selection and extraction for malware classification. Journal of Information Science and Engineering, 2015, 31(3): 965-992.

    Google Scholar 

  13. Suarez-Tangil G, Stringhini G. Eight years of rider measurement in the Android malware ecosystem: Evolution and lessons learned. arXiv:1801.08115. https://arxiv.org/abs/1801.08115, July 2020.

  14. Suarez-Tangil G, Tapiador J E, Peris-Lopez P, Alís B J. Dendroid: A text mining approach to analyzing and classifying code structures in Android malware families. Expert Systems with Applications, 2014, 41(4): 1104-1117.

    Google Scholar 

  15. Lindorfer M, Neugschwandtner M, Platzer C. MARVIN: Efficient and comprehensive mobile app classification through static and dynamic analysis. In Proc. the 39th IEEE Annual Computer Software and Applications Conference, July 2015, pp.422-433.

  16. Pektaş A, Acarman T. Malware classification based on API calls and behaviour analysis. IET Information Security, 2018, 12(2): 107-117.

    Google Scholar 

  17. Aafer Y, Du W, Yin H. DroidAPIMiner: Mining APIlevel features for robust malware detection in Android. In Proc. the 9th International ICST Conference on Security and Privacy in Communication Networks, September 2013, pp.86-103.

  18. Mariconti E, Onwuzurike L, Andriotis P, de Cristofaro E, Ross G, Stringhini G. MaMaDroid: Detecting Android malware by building Markov chains of behavioral models. arXiv:1612.04433, 2016. https://arxiv.org/abs/1612.04433, April 2018.

  19. Onwuzurike L, Almeida M, Mariconti E, Blackburn J, Stringhini G, de Cristofaro E. A family of droids: Analyzing behavioral model based Android malware detection via static and dynamic analysis. arXiv:1803.03448, 2018. https://arxiv.org/abs/1803.03448, October 2019.

  20. Hu D, Ma Z, Zhang X, Li P, Ye D, Ling B. The concept drift problem in Android malware detection and its solution. Security and Communication Networks, 2017, 2017: Article No. 4956386.

  21. Zhang X, Hu D, Fan Y, Yu K. A novel Android malware detection method based on Markov blanket. In Proc. the 1st IEEE International Conference on Data Science in Cyberspace, June 2016, pp.347-352.

  22. Chen J, Alalfi M H, Dean T R et al. Detecting Android malware using clone detection. Journal of Computer Science and Technology, 2015, 30(5): 942-956.

    Google Scholar 

  23. Li L, Li D, Bissyande T F et al. On locating malicious code in piggybacked Android Apps. Journal of Computer Science and Technology, 2017, 32(6): 1108-1124.

    Google Scholar 

  24. Lei T, Qin Z, Wang Z, Li Q, Ye D. EveDroid: Event-aware Android malware detection against model degrading for IoT devices. IEEE Internet of Things Journal, 2019, 6(4): 6668-6680.

    Google Scholar 

  25. Peiravian N, Zhu X. Machine learning for Android malware detection using permission and API calls. In Proc. the 25th IEEE International Conference on Tools with Artificial Intelligence, November 2013, pp.300-305.

  26. Wu S, Wang P, Li X, Zhang Y. Effective detection of Android malware based on the usage of data flow APIs and machine learning. Information and Software Technology, 2016, 75: 17-25.

    Google Scholar 

  27. Allix K, Bissyand´e T F, Klein J, le Traon Y. AndroZoo: Collecting millions of Android Apps for the research community. In Proc. the 13th IEEE/ACM Working Conference on Mining Software Repositories, May 2016, pp.468-471.

  28. Felt A P, Ha E, Egelman S, Haney A, Chin E, Wagner D. Android permissions: User attention, comprehension, and behavior. In Proc. the 8th Symposium on Usable Privacy and Security, July 2012, Article No. 3.

  29. Felt A P, Chin E, Hanna S, Song D, Wagner D. Android permissions demystified. In Proc. the 18th ACM Conference on Computer and Communications Security, October 2011, pp.627-637.

  30. Firdausi I, Lim C, Erwin A, Nugroho A S. Analysis of machine learning techniques used in behavior-based malware detection. In Proc. the 2nd International Conference on Advances in Computing, Control and Telecommunication Technologies, December 2010, pp.201-203.

  31. Moskovitch R, Feher C, Tzachar N, Berger E, Gitelman M, Dolev S, Elovici Y. Unknown malcode detection using OPCODE representation. In Proc. the 1st European Conference on Intelligence and Security Informatics, December 2008, pp.204-215.

  32. Quinlan J R. Induction of decision trees. Machine Learning, 1986, 1(1): 81-106.

    Google Scholar 

  33. Witten I H, Frank E. Data Mining: Practical Machine Learning Tools and Techniques (2nd edition). Morgan Kaufmann, 2005.

  34. Larose D T. Discovering Knowledge in Data: An Introduction to Data Mining (2nd edition). Wiley, 2004.

  35. Breiman L. Random forests. Machine Learning, 2001, 45(1): 5-32.

    MATH  Google Scholar 

  36. Stefanowski J. The rough set based rule induction technique for classification problems. In Proc. the 6th European Congress on Intelligent Techniques and Soft Computing, September 1998, pp.109-113.

  37. Salzberg S. A nearest hyperrectangle learning method. Machine Learning, 1991, 6(3): 251-276.

    MathSciNet  Google Scholar 

  38. Cendrowska J. PRISM: An algorithm for inducing modular rules. International Journal of Man-Machine Studies, 1987, 27(4): 349-370.

    MATH  Google Scholar 

  39. Holte R C. Very simple classification rules perform well on most commonly used datasets. Machine Learning, 1993, 11: 63-91.

    MATH  Google Scholar 

  40. Platt J C. Fast training of support vector machines using sequential minimal optimization. In Advances in Kernel Methods, Schölkopf B, Christopher J C B, Smola A J (eds.), MIT Press, 1999, pp.185-208.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gökçer Peynirci.

Electronic supplementary material

ESM 1

(PDF 522 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Peynirci, G., Eminağaoğlu, M. & Karabulut, K. Feature Selection for Malware Detection on the Android Platform Based on Differences of IDF Values. J. Comput. Sci. Technol. 35, 946–962 (2020). https://doi.org/10.1007/s11390-020-9323-x

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-020-9323-x

Keywords

Navigation