Advertisement

Springer Nature is making Coronavirus research free. View research | View latest news | Sign up for updates

SysDroid: a dynamic ML-based android malware analyzer using system call traces

  • 46 Accesses

Abstract

Android is a popular open-source operating system highly susceptible to malware attacks. Researchers have developed machine learning models, learned from attributes extracted using static/dynamic approaches to identify malicious applications. However, such models suffer from low detection accuracy, due to the presence of noisy attributes, extracted from conventional feature selection algorithms. Hence, in this paper, a new feature selection mechanism known as selection of relevant attributes for improving locally extracted features using classical feature selectors (SAILS), is proposed. SAILS, targets on discovering prominent system calls from applications, and is built on the top of conventional feature selection methods, such as mutual information, distinguishing feature selector and Galavotti–Sebastiani–Simi. These classical attribute selection methods are used as local feature selectors. Besides, a novel global feature selection method known as, weighted feature selection is proposed. Comprehensive analysis of the proposed feature selectors, is conducted with the traditional methods. SAILS results in improved values for evaluation metrics, compared to the conventional feature selection algorithms for distinct machine learning models, developed using Logistic Regression, CART, Random Forest, XGBoost and Deep Neural Networks. Our evaluations observe accuracies ranging between 95 and 99% for dropout rate and learning rate in the range 0.1–0.8 and 0.001–0.2, respectively. Finally, the security evaluation of malware classifiers on adversarial examples are thoroughly investigated. A decline in accuracy with adversarial examples is observed. Also, SAILS recall rate of classifier subjected to such examples estimate in the range of 24.79–92.2%. However, prior to the attack, the true positive rate obtained by the classifier is reported between 95.2 and 99.79%. The results suggest that the hackers can bypass detection, by discovering the classifier blind spots, on augmenting a small number of legitimate attributes.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

References

  1. 1.

    Aafer, Y., Du, W., Yin, H.: Droidapiminer: mining api-level features for robust malware detection in android. In: International Conference on Security and Privacy in Communication Systems, pp. 86–103. Springer, Berlin (2013)

  2. 2.

    Afonso, V.M., de Amorim, M.F., Grégio, A.R.A., Junquera, G.B., de Geus, P.L.: Identifying android malware using dynamically obtained features. J. Comput. Virol. Hacking Tech. 11(1), 9–17 (2015)

  3. 3.

    Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., Rieck, K., Siemens, C.: Drebin: effective and explainable detection of android malware in your pocket. Ndss 14, 23–26 (2014)

  4. 4.

    Arshad, S., Shah, M.A., Wahid, A., Mehmood, A., Song, H., Yu, H.: Samadroid: a novel 3-level hybrid malware detection model for android operating system. IEEE Access 6, 4321–4339 (2018)

  5. 5.

    Bhandari, S., Panihar, R., Naval, S., Laxmi, V., Zemmari, A., Gaur, M.S.: Sword: semantic aware android malware detector. J. Inf. Secur. Appl. 42, 46–56 (2018)

  6. 6.

    Biggio, B., Corona, I., Maiorca, D., Nelson, B., Šrndić, N., Laskov, P., Giacinto, G., Roli, F.: Evasion attacks against machine learning at test time. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 387–402. Springer, Berlin (2013)

  7. 7.

    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

  8. 8.

    Burguera, I., Zurutuza, U., Nadjm-Tehrani, S.: Crowdroid: behavior-based malware detection system for android. In: Proceedings of the 1st ACM Workshop on Security and Privacy in Smartphones and Mobile Devices, pp. 15–26. ACM (2011)

  9. 9.

    Cyber security facts and statistics for 2019. https://us.norton.com/internetsecurity-emerging-threats-10-facts-/about-todays-cybersecurity-landscape-that/-you-should-know.html (2019). Accessed 10 Aug 2019

  10. 10.

    Cao, Y., Yang, J.: Towards making systems forget with machine unlearning. In: 2015 IEEE Symposium on Security and Privacy, pp. 463–480. IEEE (2015)

  11. 11.

    Chen, L., Hou, S., Ye, Y., Chen, L.: An adversarial machine learning model against android malware evasion attacks. In: Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint Conference on Web and Big Data, pp. 43–55. Springer, Berlin (2017)

  12. 12.

    Chen, S., Xue, M., Fan, L., Hao, S., Xu, L., Zhu, H., Li, B.: Automated poisoning attacks and defenses in malware detection systems: an adversarial machine learning approach. Comput. Secur. 73, 326–344 (2018)

  13. 13.

    Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIQKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM (2016)

  14. 14.

    Grosse, K., Papernot, N., Manoharan, P., Backes, M., McDaniel, P.: Adversarial examples for malware detection. In: European Symposium on Research in Computer Security, pp. 62–79. Springer, Berlin (2017)

  15. 15.

    Han, W., Xue, J., Wang, Y., Huang, L., Kong, Z., Mao, L.: Maldae: detecting and explaining malware based on correlation and fusion of static and dynamic characteristics. Comput. Secur. 83, 208–233 (2019)

  16. 16.

    Hosmer Jr., D.W., Lemeshow, S., Sturdivant, R.X.: Applied Logistic Regression, vol. 398. Wiley, New York (2013)

  17. 17.

    Hou, S., Saas, A., Chen, L., Ye, Y.: Deep4maldroid: a deep learning framework for android malware detection based on linux kernel system call graphs. In: 2016 IEEE/WIC/ACM International Conference on Web Intelligence Workshops (WIW), pp. 104–111. IEEE (2016)

  18. 18.

    Hou, S., Saas, A., Ye, Y., Chen, L.: Droiddelver: an android malware detection system using deep belief network based on API call blocks. In: International Conference on Web-Age Information Management, pp. 54–66. Springer, Berlin (2016)

  19. 19.

    Hou, S., Ye, Y., Song, Y., Abdulhayoglu, M.: Hindroid: an intelligent android malware detection system based on structured heterogeneous information network. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1507–1515. ACM (2017)

  20. 20.

    Ishibashi, H., Hihara, S., Iriki, A.: Acquisition and development of monkey tool-use: behavioral and kinematic analyses. Can. J. Physiol. Pharmacol. 78(11), 958–966 (2000)

  21. 21.

    Largeron, C., Moulin, C., Géry, M.: Entropy based feature selection for text categorization. In: Proceedings of the 2011 ACM Symposium on Applied Computing, pp. 924–928. ACM (2011)

  22. 22.

    Mobile malware evolution 2019. https://securelist.com/mobile-malware-evolution-2018/89689/ (2019). Accessed 10 Aug 2019

  23. 23.

    Michael, S., Florian, E., Thomas, S., Felix, C.F., Hoffmann, J.: Mobilesandbox: looking deeper into android applications. In: Proceedings of the 28th International ACM Symposium on Applied Computing (SAC) (2013)

  24. 24.

    Naway, A., Li, Y.: A review on the use of deep learning in android malware detection. arXiv preprint arXiv:1812.10360 (2018)

  25. 25.

    Roundy, K.A., Miller, B.P.: Hybrid analysis and control of malware. In: International Workshop on Recent Advances in Intrusion Detection, pp. 317–338. Springer, Berlin (2010)

  26. 26.

    Safavian, S.R., Landgrebe, D.: A survey of decision tree classifier methodology. IEEE Trans. Syst. Man Cybern. 21(3), 660–674 (1991)

  27. 27.

    Santos, I., Penya, Y.K., Devesa, J., Bringas, P.G.: N-grams-based file signatures for malware detection. ICEIS 2(9), 317–320 (2009)

  28. 28.

    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

  29. 29.

    Suciu, O., Marginean, R., Kaya, Y., Daume III, H., Dumitras, T.: When does machine learning FAIL? Generalized transferability for evasion and poisoning attacks. In: 27th USENIX Security Symposium (USENIX Security 18), pp. 1299–1316 (2018)

  30. 30.

    Tong, F., Yan, Z.: A hybrid approach of mobile malware detection in android. J. Parallel Distrib. Comput. 103, 22–31 (2017)

  31. 31.

    Virustotal. http://virustotal.com/ (2019). Accessed 10 Aug 2019

  32. 32.

    Wang, W., Zhao, M., Wang, J.: Effective android malware detection with a hybrid model based on deep autoencoder and convolutional neural network. J. Ambient Intell. Humaniz. Comput. 10(8), 3035–3043 (2019)

  33. 33.

    Yang, Y., Shen, H.T., Ma, Z., Huang, Z., Zhou, X.: L2, 1-norm regularized discriminative feature selection for unsupervised. In: Twenty-Second International Joint Conference on Artificial Intelligence (2011)

  34. 34.

    Yuan, Z., Lu, Y., Xue, Y.: Droiddetector: android malware characterization and detection using deep learning. Tsinghua Sci. Technol. 21(1), 114–123 (2016)

  35. 35.

    Zhang, J., Zhang, K., Qin, Z., Yin, H., Wu, Q.: Sensitive system calls based packed malware variants detection using principal component initialized multilayers neural networks. Cybersecurity 1(1), 10 (2018)

  36. 36.

    Zheng, Z., Wu, X., Srihari, R.: Feature selection for text categorization on imbalanced data. ACM SIGKDD Explor. Newslett. 6(1), 80–89 (2004)

  37. 37.

    Zhou, Y., Jiang, X.: Dissecting android malware: characterization and evolution. In: 2012 IEEE Symposium on Security and Privacy, pp. 95–109. IEEE (2012)

  38. 38.

    9apps: Andriod app website. https://www.9apps.com/ (2019). Accessed 10 Aug 2019

Download references

Author information

Correspondence to Shojafar Mohammad.

Ethics declarations

Conflicts of interest

There is no conflict of interest for the paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

In this section, different scores of N-gram for GSS, WFS, and MI are presented to illustrate the malware and benign samples for various features (see the Figs. 1112 and 13).

Fig. 11
figure11

Score difference of N-grams for GSS\(^*\)

Fig. 12
figure12

Score difference of N-grams for WFS\(^*\)

Fig. 13
figure13

Score difference of N-grams for MI\(^*\)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ananya, A., Aswathy, A., Amal, T.R. et al. SysDroid: a dynamic ML-based android malware analyzer using system call traces. Cluster Comput (2020). https://doi.org/10.1007/s10586-019-03045-6

Download citation

Keywords

  • Android malware
  • Machine learning (ML)
  • Deep learning (DL)
  • Feature selection
  • Adversarial machine learning (AML)
  • Attacks