Skip to main content
Log in

Effective and Efficient Hybrid Android Malware Classification Using Pseudo-Label Stacked Auto-Encoder

  • Published:
Journal of Network and Systems Management Aims and scope Submit manuscript

Abstract

Android has become the target of attackers because of its popularity. The detection of Android mobile malware has become increasingly important due to its significant threat. Supervised machine learning, which has been used to detect Android malware is far from perfect because it requires a significant amount of labeled data. Since labeled data is expensive and difficult to get while unlabeled data is abundant and cheap in this context, we resort to a semi-supervised learning technique, namely pseudo-label stacked auto-encoder (PLSAE), which involves training using a set of labeled and unlabeled instances. We use a hybrid approach of dynamic analysis and static analysis to craft feature vectors. We evaluate our proposed model on CICMalDroid2020, which includes 17,341 most recent samples of five different Android apps categories. After that, we compare the results with state-of-the-art techniques in terms of accuracy and efficiency. Experimental results show that our proposed framework outperforms other semi-supervised approaches and common machine learning algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. https://www.unb.ca/cic/datasets/maldroid-2020.html

  2. https://www.unb.ca/cic/datasets/maldroid-2020.html

References

  1. “Mobile OS market share \(\mid\) Statista ,” https://www.statista.com/statistics/266136/global-market-share-held-by-smartphone-operating-systems/, online; accessed 30 April 2019

  2. Otoum, Y., Nayak, A.: As-ids: Anomaly and signature based ids for the internet of things. J. Netw. Syst. Manag. 29, 07 (2021)

    Article  Google Scholar 

  3. Afzal, S., Asim, M., Javed, A.R., Beg, M.O., Baker, T.: Urldeepdetect: a deep learning approach for detecting malicious urls using semantic vector models. J. Netw. Syst. Manag. 29(3), 21 (2021). https://doi.org/10.1007/s10922-021-09587-8

    Article  Google Scholar 

  4. Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., Rieck, K., Siemens, C.: DREBIN: effective and explainable detection of Android malware in your pocket. In: Network and Distributed System Security Symposium (NDSS) (2014)

  5. Zhang, M., Duan, Y., Yin, H., Zhao, Z.: Semantics-aware Android malware classification using weighted contextual API dependency graphs. In: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security. ACM, pp. 1105–1116 (2014)

  6. Wei, F., Li, Y., Roy, S., Ou, X., Zhou, W.: Deep ground truth analysis of current Android malware. In: International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, pp. 252–276 (2017)

  7. Kang, H., Jang, J.-W., Mohaisen, A., Kim, H.K.: Detecting and classifying Android malware using static analysis along with creator information. Int. J. Distrib. Sens. N. 11(6), 479174 (2015)

    Article  Google Scholar 

  8. Kim, T., Kang, B., Rho, M., Sezer, S., Im, E.G.: A multimodal deep learning method for Android malware detection using various features. IEEE Trans. Inf. Forensics Secur. 14(3), 773–788 (2019)

    Article  Google Scholar 

  9. Hou, S., Saas, A., Ye, Y., Chen, L.: DroidDelver: an Android malware detection system using Deep Belief Network based on API call blocks. In: International Conference on Web-age Information Management. Springer, pp. 54–66 (2016)

  10. Karbab, E.B., Debbabi, M., Derhab, A., Mouheb, D.: MalDozer: automatic framework for Android malware detection using deep learning. Digit. Invest. 24, S48–S59 (2018)

    Article  Google Scholar 

  11. Mahdavifar, S., Abdul Kadir, A.F., Fatemi, R., Alhadidi, D., Ghorbani, A.A.: Dynamic android malware category classification using semi-supervised deep learning. In: 2020 IEEE International Conference on Dependable, Autonomic and Secure Computing, International Conference on Pervasive Intelligence and Computing. International Conference on Cloud and Big Data Computing, International Conference on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), pp. 515–522 (2020)

  12. Tam, K., Khan, S.J., Fattori, A., Cavallaro, L.: CopperDroid: automatic reconstruction of Android malware behaviors. In: Network and Distributed System Security Symposium (NDSS) (2015)

  13. Yuan, Z., Lu, Y., Wang, Z., Xue, Y.: Droid-Sec: deep learning in Android malware detection. In: ACM SIGCOMM Comput. Commun. Rev., vol. 44, no. 4. ACM, pp. 371–372 (2014)

  14. Su, X., Zhang, D., Li, W., Zhao, K.: A deep learning approach to Android malware feature learning and detection. In: Trustcom/BigDataSE/ISPA, 2016 IEEE. IEEE, pp. 244–251 (2016)

  15. Nix, R., Zhang, J.: Classification of Android apps and malware using deep neural networks. IEEE International Joint Conference on Neural Networks (IJCNN). IEEE, pp. 1871–1878 (2017)

    Google Scholar 

  16. Hsien-De Huang, T., Kao, H.-Y.: R2-d2: color-inspired Convolutional Neural Network (CNN)-based Android malware detections. In: 2018 IEEE International Conference on Big Data. IEEE, pp. 2633–2642 (2018)

  17. Wang, W., Zhao, M., Wang, J.: Effective Android malware detection with a hybrid model based on deep autoencoder and convolutional neural network. J. Amb. Intel. Hum. Comp. 10(8), 3035–3043 (2018)

    Article  Google Scholar 

  18. Xiao, X., Zhang, S., Mercaldo, F., Hu, G., Sangaiah, A.K.: Android malware detection based on system call sequences and LSTM. Multimed. Tools Appl. 78(4), 3979–3999 (2019)

    Article  Google Scholar 

  19. Yen, Y.-S., Sun, H.-M.: An Android mutation malware detection based on deep learning using visualization of importance from codes. Microelectron. Reliab. 93, 109–114 (2019)

    Article  Google Scholar 

  20. Lu, T., Du, Y., Ouyang, L., Chen, Q., Wang, X.: Android malware detection based on a hybrid deep learning model. In: Secur. Commun. Netw., vol. 2020, pp. 1–11, 08 (2020)

  21. Ma, S., Wang, S., Lo, D., Deng, R.H., Sun, C.: Active semi-supervised approach for checking app behavior against its description. In: IEEE 39th Annual Computer Software and Applications Conference, vol. 2. IEEE, pp. 179–184 (2015)

  22. Chen, L., Zhang, M., Yang, C.-Y., Sahita, R.: Semi-supervised classification for dynamic Android malware detection. arXiv preprint arXiv:1704.05948 (2017)

  23. Karbab, E.B., Debbabi, M., Alrabaee, S., Mouheb, D.: Dysign: dynamic fingerprinting for the automatic detection of android malware. In: Proceedings of the 11th International Conference on Malicious and Unwanted Software (MALWARE), pp. 1–8 (2016)

  24. Alrabaee, S., Shirani, P., Wang, L., Debbabi, M.: Fossil: a resilient and efficient system for identifying foss functions in malware binaries. ACM Trans. Priv. Secur. 21(2), 1–34 (2018)

    Article  Google Scholar 

  25. Cai, H., Meng, N., Ryder, B., Yao, D.: DroidCat: effective android malware detection and categorization via app-level profiling. IEEE Trans. Inf. Forensics Secur. 14(6), 1455–1470 (2018)

    Article  Google Scholar 

  26. Mahdavifar, S., Ghorbani, A.A.: Application of deep learning to cybersecurity: a survey. Neurocomputing 347, 149–176 (2019)

    Article  Google Scholar 

  27. Voulodimos, A., Doulamis, N., Doulamis, A., Protopapadakis, E.: Deep learning for computer vision: a brief review. In: Comput. Intel. Neurosc., Vol. 2018 (2018)

  28. Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2625–2634 (2015)

  29. Yang, W., Liu, Q., Wang, S., Cui, Z., Chen, X., Chen, L., Zhang, N.: Down image recognition based on deep convolutional neural network. Inf. Process. Agric. 5(2), 246–252 (2018)

    Google Scholar 

  30. Fitriah Abdul Kadir, A.: A detection framework for android financial malware. Ph.D. Dissertation, University of New Brunswick (2018)

  31. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)

    Article  Google Scholar 

  32. Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning. ACM, pp. 160–167 (2008)

  33. Min, S., Lee, B., Yoon, S.: Deep learning in bioinformatics. Brief Bioinform. 18(5), 851–869 (2017)

    Google Scholar 

  34. Noda, K., Yamaguchi, Y., Nakadai, K., Okuno, H.G., Ogata, T.: Audio-visual speech recognition using deep learning. Appl. Intell. 42(4), 722–737 (2015)

    Article  Google Scholar 

  35. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  36. Mahdavifar, S., Ghorbani, A.A.: Dennes: deep embedded neural network expert system for detecting cyber attacks. In: Neural Computing and Applications, pp. 1–28

  37. “Introduction to semi-supervised learning with ladder networks,” http://rinuboney.github.io/2016/01/19/ladder-network.html/ (2016)

  38. Nigam, K., Ghani, R.: Analyzing the effectiveness and applicability of co-training. Cikm 5, 3 (2000)

    Google Scholar 

  39. Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory, ser. COLT’ 98. New York, NY, USA: ACM, pp. 92–100 (1998). http://doi.acm.org/10.1145/279943.279962

  40. Rosenberg, C., Hebert, M., Schneiderman, H.: Semi-supervised self-training of object detection models (2005)

  41. Joachims, T.: Transductive inference for text classification using support vector machines. In: Proceedings of the 16th International Conference on Machine Learning, ser. ICML ’99. San Francisco, CA, USA. Morgan Kaufmann Publishers Inc., pp. 200–209 (1999)

  42. Chapelle, O., Zien, A.: Semi-supervised classification by low density separation. In: AISTATS 2005. Max-Planck-Gesellschaft, pp. 57–64 (2005)

  43. Blum, A., Lafferty, J., Rwebangira, M.R., Reddy, R.: Semi-supervised learning using randomized mincuts. In: Proceedings of the 21st International Conference on Machine Learning, ser. ICML ’04. ACM, New York, NY, p. 13 (2004)

  44. Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using Gaussian fields and harmonic functions. In: Proceedings of the 20th International Conference on Machine Learning, ser. ICML’03. AAAI Press, pp. 912–919 (2003)

  45. Ranzato, M.A., Szummer, M.: Semi-supervised learning of compact document representations with deep networks. In: Proceedings of the 25th International Conference on Machine Learning, ser. ICML ’08. ACM, New York, NY, pp. 792–799 (2008)

  46. Lee, D.-H.: Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on challenges in representation learning. ICML Vol. 3, p. 2 (2013)

  47. Rasmus, A., Berglund, M., Honkala, M., Valpola, H., Raiko, T.: Semi-supervised learning with ladder networks. Adv. Neural. Inf. Process. Syst. 28, 3546–3554 (2015)

    Google Scholar 

  48. Sajjadi, M., Javanmardi, M., Tasdizen, T.: Regularization with stochastic transformations and perturbations for deep semi-supervised learning. CoRR, vol. abs/1606.04586 (2016)

  49. Wu, W., Yu, Z., He, J.: A semi-supervised deep network embedding approach based on the neighborhood structure. Big Data Min. Anal. 2(3), 205–216 (2019)

    Article  Google Scholar 

  50. Contagio Mobile Malware Mini Dump (2019). http://contagiominidump.blogspot.ca/ online. Accessed 6 May 2019

  51. Kadir, A.F.A., Stakhanova, N., Ghorbani, A.A.: An empirical analysis of Android banking malware. In: Protecting Mobile Networks and Devices: Challenges and Solutions, p. 209 (2016)

  52. Abdul Kadir, A.F., Stakhanova, N., Ghorbani, A.: Android botnets: what URLs are telling us. In: Qiu, M., Xu, S., Yung, M., Zhang, H. (eds.) Network and System Security, pp. 78–91. Springer, Cham (2015)

    Chapter  Google Scholar 

  53. Kadir, A.F.A., Stakhanova, N., Ghorbani, A.A.: Understanding Android financial malware attacks: taxonomy, characterization, and challenges. J. Cybersecur. Mobil. 7(3), 1–52 (2018)

    Google Scholar 

  54. Enck, W., Ongtang, M., McDaniel, P.: Understanding Android security. IEEE Secur. Priv. 7(1), 50–57 (2009)

    Article  Google Scholar 

  55. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  56. Surendran, R., Thomas, T., Emmanuel, S.: On existence of common malicious system call codes in android malware families. IEEE Trans. Reliab. 70(1), 248–260 (2020)

    Article  Google Scholar 

  57. Malik, S., Khatter, K.: System call analysis of android malware families. Indian J. Sci. Technol. 9(21), 1–13 (2016)

    Article  Google Scholar 

  58. Vinod, P., Zemmari, A., Conti, M.: A machine learning based approach to detect malicious android apps using discriminant system calls. Futur. Gener. Comput. Syst. 94, 333–350 (2019)

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to express their gratitude toward Dr. Lorenzo Cavallaro and Feargus Pendlebury (Systems Security Research Lab, King’s College London) for generously analyzing a large number of Android APKs in CopperDroid.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Samaneh Mahdavifar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

figure a
figure b
figure c
figure d

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mahdavifar, S., Alhadidi, D. & Ghorbani, A.A. Effective and Efficient Hybrid Android Malware Classification Using Pseudo-Label Stacked Auto-Encoder. J Netw Syst Manage 30, 22 (2022). https://doi.org/10.1007/s10922-021-09634-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10922-021-09634-4

Keywords

Navigation