Effective and Efficient Hybrid Android Malware Classification Using Pseudo-Label Stacked Auto-Encoder

Mahdavifar, Samaneh; Alhadidi, Dima; Ghorbani, Ali. A.

doi:10.1007/s10922-021-09634-4

Effective and Efficient Hybrid Android Malware Classification Using Pseudo-Label Stacked Auto-Encoder

Published: 02 November 2021

Volume 30, article number 22, (2022)
Cite this article

Journal of Network and Systems Management Aims and scope Submit manuscript

2431 Accesses
59 Citations
Explore all metrics

Abstract

Android has become the target of attackers because of its popularity. The detection of Android mobile malware has become increasingly important due to its significant threat. Supervised machine learning, which has been used to detect Android malware is far from perfect because it requires a significant amount of labeled data. Since labeled data is expensive and difficult to get while unlabeled data is abundant and cheap in this context, we resort to a semi-supervised learning technique, namely pseudo-label stacked auto-encoder (PLSAE), which involves training using a set of labeled and unlabeled instances. We use a hybrid approach of dynamic analysis and static analysis to craft feature vectors. We evaluate our proposed model on CICMalDroid2020, which includes 17,341 most recent samples of five different Android apps categories. After that, we compare the results with state-of-the-art techniques in terms of accuracy and efficiency. Experimental results show that our proposed framework outperforms other semi-supervised approaches and common machine learning algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AndyWar: an intelligent android malware detection using machine learning

Article 06 July 2023

Mobile-Sandbox: combining static and dynamic analysis with machine-learning techniques

Article 20 July 2014

Feature-Based Semi-supervised Learning to Detect Malware from Android

Notes

References

“Mobile OS market share \(\mid\) Statista ,” https://www.statista.com/statistics/266136/global-market-share-held-by-smartphone-operating-systems/, online; accessed 30 April 2019
Otoum, Y., Nayak, A.: As-ids: Anomaly and signature based ids for the internet of things. J. Netw. Syst. Manag. 29, 07 (2021)
Article Google Scholar
Afzal, S., Asim, M., Javed, A.R., Beg, M.O., Baker, T.: Urldeepdetect: a deep learning approach for detecting malicious urls using semantic vector models. J. Netw. Syst. Manag. 29(3), 21 (2021). https://doi.org/10.1007/s10922-021-09587-8
Article Google Scholar
Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., Rieck, K., Siemens, C.: DREBIN: effective and explainable detection of Android malware in your pocket. In: Network and Distributed System Security Symposium (NDSS) (2014)
Zhang, M., Duan, Y., Yin, H., Zhao, Z.: Semantics-aware Android malware classification using weighted contextual API dependency graphs. In: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security. ACM, pp. 1105–1116 (2014)
Wei, F., Li, Y., Roy, S., Ou, X., Zhou, W.: Deep ground truth analysis of current Android malware. In: International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, pp. 252–276 (2017)
Kang, H., Jang, J.-W., Mohaisen, A., Kim, H.K.: Detecting and classifying Android malware using static analysis along with creator information. Int. J. Distrib. Sens. N. 11(6), 479174 (2015)
Article Google Scholar
Kim, T., Kang, B., Rho, M., Sezer, S., Im, E.G.: A multimodal deep learning method for Android malware detection using various features. IEEE Trans. Inf. Forensics Secur. 14(3), 773–788 (2019)
Article Google Scholar
Hou, S., Saas, A., Ye, Y., Chen, L.: DroidDelver: an Android malware detection system using Deep Belief Network based on API call blocks. In: International Conference on Web-age Information Management. Springer, pp. 54–66 (2016)
Karbab, E.B., Debbabi, M., Derhab, A., Mouheb, D.: MalDozer: automatic framework for Android malware detection using deep learning. Digit. Invest. 24, S48–S59 (2018)
Article Google Scholar
Mahdavifar, S., Abdul Kadir, A.F., Fatemi, R., Alhadidi, D., Ghorbani, A.A.: Dynamic android malware category classification using semi-supervised deep learning. In: 2020 IEEE International Conference on Dependable, Autonomic and Secure Computing, International Conference on Pervasive Intelligence and Computing. International Conference on Cloud and Big Data Computing, International Conference on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), pp. 515–522 (2020)
Tam, K., Khan, S.J., Fattori, A., Cavallaro, L.: CopperDroid: automatic reconstruction of Android malware behaviors. In: Network and Distributed System Security Symposium (NDSS) (2015)
Yuan, Z., Lu, Y., Wang, Z., Xue, Y.: Droid-Sec: deep learning in Android malware detection. In: ACM SIGCOMM Comput. Commun. Rev., vol. 44, no. 4. ACM, pp. 371–372 (2014)
Su, X., Zhang, D., Li, W., Zhao, K.: A deep learning approach to Android malware feature learning and detection. In: Trustcom/BigDataSE/ISPA, 2016 IEEE. IEEE, pp. 244–251 (2016)
Nix, R., Zhang, J.: Classification of Android apps and malware using deep neural networks. IEEE International Joint Conference on Neural Networks (IJCNN). IEEE, pp. 1871–1878 (2017)
Google Scholar
Hsien-De Huang, T., Kao, H.-Y.: R2-d2: color-inspired Convolutional Neural Network (CNN)-based Android malware detections. In: 2018 IEEE International Conference on Big Data. IEEE, pp. 2633–2642 (2018)
Wang, W., Zhao, M., Wang, J.: Effective Android malware detection with a hybrid model based on deep autoencoder and convolutional neural network. J. Amb. Intel. Hum. Comp. 10(8), 3035–3043 (2018)
Article Google Scholar
Xiao, X., Zhang, S., Mercaldo, F., Hu, G., Sangaiah, A.K.: Android malware detection based on system call sequences and LSTM. Multimed. Tools Appl. 78(4), 3979–3999 (2019)
Article Google Scholar
Yen, Y.-S., Sun, H.-M.: An Android mutation malware detection based on deep learning using visualization of importance from codes. Microelectron. Reliab. 93, 109–114 (2019)
Article Google Scholar
Lu, T., Du, Y., Ouyang, L., Chen, Q., Wang, X.: Android malware detection based on a hybrid deep learning model. In: Secur. Commun. Netw., vol. 2020, pp. 1–11, 08 (2020)
Ma, S., Wang, S., Lo, D., Deng, R.H., Sun, C.: Active semi-supervised approach for checking app behavior against its description. In: IEEE 39th Annual Computer Software and Applications Conference, vol. 2. IEEE, pp. 179–184 (2015)
Chen, L., Zhang, M., Yang, C.-Y., Sahita, R.: Semi-supervised classification for dynamic Android malware detection. arXiv preprint arXiv:1704.05948 (2017)
Karbab, E.B., Debbabi, M., Alrabaee, S., Mouheb, D.: Dysign: dynamic fingerprinting for the automatic detection of android malware. In: Proceedings of the 11th International Conference on Malicious and Unwanted Software (MALWARE), pp. 1–8 (2016)
Alrabaee, S., Shirani, P., Wang, L., Debbabi, M.: Fossil: a resilient and efficient system for identifying foss functions in malware binaries. ACM Trans. Priv. Secur. 21(2), 1–34 (2018)
Article Google Scholar
Cai, H., Meng, N., Ryder, B., Yao, D.: DroidCat: effective android malware detection and categorization via app-level profiling. IEEE Trans. Inf. Forensics Secur. 14(6), 1455–1470 (2018)
Article Google Scholar
Mahdavifar, S., Ghorbani, A.A.: Application of deep learning to cybersecurity: a survey. Neurocomputing 347, 149–176 (2019)
Article Google Scholar
Voulodimos, A., Doulamis, N., Doulamis, A., Protopapadakis, E.: Deep learning for computer vision: a brief review. In: Comput. Intel. Neurosc., Vol. 2018 (2018)
Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2625–2634 (2015)
Yang, W., Liu, Q., Wang, S., Cui, Z., Chen, X., Chen, L., Zhang, N.: Down image recognition based on deep convolutional neural network. Inf. Process. Agric. 5(2), 246–252 (2018)
Google Scholar
Fitriah Abdul Kadir, A.: A detection framework for android financial malware. Ph.D. Dissertation, University of New Brunswick (2018)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning. ACM, pp. 160–167 (2008)
Min, S., Lee, B., Yoon, S.: Deep learning in bioinformatics. Brief Bioinform. 18(5), 851–869 (2017)
Google Scholar
Noda, K., Yamaguchi, Y., Nakadai, K., Okuno, H.G., Ogata, T.: Audio-visual speech recognition using deep learning. Appl. Intell. 42(4), 722–737 (2015)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Mahdavifar, S., Ghorbani, A.A.: Dennes: deep embedded neural network expert system for detecting cyber attacks. In: Neural Computing and Applications, pp. 1–28
“Introduction to semi-supervised learning with ladder networks,” http://rinuboney.github.io/2016/01/19/ladder-network.html/ (2016)
Nigam, K., Ghani, R.: Analyzing the effectiveness and applicability of co-training. Cikm 5, 3 (2000)
Google Scholar
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory, ser. COLT’ 98. New York, NY, USA: ACM, pp. 92–100 (1998). http://doi.acm.org/10.1145/279943.279962
Rosenberg, C., Hebert, M., Schneiderman, H.: Semi-supervised self-training of object detection models (2005)
Joachims, T.: Transductive inference for text classification using support vector machines. In: Proceedings of the 16th International Conference on Machine Learning, ser. ICML ’99. San Francisco, CA, USA. Morgan Kaufmann Publishers Inc., pp. 200–209 (1999)
Chapelle, O., Zien, A.: Semi-supervised classification by low density separation. In: AISTATS 2005. Max-Planck-Gesellschaft, pp. 57–64 (2005)
Blum, A., Lafferty, J., Rwebangira, M.R., Reddy, R.: Semi-supervised learning using randomized mincuts. In: Proceedings of the 21st International Conference on Machine Learning, ser. ICML ’04. ACM, New York, NY, p. 13 (2004)
Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using Gaussian fields and harmonic functions. In: Proceedings of the 20th International Conference on Machine Learning, ser. ICML’03. AAAI Press, pp. 912–919 (2003)
Ranzato, M.A., Szummer, M.: Semi-supervised learning of compact document representations with deep networks. In: Proceedings of the 25th International Conference on Machine Learning, ser. ICML ’08. ACM, New York, NY, pp. 792–799 (2008)
Lee, D.-H.: Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on challenges in representation learning. ICML Vol. 3, p. 2 (2013)
Rasmus, A., Berglund, M., Honkala, M., Valpola, H., Raiko, T.: Semi-supervised learning with ladder networks. Adv. Neural. Inf. Process. Syst. 28, 3546–3554 (2015)
Google Scholar
Sajjadi, M., Javanmardi, M., Tasdizen, T.: Regularization with stochastic transformations and perturbations for deep semi-supervised learning. CoRR, vol. abs/1606.04586 (2016)
Wu, W., Yu, Z., He, J.: A semi-supervised deep network embedding approach based on the neighborhood structure. Big Data Min. Anal. 2(3), 205–216 (2019)
Article Google Scholar
Contagio Mobile Malware Mini Dump (2019). http://contagiominidump.blogspot.ca/ online. Accessed 6 May 2019
Kadir, A.F.A., Stakhanova, N., Ghorbani, A.A.: An empirical analysis of Android banking malware. In: Protecting Mobile Networks and Devices: Challenges and Solutions, p. 209 (2016)
Abdul Kadir, A.F., Stakhanova, N., Ghorbani, A.: Android botnets: what URLs are telling us. In: Qiu, M., Xu, S., Yung, M., Zhang, H. (eds.) Network and System Security, pp. 78–91. Springer, Cham (2015)
Chapter Google Scholar
Kadir, A.F.A., Stakhanova, N., Ghorbani, A.A.: Understanding Android financial malware attacks: taxonomy, characterization, and challenges. J. Cybersecur. Mobil. 7(3), 1–52 (2018)
Google Scholar
Enck, W., Ongtang, M., McDaniel, P.: Understanding Android security. IEEE Secur. Priv. 7(1), 50–57 (2009)
Article Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Surendran, R., Thomas, T., Emmanuel, S.: On existence of common malicious system call codes in android malware families. IEEE Trans. Reliab. 70(1), 248–260 (2020)
Article Google Scholar
Malik, S., Khatter, K.: System call analysis of android malware families. Indian J. Sci. Technol. 9(21), 1–13 (2016)
Article Google Scholar
Vinod, P., Zemmari, A., Conti, M.: A machine learning based approach to detect malicious android apps using discriminant system calls. Futur. Gener. Comput. Syst. 94, 333–350 (2019)
Article Google Scholar

Download references

Acknowledgements

The authors would like to express their gratitude toward Dr. Lorenzo Cavallaro and Feargus Pendlebury (Systems Security Research Lab, King’s College London) for generously analyzing a large number of Android APKs in CopperDroid.

Author information

Authors and Affiliations

Canadian Institute for Cybersecurity (CIC), Faculty of Computer Science, University of New Brunswick, Fredericton, NB, Canada
Samaneh Mahdavifar & Ali. A. Ghorbani
School of Computer Science, University of Windsor, Windsor, ON, Canada
Dima Alhadidi

Authors

Samaneh Mahdavifar
View author publications
You can also search for this author in PubMed Google Scholar
Dima Alhadidi
View author publications
You can also search for this author in PubMed Google Scholar
Ali. A. Ghorbani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Samaneh Mahdavifar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mahdavifar, S., Alhadidi, D. & Ghorbani, A.A. Effective and Efficient Hybrid Android Malware Classification Using Pseudo-Label Stacked Auto-Encoder. J Netw Syst Manage 30, 22 (2022). https://doi.org/10.1007/s10922-021-09634-4

Download citation

Received: 02 January 2021
Revised: 30 August 2021
Accepted: 29 September 2021
Published: 02 November 2021
DOI: https://doi.org/10.1007/s10922-021-09634-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effective and Efficient Hybrid Android Malware Classification Using Pseudo-Label Stacked Auto-Encoder

Abstract

Access this article

Similar content being viewed by others

AndyWar: an intelligent android malware detection using machine learning

Mobile-Sandbox: combining static and dynamic analysis with machine-learning techniques

Feature-Based Semi-supervised Learning to Detect Malware from Android

Notes

References

Acknowledgements