Effective android malware detection with a hybrid model based on deep autoencoder and convolutional neural network

  • Wei WangEmail author
  • Mengxue Zhao
  • Jigang Wang
Original Research


Android security incidents occurred frequently in recent years. To improve the accuracy and efficiency of large-scale Android malware detection, in this work, we propose a hybrid model based on deep autoencoder (DAE) and convolutional neural network (CNN). First, to improve the accuracy of malware detection, we reconstruct the high-dimensional features of Android applications (apps) and employ multiple CNN to detect Android malware. In the serial convolutional neural network architecture (CNN-S), we use Relu, a non-linear function, as the activation function to increase sparseness and “dropout” to prevent over-fitting. The convolutional layer and pooling layer are combined with the full-connection layer to enhance feature extraction capability. Under these conditions, CNN-S shows powerful ability in feature extraction and malware detection. Second, to reduce the training time, we use deep autoencoder as a pre-training method of CNN. With the combination, deep autoencoder and CNN model (DAE-CNN) can learn more flexible patterns in a short time. We conduct experiments on 10,000 benign apps and 13,000 malicious apps. CNN-S demonstrates a significant improvement compared with traditional machine learning methods in Android malware detection. In details, compared with SVM, the accuracy with the CNN-S model is improved by 5%, while the training time using DAE-CNN model is reduced by 83% compared with CNN-S model.


Deep learning Convolutional neural network Autoencoder Malware detection Android applications 



The work reported in this paper was supported in part by National Key R&D Program of China, under Grant 2017YFB0802805, in part by Shanghai Key Laboratory of Integrated Administration Technologies for Information Security, under Grant AGK2015002, in part by ZTE Corporation Foundation, under Grant K17L00190, in part by funds of Science and Technology on Electronic Information Control Laboratory, under Grant K16GY00040, in part by the Fundamental Research funds for the central Universities of China, under Grant K17JB00060 and K17JB00020, and in part by Natural Science Foundation of China, under Grant U1736114 and 61672092.


  1. Amos B, Turner H, White J (2013) Applying machine learning classifiers to dynamic android malware detection at scale. In: 9th international wireless communications and mobile computing conference (IWCMC), July 1–5, 2013, Sardinia, Italy, pp 1666–1671Google Scholar
  2. Atat R, Liu L, Chen H, Wu J, Li H, Yi Y (2017) Enabling cyber-physical communication in 5G cellular networks: challenges, spatial spectrum sensing, and cyber-security. IET Cyber-Phys Syst Theory Appl 2(1):49–54Google Scholar
  3. Bengio Y (2009) Learning deep architectures for AI. Foundations & Trends®. Mach Learn 2(1):1–127MathSciNetzbMATHGoogle Scholar
  4. Chang X, Yang Y (2017) Semisupervised feature analysis by mining correlations among multiple tasks. IEEE Trans Neural Netw Learn Syst 28(10):2294–2305MathSciNetGoogle Scholar
  5. Chang X, Ma Z, Yang Y, Zeng Z, Hauptmann AG (2017a) Bi-level semantic representation analysis for multimedia event detection. IEEE Trans Cybern 47(5):1180–1197Google Scholar
  6. Chang X, Yu YL, Yang Y, Xing EP (2017b) Semantic pooling for complex event analysis in untrimmed videos. IEEE Trans Pattern Anal Mach Intell 39(8):1617–1632Google Scholar
  7. Chen X, Li J, Huang X, Ma J, Lou W (2015) New publicly verifiable databases with efficient updates. IEEE Trans Dependable Secure Comput 12(5):546–556Google Scholar
  8. China Internet Security (2016) Research report. Accessed Dec 2017
  9. Enck W, Gilbert P, Han S, Tendulkar V, Chun B, Cox LP, Jung J, McDaniel P, Sheth AN (2014) TaintDroid: an information-flow tracking system for realtime privacy monitoring on smartphones. Acm Trans Comput Syst (TOCS) 32(2):1–29Google Scholar
  10. Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the 14th international conference on artificial intelligence and statistics (AISTATS), April 11–13, 2011, Ft. Lauderdale, FL, USA, pp 315–323Google Scholar
  11. Guan X, Wang X, Zhang X (2009) Fast intrusion detection based on a non-negative matrix factorization model. J Netw Comput Appl 32(1):31–44Google Scholar
  12. Gupta BB, Agrawal DP, Yamaguchi S (2016) Handbook of research on modern cryptographic solutions for computer and cyber Security, 1st edn. IGI Publishing, Hershey, PA. ISBN:1522501053 9781522501053Google Scholar
  13. Hamedani K, Liu L, Atat R, Wu J, Yi Y (2018) Reservoir computing meets smart grids: attack detection using delayed feedback networks. IEEE Trans Ind Inf 14(2):734–743Google Scholar
  14. Hinton GE, Zemel RS (1994) Autoencoders, minimum description length and Helmholtz free energy. Adv Neural Inf Process Syst (NIPS) 6:3–10Google Scholar
  15. Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012) Improving neural networks by preventing co-adaptation of feature detectors. CoRR 1207:0580Google Scholar
  16. Hinton GE, Osindero S, Teh YW (2014) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554MathSciNetzbMATHGoogle Scholar
  17. Huang HD, Yu CM, Kao HY (2017) R2-D2: color-inspired convolutional neural network (cnn)-based android malware detections. CoRR 1705:04448Google Scholar
  18. Ibtihal M, Driss EO, Hassan N (2017) Homomorphic encryption as a service for outsourced images in mobile cloud computing environment. Int J Cloud Appl Comput 7(2):27–40Google Scholar
  19. Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), October 25–29, 2014 Doha, Qatar, pp 1746–1751Google Scholar
  20. Klieber W, Flynn L, Bhosale A, Jia L, Bauer L (2014) Android taint flow analysis for app sets. In: Proceedings of the 3rd ACM SIGPLAN international workshop on the state of the art in Java program analysis, June 9–11, 2014, Edinburgh, United Kingdom, pp 1–6Google Scholar
  21. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324Google Scholar
  22. Lee K, Choi HO, Min SD, Lee J, Gupta B, Nam Y (2017) A comparative evaluation of atrial fibrillation detection methods in koreans based on optical recordings using a smartphone. IEEE Access 5:11437–11443Google Scholar
  23. Li Q, Li X (2015) Android malware detection based on static analysis of characteristic tree. In: 2015 International conference on cyber-enabled distributed computing and knowledge discovery (CyberC), September 17–19, 2015, Xi’an, China, pp 84–91Google Scholar
  24. Li P, Li J, Huang Z, Gao CZ, Chen WB, Chen K (2017a) Privacy-preserving outsourced classification in cloud computing. Cluster Comput.
  25. Li Z, Nie F, Chang X, Yang Y (2017b) Beyond trace ratio: weighted harmonic mean of trace ratios for multiclass discriminant analysis. IEEE Trans Knowl Data Eng 29(10):2100–2110Google Scholar
  26. Li J, Sun L, Yan Q, Li Z, Srisa-an W, Ye H (2018a) Significant permission identification for machine learning based android malware detection. IEEE Trans Industr Inf PP(99):1–1Google Scholar
  27. Li Y, Wang G, Nie L, Wang Q (2018b) Distance metric optimization driven convolutional neural network for age invariant face recognition. Pattern Recogn 75C:51–62Google Scholar
  28. Liu X, Liu J, Wang W, He Y, Zhang X (2018) Discovering and understanding android sensor usage behaviors with data flow analysis. World Wide Web 21(1):105–126Google Scholar
  29. Lu L, Li Z, Wu Z, Lee W, Jiang G (2012) CHEX: statically vetting android apps for component hijacking vulnerabilities. In: ACM conference on computer and communications security, October 16–18, 2012, Raleigh, NC, USA, pp 229–240Google Scholar
  30. Memos VA, Psannis KE, Ishibashi Y, Kim BG, Gupta B (2018) An efficient algorithm for media-based surveillance system (EAMSuS) in IoT smart city framework. Future Gener Comput Syst 83:619–628Google Scholar
  31. Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), June 21–24, 2010, Haifa, Israel, pp 807–814Google Scholar
  32. Nix R, Zhang J (2017) Classification of Android apps and malware using deep neural networks. Int Jt Conf Neural Netw (IJCNN) May 14–19, 2017, Alaska, USA, pp 1871–1878Google Scholar
  33. Pandita R, Xiao X, Yang W, Enck W, Xie T (2013) WHYPER: towards automating risk assessment of mobile applications. Usenix Conf Secur 2013:527–542Google Scholar
  34. Pedregosa F, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V (2011) Scikit-learn: machine learning in Python. Mach Learn Res 12(10):2825–2830MathSciNetzbMATHGoogle Scholar
  35. Rastogi S, Bhushan K, Gupta BB (2016) Android applications repackaging detection techniques for smartphone devices. Procedia Comput Sci 78:26–32Google Scholar
  36. Sarma BP, Li N, Gates C, Potharaju R, Nita-Rotaru C, Molloy I (2012) Android permissions: a perspective combining risks and benefits. In: ACM symposium on access control models and technologies. June 21–23, 2017, Indianapolis, IN, USA, pp 13–22Google Scholar
  37. Shabtai A, Fledel Y, Elovici Y (2010) Automated static code analysis for classifying android applications using machine learning. In: 2010 international conference on computational intelligence and security (CIS), December 11–14, 2010, Nanning, Guangxi, China, pp 329–333Google Scholar
  38. Shen C, Chen Y, Guan X, Maxion Y (2017) Pattern-growth based mining mouse-interaction behavior for an active user authentication system. IEEE Trans Dependable Secur Comput PP(99):1–1Google Scholar
  39. Shen J, Zhou T, Chen X, Li J, Susilo W (2018a) Anonymous and traceable group data sharing in cloud computing. IEEE Trans Inf Forensics Secur 13(4):912–925Google Scholar
  40. Shen C, Li Y, Chen Y, Guan X, Maxion R (2018b) Performance analysis of multi-motion sensor behavior for active smartphone authentication. IEEE Trans Inf Forensics Secur 13(1):48–62Google Scholar
  41. Shen J, Gui Z, Ji S, Shen J, Tan H, Tang Y (2018c) Cloud-aided lightweight certificateless authentication protocol with anonymity for wireless body area networks. Netw Comput Appl 106:117–123Google Scholar
  42. Shen C, Chen Y, Guan X (2018d) Performance evaluation of implicit smartphones authentication via sensor-behavior analysis. Inf Sci 430:538–553Google Scholar
  43. Wang W, Battiti R (2006) Identifying intrusions in computer networks with principal component analysis. ARES 2006:270–279Google Scholar
  44. Wang W, Guan X, Zhang X (2008) Processing of massive audit data streams for real-time anomaly intrusion detection. Comput Commun 31(1):58–72Google Scholar
  45. Wang W, Guyet T, Quiniou R, Cordier M, Masseglia F, Zhang X (2014a) Autonomic intrusion detection: adaptively detecting anomalies over unlabeled audit data streams in computer networks. Knowl Based Syst 70:103–117Google Scholar
  46. Wang W, Wang X, Feng D, Liu J, Han Z, Zhang X (2014b) Exploring permission-induced risk in android applications for malicious application detection. IEEE Trans Inf Forensics Secur 9(11):1869–1882Google Scholar
  47. Wang W, He Y, Liu J, Gombault S (2015) Constructing important features from massive network traffic for lightweight intrusion detection. IET Inf Secur 9(6):374–379Google Scholar
  48. Wang X, Wang W, He Y, Liu J, Han Z, Zhang X (2017) Characterizing android apps’ behavior for effective detection of malapps at large scale. Future Gener Comput Syst 75:30–45Google Scholar
  49. Wang Y, Zhu G, Shi Y (2018a) Transportation spherical watermarking. IEEE Trans Image Process 27(4):2063–2077MathSciNetzbMATHGoogle Scholar
  50. Wang H, Wang W, Cui Z, Zhou X, Zhao J, Li Y (2018b) A new dynamic firefly algorithm for demand estimation of water resources. Inf Sci 438:95–106MathSciNetGoogle Scholar
  51. Wang W, Li Y, Wang X, Liu J, Zhang X (2018c) Detecting android malicious apps and categorizing benign apps with ensemble of classifiers. Future Gener Comput Syst 78:987–994Google Scholar
  52. Wang W, Liu J, Pitsilis G, Zhang X (2018d) Abstracting massive data for lightweight intrusion detection in computer networks. Inf Sci 433–434:417–430MathSciNetGoogle Scholar
  53. Wu WC, Hung SH (2014) DroidDolphin: a dynamic android malware detection framework using big data and machine learning. In: Proceedings of the 2014 conference on research in adaptive and convergent systems (RACS), pp 247–252Google Scholar
  54. Wu J, Guo S, Li J, Zeng D (2016a) Big data meet green challenges: greening big data. IEEE Syst J 10(3):873–887Google Scholar
  55. Wu J, Guo S, Li J, Zeng D (2016b) Big data meet green challenges: big data toward green applications. IEEE Syst J 10(3):888–900Google Scholar
  56. Xie D, Lai X, Lei X, Fan L (2018) Cognitive multiuser energy harvesting decode-and-forward relaying system with direct links. IEEE Access 6:5596–5606Google Scholar
  57. Yerima SY, Sezer S, McWilliams G (2014) Analysis of Bayesian classification-based approaches for android malware detection. IET Inf Secur 8(1):25–36Google Scholar
  58. Yuan Z, Lu Y, Xue Y (2016) Droiddetector: android malware characterization and detection using deep learning. Tsinghua Sci Technol 21(1):114–123Google Scholar
  59. Zhang C, Liu C, Zhang X Almpanidis G (2017) An up-to-date comparison of state-of-the-art classification algorithms. Expert Syst Appl 82:128–150Google Scholar
  60. Zhou Y, Jiang X (2012) Dissecting android malware: characterization and evolution. In: IEEE symposium on security and privacy (SP), May 20–23, 2012, San Francisco, USA, pp 95–109Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Beijing Key Laboratory of Security and Privacy in Intelligent TransportationBeijing Jiaotong UniversityBeijingChina
  2. 2.Science and Technology on Electronic Information Control LaboratoryChengduChina
  3. 3.ZTE CorporationChengduChina

Personalised recommendations