SeqDroid: Obfuscated Android Malware Detection Using Stacked Convolutional and Recurrent Neural Networks

  • William Younghoo LeeEmail author
  • Joshua Saxe
  • Richard Harang
Part of the Advanced Sciences and Technologies for Security Applications book series (ASTSA)


To evade detection, attackers usually obfuscate malicious Android applications. These malicious applications often have randomly generated application IDs or package names, and they are also often signed with randomly created certificates. Conventional machine learning models for detecting such malware are neither robust enough nor scalable to the volume of Android applications that are being produced on a daily basis. Recurrent neural networks (RNN) and convolutional neural networks (CNN) have been applied to identify malware by learning patterns in sequence data. We propose a novel malware classification method for malicious Android applications using stacked RNNs and CNNs so that our model learns the generalized correlation between obfuscated string patterns from an application’s package name and the certificate owner name. The model extracts machine learning features using embedding and gated recurrent units (GRU), and an additional CNN unit further optimizes the feature extraction process. Our experiments demonstrate that our approach outperforms Ngram-based models and that our feature extraction method is robust to obfuscation and sufficiently lightweight for Android devices.


Android Malware classification Convolutional neural network Recurrent neural network 


  1. 1.
    Arp D, Spreitzenbarth M, Hubner M, Gascon H, Rieck K (2014) Drebin: effective and explainable detection of Android malware in your pocket. In: Network and distributed system security symposiumGoogle Scholar
  2. 2.
    Zhou Y, Jiang X (2016) Dissecting android malware: characterization and evolution. In: Proceedings of the 2012 IEEE symposium on security and privacyGoogle Scholar
  3. 3.
    Billah Karbab E, Debbabi M, Derhab A, Mouheb D, Android malware detection using deep learning on API method sequences. arXiv preprint, arXiv:1712.08996Google Scholar
  4. 4.
    Peiravian N, Zhu X (2013) Machine learning for android malware detection using permission and API calls. Int Conf Tools Artif Intell:300–305Google Scholar
  5. 5.
    Chen W, Aspinall D, Gordon A, Sutton C, Muttik I (2016) More semantics more robust: improving android malware classifiers. In: Proceedings of the 9th ACM conference on security & privacy in wireless and mobile networks, pp 18–20Google Scholar
  6. 6.
    WangEmail W, Wang Z (2018) Effective android malware detection with a hybrid model based on deep autoencoder and convolutional neural network. J Ambient Intell Humaniz Comput: 1–19Google Scholar
  7. 7.
  8. 8.
  9. 9.
  10. 10.
    Suarez-Tangil G, Kumar Dash S, Ahmadi M, Kinder J, Giacinto G, Cavallaro L (2017) DroidSieve: fast and accurate classification of obfuscated android malware. In: Proceedings of the seventh ACM on conference on data and application security and privacy. ACM, New York, pp 309–320Google Scholar
  11. 11.
  12. 12.
    Aafer Y, Du W, Yin H (2013) Droidapiminer: mining api-level features for robust malware detection in android. In: International conference on security and privacy in communication systems, pp 86–103Google Scholar
  13. 13.
    Huda S, Abawajy J, Alazab M, Abdollalihian M, Islam R, Yearwood J (2016) Hybrids of support vector machine wrapper and filter based framework for malware detection. Futur Gener Comput Syst 55:376–390CrossRefGoogle Scholar
  14. 14.
    Kumar Dash S, Suarez-Tangil G, Khan S (2016) DroidScribe: classifying android malware based on runtime behavior. In: IEEE security and privacy workshops. Conference Publishing Services, IEEE Computer Society, Los Alamitos, pp 252–261Google Scholar
  15. 15.
    Chaba S, Kumar R, Pant R, Dave M, Malware detection approach for android systems using system call logs. arXiv preprint, arXiv:1709.08805Google Scholar
  16. 16.
    Rhodea M, Burnapa P, Jonesb K, Early-stage malware prediction using recurrent neural networks. arXiv preprint, arXiv:1708.03513Google Scholar
  17. 17.
    Malik S, Khatter K (2016) System call analysis of android malware families. Indian J Sci Technol 9Google Scholar
  18. 18.
    Ahmed F, Hameed H, Shafiq MZ, Farooq M (2009) Using spatio-temporal information in API calls with machine learning algorithms for malware detection. In: Proceedings of the 2nd ACM workshop on security and artificial intelligence. ACM, New York, pp 55–62Google Scholar
  19. 19.
    Tian R, Islam R, Batten L, Versteeg S (2010) Differentiating malware from cleanware using behavioural analysis. In: International conference on malicious and unwanted softwareGoogle Scholar
  20. 20.
    Saxe J, Berlin K, eXpose: a character-level convolutional neural network with embeddings for detecting malicious URLs, file paths and registry keys. arXiv preprint, arXiv:1702.08568Google Scholar
  21. 21.
    Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRefGoogle Scholar
  22. 22.
    Cho K, Merrienboer Bv, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y, Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint, arXiv:1406.1078Google Scholar
  23. 23.
    Vinayakumar R, Soman KP, Poornachandran P (2017) Deep android malware detection and classification. In: International conference on advances in computing, communications and informaticsGoogle Scholar
  24. 24.
    Ioffe S, Szegedy C, Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint, arXiv:1502.03167Google Scholar
  25. 25.
    Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958MathSciNetzbMATHGoogle Scholar
  26. 26.
  27. 27.
    Kingma DP, Ba J, Adam: a method for stochastic optimization. arXiv preprint, arXiv:1412.6980Google Scholar
  28. 28.
  29. 29.
    Liu K, Li Y, Xu N, Learn to combine modalities in multimodal deep learning. arXiv preprint, arXiv:1805.11730Google Scholar
  30. 30.
    Rieck K, Trinius P, Willems C, Holz T (2011) Automatic analysis of malware behavior using machine learning. J Comp Secur 19:639–668CrossRefGoogle Scholar
  31. 31.
    Alazab M (2015) Profiling and classifying the behaviour of malicious codes. J Syst Softw, Elsevier 100:91–102CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • William Younghoo Lee
    • 1
    Email author
  • Joshua Saxe
    • 1
  • Richard Harang
    • 1
  1. 1.Sophos Group PLCAbingdonUK

Personalised recommendations