Cluster Computing

, Volume 21, Issue 1, pp 569–588 | Cite as

Deep neural architectures for large scale android malware analysis

  • Mohammad NaumanEmail author
  • Tamleek Ali Tanveer
  • Sohail Khan
  • Toqeer Ali Syed


Android is arguably the most widely used mobile operating system in the world. Due to its widespead use and huge user base, it has attracted a lot of attention from the unsavory crowd of malware writers. Traditionally, techniques to counter such malicious software involved manually analyzing code and figuring out whether it was malicious or benign. However, due to the immense pace at which newer malware families are surfacing, such an approach is no longer feasible. Machine learning offers a way to tackle this issue of speed by automating the classification task. While several efforts have been made to use traditional machine learning techniques to Android malware detection, no reasonable effort has been made to utilize the newer, deep learning models in this domain. In this paper, we apply several deep learning models including fully connected, convolutional and recurrent neural networks as well as autoencoders and deep belief networks to detect Android malware from a large scale dataset of more than 55 GBs of Android malware. Further, we apply Bayesian machine learning to this problem domain to see how it fares with the deep learning based models while also providing insights into the dataset. We show that we are able to achieve better results using these models as compared to the state-of-the-art approaches. Our best model gets an F1 score of 0.986 with an AUC of 0.983 as compared to the existing best F1 score of 0.875 and AUC of 0.953.


Android Malware analysis Machine learning Deep neural networks Bayesian machine learning 



We would like to thank the maintainers of Drebin [2] the VirshShare site [58] for making their datasets available to us.The computation-intensive MCMC sampling and neural network training were made possible by the generous contribution of the Tesla K40c GPU by NVIDIA Corporation. The content of this paper is not necessarily endorsed by any of the funding agencies.


  1. 1.
    Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: large-scale machine learning on heterogeneous systems (2015).
  2. 2.
    Arp, D., Spreitzenbarth, M., Hübner, M., Gascon, H., Rieck, K., Siemens, C.: Drebin: effective and explainable detection of android malware in your pocket. In: Proceedings of the Annual Symposium on Network and Distributed System Security (NDSS) (2014)Google Scholar
  3. 3.
    Arzt, S., Rasthofer, S., Fritz, C., Bodden, E., Bartel, A., Klein, J., Le Traon, Y., Octeau, D., McDaniel, P.: FlowDroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps. In: ACM SIGPLAN Notices, vol. 49, pp. 259–269, ACM (2014)Google Scholar
  4. 4.
    Barber, D.: Bayesian Reasoning and Machine Learning. Cambridge University Press, Cambridge (2012)zbMATHGoogle Scholar
  5. 5.
    Barrera, D., Van Oorschot, P.: Secure software installation on smartphones. Secur. Priv. IEEE 9(3), 42–48 (2011)CrossRefGoogle Scholar
  6. 6.
    Bastien, F., Lamblin, P., Pascanu, R., Bergstra, J., Goodfellow, I., Bergeron, A., Bouchard, N., Warde-Farley, D., Bengio, Y.: Theano: new features and speed improvements. arXiv preprint arXiv:1211.5590 (2012)
  7. 7.
    Bedini, A.: HDF5 for Python.
  8. 8.
    Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)MathSciNetzbMATHGoogle Scholar
  9. 9.
    Biswas, A., Shapiro, V.: Approximate distance fields with non-vanishing gradients. Graph. Models 66(3), 133–159 (2004)CrossRefzbMATHGoogle Scholar
  10. 10.
    Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’2010, pp. 177–186. Springer (2010)Google Scholar
  11. 11.
    Box, G.E., Tiao, G.C.: Bayesian Inference in Statistical Analysis, vol. 40. Wiley, New York (2011)zbMATHGoogle Scholar
  12. 12.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062 (2014)
  13. 13.
    Chollet, F.: Keras: deep learning library for theano and tensorflow. (2015)Google Scholar
  14. 14.
    Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Gated feedback recurrent neural networks. arXiv preprint arXiv:1502.02367 (2015)
  15. 15.
    Dash, S.K., Suarez-Tangil, G., Khan, S., Tam, K., Ahmadi, M., Kinder, J., Cavallaro, L.: Droidscribe: classifying android malware based on runtime behavior. Mob. Secur. Technol. (MoST 2016) 7148, 1–12 (2016)Google Scholar
  16. 16.
    Date, P., Hendler, J.A., Carothers, C.D.: Design index for deep neural networks. Proc. Comput. Sci. 88, 131–138 (2016)CrossRefGoogle Scholar
  17. 17.
    Davis, B., Chen, H.: RetroSkeleton: retrofitting android apps. In: Proceedings of the 11th International Conference on Mobile Systems, Applications and Services (MobiSys’13), pp. 25–28 (2013)Google Scholar
  18. 18.
    Enck, W., Gilbert, P., Chun, B.G., Cox, L.P., Jung, J., McDaniel, P., Sheth, A.N.: TaintDroid: an information-flow tracking system for realtime privacy monitoring on smartphones. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (OSDI’10), pp. 1–6 (2010)Google Scholar
  19. 19.
    Enck, W., Ongtang, M., McDaniel, P.: On lightweight mobile phone application certification. In: Proceedings of the 16th ACM Conference on Computer and Communications Security (CCS’09), pp. 235–245. ACM (2009)Google Scholar
  20. 20.
    Erhan, D., Bengio, Y., Courville, A., Manzagol, P.A., Vincent, P., Bengio, S.: Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res. 11, 625–660 (2010)MathSciNetzbMATHGoogle Scholar
  21. 21.
    Franzke, B., Kosko, B.: Using noise to speed up markov chain monte carlo estimation. Proc. Comput. Sci. 53, 113–120 (2015)CrossRefGoogle Scholar
  22. 22.
    Fuchs, A., Chaudhuri, A., Foster, J.: SCanDroid: automated security certification of Android applications. Technical reports (2009)Google Scholar
  23. 23.
    Funahashi, K.I., Nakamura, Y.: Approximation of dynamical systems by continuous time recurrent neural networks. Neural Netw. 6(6), 801–806 (1993)CrossRefGoogle Scholar
  24. 24.
    Garcia, J., Hammad, M., Pedrood, B., Bagheri-Khaligh, A., Malek, S.: Obfuscation-resilient, efficient, and accurate detection and family identification of android malware. George Mason University, Technical reports (2015)Google Scholar
  25. 25.
  26. 26.
    Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B.: Bayesian Data Analysis, vol. 2. Taylor & Francis, New York (2014)zbMATHGoogle Scholar
  27. 27.
    Hastings, W.K.: Monte carlo sampling methods using markov chains and their applications. Biometrika 57(1), 97–109 (1970)MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    Hernández-Lobato, J.M., Adams, R.P.: Probabilistic backpropagation for scalable learning of bayesian neural networks. arXiv preprint arXiv:1502.05336 (2015)
  29. 29.
    Hinton, G.: A practical guide to training restricted boltzmann machines. Momentum 9(1), 926 (2010)Google Scholar
  30. 30.
    Hinton, G.E., Dayan, P., Frey, B.J., Neal, R.M.: The wake-sleep algorithm for unsupervised neural networks. Science 268(5214), 1158 (1995)CrossRefGoogle Scholar
  31. 31.
    Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  32. 32.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  33. 33.
    Homan, M.D., Gelman, A.: The no-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 15(1), 1593–1623 (2014)MathSciNetzbMATHGoogle Scholar
  34. 34.
    Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989)CrossRefGoogle Scholar
  35. 35.
    Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the 13th Annual ACM Symposium on Theory of Computing, pp. 604–613. ACM (1998)Google Scholar
  36. 36.
    Jolliffe, I.: Principal Component Analysis. Wiley Online Library (2002)Google Scholar
  37. 37.
    Karakida, R., Okada, M., Amari, S.I.: Dynamical analysis of contrastive divergence learning. Neural Netw. 79, 78–87 (2016)CrossRefGoogle Scholar
  38. 38.
    Kohonen, T.: Self-Organizing Maps, vol. 30. Springer, New York (2001)zbMATHGoogle Scholar
  39. 39.
    LeCun, Y., Bengio, Y.: Convolutional networks for images, speech, and time series. The Handb. Brain Theory Neural Netw. 3361(10), 1995 (1995)Google Scholar
  40. 40.
    LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)CrossRefGoogle Scholar
  41. 41.
    Long, M., Wang, J., Jordan, M.I.: Deep transfer learning with joint adaptation networks. arXiv preprint arXiv:1605.06636 (2016)
  42. 42.
    Mansfield-Devine, S.: Android architecture: attacking the weak points. Netw. Secur. 2012(10), 5–12 (2012)CrossRefGoogle Scholar
  43. 43.
    Mnih, V., Heess, N., Graves, A., et al.: Recurrent models of visual attention. In: Advances in Neural Information Processing Systems, pp. 2204–2212 (2014)Google Scholar
  44. 44.
    Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 807–814 (2010)Google Scholar
  45. 45.
    Ongtang, M., McLaughlin, S., Enck, W., McDaniel, P.: Semantically rich application-centric security in android. In: Proceedings of the Annual Computer Security Applications Conference (ACSAC’09), pp. 340–349. IEEE (2009)Google Scholar
  46. 46.
    Patil, A., Huard, D., Fonnesbeck, C.J.: PyMC: Bayesian stochastic modelling in python. J. Stat. Softw. 35(4), 1 (2010)CrossRefGoogle Scholar
  47. 47.
    Peng, H., Gates, C., Sarma, B., Li, N., Qi, Y., Potharaju, R., Nita-Rotaru, C., Molloy, I.: Using probabilistic generative models for ranking risks of android apps. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security, pp. 241–252. ACM (2012)Google Scholar
  48. 48.
    Powell, M.J.: A fast algorithm for nonlinearly constrained optimization calculations. In: Numerical analysis, pp. 144–157. Springer (1978)Google Scholar
  49. 49.
    Salakhutdinov, R., Murray, I.: On the quantitative analysis of deep belief networks. In: Proceedings of the 25th International Conference on Machine Learning, pp. 872–879. ACM (2008)Google Scholar
  50. 50.
    Sarma, B.P., Li, N., Gates, C., Potharaju, R., Nita-Rotaru, C., Molloy, I.: Android permissions: a perspective combining risks and benefits. In: Proceedings of the 17th ACM Symposium on Access Control Models and Technologies, pp. 13–22. ACM (2012)Google Scholar
  51. 51.
    Sermanet, P., Frome, A., Real, E.: Attention for fine-grained categorization. arXiv preprint arXiv:1412.7054 (2014)
  52. 52.
    Shabtai, A., Fledel, Y., Elovici, Y.: Securing android-powered mobile devices using selinux. Secur. Priv. IEEE 8(3), 36–44 (2010)CrossRefGoogle Scholar
  53. 53.
    Suykens, J.A., Vandewalle, J.: Least squares support vector machine classifiers. Neural Process. Lett. 9(3), 293–300 (1999)CrossRefzbMATHGoogle Scholar
  54. 54.
    Symantec: Internet security threat report, volume 20. Accessed 15 July 2016
  55. 55.
    Szegedy, C., Ioffe, S., Vanhoucke, V.: Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv preprint arXiv:1602.07261 (2016)
  56. 56.
    Tripp, O., Rubin, J.: A bayesian approach to privacy enforcement in smartphones. In: USENIX Security (2014)Google Scholar
  57. 57.
    Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1096–1103. ACM (2008)Google Scholar
  58. 58.
    VXShare: VirusShare. . Accessed 3 Jan 2017
  59. 59.
    Yan, L.K., Yin, H.: DroidScope: Seamlessly reconstructing the os and dalvik semantic views for dynamic android malware analysis. In: USENIX security symposium, pp. 569–584 (2012)Google Scholar
  60. 60.
    Yang, Z., Hu, Z., Deng, Y., Dyer, C., Smola, A.: Neural machine translation with recurrent attention modeling. arXiv preprint arXiv:1607.05108 (2016)
  61. 61.
    Zhou, Y., Jiang, X.: Dissecting android malware: Characterization and evolution. In: Security and Privacy (SP), 2012 IEEE Symposium on, pp. 95–109. IEEE (2012)Google Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.National University of Computer and Emerging SciencesPeshawarPakistan
  2. 2.Institute of Management SciencesPeshawarPakistan
  3. 3.Deanship of Preparatory Year and Supporting Studies, Computer Science DepartmentImam Abdulrahman Bin Faisal UniversityDammamKingdom of Saudi Arabia
  4. 4.Faculty of Computer and Information SystemIslamic University of MadinahMadinahKingdom of Saudi Arabia

Personalised recommendations