Android Application Analysis Using Machine Learning Techniques

  • Takeshi TakahashiEmail author
  • Tao Ban
Part of the Intelligent Systems Reference Library book series (ISRL, volume 151)


The amount of malware that target Android terminals is growing. Malware applications are distributed to Android terminals in the form of Android Packages (APKs), similar to other Android applications. Analyzing APKs may thus help identify malware. In this chapter, we describe how machine learning techniques can be used to identify Android malware. We begin by looking at the structure of an APK file and introduce techniques for identifying malware. We then describe how data can be collected and analyzed and then used to prepare a dataset. This is done by not only using permission requests and API calls, but also by using application clusters and descriptions as the source. To demonstrate the effectiveness of machine learning techniques for analyzing Android applications, we analyze the performance of support vector machine classification on our dataset and compare it to that of a scheme that does not utilize machine learning. We also evaluate the effectiveness of the features used and further improve the classification performance by removing irrelevant features. Finally, we address several issues and limitations on the use of machine learning techniques for analyzing Android applications.


Analyze Android Applications Android Malware Request Permission Android Software (APK) Android Terminals 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Van der Meulen R, Forni AA (2017) Gartner says demand for 4G smartphones in emerging markets spurred growth in second quarter of 2017.
  2. 2.
  3. 3.
    Lipovsky R (2014) ESET analyzes Simplocker—first Android file-encrypting, TOR-enabled ransomware.
  4. 4.
    Stefanko L (2015) Aggressive Android ransomware spreading in the USA.
  5. 5.
    Schölkopf B, Smola AJ (2001) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, CambridgeGoogle Scholar
  6. 6.
    Vapnik VN (1998) Statistical learning theory. Wiley, HobokenzbMATHGoogle Scholar
  7. 7.
    Takahashi T, Ban T, Tien CW, Lin CH, Inoue D, Nakao K (2016) The usability of metadata for Android application analysis. In: Hirose A, Ozawa S, Doya K, Ikeda K, Lee M, Liu D (eds) Proceedings of the 23nd International Conference on Neural Information Processing. Springer, Cham, pp 546–554. Scholar
  8. 8.
    Ban T, Takahashi T, Guo S, Inoue D, Nakao K (2016) Integration of multimodal features for Android malware detection based on linear SVM. In: Proceedings of the 11th Asia Joint Conference on Information Security, IEEE, pp 141–146.
  9. 9.
    Moonsamy V, Rong J, Liu S (2014) Mining permission patterns for contrasting clean and malicious Android applications. Future Gener Comp Syst 36:122–132. Scholar
  10. 10.
    Wang Y, Zheng J, Sun C, Mukkamala S (2013) Quantitative security risk assessment of Android permissions and applications. In: Wang L, Shafiq B (eds) Data and applications security and privacy XXVII. Springer, Heidelberg, pp 226–241. Scholar
  11. 11.
    Sarma BP, Li N, Gates C, Potharaju R, Nita-Rotaru C, Molloy I (2012) Android permissions: a perspective combining risks and benefits. In: Atluri V, Vaidya J (eds) Proceedings of the 17th ACM Symposium on Access Control Models and Technologies. ACM, New York, pp 13–22.
  12. 12.
    Demiroz A (2018) Google play crawler JAVA API.
  13. 13.
    Enck W, Gilbert P, Han S, Tendulkar V, Chun BG, Cox LP, Jung J, McDaniel P, Sheth AN (2014) TaintDroid: an information-flow tracking system for realtime privacy monitoring on smartphones. ACM T Comput Syst 32(2), Article 5. Scholar
  14. 14.
    Octeau D, McDaniel P, Jha S, Bartel A, Bodden E, Klein J, Le Traon Y (2013) Effective inter-component communication mapping in Android with Epicc: an essential step towards holistic security analysis. In: Proceedings of the 22nd USENIX Conference on Security. USENIX Association, Berkeley, CA, USA, pp 543–558.
  15. 15.
    Android Developers (2018) UI/Application exerciser monkey.
  16. 16.
    Li Y, Yang Z, Guo Y, Chen X (2017) DroidBot: a lightweight UI-guided test input generator for Android. In: Uchitel S, Orso A, Robillard M (eds) Proceedings of the 39th International Conference on Software Engineering Companion. IEEE Computer Society, Los Alamitos, CA, USA, pp 23–26.
  17. 17.
    Harris ZS (1954) Distributional structure. WORD 10(2–3):146–162.
  18. 18.
    MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Le Cam LM, Neyman J (eds) Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol 1. University of California Press, Berkeley, pp 281–297.
  19. 19.
    Gorla A, Tavecchia I, Gross F, Zeller A (2014) Checking app behavior against app descriptions. In: Jalote P, Briand L, van der Hoek A (eds) Proceedings of the 36th International Conference on Software Engineering. ACM, New York, pp 1025–1035.
  20. 20.
    Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022zbMATHGoogle Scholar
  21. 21.
    Shuyo N (2010) Language detection library for Java.
  22. 22.
    Pereda R (2011) Stemmify 0.0.2.
  23. 23.
    McCallum AK (2002) MALLET: a machine learning for language toolkit.
  24. 24. (2013) kmeans 0.1.1.
  25. 25.
    Apps and Games AS (2018) Bemobi mobile store.
  26. 26.
    Cover TM (1965) Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans Electron EC-14(3):326–334. Scholar
  27. 27.
    Lin CJ, Weng RC, Keerthi SS (2007) Trust region Newton methods for large-scale logistic regression. In: Ghahramani Z (ed) Proceedings of the 24th International Conference on Machine Learning. ACM, New York, pp 561–568.
  28. 28.
    Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9:1871–1874zbMATHGoogle Scholar
  29. 29.
    Peiravian N, Zhu X (2013) Machine learning for Android malware detection using permission and API calls. In: Bourbakis N, Brodsky A (eds) Proceedings of the 25th International Conference on Tools with Artificial Intelligence. IEEE Computer Society, Los Alamitos, CA, USA, pp 300–305.
  30. 30.
    Aafer Y, Du W, Yin H (2013) DroidAPIMiner: mining API-level features for robust malware detection in Android. In: Zia T, Zomaya A, Varadharajan V, Mao M (eds) Security and privacy in communication networks. Springer, Cham, pp 86–103. Scholar
  31. 31.
    Li W, Ge J, Dai G (2015) Detecting malware for Android platform: an SVM-based approach. In: Qiu M, Zhang T, Das S (eds) Proceedings of the 2nd International Conference on Cyber Security and Cloud Computing. IEEE Computer Society, Los Alamitos, CA, USA, pp 464–469.
  32. 32.
    Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422. Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.National Institute of Information and Communications TechnologyTokyoJapan

Personalised recommendations