Advertisement

Android Application Analysis Using Machine Learning Techniques

  • Takeshi Takahashi
  • Tao Ban
Chapter
Part of the Intelligent Systems Reference Library book series (ISRL, volume 151)

Abstract

The amount of malware that target Android terminals is growing. Malware applications are distributed to Android terminals in the form of Android Packages (APKs), similar to other Android applications. Analyzing APKs may thus help identify malware. In this chapter, we describe how machine learning techniques can be used to identify Android malware. We begin by looking at the structure of an APK file and introduce techniques for identifying malware. We then describe how data can be collected and analyzed and then used to prepare a dataset. This is done by not only using permission requests and API calls, but also by using application clusters and descriptions as the source. To demonstrate the effectiveness of machine learning techniques for analyzing Android applications, we analyze the performance of support vector machine classification on our dataset and compare it to that of a scheme that does not utilize machine learning. We also evaluate the effectiveness of the features used and further improve the classification performance by removing irrelevant features. Finally, we address several issues and limitations on the use of machine learning techniques for analyzing Android applications.

References

  1. 1.
    Van der Meulen R, Forni AA (2017) Gartner says demand for 4G smartphones in emerging markets spurred growth in second quarter of 2017. https://www.gartner.com/newsroom/id/3788963
  2. 2.
  3. 3.
    Lipovsky R (2014) ESET analyzes Simplocker—first Android file-encrypting, TOR-enabled ransomware. https://www.welivesecurity.com/2014/06/04/simplocker/
  4. 4.
    Stefanko L (2015) Aggressive Android ransomware spreading in the USA. https://www.welivesecurity.com/2015/09/10/aggressive-android-ransomware-spreading-in-the-usa/
  5. 5.
    Schölkopf B, Smola AJ (2001) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, CambridgeGoogle Scholar
  6. 6.
    Vapnik VN (1998) Statistical learning theory. Wiley, HobokenzbMATHGoogle Scholar
  7. 7.
    Takahashi T, Ban T, Tien CW, Lin CH, Inoue D, Nakao K (2016) The usability of metadata for Android application analysis. In: Hirose A, Ozawa S, Doya K, Ikeda K, Lee M, Liu D (eds) Proceedings of the 23nd International Conference on Neural Information Processing. Springer, Cham, pp 546–554. https://doi.org/10.1007/978-3-319-46687-3_60CrossRefGoogle Scholar
  8. 8.
    Ban T, Takahashi T, Guo S, Inoue D, Nakao K (2016) Integration of multimodal features for Android malware detection based on linear SVM. In: Proceedings of the 11th Asia Joint Conference on Information Security, IEEE, pp 141–146. https://doi.org/10.1109/AsiaJCIS.2016.29
  9. 9.
    Moonsamy V, Rong J, Liu S (2014) Mining permission patterns for contrasting clean and malicious Android applications. Future Gener Comp Syst 36:122–132. https://doi.org/10.1016/j.future.2013.09.014CrossRefGoogle Scholar
  10. 10.
    Wang Y, Zheng J, Sun C, Mukkamala S (2013) Quantitative security risk assessment of Android permissions and applications. In: Wang L, Shafiq B (eds) Data and applications security and privacy XXVII. Springer, Heidelberg, pp 226–241. https://doi.org/10.1007/978-3-642-39256-6_15Google Scholar
  11. 11.
    Sarma BP, Li N, Gates C, Potharaju R, Nita-Rotaru C, Molloy I (2012) Android permissions: a perspective combining risks and benefits. In: Atluri V, Vaidya J (eds) Proceedings of the 17th ACM Symposium on Access Control Models and Technologies. ACM, New York, pp 13–22. https://doi.org/10.1145/2295136.2295141
  12. 12.
    Demiroz A (2018) Google play crawler JAVA API. https://github.com/Akdeniz/google-play-crawler
  13. 13.
    Enck W, Gilbert P, Han S, Tendulkar V, Chun BG, Cox LP, Jung J, McDaniel P, Sheth AN (2014) TaintDroid: an information-flow tracking system for realtime privacy monitoring on smartphones. ACM T Comput Syst 32(2), Article 5. https://doi.org/10.1145/2619091CrossRefGoogle Scholar
  14. 14.
    Octeau D, McDaniel P, Jha S, Bartel A, Bodden E, Klein J, Le Traon Y (2013) Effective inter-component communication mapping in Android with Epicc: an essential step towards holistic security analysis. In: Proceedings of the 22nd USENIX Conference on Security. USENIX Association, Berkeley, CA, USA, pp 543–558. https://www.usenix.org/system/files/conference/usenixsecurity13/sec13-paper_octeau.pdf
  15. 15.
    Android Developers (2018) UI/Application exerciser monkey. https://developer.android.com/studio/test/monkey
  16. 16.
    Li Y, Yang Z, Guo Y, Chen X (2017) DroidBot: a lightweight UI-guided test input generator for Android. In: Uchitel S, Orso A, Robillard M (eds) Proceedings of the 39th International Conference on Software Engineering Companion. IEEE Computer Society, Los Alamitos, CA, USA, pp 23–26. https://doi.org/10.1109/ICSE-C.2017.8
  17. 17.
    Harris ZS (1954) Distributional structure. WORD 10(2–3):146–162. https://doi.org/10.1080/00437956.1954.11659520
  18. 18.
    MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Le Cam LM, Neyman J (eds) Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol 1. University of California Press, Berkeley, pp 281–297. https://projecteuclid.org/euclid.bsmsp/1200512992
  19. 19.
    Gorla A, Tavecchia I, Gross F, Zeller A (2014) Checking app behavior against app descriptions. In: Jalote P, Briand L, van der Hoek A (eds) Proceedings of the 36th International Conference on Software Engineering. ACM, New York, pp 1025–1035. https://doi.org/10.1145/2568225.2568276
  20. 20.
    Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022zbMATHGoogle Scholar
  21. 21.
    Shuyo N (2010) Language detection library for Java. https://github.com/shuyo/language-detection
  22. 22.
    Pereda R (2011) Stemmify 0.0.2. https://rubygems.org/gems/stemmify
  23. 23.
    McCallum AK (2002) MALLET: a machine learning for language toolkit. http://mallet.cs.umass.edu
  24. 24.
    RubyGems.org (2013) kmeans 0.1.1. https://rubygems.org/gems/kmeans/
  25. 25.
    Apps and Games AS (2018) Bemobi mobile store. http://apps.bemobi.com
  26. 26.
    Cover TM (1965) Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans Electron EC-14(3):326–334. https://doi.org/10.1109/PGEC.1965.264137CrossRefGoogle Scholar
  27. 27.
    Lin CJ, Weng RC, Keerthi SS (2007) Trust region Newton methods for large-scale logistic regression. In: Ghahramani Z (ed) Proceedings of the 24th International Conference on Machine Learning. ACM, New York, pp 561–568. https://doi.org/10.1145/1273496.1273567
  28. 28.
    Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9:1871–1874zbMATHGoogle Scholar
  29. 29.
    Peiravian N, Zhu X (2013) Machine learning for Android malware detection using permission and API calls. In: Bourbakis N, Brodsky A (eds) Proceedings of the 25th International Conference on Tools with Artificial Intelligence. IEEE Computer Society, Los Alamitos, CA, USA, pp 300–305. https://doi.org/10.1109/ICTAI.2013.53
  30. 30.
    Aafer Y, Du W, Yin H (2013) DroidAPIMiner: mining API-level features for robust malware detection in Android. In: Zia T, Zomaya A, Varadharajan V, Mao M (eds) Security and privacy in communication networks. Springer, Cham, pp 86–103. https://doi.org/10.1007/978-3-319-04283-1_6Google Scholar
  31. 31.
    Li W, Ge J, Dai G (2015) Detecting malware for Android platform: an SVM-based approach. In: Qiu M, Zhang T, Das S (eds) Proceedings of the 2nd International Conference on Cyber Security and Cloud Computing. IEEE Computer Society, Los Alamitos, CA, USA, pp 464–469. https://doi.org/10.1109/CSCloud.2015.50
  32. 32.
    Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422. https://doi.org/10.1023/A:1012487302797CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.National Institute of Information and Communications TechnologyTokyoJapan

Personalised recommendations