AndroParse - An Android Feature Extraction Framework and Dataset

  • Robert Schmicker
  • Frank BreitingerEmail author
  • Ibrahim Baggili
Conference paper
Part of the Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering book series (LNICST, volume 259)


Android malware has become a major challenge. As a consequence, practitioners and researchers spend a significant time analyzing Android applications (APK). A common procedure (especially for data scientists) is to extract features such as permissions, APIs or strings which can then be analyzed. Current state of the art tools have three major issues: (1) a single tool cannot extract all the significant features used by scientists and practitioners (2) Current tools are not designed to be extensible and (3) Existing parsers can be timely as they are not runtime efficient or scalable. Therefore, this work presents AndroParse which is an open-source Android parser written in Golang that currently extracts the four most common features: Permissions, APIs, Strings and Intents. AndroParse outputs JSON files as they can easily be used by most major programming languages. Constructing the parser allowed us to create an extensive feature dataset which can be accessed by our independent REST API. Our dataset currently has 67,703 benign and 46,683 malicious APK samples.


AndroParse Android Malware Dataset Features Framework 



We like to thank the University of New Haven’s Summer Undergraduate Research Fellowship (SURF) program who supported this research.

Supplementary material


  1. 1.
  2. 2.
    Aafer, Y., Du, W., Yin, H.: DroidAPIMiner: mining API-level features for robust malware detection in android. In: Zia, T., Zomaya, A., Varadharajan, V., Mao, M. (eds.) SecureComm 2013. LNICST, vol. 127, pp. 86–103. Springer, Cham (2013). Scholar
  3. 3.
    Anonymous. CAPIL: Component-API linkage for android malware detection (2016, unpublished)Google Scholar
  4. 4.
    APK-DL. Apk downloader (2016). Accessed 13 Apr 2018
  5. 5.
    APKPure. Download APK free online (2016). Accessed 13 Apr 2018
  6. 6.
    Apvrille, L., Apvrille, A.: Identifying unknown android malware with feature extractions and classification techniques. In: 2015 IEEE Trustcom/BigDataSE/ISPA, vol. 1, pp. 182–189. IEEE (2015)Google Scholar
  7. 7.
    Arp, D., Spreitzenbarth, M., Hübner, M., Gascon, H., Rieck, K., CERT Siemens: DREBIN: effective and explainable detection of android malware in your pocket. In: Proceedings of the Annual Symposium on Network and Distributed System Security (NDSS) (2014). Accessed 13 Apr 2018
  8. 8.
    Au, K.W.Y., Zhou, Y.F., Huang, Z., Lie, D.: PScout: analyzing the android permission specification. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security, pp. 217–228. ACM (2012)Google Scholar
  9. 9.
    Aung, Z., Zaw, W.: Permission-based android malware detection. Int. J. Sci. Technol. Res. 2(3), 228–234 (2013)Google Scholar
  10. 10.
    Babu Rajesh, V., Reddy, P., Himanshu, P., Patil, M.U.: Droidswan: detecting malicious android applications based on static feature analysis. Comput. Sci. Inf. Technol., 163 (2015)Google Scholar
  11. 11.
    Baskaran, B., Ralescu, A.: A study of android malware detection techniques and machine learning. University of Cincinnati (2016)Google Scholar
  12. 12.
    Bhatia, A.: Android-security-awesome, February 2017. Accessed 13 Apr 2018
  13. 13.
    Desnos, A.: Androguard-reverse engineering, malware and goodware analysis of android applications. URL code. (2013)Google Scholar
  14. 14.
    eLinux. Android AAPT, June 2010. Accessed 13 Apr 2018
  15. 15.
    Faruki, P., Bharmal, A., Laxmi, V., Gaur, M.S., Conti, M., Rajarajan, M.: Evaluation of android anti-malware techniques against Dalvik bytecode obfuscation. In: 2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications, pp. 414–421. IEEE (2014)Google Scholar
  16. 16.
    Feizollah, A., Anuar, N.B., Salleh, R., Wahab, A.W.A.: A review on feature selection in mobile malware detection. Digit. Invest. 13, 22–37 (2015)CrossRefGoogle Scholar
  17. 17.
    Fereidooni, H., Moonsamy, V., Conti, M., Batina, L.: Efficient classification of android malware in the wild using robust static features (2016)Google Scholar
  18. 18.
    Geneiatakis, D., Satta, R., Fovino, I.N., Neisse, R.: On the efficacy of static features to detect malicious applications in android. In: Fischer-Hübner, S., Lambrinoudakis, C., Lopez, J. (eds.) TrustBus 2015. LNCS, vol. 9264, pp. 87–98. Springer, Cham (2015). Scholar
  19. 19.
    Holmes, G., Donkin, A., Witten, I.H.: WEKA: a machine learning workbench. In: Proceedings of the 1994 Second Australian and New Zealand Conference on Intelligent Information Systems, pp. 357–361. IEEE (1994)Google Scholar
  20. 20.
    Kaushik, P., Jain, A.: Malware detection techniques in android. Int. J. Comput. Appl. 122(17), 22–26 (2015)Google Scholar
  21. 21.
    Maggi, F., Valdi, A., Zanero, S.: Andrototal: a flexible, scalable toolbox and service for testing mobile malware detectors. In: Proceedings of the Third ACM Workshop on Security and Privacy in Smartphones and Mobile Devices, pp. 49–54. ACM (2013)Google Scholar
  22. 22.
    Maiorca, D., Ariu, D., Corona, I., Aresu, M., Giacinto, G.: Stealth attacks: an extended insight into the obfuscation effects on android malware. Comput. Secur. 51, 16–31 (2015)CrossRefGoogle Scholar
  23. 23.
    Malik, S., Khatter, K.: AndroData: a tool for static & dynamic feature extraction of android apps. Int. J. Appl. Eng. Res. 10(94), 98–102 (2015)Google Scholar
  24. 24.
    Nativ, Y.T., Shalev, S.: Thezoo (2015). Accessed 13 Apr 2018
  25. 25.
    Newman, D.J., Hettich, S., Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998). Accessed 13 Apr 2018
  26. 26.
    Parkour, M.: Contagio mobile. Mobile malware mini dump (2013). Accessed 13 Apr 2018
  27. 27.
    Payload Security. Learn more about the standalone version or purchase a private web service (2016). Accessed 13 Apr 2018
  28. 28.
    Pehlivan, U., Baltaci, N., Acartürk, C., Baykal, N.: The analysis of feature selection methods and classification algorithms in permission based android malware detection. In: 2014 IEEE Symposium on Computational Intelligence in Cyber Security (CICS), pp. 1–8. IEEE (2014)Google Scholar
  29. 29.
    Rami, K., Desai, V.: Performance base static analysis of malware on android (2013)Google Scholar
  30. 30.
    Sahs, J., Khan, L.: A machine learning approach to android malware detection. In: 2012 European Intelligence and Security Informatics Conference (EISIC), pp. 141–147. IEEE (2012)Google Scholar
  31. 31.
    Sanz, B., Santos, I., Laorden, C., Ugarte-Pedrero, X., Bringas, P.G., Álvarez, G.: PUMA: permission usage to detect malware in android. In: Herrero, Á., et al. (eds.) International Joint Conference CISIS’12-ICEUTE’ 12-SOCO’ 12. AISC, vol. 189, pp. 289–298. Springer, Heidelberg (2013). Scholar
  32. 32.
    Seth, R., Kaushal, R.: Permission based malware analysis & detection in android (2014)Google Scholar
  33. 33.
    Spreitzenbarth, M., Schreck, T., Echtler, F., Arp, D., Hoffmann, J.: Mobile-sandbox: combining static and dynamic analysis with machine-learning techniques. Int. J. Inf. Secur. 14(2), 141–153 (2015)CrossRefGoogle Scholar
  34. 34.
    SunFeith. php\(\_\)apk\(\_\)parser (2013). Accessed 13 Apr 2018
  35. 35.
    Svensson, R.: Das malwerk (2016). Accessed 13 Apr 2018
  36. 36.
    Tdoly. tdoly/apk\(\_\)parse. GitHub (2015). Accessed 13 Apr 2018
  37. 37.
    VirusTotalTeam. Virustotal-free online virus, malware and url scanner (2013). Accessed 13 Apr 2018
  38. 38.
    Wang, X., Yang, Y., Zeng, Y.: Accurate mobile malware detection and classification in the cloud. SpringerPlus 4(1), 1 (2015)CrossRefGoogle Scholar
  39. 39.
    Wei, F., Li, Y., Roy, S., Ou, X., Zhou, W.: Deep ground truth analysis of current android malware. In: Polychronakis, M., Meier, M. (eds.) DIMVA 2017. LNCS, vol. 10327, pp. 252–276. Springer, Cham (2017). Scholar
  40. 40.
    Winsniewski, R.: Android–apktool: a tool for reverse engineering android APK files (2012)Google Scholar
  41. 41.
    Yerima, S.Y., Sezer, S., Muttik, I.: Android malware detection using parallel machine learning classifiers. In: 2014 Eighth International Conference on Next Generation Mobile Apps, Services and Technologies, pp. 37–42. IEEE (2014)Google Scholar
  42. 42.
    Zhang, X., Breitinger, F., Baggili, I.: Rapid android parser for investigating dex files (RAPID). Digit. Invest. 17, 28–39 (2016)CrossRefGoogle Scholar
  43. 43.
    Zhou, Y., Jiang, X.: Android malware genome project. Disponibile a (2012).
  44. 44.
    Zhou, Y., Wang, Z., Zhou, W., Jiang, X.: Hey, you, get off of my market: detecting malicious apps in official and alternative android markets. In: NDSS, vol. 25, pp. 50–52 (2012)Google Scholar

Copyright information

© ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2019

Authors and Affiliations

  • Robert Schmicker
    • 1
  • Frank Breitinger
    • 1
    Email author
  • Ibrahim Baggili
    • 1
  1. 1.Cyber Forensics Research and Education Group (UNHcFREG), Tagliatela College of EngineeringUniversity of New HavenWest HavenUSA

Personalised recommendations