Skip to main content

DexRay: A Simple, yet Effective Deep Learning Approach to Android Malware Detection Based on Image Representation of Bytecode

  • Conference paper
  • First Online:
Deployable Machine Learning for Security Defense (MLHat 2021)

Abstract

Computer vision has witnessed several advances in recent years, with unprecedented performance provided by deep representation learning research. Image formats thus appear attractive to other fields such as malware detection, where deep learning on images alleviates the need for comprehensively hand-crafted features generalising to different malware variants. We postulate that this research direction could become the next frontier in Android malware detection, and therefore requires a clear roadmap to ensure that new approaches indeed bring novel contributions. We contribute with a first building block by developing and assessing a baseline pipeline for image-based malware detection with straightforward steps.

We propose DexRay, which converts the bytecode of the app DEX files into grey-scale “vector” images and feeds them to a 1-dimensional Convolutional Neural Network model. We view DexRay as foundational due to the exceedingly basic nature of the design choices, allowing to infer what could be a minimal performance that can be obtained with image-based learning in malware detection.

The performance of DexRay evaluated on over 158k apps demonstrates that, while simple, our approach is effective with a high detection rate (F1-score \(=0.96\)). Finally, we investigate the impact of time decay and image-resizing on the performance of DexRay and assess its resilience to obfuscation.

This work-in-progress paper contributes to the domain of Deep Learning based Malware detection by providing a sound, simple, yet effective approach (with available artefacts) that can be the basis to scope the many profound questions that will need to be investigated to fully develop this domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Data Availability Statement

All artefacts are available online at: https://github.com/Trustworthy-Software/DexRay.

Notes

  1. 1.

    https://www.microsoft.com/security/blog/2020/05/08/microsoft-researchers-work-with-intel-labs-to-explore-new-deep-learning-approaches-for-malware-classification/.

  2. 2.

    https://www.tensorflow.org/api_docs/python/tf/image/resize.

  3. 3.

    https://play.google.com/store.

  4. 4.

    https://www.virustotal.com/.

  5. 5.

    https://github.com/ClaudiuGeorgiu/Obfuscapk.

  6. 6.

    https://www.oracle.com/technical-resources/articles/java/javareflection.html.

  7. 7.

    https://www.tensorflow.org.

  8. 8.

    http://R2D2.TWMAN.ORG.

References

  1. Kang, H., Jang, J.-W., Mohaisen, A., Kim, H.K.: Detecting and classifying android malware using static analysis along with creator information. Int. J. Distrib. Sens. Netw. 11(6), 479174 (2015)

    Article  Google Scholar 

  2. Petsas, T., Voyatzis, G., Athanasopoulos, E., Polychronakis, M., Ioannidis, S.: Rage against the virtual machine: hindering dynamic analysis of android malware. In: Proceedings of the Seventh European Workshop on System Security, ser. EuroSec 2014. Association for Computing Machinery, New York (2014). https://doi.org/10.1145/2592791.2592796

  3. Zheng, M., Sun, M., Lui, J.C.S.: Droid analytics: a signature based analytic system to collect, extract, analyze and associate android malware. In: 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, pp. 163–171 (2013)

    Google Scholar 

  4. Faruki, P., Ganmoor, V., Laxmi, V., Gaur, M.S., Bharmal, A.: Androsimilar: robust statistical feature signature for android malware detection. In: Proceedings of the 6th International Conference on Security of Information and Networks, ser. SIN 2013, pp. 152–159. Association for Computing Machinery, New York (2013). https://doi.org/10.1145/2523514.2523539

  5. McAfee: Mcafee labs threats report (2020). https://www.mcafee.com/enterprise/en-us/assets/reports/rp-quarterly-threats-nov-2020.pdf. Accessed 22 Feb 2021

  6. Google: Android security & privacy 2018 year in review (2018). https://source.android.com/security/reports/Google_Android_Security_2018_Report_Final.pdf. Accessed 22 Feb 2021

  7. Malwarebytes Lab: 2020 state of malware report (2020). https://resources.malwarebytes.com/files/2020/02/2020_State-of-Malware-Report-1.pdf. Accessed 22 Feb 2021

  8. Kaspersky Lab: Kaspersky security network (2017). https://media.kaspersky.com/pdf/KESB_Whitepaper_KSN_ENG_final.pdf. Accessed 22 Feb 2021

  9. Arp, D., Spreitzenbarth, M., Hübner, M., Gascon, H., Rieck, K.: Drebin: efficient and explainable detection of Android malware in your pocket. In: Proceedings of the ISOC Network and Distributed System Security Symposium (NDSS), San Diego, CA (2014)

    Google Scholar 

  10. Garcia, J., Hammad, M., Malek, S.: [journal first] Lightweight, obfuscation-resilient detection and family identification of android malware. In: 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE), p. 497 (2018)

    Google Scholar 

  11. Onwuzurike, L., Mariconti, E., Andriotis, P., Cristofaro, E.D., Ross, G., Stringhini, G.: MaMaDroid: detecting android malware by building Markov chains of behavioral models (extended version). ACM Trans. Priv. Secur. 22(2) (2019). https://doi.org/10.1145/3313391

  12. Fereidooni, H., Conti, M., Yao, D., Sperduti, A.: Anastasia: Android malware detection using static analysis of applications. In: 2016 8th IFIP International Conference on New Technologies, Mobility and Security (NTMS), pp. 1–5 (2016)

    Google Scholar 

  13. Cai, H., Meng, N., Ryder, B., Yao, D.: Droidcat: effective android malware detection and categorization via app-level profiling. IEEE Trans. Inf. Forensics Secur. 14(6), 1455–1470 (2019)

    Article  Google Scholar 

  14. Wu, W.-C., Hung, S.-H.: DroidDolphin: a dynamic android malware detection framework using big data and machine learning. In: Proceedings of the 2014 Conference on Research in Adaptive and Convergent Systems, ser. RACS 2014, pp. 247–252. Association for Computing Machinery, New York (2014). https://doi.org/10.1145/2663761.2664223

  15. Martinelli, F., Mercaldo, F., Saracino, A.: Bridemaid: an hybrid tool for accurate detection of android malware. In: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, ser. ASIA CCS 2017, pp. 899–901. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3052973.3055156

  16. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2818–2826 (2016)

    Google Scholar 

  17. Allix, K., Bissyandé, T.F., Klein, J., Le Traon, Y.: AndroZoo: collecting millions of Android apps for the research community. In: Proceedings of the 13th International Conference on Mining Software Repositories, ser. MSR 2016, pp. 468–471. ACM, New York (2016). http://doi.acm.org/10.1145/2901739.2903508

  18. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015). https://doi.org/10.1038/nature14539

    Article  Google Scholar 

  19. Yamashita, R., Nishio, M., Do, R.K.G., Togashi, K.: Convolutional neural networks: an overview and application in radiology. Insights Imaging 9(4), 611–629 (2018). https://doi.org/10.1007/s13244-018-0639-9

    Article  Google Scholar 

  20. Zhiqiang, W., Jun, L.: A review of object detection based on convolutional neural network. In: 2017 36th Chinese Control Conference (CCC), pp. 11 104–11 109 (2017)

    Google Scholar 

  21. Aloysius, N., Geetha, M.: A review on deep convolutional neural networks. In: 2017 International Conference on Communication and Signal Processing (ICCSP), pp. 0588–0592 (2017)

    Google Scholar 

  22. Ke, Q., Liu, J., Bennamoun, M., An, S., Sohel, F., Boussaid, F.: Computer vision for human-machine interaction. In: Computer Vision for Assistive Healthcare, pp. 127–145. Elsevier (2018)

    Google Scholar 

  23. Yu, D., Wang, H., Chen, P., Wei, Z.: Mixed pooling for convolutional neural networks. In: Miao, D., Pedrycz, W., Ślȩzak, D., Peters, G., Hu, Q., Wang, R. (eds.) RSKT 2014. LNCS (LNAI), vol. 8818, pp. 364–375. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11740-9_34

    Chapter  Google Scholar 

  24. Aonzo, S., Georgiu, G.C., Verderame, L., Merlo, A.: Obfuscapk: an open-source black-box obfuscation tool for Android apps. SoftwareX 11, 100403 (2020). http://www.sciencedirect.com/science/article/pii/S2352711019302791

  25. Raschka, S.: Model evaluation, model selection, and algorithm selection in machine learning, arXiv preprint arXiv:1811.12808 (2018)

  26. Daoudi, N., Allix, K., Bissyandé, T.F., Klein, J.: Lessons learnt on reproducibility in machine learning based Android malware detection. Empir. Softw. Eng. 26(4), 1–53 (2021). https://doi.org/10.1007/s10664-021-09955-7

    Article  Google Scholar 

  27. Huang, T.H., Kao, H.: R2-D2: color-inspired convolutional neural network (CNN)-based Android malware detections. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 2633–2642 (2018)

    Google Scholar 

  28. Ding, Y., Zhang, X., Hu, J., Xu, W.: Android malware detection method based on bytecode image. J. Ambient Intell. Human. Comput., 1–10 (2020). https://link.springer.com/article/10.1007%2Fs12652-020-02196-4

  29. Pendlebury, F., Pierazzi, F., Jordaney, R., Kinder, J., Cavallaro, L.: TESSERACT: eliminating experimental bias in malware classification across space and time. In: 28th USENIX Security Symposium (USENIX Security 19), pp. 729–746. USENIX Association, Santa Clara, August 2019. https://www.usenix.org/conference/usenixsecurity19/presentation/pendlebury

  30. Xu, K., Li, Y., Deng, R., Chen, K., Xu, J.: DroidEvolver: self-evolving android malware detection system. In: 2019 IEEE European Symposium on Security and Privacy (EuroS P), pp. 47–62 (2019)

    Google Scholar 

  31. Zhang, X., et al.: Enhancing state-of-the-art classifiers with API semantics to detect evolved android malware. In: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS 2020, pp. 757–770. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3372297.3417291

  32. Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should i trust you?”: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD 2016, pp. 1135–1144. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2939672.2939778

  33. Guo, W., Mu, D., Xu, J., Su, P., Wang, G., Xing, X.: LEMNA: explaining deep learning based security applications. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS 2018, pp. 364–379. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3243734.3243792

  34. Palumbo, P., Sayfullina, L., Komashinskiy, D., Eirola, E., Karhunen, J.: A pragmatic Android malware detection procedure. Comput. Secur. 70, 689–701 (2017)

    Article  Google Scholar 

  35. Liu, K., Xu, S., Xu, G., Zhang, M., Sun, D., Liu, H.: A review of android malware detection approaches based on machine learning. IEEE Access 8, 124 579–124 607 (2020)

    Google Scholar 

  36. Sharma, T., Rattan, D.: Malicious application detection in Android - a systematic literature review. Comput. Sci. Rev. 40, 100373 (2021). https://www.sciencedirect.com/science/article/pii/S1574013721000137

  37. Wu, D., Mao, C., Wei, T., Lee, H., Wu, K.: DroidMat: Android malware detection through manifest and API calls tracing. In: 2012 Seventh Asia Joint Conference on Information Security, pp. 62–69 (2012)

    Google Scholar 

  38. Burguera, I., Zurutuza, U., Nadjm-Tehrani, S.: Crowdroid: behavior-based malware detection system for Android. In: Proceedings of the 1st ACM Workshop on Security and Privacy in Smartphones and Mobile Devices, ser. SPSM 2011, pp. 15–26. Association for Computing Machinery, New York (2011). https://doi.org/10.1145/2046614.2046619

  39. Kouliaridis, V., Kambourakis, G., Geneiatakis, D., Potha, N.: Two anatomists are better than one-dual-level android malware detection. Symmetry 12(7), 1128 (2020)

    Article  Google Scholar 

  40. Arshad, S., Shah, M.A., Wahid, A., Mehmood, A., Song, H., Yu, H.: SAMADroid: a novel 3-level hybrid malware detection model for Android operating system. IEEE Access 6, 4321–4339 (2018)

    Article  Google Scholar 

  41. Wang, Z., Cai, J., Cheng, S., Li, W.: DroidDeepLearner: identifying android malware using deep learning. In: 2016 IEEE 37th Sarnoff Symposium, pp. 160–165 (2016)

    Google Scholar 

  42. Qiu, J., Zhang, J., Luo, W., Pan, L., Nepal, S., Xiang, Y.: A survey of Android malware detection with deep neural models. ACM Comput. Surv. 53(6) (2020). https://doi.org/10.1145/3417978

  43. Karbab, E.B., Debbabi, M., Derhab, A., Mouheb, D.: MalDozer: automatic framework for Android malware detection using deep learning. Digit. Investig. 24, S48–S59 (2018)

    Article  Google Scholar 

  44. Kim, T., Kang, B., Rho, M., Sezer, S., Im, E.G.: A multimodal deep learning method for android malware detection using various features. IEEE Trans. Inf. Forensics Secur. 14(3), 773–788 (2018)

    Article  Google Scholar 

  45. Yuan, Z., Lu, Y., Xue, Y.: Droiddetector: android malware characterization and detection using deep learning. Tsinghua Sci. Technol. 21(1), 114–123 (2016)

    Article  Google Scholar 

  46. Alzaylaee, M.K., Yerima, S.Y., Sezer, S.: DL-Droid: deep learning based Android malware detection using real devices. Comput. Secur. 89, 101663 (2020)

    Article  Google Scholar 

  47. Hou, S., Saas, A., Chen, L., Ye, Y.: Deep4MalDroid: a deep learning framework for Android malware detection based on Linux kernel system call graphs. In: 2016 IEEE/WIC/ACM International Conference on Web Intelligence Workshops (WIW), pp. 104–111 (2016)

    Google Scholar 

  48. Wang, W., Zhao, M., Wang, J.: Effective android malware detection with a hybrid model based on deep autoencoder and convolutional neural network. J. Ambient. Intell. Human. Comput. 10(8), 3035–3043 (2018). https://doi.org/10.1007/s12652-018-0803-6

    Article  Google Scholar 

  49. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 26, pp. 3111–3119. Curran Associates Inc. (2013). https://proceedings.neurips.cc/paper/2013/file/9aa42b31882ec039965f3c4923ce901b-Paper.pdf

  50. McLaughlin, N., et al.: Deep android malware detection. In: CODASPY 2017 - Proceedings of the 7th ACM Conference on Data and Application Security and Privacy, ser. CODASPY 2017 - Proceedings of the 7th ACM Conference on Data and Application Security and Privacy, pp. 301–308. Association for Computing Machinery Inc., March 2017. Funding Information: This work was partially supported by the grants from Global Research Laboratory Project through National Research Foundation (NRF-2014K1A1A2043029) and the Center for Cybersecurity and Digital Forensics at Arizona State University. This work was also partially supported by Engineering and Physical Sciences Research Council (EPSRC) grant EP/N508664/1.; 7th ACM Conference on Data and Application Security and Privacy, CODASPY 2017; Conference date: 22–03-2017 Through 24–03-2017

    Google Scholar 

  51. Nataraj, L., Karthikeyan, S., Jacob, G., Manjunath, B.S.: Malware images: visualization and automatic classification. In: Proceedings of the 8th International Symposium on Visualization for Cyber Security, pp. 1–7 (2011)

    Google Scholar 

  52. Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vision 42(3), 145–175 (2001). https://doi.org/10.1023/A:1011139631724

    Article  MATH  Google Scholar 

  53. Darus, F.M., Salleh, N.A.A., Mohd Ariffin, A.F.: Android malware detection using machine learning on image patterns. In: 2018 Cyber Resilience Conference (CRC), pp. 1–2 (2018)

    Google Scholar 

  54. Yadav, B., Tokekar, S.: Deep learning in malware identification and classification. In: Stamp, M., Alazab, M., Shalaginov, A. (eds.) Malware Analysis Using Artificial Intelligence and Deep Learning, pp. 163–205. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-62582-5_6

    Chapter  Google Scholar 

  55. Ünver, H.M., Bakour, K.: Android malware detection based on image-based features and machine learning techniques. SN Appl. Sci. 2(7) (2020). https://doi.org/10.1007/s42452-020-3132-2

  56. Mercaldo, F., Santone, A.: Deep learning for image-based mobile malware detection. J. Comput. Virol. Hacking Tech. 16(2), 157–171 (2020). https://doi.org/10.1007/s11416-019-00346-7

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nadia Daoudi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Daoudi, N., Samhi, J., Kabore, A.K., Allix, K., Bissyandé, T.F., Klein, J. (2021). DexRay: A Simple, yet Effective Deep Learning Approach to Android Malware Detection Based on Image Representation of Bytecode. In: Wang, G., Ciptadi, A., Ahmadzadeh, A. (eds) Deployable Machine Learning for Security Defense. MLHat 2021. Communications in Computer and Information Science, vol 1482. Springer, Cham. https://doi.org/10.1007/978-3-030-87839-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-87839-9_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-87838-2

  • Online ISBN: 978-3-030-87839-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics