Abstract
Computer vision has witnessed several advances in recent years, with unprecedented performance provided by deep representation learning research. Image formats thus appear attractive to other fields such as malware detection, where deep learning on images alleviates the need for comprehensively hand-crafted features generalising to different malware variants. We postulate that this research direction could become the next frontier in Android malware detection, and therefore requires a clear roadmap to ensure that new approaches indeed bring novel contributions. We contribute with a first building block by developing and assessing a baseline pipeline for image-based malware detection with straightforward steps.
We propose DexRay, which converts the bytecode of the app DEX files into grey-scale “vector” images and feeds them to a 1-dimensional Convolutional Neural Network model. We view DexRay as foundational due to the exceedingly basic nature of the design choices, allowing to infer what could be a minimal performance that can be obtained with image-based learning in malware detection.
The performance of DexRay evaluated on over 158k apps demonstrates that, while simple, our approach is effective with a high detection rate (F1-score \(=0.96\)). Finally, we investigate the impact of time decay and image-resizing on the performance of DexRay and assess its resilience to obfuscation.
This work-in-progress paper contributes to the domain of Deep Learning based Malware detection by providing a sound, simple, yet effective approach (with available artefacts) that can be the basis to scope the many profound questions that will need to be investigated to fully develop this domain.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Data Availability Statement
All artefacts are available online at: https://github.com/Trustworthy-Software/DexRay.
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
References
Kang, H., Jang, J.-W., Mohaisen, A., Kim, H.K.: Detecting and classifying android malware using static analysis along with creator information. Int. J. Distrib. Sens. Netw. 11(6), 479174 (2015)
Petsas, T., Voyatzis, G., Athanasopoulos, E., Polychronakis, M., Ioannidis, S.: Rage against the virtual machine: hindering dynamic analysis of android malware. In: Proceedings of the Seventh European Workshop on System Security, ser. EuroSec 2014. Association for Computing Machinery, New York (2014). https://doi.org/10.1145/2592791.2592796
Zheng, M., Sun, M., Lui, J.C.S.: Droid analytics: a signature based analytic system to collect, extract, analyze and associate android malware. In: 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, pp. 163–171 (2013)
Faruki, P., Ganmoor, V., Laxmi, V., Gaur, M.S., Bharmal, A.: Androsimilar: robust statistical feature signature for android malware detection. In: Proceedings of the 6th International Conference on Security of Information and Networks, ser. SIN 2013, pp. 152–159. Association for Computing Machinery, New York (2013). https://doi.org/10.1145/2523514.2523539
McAfee: Mcafee labs threats report (2020). https://www.mcafee.com/enterprise/en-us/assets/reports/rp-quarterly-threats-nov-2020.pdf. Accessed 22 Feb 2021
Google: Android security & privacy 2018 year in review (2018). https://source.android.com/security/reports/Google_Android_Security_2018_Report_Final.pdf. Accessed 22 Feb 2021
Malwarebytes Lab: 2020 state of malware report (2020). https://resources.malwarebytes.com/files/2020/02/2020_State-of-Malware-Report-1.pdf. Accessed 22 Feb 2021
Kaspersky Lab: Kaspersky security network (2017). https://media.kaspersky.com/pdf/KESB_Whitepaper_KSN_ENG_final.pdf. Accessed 22 Feb 2021
Arp, D., Spreitzenbarth, M., Hübner, M., Gascon, H., Rieck, K.: Drebin: efficient and explainable detection of Android malware in your pocket. In: Proceedings of the ISOC Network and Distributed System Security Symposium (NDSS), San Diego, CA (2014)
Garcia, J., Hammad, M., Malek, S.: [journal first] Lightweight, obfuscation-resilient detection and family identification of android malware. In: 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE), p. 497 (2018)
Onwuzurike, L., Mariconti, E., Andriotis, P., Cristofaro, E.D., Ross, G., Stringhini, G.: MaMaDroid: detecting android malware by building Markov chains of behavioral models (extended version). ACM Trans. Priv. Secur. 22(2) (2019). https://doi.org/10.1145/3313391
Fereidooni, H., Conti, M., Yao, D., Sperduti, A.: Anastasia: Android malware detection using static analysis of applications. In: 2016 8th IFIP International Conference on New Technologies, Mobility and Security (NTMS), pp. 1–5 (2016)
Cai, H., Meng, N., Ryder, B., Yao, D.: Droidcat: effective android malware detection and categorization via app-level profiling. IEEE Trans. Inf. Forensics Secur. 14(6), 1455–1470 (2019)
Wu, W.-C., Hung, S.-H.: DroidDolphin: a dynamic android malware detection framework using big data and machine learning. In: Proceedings of the 2014 Conference on Research in Adaptive and Convergent Systems, ser. RACS 2014, pp. 247–252. Association for Computing Machinery, New York (2014). https://doi.org/10.1145/2663761.2664223
Martinelli, F., Mercaldo, F., Saracino, A.: Bridemaid: an hybrid tool for accurate detection of android malware. In: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, ser. ASIA CCS 2017, pp. 899–901. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3052973.3055156
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2818–2826 (2016)
Allix, K., Bissyandé, T.F., Klein, J., Le Traon, Y.: AndroZoo: collecting millions of Android apps for the research community. In: Proceedings of the 13th International Conference on Mining Software Repositories, ser. MSR 2016, pp. 468–471. ACM, New York (2016). http://doi.acm.org/10.1145/2901739.2903508
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015). https://doi.org/10.1038/nature14539
Yamashita, R., Nishio, M., Do, R.K.G., Togashi, K.: Convolutional neural networks: an overview and application in radiology. Insights Imaging 9(4), 611–629 (2018). https://doi.org/10.1007/s13244-018-0639-9
Zhiqiang, W., Jun, L.: A review of object detection based on convolutional neural network. In: 2017 36th Chinese Control Conference (CCC), pp. 11 104–11 109 (2017)
Aloysius, N., Geetha, M.: A review on deep convolutional neural networks. In: 2017 International Conference on Communication and Signal Processing (ICCSP), pp. 0588–0592 (2017)
Ke, Q., Liu, J., Bennamoun, M., An, S., Sohel, F., Boussaid, F.: Computer vision for human-machine interaction. In: Computer Vision for Assistive Healthcare, pp. 127–145. Elsevier (2018)
Yu, D., Wang, H., Chen, P., Wei, Z.: Mixed pooling for convolutional neural networks. In: Miao, D., Pedrycz, W., Ślȩzak, D., Peters, G., Hu, Q., Wang, R. (eds.) RSKT 2014. LNCS (LNAI), vol. 8818, pp. 364–375. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11740-9_34
Aonzo, S., Georgiu, G.C., Verderame, L., Merlo, A.: Obfuscapk: an open-source black-box obfuscation tool for Android apps. SoftwareX 11, 100403 (2020). http://www.sciencedirect.com/science/article/pii/S2352711019302791
Raschka, S.: Model evaluation, model selection, and algorithm selection in machine learning, arXiv preprint arXiv:1811.12808 (2018)
Daoudi, N., Allix, K., Bissyandé, T.F., Klein, J.: Lessons learnt on reproducibility in machine learning based Android malware detection. Empir. Softw. Eng. 26(4), 1–53 (2021). https://doi.org/10.1007/s10664-021-09955-7
Huang, T.H., Kao, H.: R2-D2: color-inspired convolutional neural network (CNN)-based Android malware detections. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 2633–2642 (2018)
Ding, Y., Zhang, X., Hu, J., Xu, W.: Android malware detection method based on bytecode image. J. Ambient Intell. Human. Comput., 1–10 (2020). https://link.springer.com/article/10.1007%2Fs12652-020-02196-4
Pendlebury, F., Pierazzi, F., Jordaney, R., Kinder, J., Cavallaro, L.: TESSERACT: eliminating experimental bias in malware classification across space and time. In: 28th USENIX Security Symposium (USENIX Security 19), pp. 729–746. USENIX Association, Santa Clara, August 2019. https://www.usenix.org/conference/usenixsecurity19/presentation/pendlebury
Xu, K., Li, Y., Deng, R., Chen, K., Xu, J.: DroidEvolver: self-evolving android malware detection system. In: 2019 IEEE European Symposium on Security and Privacy (EuroS P), pp. 47–62 (2019)
Zhang, X., et al.: Enhancing state-of-the-art classifiers with API semantics to detect evolved android malware. In: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS 2020, pp. 757–770. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3372297.3417291
Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should i trust you?”: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD 2016, pp. 1135–1144. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2939672.2939778
Guo, W., Mu, D., Xu, J., Su, P., Wang, G., Xing, X.: LEMNA: explaining deep learning based security applications. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS 2018, pp. 364–379. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3243734.3243792
Palumbo, P., Sayfullina, L., Komashinskiy, D., Eirola, E., Karhunen, J.: A pragmatic Android malware detection procedure. Comput. Secur. 70, 689–701 (2017)
Liu, K., Xu, S., Xu, G., Zhang, M., Sun, D., Liu, H.: A review of android malware detection approaches based on machine learning. IEEE Access 8, 124 579–124 607 (2020)
Sharma, T., Rattan, D.: Malicious application detection in Android - a systematic literature review. Comput. Sci. Rev. 40, 100373 (2021). https://www.sciencedirect.com/science/article/pii/S1574013721000137
Wu, D., Mao, C., Wei, T., Lee, H., Wu, K.: DroidMat: Android malware detection through manifest and API calls tracing. In: 2012 Seventh Asia Joint Conference on Information Security, pp. 62–69 (2012)
Burguera, I., Zurutuza, U., Nadjm-Tehrani, S.: Crowdroid: behavior-based malware detection system for Android. In: Proceedings of the 1st ACM Workshop on Security and Privacy in Smartphones and Mobile Devices, ser. SPSM 2011, pp. 15–26. Association for Computing Machinery, New York (2011). https://doi.org/10.1145/2046614.2046619
Kouliaridis, V., Kambourakis, G., Geneiatakis, D., Potha, N.: Two anatomists are better than one-dual-level android malware detection. Symmetry 12(7), 1128 (2020)
Arshad, S., Shah, M.A., Wahid, A., Mehmood, A., Song, H., Yu, H.: SAMADroid: a novel 3-level hybrid malware detection model for Android operating system. IEEE Access 6, 4321–4339 (2018)
Wang, Z., Cai, J., Cheng, S., Li, W.: DroidDeepLearner: identifying android malware using deep learning. In: 2016 IEEE 37th Sarnoff Symposium, pp. 160–165 (2016)
Qiu, J., Zhang, J., Luo, W., Pan, L., Nepal, S., Xiang, Y.: A survey of Android malware detection with deep neural models. ACM Comput. Surv. 53(6) (2020). https://doi.org/10.1145/3417978
Karbab, E.B., Debbabi, M., Derhab, A., Mouheb, D.: MalDozer: automatic framework for Android malware detection using deep learning. Digit. Investig. 24, S48–S59 (2018)
Kim, T., Kang, B., Rho, M., Sezer, S., Im, E.G.: A multimodal deep learning method for android malware detection using various features. IEEE Trans. Inf. Forensics Secur. 14(3), 773–788 (2018)
Yuan, Z., Lu, Y., Xue, Y.: Droiddetector: android malware characterization and detection using deep learning. Tsinghua Sci. Technol. 21(1), 114–123 (2016)
Alzaylaee, M.K., Yerima, S.Y., Sezer, S.: DL-Droid: deep learning based Android malware detection using real devices. Comput. Secur. 89, 101663 (2020)
Hou, S., Saas, A., Chen, L., Ye, Y.: Deep4MalDroid: a deep learning framework for Android malware detection based on Linux kernel system call graphs. In: 2016 IEEE/WIC/ACM International Conference on Web Intelligence Workshops (WIW), pp. 104–111 (2016)
Wang, W., Zhao, M., Wang, J.: Effective android malware detection with a hybrid model based on deep autoencoder and convolutional neural network. J. Ambient. Intell. Human. Comput. 10(8), 3035–3043 (2018). https://doi.org/10.1007/s12652-018-0803-6
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 26, pp. 3111–3119. Curran Associates Inc. (2013). https://proceedings.neurips.cc/paper/2013/file/9aa42b31882ec039965f3c4923ce901b-Paper.pdf
McLaughlin, N., et al.: Deep android malware detection. In: CODASPY 2017 - Proceedings of the 7th ACM Conference on Data and Application Security and Privacy, ser. CODASPY 2017 - Proceedings of the 7th ACM Conference on Data and Application Security and Privacy, pp. 301–308. Association for Computing Machinery Inc., March 2017. Funding Information: This work was partially supported by the grants from Global Research Laboratory Project through National Research Foundation (NRF-2014K1A1A2043029) and the Center for Cybersecurity and Digital Forensics at Arizona State University. This work was also partially supported by Engineering and Physical Sciences Research Council (EPSRC) grant EP/N508664/1.; 7th ACM Conference on Data and Application Security and Privacy, CODASPY 2017; Conference date: 22–03-2017 Through 24–03-2017
Nataraj, L., Karthikeyan, S., Jacob, G., Manjunath, B.S.: Malware images: visualization and automatic classification. In: Proceedings of the 8th International Symposium on Visualization for Cyber Security, pp. 1–7 (2011)
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vision 42(3), 145–175 (2001). https://doi.org/10.1023/A:1011139631724
Darus, F.M., Salleh, N.A.A., Mohd Ariffin, A.F.: Android malware detection using machine learning on image patterns. In: 2018 Cyber Resilience Conference (CRC), pp. 1–2 (2018)
Yadav, B., Tokekar, S.: Deep learning in malware identification and classification. In: Stamp, M., Alazab, M., Shalaginov, A. (eds.) Malware Analysis Using Artificial Intelligence and Deep Learning, pp. 163–205. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-62582-5_6
Ünver, H.M., Bakour, K.: Android malware detection based on image-based features and machine learning techniques. SN Appl. Sci. 2(7) (2020). https://doi.org/10.1007/s42452-020-3132-2
Mercaldo, F., Santone, A.: Deep learning for image-based mobile malware detection. J. Comput. Virol. Hacking Tech. 16(2), 157–171 (2020). https://doi.org/10.1007/s11416-019-00346-7
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Daoudi, N., Samhi, J., Kabore, A.K., Allix, K., Bissyandé, T.F., Klein, J. (2021). DexRay: A Simple, yet Effective Deep Learning Approach to Android Malware Detection Based on Image Representation of Bytecode. In: Wang, G., Ciptadi, A., Ahmadzadeh, A. (eds) Deployable Machine Learning for Security Defense. MLHat 2021. Communications in Computer and Information Science, vol 1482. Springer, Cham. https://doi.org/10.1007/978-3-030-87839-9_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-87839-9_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87838-2
Online ISBN: 978-3-030-87839-9
eBook Packages: Computer ScienceComputer Science (R0)