Skip to main content
Log in

MLDroid—framework for Android malware detection using machine learning techniques

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

This research paper presents MLDroid—a web-based framework—which helps to detect malware from Android devices. Due to increase in the popularity of Android devices, malware developers develop malware on daily basis to threaten the system integrity and user’s privacy. The proposed framework detects malware from Android apps by performing its dynamic analysis. To detect malware from real-world apps, we trained our proposed framework by selecting features which are gained by implementing feature selection approaches. Further, these selected features help to build a model by considering different machine learning algorithms. Experiment was performed on 5,00,000 plus Android apps. Empirical result reveals that model developed by considering all the four distinct machine learning algorithms parallelly (i.e., deep learning algorithm, farthest first clustering, Y-MLP and nonlinear ensemble decision tree forest approach) and rough set analysis as a feature subset selection algorithm achieved the highest detection rate of 98.8% to detect malware from real-world apps.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. Virus Total used to identify the class.

  2. https://play.google.com/store?hl=en.

  3. http://www.hiapk.com/.

  4. http://www.appchina.com/.

  5. http://andrdoid.d.cn/.

  6. http://www.mumayi.com/

  7. http://www.gfan.com/.

  8. http://slideme.org/.

  9. http://android.pandaapp.com/.

  10. https://www.virustotal.com/.

  11. https://www.microsoft.com/en-in/windows/comprehensive-security.

  12. Categories of classes are identified by using Virus Total antivirus scanner.

  13. https://data.mendeley.com/datasets/9b45k4hkdf/1.

  14. https://github.com/ArvindMahindru66/Computer-and-security-dataset.

  15. https://developer.android.com/guide/topics/permissions/overview.

  16. In our study, we cannot consider all extracted features while training with supervised machine algorithms as it gives less accuracy at the time of testing.

  17. On the basis of Tables 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 and 31 and box-plot diagram, we select best machine learning algorithm to build our web-based malware detection model. Detection of malware families using MLDroid is seen in "Appendix B."

  18. In our study, we consider sample of 81 different malware families their names with samples are mentioned in "Appendix C."

References

  1. Singh KU, Gupta PK, Ghrera SP (2015) Performance evaluation of AOMDV routing algorithm with local repair for wireless mesh networks. CSI Trans ICT 2(4):253–260

    Article  Google Scholar 

  2. https://developer.android.com/guide/topics/permissions/overview

  3. https://www.gdatasoftware.com/news/2019/07/35228-mobile-malware-report-no-let-up-with-android-malware

  4. https://www.idc.com/promo/smartphone-market-share/os

  5. https://play.google.com/store?hl=en

  6. https://www.businessofapps.com/data/app-statistics/

  7. https://safeguarde.com/mobile-apps-stealing-your-information/

  8. Novakovic J (2010) The impact of feature selection on the accuracy of Naïve Bayes classifier. In: 18th Telecommunications forum TELFOR, vol 2, pp 1113–1116

  9. Plackett RL (1983) Karl Pearson and the chi-squared test. Int Stat Rev/Revue Int Stat 51(1):59–72

    Article  MathSciNet  MATH  Google Scholar 

  10. Wang W, Wang X, Feng D, Liu J, Han Z, Zhang X (2014) Exploring permission-induced risk in android applications for malicious application detection. IEEE Trans Inf Forens Secur 9(11):1869–1882

    Article  Google Scholar 

  11. Cruz C, Erika A, Ochimizu K (2009) Towards logistic regression models for predicting fault-prone code across software projects. In: Proceedings of the 2009 3rd international symposium on empirical software engineering and measurement. IEEE Computer Society, pp 460–463

  12. Hall MA (1999) Correlation-based feature selection for machine learning (Doctoral dissertation, The University of Waikato, Dept. of Computer Science)

  13. Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11(5):341–356

    Article  MATH  Google Scholar 

  14. Dash M, Liu H (2003) Consistency-based search in feature selection. Artif intell 151(1–2):155–176

    Article  MathSciNet  MATH  Google Scholar 

  15. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif intell 97(1–2):273–324

    Article  MATH  Google Scholar 

  16. Arp D, Michael S, Malte H, Hugo G, Konrad R, Siemens CERT (2014) Drebin: effective and explainable detection of android malware in your pocket. In: NDSS, vol 14, pp 23–26

  17. http://www.malgenomeproject.org/

  18. https://data.mendeley.com/datasets/dc7wytfhsm/1

  19. Cui B, Jin H, Carullo G, Liu Z (2015) Service-oriented mobile malware detection system based on mining strategies. Pervasive Mob Comput 24:101–116

    Article  Google Scholar 

  20. Enck W, Ongtang M, McDaniel P (2009) On lightweight mobile phone application certification. In: Proceedings of the 16th ACM conference on Computer and communications security. ACM, pp 235–245

  21. Narudin FA, Ali F, Nor BA, Abdullah G (2016) Evaluation of machine learning classifiers for mobile malware detection. Soft Comput 20(1):343–357

    Article  Google Scholar 

  22. Wei T-E, Mao C-H, Jeng AB, Lee H-M, Wang H-T, Wu D-J (2012) Android malware detection via a latent network behavior analysis. In: 2012 IEEE 11th international conference on trust, security and privacy in computing and communications. IEEE, pp 1251–1258

  23. El Attar A, Khatoun R, Lemercier M (2014) A Gaussian mixture model for dynamic detection of abnormal behavior in smartphone applications. In: 2014 global information infrastructure and networking symposium (GIIS). IEEE, pp 1–6

  24. Dixon B, Mishra S (2013) Power based malicious code detection techniques for smartphones. In: 2013 12th IEEE international conference on trust, security and privacy in computing and communications. IEEE, pp 142–149

  25. Suarez-Tangil G, Tapiador JE, Peris-Lopez P, Pastrana S (2015) Power-aware anomaly detection in smartphones: an analysis of on-platform versus externalized operation. Pervasive Mob Comput 18:137–151

    Article  Google Scholar 

  26. Chen PS, Lin S-C, Sun C-H (2015) Simple and effective method for detecting abnormal internet behaviors of mobile devices. Inf Sci 321:193–204

    Article  Google Scholar 

  27. Quan D, Zhai L, Yang F, Wang P (2014) Detection of android malicious apps based on the sensitive behaviors. In: 2014 IEEE 13th international conference on trust, security and privacy in computing and communications. IEEE, pp 877–883

  28. Ng DV, Hwang J-IG (2014) Hwang. Android malware detection using the dendritic cell algorithm. In: 2014 International conference on machine learning and cybernetics, vol 1. IEEE, pp 257–262

  29. Sheen S, Anitha R, Natarajan V (2015) Android based malware detection using a multifeature collaborative decision fusion approach. Neurocomputing 151:905–912

    Article  Google Scholar 

  30. Tong F, Yan Z (2017) A hybrid approach of mobile malware detection in Android. J Parallel Distrib Comput 103:22–31

    Article  Google Scholar 

  31. Fung CJ, Lam DY, Boutaba R (2014) Revmatch: an efficient and robust decision model for collaborative malware detection. In: 2014 IEEE network operations and management symposium (NOMS). IEEE, pp 1–9

  32. Shone N, Tran NN, Vu DP, Qi S (2018) A deep learning approach to network intrusion detection. IEEE Trans Emerg Top Comput Intell 2(1):41–50

    Article  Google Scholar 

  33. Shiyong Y, Jinsong B, Yiming Z, Xiaodi H (2017) M2m security technology of cps based on blockchains. Symmetry 9(9):193

    Article  Google Scholar 

  34. Abawajy J, Kelarev A (2017) Iterative classifier fusion system for the detection of Android malware. IEEE Trans Big Data 5(3):282–292

    Article  Google Scholar 

  35. Guo D-F, Sui A-F, Shi Y-J, Hu J-J, Lin G-Z, Tao G (2014) Behavior classification based self-learning mobile malware detection. JCP 9(4):851–858

    Google Scholar 

  36. Tramontana E, Verga G (2019) Mitigating privacy-related risks for android users. In: 2019 IEEE 28th international conference on enabling technologies: infrastructure for collaborative enterprises (WETICE). IEEE, pp 243–248

  37. Enck W, Gilbert P, Han S, Tendulkar V, Chun B-G, Cox LP, Jung J, McDaniel P, Sheth AN (2014) TaintDroid: an information-flow tracking system for realtime privacy monitoring on smartphones. ACM Trans Comput Syst (TOCS) 32(2):1–29

    Article  Google Scholar 

  38. Motiur RSSM, Kumar SS (2018) StackDroid: Evaluation of a Multi-level Approach for Detecting the Malware on Android Using Stacked Generalization. International Conference on Recent Trends in Image Processing and Pattern Recognition. Springer, Singapore, pp 611–623

    Google Scholar 

  39. Barrera D, Güneş KH, Van Oorschot PC, Somayaji A (2010) A methodology for empirical analysis of permission-based security models and its application to android. In: Proceedings of the 17th ACM conference on computer and communications security. ACM, pp 73–84

  40. Yang L, Ganapathy V, Iftode L (2011) Enhancing mobile malware detection with social collaboration. In: 2011 IEEE third international conference on privacy, security, risk and trust and 2011 IEEE third international conference on social computing. IEEE, pp 572–576

  41. Rahman M (2013) Droidmln: a Markov logic network approach to detect android malware. In: 2013 12th International conference on machine learning and applications, vol 2. IEEE, pp 166–169

  42. Alam MS, Vuong ST (2013) Random forest classification for detecting android malware. In: 2013 IEEE international conference on green computing and communications and IEEE Internet of Things and IEEE cyber, physical and social computing. IEEE, pp 663–669

  43. Amos B, Turner H, White J (2013) Applying machine learning classifiers to dynamic android malware detection at scale. In: 2013 9th international wireless communications and mobile computing conference (IWCMC). IEEE, pp 1666–1671

  44. Shen T, Zhongyang Y, Xin Z, Mao B, Huang H (2014) Detect Android malware variants using component based topology graph. In: 2014 IEEE 13th international conference on trust, security and privacy in computing and communications. IEEE, pp 406–413

  45. Almin SB, Madhumita C (2015) A novel approach to detect android malware. Procedia Comput Sci 45:407–417

    Article  Google Scholar 

  46. Andriatsimandefitra R, Tong VVT (2015) Detection and identification of Android malware based on information flow monitoring. In: 2015 IEEE 2nd International conference on cyber security and cloud computing. IEEE, pp 200–203

  47. Caviglione L, Mauro G, Jean-Franşois L, Wojciech M, Marcin U (2015) Seeing the unseen: revealing mobile malware hidden communications via energy consumption and artificial intelligence. IEEE Trans Inf Forensics Secur 11(4):799–810

    Article  Google Scholar 

  48. Holland B, Deering T, Kothari S, Mathews J, Ranade N (2015) Security toolbox for detecting novel and sophisticated android malware. In: Proceedings of the 37th international conference on software engineering, vol 2. IEEE Press, pp 733–736

  49. Martinelli F, Mercaldo F, Saracino A (2017) Bridemaid: an hybrid tool for accurate detection of android malware. In: Proceedings of the 2017 ACM on Asia conference on computer and communications security. ACM, pp 899–901

  50. Kadir AFA, Natalia S, Ghorbani GA (2015) Android botnets: what urls are telling us. In: International conference on network and system security. Springer, Cham, pp 78–91

  51. Zhou Y, Jiang X (2012) Dissecting android malware: characterization and evolution. In: 2012 IEEE symposium on security and privacy. IEEE, pp 95–109

  52. http://202.117.54.231:8080/

  53. https://www.statista.com/statistics/271774/share-of-android-platforms-on-mobile-devices-with-android-os/

  54. Aafer Y, Du W, Yin H (2013) Droidapiminer: mining api-level features for robust malware detection in android. In: International conference on security and privacy in communication systems. Springer, Cham, pp 86–103

  55. Wu D-J, Mao C-H, Wei T-E, Lee H-M, Wu K-P (2012) Droidmat: Android malware detection through manifest and api calls tracing. In: 2012 Seventh Asia joint conference on information security. IEEE, pp 62–69

  56. Mohammadi S, Mirvaziri H, Ghazizadeh-Ahsaee M, Karimipour H (2019) Cyber intrusion detection by combined feature selection algorithm. J Inf Secur Appl 44:80–88

    Google Scholar 

  57. Shabtai A, Kanonov U, Elovici Y, Glezer C, Weiss Y (2012) “Andromaly”: a behavioral malware detection framework for android devices. J Intell Inf Syst 1:161–190

    Article  Google Scholar 

  58. Igor S, Borja S, Carlos L, Felix B, Bringas PG (2011) Opcode-sequence-based semi-supervised unknown malware detection. In: Computational intelligence in security for information systems. Springer, Berlin, Heidelberg, pp 50–57

  59. Canbek G, Baykal N, Sagiroglu S (2017) Clustering and visualization of mobile application permissions for end users and malware analysts. In: 2017 5th international symposium on digital forensic and security (ISDFS). IEEE, pp 1–10

  60. Burguera I, Zurutuza U, Nadjm-Tehrani S (2011) Crowdroid: behavior-based malware detection system for android. In: Proceedings of the 1st ACM workshop on security and privacy in smartphones and mobile devices. ACM, pp 15–26

  61. Chuang H-Y, Wang S-D (2015) Machine learning based hybrid behavior models for Android malware analysis. In: 2015 IEEE international conference on software quality, reliability and security. IEEE, pp 201–206

  62. Mahindru A, Singh P (2017) Dynamic permissions based android malware detection using machine learning techniques. In: Proceedings of the 10th innovations in software engineering conference. ACM, pp 202–210

  63. Borja S, Igor S, Carlos L, Xabier U-P, Garcia BP, Gonzalo Á (2013) Puma: permission usage to detect malware in android. In: International joint conference CISIS’12-ICEUTE 12-SOCO 12 special sessions. Springer, Berlin, Heidelberg, pp 289–298

  64. Mas’ ud MZ, Sahib S, Abdollah MF, Selamat SR, Yusof R (2014) Analysis of features selection and machine learning classifier in android malware detection. In: 2014 International conference on information science & applications (ICISA). IEEE, pp 1–5

  65. Chun-Ying H, Yi-Ting T, Chung-Han H (2013) Performance evaluation on permission-based detection for android malware. Advances in intelligent systems and applications-volume 2. Springer, Berlin, pp 111–120

    Google Scholar 

  66. Yuan Z, Lu Y, Wang Z, Xue Y (2014) Droid-sec: deep learning in android malware detection. In: ACM SIGCOMM computer communication review, vol 44, no 4. ACM, pp 371–372

  67. Wagstaff K, Cardie C, Rogers S, Schrödl S (2001) Constrained k-means clustering with background knowledge. Icml 1:577–584

    Google Scholar 

  68. Khubeb SM, Shams N (2013) Analysis of KDD CUP 99 dataset using clustering based data mining. Int J Database Theory Appl 6(5):23–34

    Article  Google Scholar 

  69. Bilenko M, Basu S, Mooney RJ (2004) Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the twenty-first international conference on Machine learning. ACM, p 11

  70. Dash M, Choi K, Scheuermann P, Liu H (2002) Feature selection for clustering-a filter solution. In: 2002 IEEE international conference on data mining, Proceedings. IEEE, pp 115–122

  71. Kriegel H-P, Peer K, Jörg S, Arthur Z (2011) Density-based clustering. Wiley Interdiscip Rev Data Min Knowl Discov 1(3):231–240

    Article  Google Scholar 

  72. Sigdel M, Dinş İ, Dinş S, Sigdel MS, Pusey ML, Aygün RS (2014) Evaluation of semi-supervised learning for classification of protein crystallization imagery. In: IEEE SOUTHEASTCON 2014. IEEE, pp 1–6

  73. Kumar L, Sanjay M, Santanu KR (2017) An empirical analysis of the effectiveness of software metrics and fault prediction model for identifying faulty classes. In: Computer standards & interfaces, vol 53, pp 1–32

  74. Kumar L, Rath SK, Sureka A (2017) Empirical analysis on effectiveness of source code metrics for predicting change-proneness. In: Proceedings of the 10th innovations in software engineering conference. ACM, pp 4–14

  75. Blair DC (1979) Information retrieval, CJ Van Rijsbergen. London: Butterworths; 1979: 208 pp. J Am Soc Inf Sci 30(6):374–375

    Article  Google Scholar 

  76. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324

    Article  MATH  Google Scholar 

  77. Faruki P, Ganmoor V, Laxmi V, Gaur MS, Bharmal A (2013) AndroSimilar: robust statistical feature signature for Android malware detection. In: Proceedings of the 6th international conference on security of information and networks. ACM, pp 152–159

  78. Lindorfer M, Neugschwandtner M, Weichselbaum L, Fratantonio Y, Van Der VV, Platzer C (2014) Andrubis–1,000,000 apps later: a view on current Android malware behaviors. In: 2014 third international workshop on building analysis datasets and gathering experience returns for security (BADGERS). IEEE, pp 3–17

  79. Xu R, Saïdi H, Anderson R (2012) Aurasium: practical policy enforcement for android applications. In: Presented as part of the 21st USENIX security symposium (USENIX Security 12), pp 539–552

  80. Kimberly T, Khan SJ, Fattori A, Cavallaro L (2015) CopperDroid: automatic reconstruction of android malware behaviors. In Ndss

  81. Portokalidis G, Homburg P, Anagnostakis K, Bos H (2010) Paranoid Android: versatile protection for smartphones. In: Proceedings of the 26th annual computer security applications conference. ACM, pp 347–356

  82. Hou S, Ye Y, Song Y, Abdulhayoglu M (2017) Hindroid: an intelligent android malware detection system based on structured heterogeneous information network. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1507–1515

  83. Billah KEM, Mourad D, Abdelouahid D, Djedjiga M (2018) MalDozer: automatic framework for android malware detection using deep learning. Digit Investig 24:S48–S59

    Article  Google Scholar 

  84. Hui-Juan Z, Tong-Hai J, Bo M, Zhu-Hong Y, Wei-Lei S, Li C (2018) HEMD: a highly efficient random forest-based malware detection framework for Android. Neural Comput Appl 30(11):3353–3361

    Article  Google Scholar 

  85. Hui-Juan Z, Zhu-Hong Y, Ze-Xuan Z, Wei-Lei S, Xing C, Li C (2018) DroidDet: effective and robust detection of android malware using static analysis along with rotation forest model. Neurocomputing 272:638–646

    Article  Google Scholar 

  86. Wei W, Mengxue Z, Jigang W (2019) Effective android malware detection with a hybrid model based on deep autoencoder and convolutional neural network. J Ambient Intell Hum Comput 10(8):3035–3043

    Article  Google Scholar 

  87. Han W, Xue J, Wang Y, Liu Z, Kong Z (2019) MalInsight: a systematic profiling based malware detection framework. J Netw Comput Appl 125:236–250

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arvind Mahindru.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

See Fig. 12.

Fig. 12
figure 12

Unsupervised machine learning algorithms

Appendix B

See Fig. 13.

Fig. 13
figure 13

Performance of MLDroid for detecting malware families

Appendix C

See Table 37.

Table 37 Top malware families used in our data set

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mahindru, A., Sangal, A.L. MLDroid—framework for Android malware detection using machine learning techniques. Neural Comput & Applic 33, 5183–5240 (2021). https://doi.org/10.1007/s00521-020-05309-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-05309-4

Keywords

Navigation