Knowledge and Information Systems

, Volume 49, Issue 3, pp 795–833 | Cite as

ALDROID: efficient update of Android anti-virus software using designated active learning methods

  • Nir NissimEmail author
  • Robert Moskovitch
  • Oren BarAd
  • Lior Rokach
  • Yuval Elovici
Regular Paper


Many new unknown malwares aimed at compromising smartphones are created constantly. These widely used smartphones are very dependent on anti-virus solutions due to their limited resources. To update the anti-virus signature repository, anti-virus vendors must deal with vast quantities of new applications daily in order to identify new unknown malwares. Machine learning algorithms have been used to address this task, yet they must also be efficiently updated on a daily basis. To improve detection and updatability, we introduce a new framework, “ALDROID” and active learning (AL) methods on which ALDROID is based. Our methods are aimed at selecting only new informative applications (benign and especially malicious), thus reducing the labeling efforts of security experts, and enable a frequent and efficient process of enhancing the framework’s detection model and Android’s anti-virus software. Results indicate that our AL methods outperformed other solutions including the existing AL method and heuristic engine. Our AL methods acquired the largest number and percentage of new malwares, while preserving the detection models’ detection capabilities (high TPR and low FPR rates). Specifically, our methods acquired more than double the amount of new malwares acquired by the heuristic engine and 6.5 times more malwares than the existing AL method.


Detection Acquisition Malware Android Active learning Anti-virus Application 



This research was partly supported by the National Cyber Bureau of the Israeli Ministry of Science, Technology and Space.


  1. 1.
    Abou-Assaleh T, Cercone N, Keselj V, Sweidan R (2004) N-gram based detection of new malicious code. In: Proceedings of the 28th annual international computer software and applications conference (COMPSAC’04)Google Scholar
  2. 2.
    Andriatsimandefitra R, Geller S, Tong VVT (2012) Designing information flow policies for Android’s operating system. In: 2012 IEEE International conference on communications (ICC), 10–15 June 2012, pp 976–981Google Scholar
  3. 3.
    Angluin D (1988) Queries and concept learning. Mach Learn 2:319–342MathSciNetGoogle Scholar
  4. 4.
    Apvrille A, Strazzere T (2012) Reducing the window of opportunity for Android malware Gotta catch ’em all. J Comput Virol 8(1–2):61–71CrossRefGoogle Scholar
  5. 5.
    Baram Y, El-Yaniv R, Luz K (2004) Online choice of active learning algorithms. J Mach Learn Res 5:255–291MathSciNetGoogle Scholar
  6. 6.
    Batyuk L, Herpich M, Camtepe SA, Raddatz K, Schmidt A, Albayrak S (2011) Using static analysis for automatic assessment and mitigation of unwanted and malicious activities within Android applications. In: 2011 6th International conference on malicious and unwanted software (MALWARE), 18–19 October 2009, pp 66–72Google Scholar
  7. 7.
    Bläsing T, Batyuk L, Schmidt A-D, Camtepe SA, Albayrak S (2010) An Android Application Sandbox system for suspicious software detection. In: 2010 5th International conference on malicious and unwanted software (MALWARE), 19–20 October 2010, pp 55–62Google Scholar
  8. 8.
    Bulygin Y (2007) Epidemics of mobile worms. In: Proceedings of the 26th IEEE international performance computing and communications conference, IPCCC 2007, April 11–13, 2007, New Orleans, Louisiana, USA. IEEE Computer Society, pp 475–478Google Scholar
  9. 9.
    Burges CJC (1988) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2:121–167CrossRefGoogle Scholar
  10. 10.
    Chang CC, Lin C-J (2001) LIBSVM: a library for support vector machines.
  11. 11.
    Dagon D, Martin T, Starner T (2004) Mobile phones as computing devices: the viruses are coming! IEEE Pervasive Comput 3(4):11–15Google Scholar
  12. 12.
    Desnos A (2013) (Visited June 2013) Androguard reverse engineering tool.
  13. 13.
    Fawcett T (2003) ROC graphs: notes and practical considerations for researchers. Technical report, HPL-2003-4, HP LaboratoriesGoogle Scholar
  14. 14.
    Ham H-S, Choi M-J (2013) Analysis of Android malware detection performance using machine learning classifiers. In: 2013 International conference on ICT Convergence (ICTC). IEEEGoogle Scholar
  15. 15.
    Hand DJ, Till RJ (2001) A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach Learn 45:171–186zbMATHCrossRefGoogle Scholar
  16. 16.
    Henchiri O, Japkowicz N (2006) A feature selection and evaluation scheme for computer virus detection. In: Proceedings of ICDM-2006, Hong Kong, pp 891–895Google Scholar
  17. 17.
    Herbrich R, Graepel T, Campbell C (2001) Bayes point machines. J Mach Learn Res 1:245–279. doi: 10.1162/153244301753683717 MathSciNetzbMATHGoogle Scholar
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
  31. 31.
  32. 32.
  33. 33.
    Ikinci A, Holz T, Freiling FC (2008) Monkey-spider: detecting malicious websites with low-interaction honeyclients. In: Sicherheit, pp 407–421Google Scholar
  34. 34.
    Kiem H, Thuy NT, Quang TMN (2004) A machine learning approach to anti-virus system (2004) Joint workshop of Vietnamese society of AI, SIGKBS-JSAI, ICS-IPSJ and IEICE-SIGAI on active mining, Hanoi-Vietnam, 4–7 December 2004, pp 61–65Google Scholar
  35. 35.
    Kolter JZ, Maloof MA (2004) Learning to detect malicious executables in the wild. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, NY, pp 470–478Google Scholar
  36. 36.
    Kolter J, Maloof M (2006) Learning to detect and classify malicious executables in the wild. J Mach Learn Res 7:2721–2744MathSciNetzbMATHGoogle Scholar
  37. 37.
    Leavitt N (2005) Mobile phones: the next frontier for hackers? Computer 38(4):20–23CrossRefGoogle Scholar
  38. 38.
    Lewis D, Gale W (1994) A sequential algorithm for training text classifiers. In: Proceedings of the seventeenth annual international ACM-SIGIR conference on research and development in information retrieval. Springer, pp 3–12Google Scholar
  39. 39.
    Lin Y-D, Lai Y-C, Chen C-H, Tsai H-C (2013) Identifying android malicious repackaged applications by thread-grained system call sequences. Comput Secur 39(Part B):340–350. doi: 10.1016/j.cose.2013.08.010 CrossRefGoogle Scholar
  40. 40.
    Luoshi Z, Yan N, Xiao W, Zhaoguo W, Yibo X (2013) A3: automatic analysis of Android malware. In: 1st International workshop on cloud computing and information security. Atlantis Press, November 2013Google Scholar
  41. 41.
    Mansfield-Devine S (2012) Android malware and mitigations. Netw Secur 2012(11):12–20. doi: 10.1016/S1353-4858(12)70104-6 CrossRefGoogle Scholar
  42. 42.
    Masud M, Khan L, Thuraisingham B (2007) Feature based techniques for auto-detection of novel email worms. In: Advances in knowledge discovery and data miningGoogle Scholar
  43. 43.
    Mokube I, Adams M (2007) Honeypots: concepts, approaches, and challenges. In: ACMSE 2007. Winston-Salem, North Carolina, USA, 23–24 March, pp 321–325Google Scholar
  44. 44.
    Moskovitch R, Stopel D, Feher C, Nissim N, Japkowicz N, Elovici Y (2009) Unknown malcode detection and the imbalance problem. J Comput Virol 5(4):295–308CrossRefGoogle Scholar
  45. 45.
    Moskovitch R, Nissim N, Elovici Y (2008) Acquisition of malicious code using active learning. In; 2nd ACM SIGKDD international workshop on privacy, security, and trust in KDD, PinKDD08, Springer, Lectures Notes in Computer Sciences, Las Vegas, USA, vol 5456, 25 August 2008, pp 74–91Google Scholar
  46. 46.
    Moskovitch R, Nissim N, Elovici Y (2009) Malicious code detection using active learning. In: Privacy, security, and trust in KDD, pp 74–91Google Scholar
  47. 47.
    Moskovitch R, Nissim N, Englert R, Elovici Y (2008) Detection of unknown computer worms activity using active learning. In: The 11th International conference on information fusion, Cologne, Germany, June 30–July 3Google Scholar
  48. 48.
    Moskovitch R, Stopel D, Feher C, Nissim N, Elovici Y (2008) Unknown malcode detection via text categorization and the imbalance problem. In: IEEE intelligence and security informatics (ISI08), TaiwanGoogle Scholar
  49. 49.
    Nissim N, Moskovitch R, Rokach L, Elovici Y (2012) Detecting unknown computer worm activity via support vector machines and active learning. Pattern Anal Appl 15:459–475MathSciNetCrossRefGoogle Scholar
  50. 50.
    Nissim N, Cohen A, Glezer C, Elovici Y (2015) Detection of malicious PDF files and directions for enhancements: a state-of-the art survey. Comput Secur 48:246–266. doi: 10.1016/j.cose.2014.10.014 CrossRefGoogle Scholar
  51. 51.
    Nissim N, Boland MR, Moskovitch R, Tatonetti NP, Elovici Y, Shahar Y, Hripcsak G (2015) An active learning framework for efficient condition severity classification. In: Artificial intelligence in medicine. Springer International Publishing (AIME-15), pp 13–24Google Scholar
  52. 52.
    Nissim N, Borland M, Moskovitch R, Tatonetti N, Elovici Y, Shahar Y, Hripcsak G (2014) An active learning enhancement for conditions severity classification. In: ACM KDD on workshop on connected health at big data era, NYC, USAGoogle Scholar
  53. 53.
    Nissim N, Cohen A, Moskovitch R, Shabtai A, Edry M, Bar-Ad O, Elovici Y (2014) ALPD: active learning framework for enhancing the detection of malicious PDF files. In: Intelligence and security informatics conference (JISIC), 2014 IEEE joint. IEEE, September 2014, pp 91–98Google Scholar
  54. 54.
    Nissim N, Moskovitch R, Rokach L, Elovici Y (2014) Novel active learning methods for enhanced PC malware detection in windows OS. Expert Syst Appl 41(13):5843–5857Google Scholar
  55. 55.
    Oberheide J, Cooke E, Jahanian F (2008) Cloudav: N-version antivirus in the network cloud. In: Proceedings of the 17th USENIX security symposium (Security’08), San Jose, CA, July 2008Google Scholar
  56. 56.
    Oberheide J, Miller J (2012) Dissecting the android bouncer. SummerCon2012, New YorkGoogle Scholar
  57. 57.
    Peng H, Gates C, Sarma B, Li N, Qi A, Potharaju R, Nita-Rotaru C, Molloy I (2012) Using probabilistic generative models for ranking risks of Android Apps. In: Proceedings of ACM CCSGoogle Scholar
  58. 58.
    Piercy M (2004) Embedded devices next on the virus target list. IEEE Electron Syst Softw 2:42–43CrossRefGoogle Scholar
  59. 59.
    Provos N, Holz T (2008) Virtual honeypots: from botnet tracking to intrusion detection. Addison-Wesley, Upper Saddle RiverGoogle Scholar
  60. 60.
    Rastogi V, Chen Y, Jiang X (2013) DroidChameleon: evaluating Android anti-malware against transformation attacks. Proceedings of the 8th ACM SIGSAC symposium on Information, computer and communications security. ACMGoogle Scholar
  61. 61.
    Sanz B, Santos I, Galán-García P, Laorden C, Ugarte-Pedrero X, Bringas PG, Alvarez PUMA G (2012) Permission Usage to detect Malware in Android. In: Proceedings of the 5th international conference on computational intelligence in security for information systems (CISIS). Ostrava (Czech Republic), 5–7 September 2012Google Scholar
  62. 62.
    Sanz B, Santos I, Laorden C, Ugarte-Pedrero X, Bringas PG (2012) On the automatic categorisation of android applications. In: 2012 IEEE Consumer communications and networking conference (CCNC), 14–17 January, pp 149–153Google Scholar
  63. 63.
    Sarma B, Li N, Gates C, Potharaju R, Nita-Rotaru C (2012) Android permissions: a perspective combining risks and benefits. In: Proceedings of SACMATGoogle Scholar
  64. 64.
    Schmidt A-D, Bye R, Schmidt H-G, Clausen J, Kiraz O, Yuksel KA, Camtepe SA, Albayrak S (2009) Static analysis of executables for collaborative malware detection on Android. In: IEEE international conference on communications, 2009 ICC’09, 14–18 June 2009, pp 1–5Google Scholar
  65. 65.
    Schmidt A-D, Schmidt H-G, Batyuk L, Clausen JH, Camtepe SA, Albayrak S, Yildizli C (2009) Smartphone malware evolution revisited: android next target? In: 2009 4th International conference on malicious and unwanted software (MALWARE), 13–14 October 2009, pp 1–7Google Scholar
  66. 66.
    Schultz M, Eskin E, Zadok E, Stolfo S (2001) Data mining methods for detection of new malicious executables. In: Proceedings of the IEEE symposium on security and privacy, pp 178–184Google Scholar
  67. 67.
    Seo S-H, Gupta A, Sallam AM, Bertino E, Yim K (2014) Detecting mobile malware threats to homeland security through static analysis. J Netw Comput Appl 38:43–53. doi: 10.1016/j.jnca.2013.05.008 CrossRefGoogle Scholar
  68. 68.
    Shabtai A, Fledel Y, Kanonov U, Elovici Y, Dolev S, Glezer C (2010) Google android: a comprehensive security assessment. IEEE Secur Priv 2:35–44CrossRefGoogle Scholar
  69. 69.
    Shabtai A, Fledel Y, Elovici Y (2010) Automated static code analysis for classifying Android applications using machine learning. In: 2010 International conference on computational intelligence and security (CIS), 11–14 December 2010, pp 329–333Google Scholar
  70. 70.
    Shabtai A, Tenenboim-Chekina L, Mimran D, Rokach L, Shapira, B, Elovici Y (2014) Mobile malware detection through analysis of deviations in application network behavior. Comput Secur 43:1–18Google Scholar
  71. 71.
    Shih DH, Lin B, Chiang HS, Shih MH (2008) Security aspects of mobile phone virus: a critical survey. Ind Manag Data Syst 108(4):478–494CrossRefGoogle Scholar
  72. 72.
    Suarez-Tangil G et al (2014) Dendroid: a text mining approach to analyzing and classifying code structures in Android malware families. Expert Syst Appl 41(4):1104–1117CrossRefGoogle Scholar
  73. 73.
    Tahan G, Rokach L, Shahar Y (2012) Mal-ID: automatic malware detection using common segment analysis and meta-features. J Mach Learn Res 13:949–979MathSciNetzbMATHGoogle Scholar
  74. 74.
    Tong S, Koller D (2000–2001) Support vector machine active learning with applications to text classification. J Mach Learn Res 2:45–66Google Scholar
  75. 75.
    Wang X, Yu W, Champion A, Fu X, Xuan D (2007) Worms via mining dynamic program execution. In: Security and privacy in communications networks and the workshops, 2007. SecureComm 2007. Third international conference security and privacy in communication networks and the workshops, SecureComm, pp 412–421Google Scholar
  76. 76.
    Yu ZHU, Wang X-C, Shen H-B (2008) Detection method of computer worms based on SVM. Mech Electr Eng Mag 8Google Scholar
  77. 77.
    Zhang Y et al (2013) Vetting undesirable behaviors in android apps with permission use analysis. In: Proceedings of the 2013 ACM SIGSAC conference on computer & communications security. ACMGoogle Scholar
  78. 78.
    Zhao M, Zhang T, Ge F, Yuan Z (2012) RobotDroid: a lightweight malware detection framework on smartphones. JNW 7(4):715–722CrossRefGoogle Scholar
  79. 79.
    Zhao W, Long J, Yin J, Cai Z, Xia G-M (2012) Sampling attack against active learning in adversarial environment. In: MDAI, pp 222–233Google Scholar
  80. 80.
    Zheng M, Lee PPC, Lui JCS (2013) Adam: an automatic and extensible platform to stress test android anti-virus systems. In: Detection of intrusions and malware, and vulnerability assessment. Springer, Berlin, Heidelberg, pp 82–101Google Scholar
  81. 81.
    Zhou Y et al (2012) Hey, you, get off of my market: detecting malicious apps in official and alternative android markets. In: Proceedings of the 19th annual network and distributed system security symposiumGoogle Scholar
  82. 82.
    Zhou Y, Jiang X (2012) Dissecting android malware: characterization and evolution. In: 2012 IEEE symposium on security and privacy (SP). IEEE, pp 95–109, May 2012Google Scholar
  83. 83.
    Zhou W, Zhou Y, Jiang X, Ning P (2012) Detecting repackaged smartphone applications in third-party Android marketplaces. In: Proceedings of the second ACM conference on data and application security and privacy. ACM, pp 317–326Google Scholar

Copyright information

© Springer-Verlag London 2016

Authors and Affiliations

  • Nir Nissim
    • 1
    Email author
  • Robert Moskovitch
    • 2
  • Oren BarAd
    • 1
  • Lior Rokach
    • 1
  • Yuval Elovici
    • 1
  1. 1.Ben Gurion University of the NegevBeershebaIsrael
  2. 2.Columbia UniversityNew YorkUSA

Personalised recommendations