Application Marketplace Malware Detection by User Feedback Analysis

  • Tal Hadad
  • Rami Puzis
  • Bronislav Sidik
  • Nir Ofek
  • Lior Rokach
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 867)


Smartphones are becoming increasingly ubiquitous. Like recommended best practices for personal computers, users are encouraged to install antivirus and intrusion detection software on their mobile devices. However, even with such software these devises are far from being fully protected. Given that application stores are the source of most applications, malware detection on these platforms is an important issue. Based on our intuition, which suggests that an application’s suspicious behavior will be noticed by some users and influence their feedback, we present an approach for analyzing user reviews in mobile application stores for the purpose of detecting malicious apps. The proposed method transfers an application’s text reviews to numerical features in two main steps: (1) extract domain-phrases based on external domain-specific textual corpus on computer and network security, and (2) compute three statistical features based on domain-phrases occurrences. We evaluated the proposed methods on 2,506 applications along with their 128,863 reviews collected from “Amazon AppStore”. The results show that proposed method yields an AUC of 86% in the detection of malicious applications.


Mobile malware Malware detection User feedback analysis Text mining Review mining 


  1. 1.
    Statista: Number of smartphones sold to end users worldwide from 2007 to 2016 (2017). Accessed Aug 2017
  2. 2.
    Statista: Number of available apps in the Apple App Store from July 2008 to January 2017 (2017). Accessed Aug 2017
  3. 3.
    Statista: Number of available applications in the Google Play Store from December 2009 to June 2017 (2017). Accessed Aug 2017
  4. 4.
    Statista: Number of available apps in the Amazon Appstore from March 2011 to April 2016 (2017). Accessed Aug 2017
  5. 5.
    Check Point: Viking horde: a new type of android malware on Google Play (2017). Accessed Aug 2017
  6. 6.
    Check Point: Dresscode android malware discovered on Google Play (2017). Accessed Aug 2017
  7. 7.
    Wang, K., Stolfo, S.J.: Anomalous payload-based network intrusion detection. In: Jonsson, E., Valdes, A., Almgren, M. (eds.) RAID 2004. LNCS, vol. 3224, pp. 203–222. Springer, Heidelberg (2004). Scholar
  8. 8.
    Blair-Goldensohn, S., Hannan, K., McDonald, R., Neylon, T., Reis, G.A., Reynar, J.: Building a sentiment summarizer for local service reviews. In: WWW Workshop on NLP in the Information Explosion Era, vol. 14, pp. 339–348 (2008)Google Scholar
  9. 9.
    Portier, K., Greer, G.E., Rokach, L., Ofek, N., Wang, Y., Biyani, P., Yu, M., Banerjee, S., Zhao, K., Mitra, P., et al.: Understanding topics and sentiment in an online cancer survivor community. JNCI Monogr. 47, 195–198 (2013)CrossRefGoogle Scholar
  10. 10.
    Hadad, T., Sidik, B., Ofek, N., Puzis, R., Rokach, L.: User feedback analysis for mobile malware detection. In: ICISSP, pp. 83–94 (2017)Google Scholar
  11. 11.
    Xie, L., Zhang, X., Seifert, J.P., Zhu, S.: pBMDS: a behavior-based malware detection system for cellphone devices. In: Proceedings of the Third ACM Conference on Wireless Network Security, pp. 37–48. ACM (2010)Google Scholar
  12. 12.
    Burguera, I., Zurutuza, U., Nadjm-Tehrani, S.: Crowdroid: behavior-based malware detection system for Android. In: Proceedings of the 1st ACM Workshop on Security and Privacy in Smartphones and Mobile Devices, pp. 15–26. ACM (2011)Google Scholar
  13. 13.
    Shabtai, A., Tenenboim-Chekina, L., Mimran, D., Rokach, L., Shapira, B., Elovici, Y.: Mobile malware detection through analysis of deviations in application network behavior. Comput. Secur. 43, 1–18 (2014)CrossRefGoogle Scholar
  14. 14.
    Aung, Z., Zaw, W.: Permission-based Android malware detection. Int. J. Sci. Technol. Res. 2, 228–234 (2013)Google Scholar
  15. 15.
    Zhang, Y., Yang, M., Xu, B., Yang, Z., Gu, G., Ning, P., Wang, X.S., Zang, B.: Vetting undesirable behaviors in Android apps with permission use analysis. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer and Communications Security, pp. 611–622. ACM (2013)Google Scholar
  16. 16.
    Rastogi, V., Chen, Y., Jiang, X.: Droidchameleon: evaluating android anti-malware against transformation attacks. In: Proceedings of the 8th ACM SIGSAC Symposium on Information, Computer and Communications Security, pp. 329–334. ACM (2013)Google Scholar
  17. 17.
    Yang, Z., Yang, M., Zhang, Y., Gu, G., Ning, P., Wang, X.S.: Appintent: analyzing sensitive data transmission in Android for privacy leakage detection. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer and Communications Security, pp. 1043–1054. ACM (2013)Google Scholar
  18. 18.
    Zheng, M., Sun, M., Lui, J.C.: Droid analytics: a signature based analytic system to collect, extract, analyze and associate Android malware. In: 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, pp. 163–171. IEEE (2013)Google Scholar
  19. 19.
    Moser, A., Kruegel, C., Kirda, E.: Limits of static analysis for malware detection. In: 2007 Twenty-Third Annual Computer Security Applications Conference, ACSAC 2007, pp. 421–430. IEEE (2007)Google Scholar
  20. 20.
    Ofek, N., Rokach, L., Mitra, P.: Methodology for connecting nouns to their modifying adjectives. In: Gelbukh, A. (ed.) CICLing 2014. LNCS, vol. 8403, pp. 271–284. Springer, Heidelberg (2014). Scholar
  21. 21.
    Katz, G., Ofek, N., Shapira, B.: Consent: context-based sentiment analysis. Knowl.-Based Syst. 84, 162–178 (2015)CrossRefGoogle Scholar
  22. 22.
    Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 168–177. ACM (2004)Google Scholar
  23. 23.
    Ofek, N., Shabtai, A.: Dynamic latent expertise mining in social networks. IEEE Internet Comput. 18, 20–27 (2014)CrossRefGoogle Scholar
  24. 24.
    Choi, Y., Cardie, C.: Learning with compositional semantics as structural inference for subsentential sentiment analysis. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 793–801. Association for Computational Linguistics (2008)Google Scholar
  25. 25.
    Balahur, A., Steinberger, R., Kabadjov, M., Zavarella, V., Van Der Goot, E., Halkia, M., Pouliquen, B., Belyaeva, J.: Sentiment analysis in the news. arXiv preprint arXiv:1309.6202 (2013)
  26. 26.
    Ye, Q., Zhang, Z., Law, R.: Sentiment classification of online reviews to travel destinations by supervised machine learning approaches. Expert Syst. Appl. 36, 6527–6535 (2009)CrossRefGoogle Scholar
  27. 27.
    Mullen, T., Collier, N.: Sentiment analysis using support vector machines with diverse information sources. In: EMNLP, vol. 4, pp. 412–418 (2004)Google Scholar
  28. 28.
    Ofek, N., Katz, G., Shapira, B., Bar-Zev, Y.: Sentiment analysis in transcribed utterances. In: Cao, T., Lim, E.-P., Zhou, Z.-H., Ho, T.-B., Cheung, D., Motoda, H. (eds.) PAKDD 2015. LNCS (LNAI), vol. 9078, pp. 27–38. Springer, Cham (2015). Scholar
  29. 29.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  30. 30.
    Baron, N.S.: Language of the internet. In: The Stanford Handbook for Language Engineers, pp. 59–127 (2003)Google Scholar
  31. 31.
    Onix: Onix text retrieval toolkit (2016). Accessed Apr 2016
  32. 32.
    Twitter: Twitter dictionary: a guide to understanding twitter lingo (2016). Accessed Apr 2016
  33. 33.
    Porter, M.F.: An algorithm for suffix stripping. Program 14, 130–137 (1980)CrossRefGoogle Scholar
  34. 34.
    Dunham, K.: Mobile Malware Attacks and Defense. Syngress, Maryland Heights (2008)Google Scholar
  35. 35.
    Syngress, E.O., O’Farrell, N.: Hackproofing Your Wireless Network (2002)Google Scholar
  36. 36.
    Shostack, A.: Threat Modeling: Designing for Ssecurity. Wiley, Hoboken (2014)Google Scholar
  37. 37.
    Nayak, U., Rao, U.H.: The InfoSec Handbook: An Introduction to Information Security. Apress, New York City (2014)Google Scholar
  38. 38.
    Bosworth, S., Kabay, M.E.: Computer Security Handbook. Wiley, Hoboken (2002)Google Scholar
  39. 39.
    Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the ACL-2002 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86. Association for Computational Linguistics (2002)Google Scholar
  40. 40.
    Dave, K., Lawrence, S., Pennock, D.M.: Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of the 12th International Conference on World Wide Web, pp. 519–528. ACM (2003)Google Scholar
  41. 41.
    De Marneffe, M.C., MacCartney, B., Manning, C.D., et al.: Generating typed dependency parses from phrase structure parses. In: Proceedings of LREC, vol. 6, pp. 449–454 (2006)Google Scholar
  42. 42.
    Ranveer, S., Hiray, S.: Comparative analysis of feature extraction methods of malware detection. Int. J. Comput. Appl. 120, 1–7 (2015)Google Scholar
  43. 43.
    VirusTotal: Virustotal, a free online service that analyzes files and urls enabling the identification of viruses, worms, trojans and other kinds of malicious content. Accessed Aug 2017
  44. 44.
    Šrndic, N., Laskov, P.: Detection of malicious pdf files based on hierarchical document structure. In: Proceedings of the 20th Annual Network & Distributed System Security Symposium (2013)Google Scholar
  45. 45.
    Nissim, N., Cohen, A., Moskovitch, R., Shabtai, A., Edry, M., Bar-Ad, O., Elovici, Y.: ALPD: Active learning framework for enhancing the detection of malicious pdf files. In: 2014 IEEE Joint Conference on Intelligence and Security Informatics Conference (JISIC), pp. 91–98. IEEE (2014)Google Scholar
  46. 46.
    Quinlan, J.R.: C4. 5: Programming for Machine Learning, p. 38. Morgan Kauffmann, Burlington (1993)Google Scholar
  47. 47.
    Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20, 832–844 (1998)CrossRefGoogle Scholar
  48. 48.
    Walker, S.H., Duncan, D.B.: Estimation of the probability of an event as a function of several independent variables. Biometrika 54, 167–179 (1967)MathSciNetCrossRefGoogle Scholar
  49. 49.
    Harris, Z.S.: Distributional structure. Word 10, 146–162 (1954)CrossRefGoogle Scholar
  50. 50.
    Amazon: Amazon appstore (2017). Accessed Aug 2017
  51. 51.
    WEKA. Accessed Aug 2017
  52. 52.
    Bishop, C.M.: Pattern recognition. Mach. Learn. 128, 1–58 (2006)Google Scholar
  53. 53.
    Kohavi, R., et al.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI, vol. 14, pp. 1137–1145 (1995)Google Scholar
  54. 54.
    Singh, Y., Kaur, A., Malhotra, R.: Comparative analysis of regression and machine learning methods for predicting fault proneness models. Int. J. Comput. Appl. Technol. 35, 183–193 (2009)CrossRefGoogle Scholar
  55. 55.
    Oommen, T., Baise, L.G., Vogel, R.M.: Sampling bias and class imbalance in maximum-likelihood logistic regression. Math. Geosci. 43, 99–120 (2011)CrossRefGoogle Scholar
  56. 56.
    Hong, L., Davison, B.D.: Empirical study of topic modeling in Twitter. In: Proceedings of the First Workshop on Social Media Analytics, pp. 80–88. ACM (2010)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Software and Information Systems EngineeringBen-Gurion University of the NegevBeershebaIsrael

Personalised recommendations