Integrating a Framework for Discovering Alternative App Stores in a Mobile App Monitoring Platform

  • Massimo Guarascio
  • Ettore Ritacco
  • Daniele Biondo
  • Rocco Mammoliti
  • Alessandra Toma
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10785)

Abstract

Nowadays, implementing brand protection strategies has become a necessity for enterprises delivering services through dedicated apps. Increasingly, malicious developers spread unauthorized (fake, malicious, obsolete or deprecated) mobile apps through alternative distribution channels and marketplaces. In this work, we propose a framework for the early detection of these alternative markets advertised through social media such as Twitter or Facebook or hosted in the Dark Web. Specifically, it combines a data modeling approach and an ensemble learning technique, allowing to recommend web pages that are likely to represent alternative marketplaces. The framework has been implemented in a prototype system called Unauthorized App Store Discovery (UASD), and integrated in a security enterprise platform for the monitoring of malicious/unauthorized mobile apps. UASD allows to analyze web pages extracted from the Web and exploits a classification model to distinguish between real app stores and similar pages (i.e. blogs, forums, etc.) which can be erroneously returned by a common search engine. An experimental evaluation on a real dataset confirms the validity of the approach in terms of accuracy.

References

  1. 1.
    Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)MathSciNetMATHGoogle Scholar
  2. 2.
    Buckland, M., Gey, F.: The relationship between recall and precision. J. Am. Soc. Inf. Sci. 45(1), 12–19 (1994)CrossRefGoogle Scholar
  3. 3.
    Costa, G., Guarascio, M., Manco, G., Ortale, R., Ritacco, E.: Rule learning with probabilistic smoothing. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 2009. LNCS, vol. 5691, pp. 428–440. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-03730-6_34 CrossRefGoogle Scholar
  4. 4.
    Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theor. 13(1), 21–27 (2006)CrossRefMATHGoogle Scholar
  5. 5.
    Hall, M.A.: Correlation-based feature selection for machine learning. Technical report (1999)Google Scholar
  6. 6.
    Jolliffe, I.T.: Principal Component Analysis, 2nd edn. Springer, New York (2002).  https://doi.org/10.1007/b98835. http://www.worldcat.org/isbn/0387954422 MATHGoogle Scholar
  7. 7.
    Jurek, A., Bi, Y., Wu, S., Nugent, C.: A survey of commonly used ensemble-based classification techniques. Knowl. Eng. Rev. 29(5), 551–581 (2014)CrossRefGoogle Scholar
  8. 8.
    Koehn, P.: Combining multiclass maximum entropy text classifiers with neural network voting. In: Ranchhod, E., Mamede, N.J. (eds.) PorTAL 2002. LNCS (LNAI), vol. 2389, pp. 125–131. Springer, Heidelberg (2002).  https://doi.org/10.1007/3-540-45433-0_19 CrossRefGoogle Scholar
  9. 9.
    Langley, P., Iba, W., Thompson, K.: An analysis of Bayesian classifiers. In: Proceedings of the Tenth National Conference on Artificial Intelligence, AAAI 1992, pp. 223–228. AAAI Press (1992)Google Scholar
  10. 10.
    Lecun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)CrossRefGoogle Scholar
  11. 11.
    Loglisci, C., Appice, A., Malerba, D.: Collective regression for handling autocorrelation of network data in a transductive setting. J. Intell. Inf. Syst. 46(3), 447–472 (2016)CrossRefGoogle Scholar
  12. 12.
    McCullagh, P., Nelder, J.A.: Generalized Linear Models, 2nd edn. Chapman & Hall, London (1989)CrossRefMATHGoogle Scholar
  13. 13.
    Phillips, S.J., Dudík, M., Schapire, R.E.: A maximum entropy approach to species distribution modeling. In: Proceedings of the Twenty-First International Conference on Machine Learning, ICML 2004, p. 83. ACM, New York (2004)Google Scholar
  14. 14.
    Powers, D.M.W.: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Technol. 2(1), 37–63 (2011)MathSciNetGoogle Scholar
  15. 15.
    Purushotham, S., Tripathy, B.K.: Evaluation of classifier models using stratified tenfold cross validation techniques. In: Krishna, P.V., Babu, M.R., Ariwa, E. (eds.) ObCom 2011. CCIS, vol. 270, pp. 680–690. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-29216-3_74 CrossRefGoogle Scholar
  16. 16.
    Rastogi, V., Chen, Y., Jiang, X.: Catch me if you can: evaluating android anti-malware against transformation attacks. Trans. Inf. Forensics Secur. 9(1), 99–108 (2014)CrossRefGoogle Scholar
  17. 17.
    van Rijsbergen, C.J.: Information Retrieval, 2nd edn. Butterworth-Heinemann, Newton (1979)MATHGoogle Scholar
  18. 18.
    Schapire, R.E.: The strength of weak learnability. Mach. Learn. 5(2), 197–227 (1990)Google Scholar
  19. 19.
    Webb, G.I., Boughton, J.R., Wang, Z.: Not so Naive Bayes: aggregating one-dependence estimators. Mach. Learn. 58(1), 5–24 (2005)CrossRefMATHGoogle Scholar
  20. 20.
    Wilson, J.M.: Brand protection 2020. Technical reports, Michigan State University (2015)Google Scholar
  21. 21.
    Wolpert, D.H.: Stacked generalization. Neural Netw. 5, 241–259 (1992)CrossRefGoogle Scholar
  22. 22.
    Zhang, H., Jiang, L., Su, J.: Hidden Naive Bayes. In: Proceedings of the 20th National Conference on Artificial Intelligence, AAAI 2005, vol. 2. AAAI Press (2005)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Massimo Guarascio
    • 1
  • Ettore Ritacco
    • 1
  • Daniele Biondo
    • 2
  • Rocco Mammoliti
    • 2
  • Alessandra Toma
    • 2
  1. 1.Institute for High Performance Computing and Networking of the Italian National Research Council (ICAR - CNR)ArcavacataItaly
  2. 2.Poste ItalianeRomeItaly

Personalised recommendations