Towards Accuracy in Similarity Analysis of Android Applications

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11281)


Android malware is most commonly delivered to a user through the many open app marketplaces. Several recent attacks have shown that the same malware infects different apps in the app market. Automated triaging by computing similarity of apps to known software components can help learn the evolution and propagation of malware. While the emphasis of existing research is on detecting repackaged apps, a similarity analysis system that can identify similar portions of code in dissimilar apps, is important. Only few public tools exist that furnish these details accurately. In this paper, we present a proof-of-concept of an analysis system that compares Android apps using a technique that combines class and method features of an app. We use a two-step process that first compute similar classes and then compute similar methods of those classes. To identify similar classes, we propose a novel set of software birthmarks. We use Normalized Compression Distance to compute similar methods. The birthmarks are evaluated on a set of over 65,000 classes from 60 APKs. To evaluate the performance of our tool, we establish ground truth by manually reverse engineering each app. The proposed system is compared with Google’s androsim, the only open-source tool for similarity analysis that also uses NCD. Our approach shows an improvement in accuracy in the worst-case when comapred to androsim. Finally, we furnish a case-study of our system to detect fake and repackaged apps by analyzing 1470 Android apps from various sources.


Android Similarity analysis Normalized compression distance Androguard 


  1. 1.
  2. 2.
    Android: Configure Apps With Over 64K Methods (2016).
  3. 3.
    Cebrián, M., Alfonseca, M., Ortega, A.: Common pitfalls using the normalized compression distance: what to watch out for in a compressor. Commun. Inf. Syst. 5, 367–384 (2005)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Chen, K., Liu, P., Zhang, Y.: Achieving accuracy and scalability simultaneously in detecting application clones on Android markets. In: ICSE (2014)Google Scholar
  5. 5.
    Chen, K., et al.: Finding unknown malice in 10 seconds: massing vetting for new threats at Google-pay scale. USENIX (2015)Google Scholar
  6. 6.
    Chen, S., Ma, B., Zhang, K.: On the similarity metric and the distance metric. Theor. Comput. Sci. 410, 2365–2376 (2009)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Chen, X., Francia, B., Li, M., McKinnon, B., Seker, A.: Shared information and program plagiarism detection. IEEE Trans. Inf. Theory 50, 1545–1551 (2004)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Crussell, J., Gibler, C., Chen, H.: Attack of the clones: detecting cloned applications on Android markets. In: Foresti, S., Yung, M., Martinelli, F. (eds.) ESORICS 2012. LNCS, vol. 7459, pp. 37–54. Springer, Heidelberg (2012). Scholar
  9. 9.
    Desnos, A.: Android: static analysis using similarity distance. In: AusPDC (2010)Google Scholar
  10. 10.
    Desnos, A.: Measuring similarity of Android applications via reversing and k-gram birthmarking. In: AusPDC (2010)Google Scholar
  11. 11.
    Fan, M., Liu, J., Wang, W., Li, H., Tian, Z., Liu, T.: DAPASA: detecting Android piggybacked apps through sensitive subgraph analysis. IEEE Trans. Inf. Forensics Secur. 12, 1772–1785 (2017)CrossRefGoogle Scholar
  12. 12.
    Faruki, P., et al.: Android security: a survey of issues, malware penetration, and defenses. IEEE Commun. Surv. Tutor. 17, 998–1022 (2015)CrossRefGoogle Scholar
  13. 13.
    Faruki, P., Ganmoor, V., Laxmi, V., Gaur, M., Bharmal, A.: AndroSimilar: robust statistical feature signature for Android malware detection. In: AusPDC (2010)Google Scholar
  14. 14.
    Gadyatskaya, O., Lezza, A.-L., Zhauniarovich, Y.: Evaluation of resource-based app repackaging detection in Android. In: Brumley, B.B., Röning, J. (eds.) NordSec 2016. LNCS, vol. 10014, pp. 135–151. Springer, Cham (2016). Scholar
  15. 15.
    Gascon, H., Yamaguchi, F., Arp, D.: Structural detection of Android malware using embedded call graphs (2018)Google Scholar
  16. 16.
    Guan, Q., Huang, H., Luo, W., Zhu, S.: Semantics-based repackaging detection for mobile apps. In: Caballero, J., Bodden, E., Athanasopoulos, E. (eds.) ESSoS 2016. LNCS, vol. 9639, pp. 89–105. Springer, Cham (2016). Scholar
  17. 17.
    Gurulian, I., Markantonakis, K., Cavallaro, L., Mayes, K.: You can’t touch this: consumer-centric Android application repackaging detection. Future Gener. Comput. Syst. 65, 1–9 (2016)CrossRefGoogle Scholar
  18. 18.
    Hanna, S., Huang, L., Wu, E., Li, S., Chen, C., Song, D.: Juxtapp: a scalable system for detecting code reuse among Android applications. In: Flegel, U., Markatos, E., Robertson, W. (eds.) DIMVA 2012. LNCS, vol. 7591, pp. 62–81. Springer, Heidelberg (2013). Scholar
  19. 19.
    Haoshi, H.: Detecting repackaged Android apps using server-side analysis. Master’s thesis, Eindhoven University of Technology (2016)Google Scholar
  20. 20.
    Huang, H., Zhu, S., Liu, P., Wu, D.: A framework for evaluating mobile app repackaging detection algorithms. In: Huth, M., Asokan, N., Čapkun, S., Flechais, I., Coles-Kemp, L. (eds.) Trust 2013. LNCS, vol. 7904, pp. 169–186. Springer, Heidelberg (2013). Scholar
  21. 21.
    Ishii, Y., Watanabe, T., Akiyama, M., Mori, T.: Clone or relative?: understanding the origins of similar Android apps. In: IWSPA (2016)Google Scholar
  22. 22.
    Ishio, T., Sakaguchi, Y., Ito, K., Inoue, K.: Source file set search for clone-and-own reuse analysis. In: ICSE (2017)Google Scholar
  23. 23.
    Kang, H., Jang, J., Mohaisen, A., Kim, H.K.: Detecting and classifying Android malware using static analysis along with creator information. IJDSN 11, 479174 (2015)Google Scholar
  24. 24.
    Kang, S., Shim, H., Cho, S., Park, M., Han, S.: A robust and efficient birthmark-based Android application filtering system. In: RACS (2014)Google Scholar
  25. 25.
    Kim, D., Gokhale, A., Ganapathy, V., Srivastava, A.: Detecting plagiarized mobile apps using API birthmarks. Autom. Softw. Eng. 23, 591–618 (2016)CrossRefGoogle Scholar
  26. 26.
    Kornblum, J.D.: Identifying almost identical files using context triggered piece-wise hashing. Digit. Invest. 3, 91–97 (2006)CrossRefGoogle Scholar
  27. 27.
    Li, L., Bissyande, T.F., Klein, J., Traon, Y.L.: An investigation into the use of common libraries in Android apps. CoRR (2015)Google Scholar
  28. 28.
    Li, L., et al.: On locating malicious code in piggybacked Android apps. J. Comput. Sci. Technol. 32, 1108–1124 (2017)CrossRefGoogle Scholar
  29. 29.
    Li, L., et al.: Automatically locating malicious packages in piggybacked Android apps. In: MOBILESoft (2017)Google Scholar
  30. 30.
    Li, M., Chen, X., Li, X., Ma, B., Vitányi, P.M.B.: The similarity metric. IEEE Trans. Inf. Theory 50, 3250–3264 (2004)MathSciNetCrossRefGoogle Scholar
  31. 31.
    Lina, S., et al.: AppIS: protect Android apps against runtime repackaging attacks. In: ICPADS (2017)Google Scholar
  32. 32.
    Linares-Vásquez, M., Holtzhauer, A., Poshyvanyk, D.: On automatically detecting similar Android apps. In: IEEE ICPC (2016)Google Scholar
  33. 33.
    Lyu, F., Lin, Y., Yang, J.: An efficient and packing-resilient two-phase Android cloned application detection approach. Mob. Inf. Syst. 2017, 12 p. (2017). Article ID 6958698CrossRefGoogle Scholar
  34. 34.
  35. 35.
    Salem, A.: Stimulation and detection of Android repackaged malware with active learning. arXiv (2018)Google Scholar
  36. 36.
    Soh, C., Tan, H.B.K., Arnatovich, Y.L., Wang, L.: Detecting clones in Android applications through analyzing user interfaces. In: ICSE (2015)Google Scholar
  37. 37.
  38. 38.
    Suarezl, G., Tapiador, J.E., Peris-Lopez, P., Blasco, J.: Dendroid: a text mining approach to analyzing and classifying code structures in Android malware families. Expert Syst. Appl. 41, 1104–1117 (2014)CrossRefGoogle Scholar
  39. 39.
    Sun, M., Li, M., Lui, J.C.S.: DroidEagle: seamless detection of visually similar Android apps. In: ACM WiSec (2015)Google Scholar
  40. 40.
    Tamada, H.: (2016).
  41. 41.
    Tamada, H., Nakamura, M., Monden, A., Matsumoto, K.I.: Java birthmarks - detecting the software theft. IEICE Trans. 88, 2148–2158 (2005)CrossRefGoogle Scholar
  42. 42.
    Thomas, D.R., Beresford, A.R., Rice, A.C.: Security metrics for the Android ecosystem. In: SPSMCCS. ACM (2015)Google Scholar
  43. 43.
    Tian, K., Yao, D., Ryder, B.G., Tan, G.: Analysis of code heterogeneity for high-precision classification of repackaged malware. In: SPW (2016)Google Scholar
  44. 44.
    Gayoso Martínez, V., Hernández Álvarez, F., Hernández Encinas, L.: State of the art in similarity preserving hashing functions. SAM (2014)Google Scholar
  45. 45.
    Wang, H., Guo, Y., Ma, Z., Chen, X.: WuKong: a scalable and accurate two-phase approach to Android app clone detection. In: ISSTA. ACM SIGSOFT (2015)Google Scholar
  46. 46.
    Yue, S., et al.: RepDroid: an automated tool for Android application repackaging detection. In: ICPC (2017)Google Scholar
  47. 47.
    Zhauniarovich, Y., Gadyatskaya, O., Crispo, B., La Spina, F., Moser, E.: FSquaDRA: fast detection of repackaged applications. In: Atluri, V., Pernul, G. (eds.) DBSec 2014. LNCS, vol. 8566, pp. 130–145. Springer, Heidelberg (2014). Scholar
  48. 48.
    Zhou, W., Zhou, Y., Grace, M., Jiang, X., Zou, S.: Fast, scalable detection of “piggybacked” mobile applications. In: CODASPY. ACM (2013)Google Scholar
  49. 49.
    Zhou, W., Zhou, Y., Jiang, X., Ning, P.: Detecting repackaged smartphone applications in third-party Android marketplaces. In: CODASPY. ACM (2013)Google Scholar
  50. 50.
    Zhou, Y., Jiang, X.: Dissecting Android malware: characterization and evolution. In: IEEE Symposium on S&P. IEEE (2012)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Amrita Center for Cybersecurity Systems and Networks, Amrita School of Engineering, Amritapuri CampusAmrita Vishwa VidyapeethamKollamIndia

Personalised recommendations