AnDarwin: Scalable Detection of Semantically Similar Android Applications

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8134)


The popularity and utility of smartphones rely on their vibrant application markets; however, plagiarism threatens the long-term health of these markets. We present a scalable approach to detecting similar Android apps based on their semantic information. We implement our approach in a tool called AnDarwin and evaluate it on 265,359 apps collected from 17 markets including Google Play and numerous thirdparty markets. In contrast to earlier approaches, AnDarwin has four advantages: it avoids comparing apps pairwise, thus greatly improving its scalability; it analyzes only the app code and does not rely on other information - such as the app’s market, signature, or description - thus greatly increasing its reliability; it can detect both full and partial app similarity; and it can automatically detect library code and remove it from the similarity analysis. We present two use cases for AnDarwin: finding similar apps by different developers (“clones”) and similar apps from the same developer (“rebranded”). In ten hours, AnDarwin detected at least 4,295 apps that have been the victims of cloning and 36,106 apps that are rebranded. By analyzing the clusters found by AnDarwin, we found 88 new variants of malware and identified 169 malicious apps based on differences in the requested permissions. Our evaluation demonstrates AnDarwin’s ability to accurately detect similar apps on a large scale.


  1. 1.
    Goapk market (April 2012),
  2. 2.
    Google play (April 2012),
  3. 3.
    Slideme: Android community and application marketplace (April 2012),
  4. 4.
    Virus total (June 2012),
  5. 5.
    Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: 47th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2006, pp. 459–468. IEEE (2006)Google Scholar
  6. 6.
    Androguard. Androguard: Manipulation and protection of android apps and more... (April 2012),
  7. 7.
    AppBrain. Number of available android applications (November 2012),
  8. 8.
    BajaBob. found on my hacked application (May 2012),
  9. 9.
    Baker, B.S.: On finding duplication and near-duplication in large software systems. In: Proceedings of 2nd Working Conference on Reverse Engineering 1995, pp. 86–95. IEEE (1995)Google Scholar
  10. 10.
    Scott Beard. Market shocker! iron soldiers xda beta published by alleged thief (May 2012),
  11. 11.
    The Lookout Blog. Security alert: Gamex trojan hides in root-required apps - tricking users into downloads (April 2012),
  12. 12.
    Broder, A.Z.: On the resemblance and containment of documents. In: Proceedings of the Compression and Complexity of Sequences 1997, pp. 21–29. IEEE (1997)Google Scholar
  13. 13.
    Broder, A.Z., Charikar, M., Frieze, A.M., Mitzenmacher, M.: Min-wise independent permutations. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, pp. 327–336. ACM (1998)Google Scholar
  14. 14.
    IBM T.J. Watson Research Center. T.j. watson libraries for analysis (wala) (April 2012),
  15. 15.
    comScore. comscore reports march 2012 u.s. mobile subscriber market share (May 2012),
  16. 16.
    Crussell, J., Gibler, C., Chen, H.: Attack of the clones: Detecting cloned applications on android markets. In: Foresti, S., Yung, M., Martinelli, F. (eds.) ESORICS 2012. LNCS, vol. 7459, pp. 37–54. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  17. 17.
    Davis, I.: Dexcd (April 2012),
  18. 18.
    Doherty, S., Krysiuk, P.: Android.basebridge (November 2012),
  19. 19.
    Gabel, M., Jiang, L., Su, Z.: Scalable detection of semantic clones. In: ACM/IEEE 30th International Conference on Software Engineering, ICSE 2008, pp. 321–330. IEEE (2008)Google Scholar
  20. 20.
    Gibler, C., Stevens, R., Crussell, J., Chen, H., Zang, H., Choi, H.: Adrob: Examining the landscape and impact of android application plagiarism. To Appear in the Proceedings of 11th International Conference on Mobile Systems, Applications and Services (2013)Google Scholar
  21. 21.
    Hanna, S., Huang, L., Wu, E., Li, S., Chen, C., Song, D.: Juxtapp: A scalable system for detecting code reuse among android applications. In: Flegel, U., Markatos, E., Robertson, W. (eds.) DIMVA 2012. LNCS, vol. 7591, pp. 62–81. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  22. 22.
    Jiang, L., Misherghi, G., Su, Z., Glondu, S.: Deckard: Scalable and accurate tree-based detection of code clones. In: Proceedings of the 29th International Conference on Software Engineering, pp. 96–105. IEEE Computer Society (2007)Google Scholar
  23. 23.
    Jiang, X.: Droidkungfu (November 2012),
  24. 24.
    Komondoor, R., Horwitz, S.: Using slicing to identify duplication in source code. In: Cousot, P. (ed.) SAS 2001. LNCS, vol. 2126, pp. 40–56. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  25. 25.
    Li, Z., Lu, S., Myagmar, S., Zhou, Y.: Cp-miner: Finding copy-paste and related bugs in large-scale software code. IEEE Transactions on Software Engineering 32(3), 176–192 (2006)CrossRefGoogle Scholar
  26. 26.
    Lockheimer, H.: Android and security (April 2012),
  27. 27.
    OGorman, G., Honda, H.: Android.geinimi (November 2012),
  28. 28.
    pxb1988. dex2jar: A tool for converting android’s .dex format to java’s .class format (April 2012),
  29. 29.
    Rajaraman, A., Leskovec, J., Ullman, J.: Mining of massive datasets (2012),
  30. 30.
    Spring, T.: Sneaky mobile ads invade android phones (June 2012),
  31. 31.
    Android Threats. Android/adwo (February 2013),
  32. 32.
    Zhou, W., Zhou, Y., Jiang, X., Ning, P.: Detecting repackaged smartphone applications in third-party android marketplaces. In: Proceedings of 2nd ACM Conference on Data and Application Security and Privacy, CODASPY 2012 (2012)Google Scholar
  33. 33.
    Zhou, W., Zhou, Y., Grace, M., Jiang, X., Zou, S.: Fast, scalable detection of piggybacked mobile applications. In: Proceedings of the Third ACM Conference on Data and Application Security and Privacy, pp. 185–196. ACM (2013)Google Scholar
  34. 34.
    Zhou, Y., Jiang, X.: Dissecting android malware: Characterization and evolution. In: Proceedings of 33rd Symposium on Security and Privacy. IEEE (2012)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.University of CaliforniaDavisUSA

Personalised recommendations