Skip to main content

AnDarwin: Scalable Detection of Semantically Similar Android Applications

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNSC,volume 8134)


The popularity and utility of smartphones rely on their vibrant application markets; however, plagiarism threatens the long-term health of these markets. We present a scalable approach to detecting similar Android apps based on their semantic information. We implement our approach in a tool called AnDarwin and evaluate it on 265,359 apps collected from 17 markets including Google Play and numerous thirdparty markets. In contrast to earlier approaches, AnDarwin has four advantages: it avoids comparing apps pairwise, thus greatly improving its scalability; it analyzes only the app code and does not rely on other information - such as the app’s market, signature, or description - thus greatly increasing its reliability; it can detect both full and partial app similarity; and it can automatically detect library code and remove it from the similarity analysis. We present two use cases for AnDarwin: finding similar apps by different developers (“clones”) and similar apps from the same developer (“rebranded”). In ten hours, AnDarwin detected at least 4,295 apps that have been the victims of cloning and 36,106 apps that are rebranded. By analyzing the clusters found by AnDarwin, we found 88 new variants of malware and identified 169 malicious apps based on differences in the requested permissions. Our evaluation demonstrates AnDarwin’s ability to accurately detect similar apps on a large scale.


  1. Goapk market (April 2012),

  2. Google play (April 2012),

  3. Slideme: Android community and application marketplace (April 2012),

  4. Virus total (June 2012),

  5. Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: 47th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2006, pp. 459–468. IEEE (2006)

    Google Scholar 

  6. Androguard. Androguard: Manipulation and protection of android apps and more... (April 2012),

  7. AppBrain. Number of available android applications (November 2012),

  8. BajaBob. found on my hacked application (May 2012),

  9. Baker, B.S.: On finding duplication and near-duplication in large software systems. In: Proceedings of 2nd Working Conference on Reverse Engineering 1995, pp. 86–95. IEEE (1995)

    Google Scholar 

  10. Scott Beard. Market shocker! iron soldiers xda beta published by alleged thief (May 2012),

  11. The Lookout Blog. Security alert: Gamex trojan hides in root-required apps - tricking users into downloads (April 2012),

  12. Broder, A.Z.: On the resemblance and containment of documents. In: Proceedings of the Compression and Complexity of Sequences 1997, pp. 21–29. IEEE (1997)

    Google Scholar 

  13. Broder, A.Z., Charikar, M., Frieze, A.M., Mitzenmacher, M.: Min-wise independent permutations. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, pp. 327–336. ACM (1998)

    Google Scholar 

  14. IBM T.J. Watson Research Center. T.j. watson libraries for analysis (wala) (April 2012),

  15. comScore. comscore reports march 2012 u.s. mobile subscriber market share (May 2012),

  16. Crussell, J., Gibler, C., Chen, H.: Attack of the clones: Detecting cloned applications on android markets. In: Foresti, S., Yung, M., Martinelli, F. (eds.) ESORICS 2012. LNCS, vol. 7459, pp. 37–54. Springer, Heidelberg (2012)

    CrossRef  Google Scholar 

  17. Davis, I.: Dexcd (April 2012),

  18. Doherty, S., Krysiuk, P.: Android.basebridge (November 2012),

  19. Gabel, M., Jiang, L., Su, Z.: Scalable detection of semantic clones. In: ACM/IEEE 30th International Conference on Software Engineering, ICSE 2008, pp. 321–330. IEEE (2008)

    Google Scholar 

  20. Gibler, C., Stevens, R., Crussell, J., Chen, H., Zang, H., Choi, H.: Adrob: Examining the landscape and impact of android application plagiarism. To Appear in the Proceedings of 11th International Conference on Mobile Systems, Applications and Services (2013)

    Google Scholar 

  21. Hanna, S., Huang, L., Wu, E., Li, S., Chen, C., Song, D.: Juxtapp: A scalable system for detecting code reuse among android applications. In: Flegel, U., Markatos, E., Robertson, W. (eds.) DIMVA 2012. LNCS, vol. 7591, pp. 62–81. Springer, Heidelberg (2013)

    CrossRef  Google Scholar 

  22. Jiang, L., Misherghi, G., Su, Z., Glondu, S.: Deckard: Scalable and accurate tree-based detection of code clones. In: Proceedings of the 29th International Conference on Software Engineering, pp. 96–105. IEEE Computer Society (2007)

    Google Scholar 

  23. Jiang, X.: Droidkungfu (November 2012),

  24. Komondoor, R., Horwitz, S.: Using slicing to identify duplication in source code. In: Cousot, P. (ed.) SAS 2001. LNCS, vol. 2126, pp. 40–56. Springer, Heidelberg (2001)

    CrossRef  Google Scholar 

  25. Li, Z., Lu, S., Myagmar, S., Zhou, Y.: Cp-miner: Finding copy-paste and related bugs in large-scale software code. IEEE Transactions on Software Engineering 32(3), 176–192 (2006)

    CrossRef  Google Scholar 

  26. Lockheimer, H.: Android and security (April 2012),

  27. OGorman, G., Honda, H.: Android.geinimi (November 2012),

  28. pxb1988. dex2jar: A tool for converting android’s .dex format to java’s .class format (April 2012),

  29. Rajaraman, A., Leskovec, J., Ullman, J.: Mining of massive datasets (2012),

  30. Spring, T.: Sneaky mobile ads invade android phones (June 2012),

  31. Android Threats. Android/adwo (February 2013),

  32. Zhou, W., Zhou, Y., Jiang, X., Ning, P.: Detecting repackaged smartphone applications in third-party android marketplaces. In: Proceedings of 2nd ACM Conference on Data and Application Security and Privacy, CODASPY 2012 (2012)

    Google Scholar 

  33. Zhou, W., Zhou, Y., Grace, M., Jiang, X., Zou, S.: Fast, scalable detection of piggybacked mobile applications. In: Proceedings of the Third ACM Conference on Data and Application Security and Privacy, pp. 185–196. ACM (2013)

    Google Scholar 

  34. Zhou, Y., Jiang, X.: Dissecting android malware: Characterization and evolution. In: Proceedings of 33rd Symposium on Security and Privacy. IEEE (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations


Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Crussell, J., Gibler, C., Chen, H. (2013). AnDarwin: Scalable Detection of Semantically Similar Android Applications. In: Crampton, J., Jajodia, S., Mayes, K. (eds) Computer Security – ESORICS 2013. ESORICS 2013. Lecture Notes in Computer Science, vol 8134. Springer, Berlin, Heidelberg.

Download citation

  • DOI:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40202-9

  • Online ISBN: 978-3-642-40203-6

  • eBook Packages: Computer ScienceComputer Science (R0)