Abstract
The popularity and utility of smartphones rely on their vibrant application markets; however, plagiarism threatens the long-term health of these markets. We present a scalable approach to detecting similar Android apps based on their semantic information. We implement our approach in a tool called AnDarwin and evaluate it on 265,359 apps collected from 17 markets including Google Play and numerous thirdparty markets. In contrast to earlier approaches, AnDarwin has four advantages: it avoids comparing apps pairwise, thus greatly improving its scalability; it analyzes only the app code and does not rely on other information - such as the app’s market, signature, or description - thus greatly increasing its reliability; it can detect both full and partial app similarity; and it can automatically detect library code and remove it from the similarity analysis. We present two use cases for AnDarwin: finding similar apps by different developers (“clones”) and similar apps from the same developer (“rebranded”). In ten hours, AnDarwin detected at least 4,295 apps that have been the victims of cloning and 36,106 apps that are rebranded. By analyzing the clusters found by AnDarwin, we found 88 new variants of malware and identified 169 malicious apps based on differences in the requested permissions. Our evaluation demonstrates AnDarwin’s ability to accurately detect similar apps on a large scale.
Chapter PDF
References
Goapk market (April 2012), http://market.goapk.com
Google play (April 2012), https://play.google.com/store/apps
Slideme: Android community and application marketplace (April 2012), http://slideme.org/
Virus total (June 2012), https://www.virustotal.com
Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: 47th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2006, pp. 459–468. IEEE (2006)
Androguard. Androguard: Manipulation and protection of android apps and more... (April 2012), http://code.google.com/p/androguard/
AppBrain. Number of available android applications (November 2012), http://www.appbrain.com/stats/number-of-android-apps
BajaBob. Smalihook.java found on my hacked application (May 2012), http://stackoverflow.com/questions/5600143/android-game-keeps-getting-hacked
Baker, B.S.: On finding duplication and near-duplication in large software systems. In: Proceedings of 2nd Working Conference on Reverse Engineering 1995, pp. 86–95. IEEE (1995)
Scott Beard. Market shocker! iron soldiers xda beta published by alleged thief (May 2012), http://androidheadlines.com/2011/01/market-shocker-iron-soldiers-xda-beta-published-by-alleged-thief.html
The Lookout Blog. Security alert: Gamex trojan hides in root-required apps - tricking users into downloads (April 2012), http://blog.mylookout.com/blog/2012/04/27/security-alert-gamex-trojans-hides-in-root-required-apps-tricking-users-into-downloads/
Broder, A.Z.: On the resemblance and containment of documents. In: Proceedings of the Compression and Complexity of Sequences 1997, pp. 21–29. IEEE (1997)
Broder, A.Z., Charikar, M., Frieze, A.M., Mitzenmacher, M.: Min-wise independent permutations. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, pp. 327–336. ACM (1998)
IBM T.J. Watson Research Center. T.j. watson libraries for analysis (wala) (April 2012), http://wala.sourceforge.net
comScore. comscore reports march 2012 u.s. mobile subscriber market share (May 2012), http://www.comscore.com/Press_Events/Press_Releases/2012/4/comScore_Reports_March_2012_U.S._Mobile_Subscriber_Market_Share
Crussell, J., Gibler, C., Chen, H.: Attack of the clones: Detecting cloned applications on android markets. In: Foresti, S., Yung, M., Martinelli, F. (eds.) ESORICS 2012. LNCS, vol. 7459, pp. 37–54. Springer, Heidelberg (2012)
Davis, I.: Dexcd (April 2012), http://www.swag.uwaterloo.ca/dexcd/index.html
Doherty, S., Krysiuk, P.: Android.basebridge (November 2012), http://www.symantec.com/security_response/writeup.jsp?docid=2011-060915-4938-99
Gabel, M., Jiang, L., Su, Z.: Scalable detection of semantic clones. In: ACM/IEEE 30th International Conference on Software Engineering, ICSE 2008, pp. 321–330. IEEE (2008)
Gibler, C., Stevens, R., Crussell, J., Chen, H., Zang, H., Choi, H.: Adrob: Examining the landscape and impact of android application plagiarism. To Appear in the Proceedings of 11th International Conference on Mobile Systems, Applications and Services (2013)
Hanna, S., Huang, L., Wu, E., Li, S., Chen, C., Song, D.: Juxtapp: A scalable system for detecting code reuse among android applications. In: Flegel, U., Markatos, E., Robertson, W. (eds.) DIMVA 2012. LNCS, vol. 7591, pp. 62–81. Springer, Heidelberg (2013)
Jiang, L., Misherghi, G., Su, Z., Glondu, S.: Deckard: Scalable and accurate tree-based detection of code clones. In: Proceedings of the 29th International Conference on Software Engineering, pp. 96–105. IEEE Computer Society (2007)
Jiang, X.: Droidkungfu (November 2012), http://www.csc.ncsu.edu/faculty/jiang/DroidKungFu.html
Komondoor, R., Horwitz, S.: Using slicing to identify duplication in source code. In: Cousot, P. (ed.) SAS 2001. LNCS, vol. 2126, pp. 40–56. Springer, Heidelberg (2001)
Li, Z., Lu, S., Myagmar, S., Zhou, Y.: Cp-miner: Finding copy-paste and related bugs in large-scale software code. IEEE Transactions on Software Engineering 32(3), 176–192 (2006)
Lockheimer, H.: Android and security (April 2012), http://googlemobile.blogspot.com/2012/02/android-and-security.html
OGorman, G., Honda, H.: Android.geinimi (November 2012), http://www.symantec.com/security_response/writeup.jsp?docid=2011-010111-5403-99
pxb1988. dex2jar: A tool for converting android’s .dex format to java’s .class format (April 2012), https://code.google.com/p/dex2jar/
Rajaraman, A., Leskovec, J., Ullman, J.: Mining of massive datasets (2012), http://infolab.stanford.edu/~ullman/mmds/book.pdf
Spring, T.: Sneaky mobile ads invade android phones (June 2012), http://www.pcworld.com/article/245305/sneaky_mobile_ads_invade_android_phones.html
Android Threats. Android/adwo (February 2013), http://android-threats.org/androidadwo/
Zhou, W., Zhou, Y., Jiang, X., Ning, P.: Detecting repackaged smartphone applications in third-party android marketplaces. In: Proceedings of 2nd ACM Conference on Data and Application Security and Privacy, CODASPY 2012 (2012)
Zhou, W., Zhou, Y., Grace, M., Jiang, X., Zou, S.: Fast, scalable detection of piggybacked mobile applications. In: Proceedings of the Third ACM Conference on Data and Application Security and Privacy, pp. 185–196. ACM (2013)
Zhou, Y., Jiang, X.: Dissecting android malware: Characterization and evolution. In: Proceedings of 33rd Symposium on Security and Privacy. IEEE (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Crussell, J., Gibler, C., Chen, H. (2013). AnDarwin: Scalable Detection of Semantically Similar Android Applications. In: Crampton, J., Jajodia, S., Mayes, K. (eds) Computer Security – ESORICS 2013. ESORICS 2013. Lecture Notes in Computer Science, vol 8134. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40203-6_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-40203-6_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40202-9
Online ISBN: 978-3-642-40203-6
eBook Packages: Computer ScienceComputer Science (R0)