Abstract
In the Android apps ecosystem, third-party libraries play a crucial role in providing common services and features. However, these libraries introduce complex dependencies that can impact stability, performance, and security. Therefore, detecting libraries used in Android apps is critical for understanding functionality, compliance, and security risks. Existing library identification approaches face challenges when obfuscation is applied to apps, leading to performance degradation. In this study, we propose Libra, a novel solution for library identification in obfuscated Android apps. Libra leverages method headers and bodies, encodes instructions compactly, and employs piecewise fuzzy hashing for effective detection of libraries in obfuscated apps. Our two-phase approach achieves high F1 scores of \(88\%\) for non-obfuscated and 50–87% for obfuscated apps, surpassing previous works by significant margins. Extensive evaluations demonstrate Libra’s effectiveness and robustness against various obfuscation techniques.
K. Nwodo—This work was done as part of an internship at Quokka.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The process of identifying components used in a software is generally known as creating a Software Bill of Materials (SBOM). See https://www.cisa.gov/sbom for more information about the SBOM concept and standards.
- 2.
At the time of this writing, the Maven Central repository [5] had over 11 million indexed library packages.
- 3.
We exclude the instance initializer method (\(\texttt {{<}init{>}}\)), the class initializer method (\(\texttt {{<}clinit{>}}\)), and the resources class (R) since these tend to be highly similar amongst apps and libraries which may lead to spurious matches.
- 4.
CTPH offers advantages over other hashing methods in this setup as it employs a recursive rolling hash where each piece of the hash is computed based on parts of the data and is not influenced by previously processed data. Consequently, if there are changes to the sequences being hashed, only a small portion of the hash is affected. This is a desirable property for library identification in obfuscated apps since changes to the library bytecode packed in the app are expected.
- 5.
Note that the Android SDK Support Library [7] was excluded from the counts for consistency with all evaluated tools.
References
Allatori. https://allatori.com/
Get started with the NDK. https://developer.android.com/ndk/guides
Libdetect dataset. https://sites.google.com/view/libdetect/home/dataset
Maven repository: Central. https://mvnrepository.com/repos/central
Proguard. https://www.guardsquare.com/proguard
Support Library \(|\) Android Developers. https://developer.android.com/topic/libraries/support-library
SolarWinds attack explained: And why it was so hard to detect (2020). https://www.csoonline.com/article/3601508/solarwinds-supply-chain-attack-explained-why-organizations-were-not-prepared.html
Synopsys research reveals significant security concerns in popular mobile apps amid pandemic (2021). https://news.synopsys.com/2021-03-25-Synopsys-Research-Reveals-Significant-Security-Concerns-in-Popular-Mobile-Apps-Amid-Pandemic
Number of apps available in leading app stores as of 3rd quarter 2022 (2021). https://www.statista.com/statistics/276623/number-of-apps-available-in-leading-app-stores/
Numbers from Google I/O: 3 billion active Android devices (2022). https://9to5google.com/2022/05/11/google-io-2022-numbers/
Shrink, obfuscate, and optimize your app (2023). https://developer.android.com/studio/build/shrink-code.html
Ali, M.: Sensors Sandbox. https://github.com/mustafa01ali/SensorsSandbox
Almanee, S., Ünal, A., Payer, M., Garcia, J.: Too quiet in the library: an empirical study of security updates in Android apps’ native code. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE (2021)
Backes, M., Bugiel, S., Derr, E.: Reliable third-party library detection in Android and its security applications. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (2016)
Derr, E., Bugiel, S., Fahl, S., Acar, Y., Backes, M.: Keep me updated: an empirical study of third-party library updatability on Android. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (2017)
Duan, R., Bijlani, A., Xu, M., Kim, T., Lee, W.: Identifying open-source license violation and 1-day security risk at large scale. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (2017)
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugen. 7(2), 179–188 (1936)
Glanz, L., et al.: CodeMatch: obfuscation won’t conceal your repackaged app. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (2017)
Han, H., Li, R., Tang, J.: Identify and inspect libraries in Android applications. Wirel. Pers. Commun. 103(1), 491–503 (2018)
Huang, J., et al.: Scalably detecting third-party Android libraries with two-stage bloom filtering. IEEE Trans. Softw. Eng. (2022)
Kornblum, J.: Identifying almost identical files using context triggered piecewise hashing. Digit. Investig. 3, 91–97 (2006)
Levenshtein, V.I., et al.: Binary codes capable of correcting deletions, insertions, and reversals. In: Soviet Physics Doklady (1966)
Li, M., et al.: LIBD: scalable and precise third-party library detection in Android markets. In: 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE) (2017)
Liu, B., Liu, B., Jin, H., Govindan, R.: Efficient privilege de-escalation for ad libraries in mobile apps. In: Proceedings of the 13th Annual International Conference on Mobile Systems, Applications, and Services, pp. 89–103 (2015)
Ma, Z., Wang, H., Guo, Y., Chen, X.: LibRadar: fast and accurate detection of third-party libraries in Android apps. In: Proceedings of the 38th International Conference on Software Engineering Companion (2016)
Narayanan, A., Chen, L., Chan, C.K.: AdDetect: automated detection of Android ad libraries using semantic analysis. In: 2014 IEEE Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP) (2014)
Sihag, V., Vardhan, M., Singh, P.: A survey of Android application and malware hardening. Comput. Sci. Rev. 39, 100365 (2021)
Soh, C., Tan, H.B.K., Arnatovich, Y.L., Narayanan, A., Wang, L.: LibSift: automated detection of third-party libraries in Android applications. In: 2016 23rd Asia-Pacific Software Engineering Conference (APSEC) (2016)
Tang, W., Luo, P., Fu, J., Zhang, D.: LibDX: a cross-platform and accurate system to detect third-party libraries in binary code. In: 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER) (2020)
Tang, Z., et al.: Securing Android applications via edge assistant third-party library detection. Comput. Secur. 80 (2019)
Wang, H., Guo, Y., Ma, Z., Chen, X.: Wukong: a scalable and accurate two-phase approach to Android app clone detection. In: Proceedings of the 2015 International Symposium on Software Testing and Analysis (2015)
Wang, Y., Rountev, A.: Who changed you? Obfuscator identification for Android. In: 2017 IEEE/ACM 4th International Conference on Mobile Software Engineering and Systems (MOBILESoft), pp. 154–164. IEEE (2017)
Wang, Y., Wu, H., Zhang, H., Rountev, A.: ORLIS: obfuscation-resilient library detection for Android. In: 2018 IEEE/ACM 5th International Conference on Mobile Software Engineering and Systems (MOBILESoft) (2018)
Wang, Y., et al.: An empirical study of usages, updates and risks of third-party libraries in Java projects. In: 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 35–45. IEEE (2020)
Xu, J., Yuan, Q.: LibRoad: rapid, online, and accurate detection of TPLs on Android. IEEE Trans. Mob. Comput. 21(1) (2020)
Zhan, X., et al.: ATVHunter: reliable version detection of third-party libraries for vulnerability identification in Android applications. In: 43rd International Conference on Software Engineering (2021)
Zhan, X., et al.: Automated third-party library detection for Android applications: are we there yet? In: 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 919–930. IEEE (2020)
Zhan, X., et al.: Research on third-party libraries in Android apps: a taxonomy and systematic literature review. IEEE Trans. Softw. Eng. 48(10) (2022)
Zhang, F., Huang, H., Zhu, S., Wu, D., Liu, P.: ViewDroid: towards obfuscation-resilient mobile application repackaging detection. In: Proceedings of the 2014 ACM Conference on Security and Privacy in Wireless & Mobile Networks (2014)
Zhang, J., Beresford, A.R., Kollmann, S.A.: LibID: reliable identification of obfuscated third-party Android libraries. In: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 55–65 (2019)
Zhang, Y., Wang, J., Huang, H., Zhang, Y., Liu, P.: Understanding and conquering the difficulties in identifying third-party librariesfrom millions of Android apps. IEEE Trans. Big Data (2021)
Zhang, Y., et al.: Detecting third-party libraries in Android applications with high precision and recall. In: IEEE 25th Conference on Software Analysis, Evolution and Reengineering (2018)
Zhang, Z., Diao, W., Hu, C., Guo, S., Zuo, C., Li, L.: An empirical study of potentially malicious third-party libraries in Android apps. In: 13th ACM Conference on Security and Privacy in Wireless and Mobile Networks (2020)
Acknowledgment
We thank the anonymous reviewers for their insightful feedback. Opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of their respective institutions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Method Encoding Codebook
Table 10 shows the codebook used by Libra to encode method instructions. We conducted feature selection to determine the best mapping using Fisher’s score [18] to gain insights into the most discriminatory instructions. Our analysis revealed that field getters, setters, and arithmetic operators exhibited low variance, making them less useful for discrimination. Consequently, we decided to combine these arithmetic instructions into a single move instruction.
B Search Space Reduction from Library Pairing
The pairing size complexity for pairs that satisfy condition one is O(k), where n is the number of libraries in the database, and \(k \ll n\) represents the group size. On the other hand, the pairing size complexity for condition two is \(O(|P_{C2}|)\), where \(P_{C2}\) is defined as:
where C is the library candidate, L is the library, A is the app, and D is the database. If no conditions are met, the library candidate is paired with the entire database, resulting in a pairing size complexity of O(n). Note that this is unlikely as there are a wide range of library sizes from the order of \(10^0\) to \(10^3\) and condition two is likely to be met.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Tomassi, D.A., Nwodo, K., Elsabagh, M. (2023). Libra: Library Identification in Obfuscated Android Apps. In: Athanasopoulos, E., Mennink, B. (eds) Information Security. ISC 2023. Lecture Notes in Computer Science, vol 14411. Springer, Cham. https://doi.org/10.1007/978-3-031-49187-0_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-49187-0_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-49186-3
Online ISBN: 978-3-031-49187-0
eBook Packages: Computer ScienceComputer Science (R0)