Abstract
To help programmers find proper API methods and learn API usages, researchers have proposed various code search engines. Given an API of interest, a code search engine can retrieve its code samples from online software repositories. Through such tools, Internet code becomes a major resource for learning API usages. Besides Internet code, local library code also contains API usages, and researchers have found that library code contains many API usages that are more concise than those in client code. As samples from a code search engine are typically client code, it is interesting to explore the API usages inside library code, but the samples inside library code contain internal method invocations. If an empirical study does not remove them from API usages, it can significantly overestimate the API usages from library code. Due to this challenge, no prior study has ever analyzed API usages inside libraries, and many research questions are still open. For example, how many API usages are there inside libraries? The answers are useful to motivate future research on APIs and code search engines. The internal usages in library code will introduce compilation errors when they are directly called from the client side. To support the exploration of the above questions, in this paper, we propose CodeEx that extracts Internet code samples from a popular code search engine and local code samples by removing internal usages from library code. With the support of CodeEx, we conduct the first empirical study on API usages of five libraries, and summarize our results into six findings as the answers to five research questions. Our results are useful for researchers to motivate their future research. For example, our results show that although code samples from library code are only half of those from the code search engine, they cover 4.0 times more API classes, 4.7 times more API methods, and 3.0 times more call sequences. Meanwhile, in a controlled experiment, we compare their effectiveness in assisting programming. We find that more API usages do not lead to more complete tasks, and it highlights the importance of code recommendation approaches.
Similar content being viewed by others
References
accumulo (2019). https://accumulo.apache.org
cassandra (2019). http://cassandra.apache.org
Guice (2019a). https://searchcode.com/api/
JDT (2019). http://www.eclipse.org/jdt/
karaf (2019). https://karaf.apache.org
lucene (2019). https://lucene.apache.org
poi (2019). https://poi.apache.org
The searchcode engine (2019b). https://searchcode.com/
Cassandra archive (2020). http://archive.apache.org/dist/cassandra
Tha API documents of accumulo 1.9 (2020). https://accumulo.apache.org/1.9/apidocs/
Ammons G, Bodík R, Larus JR (2002) Mining specifications. In: Proc 29th POPL, pp 4–16
Asyrofi MH, Thung F, Lo D, Jiang L (2020) Ausearch: Accurate API usage search in github repositories with type resolution. In: Proc SANER, pp 637–641
Bajracharya S, Ngo T, Linstead E, Dou Y, Rigor P, Baldi P, Lopes C (2006) Sourcerer: a search engine for open source code supporting structure-based search. In: Companion to Proc. OOPSLA, pp 681–682
Bian P, Liang B, Shi W, Huang J, Cai Y (2018) Nar-miner: discovering negative association rules from code for bug detection. In: Proc. ESEC/FSE, pp 411–422
Bornholt J, Torlak E (2017) Synthesizing memory models from framework sketches and litmus tests. In: Proc. PLDI, pp 467–481
Brito G, Hora A, Valente MT, Robbes R (2018) On the use of replacement messages in api deprecation: an empirical study. J Syst Softw 137:306–321
Bruce BR, Zhang T, Arora J, Xu GH, Kim M (2020) Jshrink: In-depth investigation into debloating modern java applications. In: Proc. ESEC/FSE, pp 135–146
Buse RP, Weimer W (2012) Synthesizing api usage examples. In: Proc. ICSE, pp 782–792
Chatterjee S, Juvekar S, Sen K (2009) Sniff: a search engine for java using free-form queries. In: Proc. FASE, pp 385–400
Dagenais B, Hendren LJ (2008) Enabling static analysis for partial Java programs. In: Proc. OOPSLA, pp 313–328
Feng Y, Martins R, Bastani O, Dillig I (2018) Program synthesis using conflict-driven learning. In: Proc. PLDI, pp 420–435
Gabel M, Su Z (2008) Javert: fully automatic mining of general temporal properties from dynamic traces. In: Proc. ESEC/FSE, pp 339–349
Ghafari M, Moradi H (2017) A framework for classifying and comparing source code recommendation systems. In: Proc. SANER, pp 555–556
Ghafari M, Ghezzi C, Mocci A, Tamburrelli G (2014) Mining unit tests for code recommendation. In: Proc. ICPC, pp 142–145
Ghafari M, Rubinov K, Pourhashem KMM (2017) Mining unit test cases to synthesize api usage examples. Journal of software: evolution and process 29(12):e1841
Hassan F, Wang X (2018) HireBuild: An automatic approach to history-driven repair of build scripts. In: Proc. ICSE, pp 1078–1089
Hindle A, Barr ET, Su Z, Gabel M, Devanbu P (2012) On the naturalness of software. In: Proc. 34th ICSE, pp 837–847
Holmes R, Murphy GC (2005) Using structural context to recommend source code examples. In: Proc. 27th ICSE, pp 117–125
Kawrykow D, Robillard MP (2009) Improving API usage through automatic detection of redundant code. In: Proc. ASE, pp 111–122
Keivanloo I, Rilling J, Zou Y (2014) Spotting working code examples. In: Proc. ICSE, pp 664–675
Kim K, Kim D, Bissyandé TF, Choi E, Li L, Klein J, Traon YL (2018) Facoy: a code-to-code search engine. In: Proc. ICSE, pp 946–957
Kula R G, German D M, Ouni A, Ishio T, Inoue K (2018) Do developers update their library dependencies? Empir Softw Eng 23(1):384–417
Lemieux C, Park D, Beschastnikh I (2015) General LTL specification mining. In: Proc. ASE, pp 81–92
Lemos OAL, Bajracharya SK, Ossher J, Morla RS, Masiero PC, Baldi P, Lopes CV (2007) Codegenie: using test-cases to search and reuse source code. In: Proc. ASE, pp 525–526
Linares-Vásquez M, Bavota G, Bernal-Cárdenas C, Oliveto R, Di Penta M, Poshyvanyk D (2014) Mining energy-greedy api usage patterns in android apps: an empirical study. In: Proc. MSR, pp 2–11
Liu X, Huang L, Ng V (2018) Effective api recommendation without historical software repositories. In: Proc. ASE, pp 282–292
Lo D, Khoo SC (2006) Smartic: Towards building an accurate, robust and scalable specification miner. In: Proc. ESEC/FSE, pp 265–275
Lv F, Zhang H, Lou Jg, Wang S, Zhang D, Zhao J (2015) Codehow: Effective code search based on api understanding and extended boolean model (e). In: Proc. ASE, pp 260–270
Mandelin D, Xu L, Bodík R, Kimelman D (2005) Jungloid mining: helping to navigate the API jungle. In: Proc. PLDI, pp 48–61
Maoz S, Ringert JO (2015) GR(1) synthesis for LTL specification patterns. In: Proc. ESEC/FSE, pp 96–106
McDonnell T, Ray B, Kim M (2013) An empirical study of API stability and adoption in the android ecosystem. In: Proc. ICSM, pp 70–79
McMillan C, Grechanik M, Poshyvanyk D, Fu C, Xie Q (2011) Exemplar: a source code search engine for finding highly relevant applications. IEEE Trans Softw Eng 38(5):1069–1087
Michail A (2000) Data mining library reuse patterns using generalized association rules. In: Proc. ICSE, pp 167–176
Monperrus M, Eichberg M, Tekes E, Mezini M (2012) What should developers be aware of? an empirical study on the directives of api documentation. Empir Softw Eng 17(6):703–737
Nguyen T, Vu P, Nguyen T (2020) Code recommendation for exception handling. In: Proc. ESEC/FSE, pp 1027–1038
Niu H, Keivanloo I, Zou Y (2017) Learning to rank code examples for code search engines. Empir Softw Eng 22(1):259–291
Piccioni M, Furia CA, Meyer B (2013) An empirical study of API usability. In: Proc. ESEM, pp 5–14
Reiss SP (2009) Semantics-based code search. In: Proc. ICSE, pp 243–253
Robillard M P, DeLine R (2011) A field study of API learning obstacles. Empir Softw Eng 16(6):703–732
Sadowski C, Stolee KT, Elbaum S (2015) How developers search for code: a case study. In: Proc. ESEC/FSE, pp 191–201
Saied MA, Abdeen H, Benomar O, Sahraoui H (2015) Could we infer unordered api usage patterns only using the library source code?. In: Proc. ICPC, pp 71–81
Saied M A, Ouni A, Sahraoui H, Kula R G, Inoue K, Lo D (2018) Improving reusability of software libraries through usage pattern mining. J Syst Softw 145:164–179
Saied M A, Raelijohn E, Batot E, Famelis M, Sahraoui H (2020) Towards assisting developers in api usage by automated recovery of complex temporal patterns. Inf Softw Technol 119:106213
Sawant AA, Robbes R, Bacchelli A (2016) On the reaction to deprecation of 25,357 clients of 4 + 1 popular Java APIs. In: Proc. ICSME, pp 400–410
Scaffidi C (2005) Why are APIs difficult to learn and use? Crossroads 12(4):4–4
Sim S E, Umarji M, Ratanotayanon S, Lopes C V (2011) How well do search engines support code retrieval on the web? ACM Trans Softw Eng Methodol 21(1):1–25
Stolee K T, Elbaum S, Dobos D (2014) Solving the search for source code. ACM Trans Softw Eng Methodol 23(3):1–45
Sven A, Nguyen HA, Nadi S, Nguyen TN, Mezini M (2019) Investigating next steps in static API-misuse detection. In: Proc. MSR, pp 265–275
Tansalarak N, Claypool K (2006) XSnippet: mining for sample code. In: Proc 21st OOPSLA pp 413–430
Thummalapenta S, Xie T (2007) PARSEWeb: a programmer assistant for reusing open source code on the web. In: Proc. 22nd ASE, pp 204–213
Thung F, Lo D, Lawall J (2013) Automated library recommendation. In: Proc. WCRE, pp 182–191
Wang Y, Dong J, Shah R, Dillig I (2019) Synthesizing database applications for schema refactoring. In: Proc. PLDI, p to appear
Xia X, Bao L, Lo D, Kochhar PS, Hassan AE, Xing Z (2017) What do developers search for on the web? Empir Softw Eng 22(6):3149–3185
Yang J, Evans D, Bhardwaj D, Bhat T, Das M (2006) Perracotta: mining temporal API rules from imperfect traces. In: Proc. 28th ICSE, pp 282–291
Ying AT, Robillard MP (2014) Selection and presentation practices for code example summarization. In: Proc. ESEC/FSE, pp 460–471
Zeng H, Chen J, Shen B, Zhong H (2021) Mining API constraints from library and client to detect API misuses. In: Proc. APSEC, pp 161–170
Zhang H, Wang S, Chen THP, Zou Y, Hassan AE (2019) An empirical study of obsolete answers on stack overflow. IEEE Trans Softw Eng
Zhang N, Zou Y, Xia X, Huang Q, Lo D, Li S (2022) Web APIs: Features, issues, and expectations–a large-scale empirical study of Web APIs from two publicly accessible registries using stack overflow and a user survey. IEEE Trans Softw Eng
Zhang T, Upadhyaya G, Reinhardt A, Rajan H, Kim M (2018) Are code examples on an online q&a forum reliable?: a study of api misuse on stack overflow. In: Proc. ICSE, pp 886–896
Zhong H, Mei H (2019) An empirical study on API usages. IEEE Trans Softw Eng 45(4):319–334
Zhong H, Su Z (2013) Detecting API documentation errors. In: Proc. OOPSLA, pp 803–816
Zhong H, Wang X (2017) Boosting complete-code tools for partial program. In: Proc. ASE, pp 671–681
Zhong H, Xie T, Zhang L, Pei J, Mei H (2009) MAPO: Mining and recommending API usage patterns. In: Proc. 23rd ECOOP, pp 318–343
Zhong H, Meng N, Li Z, Jia L (2020) An empirical study on API parameter rules. In: Proc. ICSE, pp 899–911
Zhong H, Wang X, Mei H (2022) Inferring bug signatures to detect real bugs. IEEE Trans Softw Eng 48(2):571–584
Zhou S, Shen B, Zhong H (2019) Lancer: Your code tell me what you need. In: Proc. ASE, pp 1202–1205
Zhu Z, Zou Y, Xie B, Jin Y, Lin Z, Zhang L (2014) Mining API usage examples from test code. In: Proc. ICSME, pp 301–310
Acknowledgments
We appreciate reviewers for their insightful comments. Hao Zhong is sponsored by the National Nature Science Foundation of China No. 62232003 and 62272295. Xiaoyin Wang is supported in part by NSF Grant CCF-1846467.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Ali Ouni
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhong, H., Wang, X. An empirical study on API usages from code search engine and local library. Empir Software Eng 28, 63 (2023). https://doi.org/10.1007/s10664-023-10304-z
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-023-10304-z