Semantic Code Search in Software Repositories using Neural Machine Translation

Papathomas, Evangelos; Diamantopoulos, Themistoklis; Symeonidis, Andreas

doi:10.1007/978-3-030-99429-7_13

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13241))

Included in the following conference series:

International Conference on Fundamental Approaches to Software Engineering

3060 Accesses
1 Citations
1 Altmetric

Abstract

Nowadays, software development is accelerated through the reuse of code snippets found online in question-answering platforms and software repositories. In order to be efficient, this process requires forming an appropriate query and identifying the most suitable code snippet, which can sometimes be challenging and particularly time-consuming. Over the last years, several code recommendation systems have been developed to offer a solution to this problem. Nevertheless, most of them recommend API calls or sequences instead of reusable code snippets. Furthermore, they do not employ architectures advanced enough to exploit the semantics of natural language and code in order to form the optimal query from the question posed. To overcome these issues, we propose CodeTransformer, a code recommendation system that provides useful, reusable code snippets extracted from open-source GitHub repositories. By employing a neural network architecture that comprises advanced attention mechanisms, our system effectively understands and models natural language queries and code snippets in a joint vector space. Upon evaluating CodeTransformer quantitatively against a similar system and qualitatively using a dataset from Stack Overflow, we conclude that our approach can recommend useful and reusable snippets to developers.

Download to read the full chapter text

Chapter PDF

Enrich Code Search Query Semantics with Raw Descriptions

A Code Search Method Incorporating Code Annotations

Framing of Quality Questions for Quality Code Snippets

Keywords

References

Allamanis, M.: The Adverse Effects of Code Duplication in Machine Learning Models of Code. In: Proceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software. p. 143–153. Onward! 2019, Association for Computing Machinery, New York, NY, USA (2019)
Google Scholar
Bernhardsson, E.: Annoy: Approximate Nearest Neighbors in C++/Python (2018), https://pypi.org/project/annoy/, Python package version 1.13.0
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics 5, 135–146 (2017)
Google Scholar
Cai, L., Wang, H., Huang, Q., Xia, X., Xing, Z., Lo, D.: BIKER: A Tool for Bi-Information Source Based API Method Recommendation. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. p. 1075–1079. ESEC/FSE 2019, ACM, New York, NY, USA (2019)
Google Scholar
Campbell, B.A., Treude, C.: NLP2Code: Code Snippet Content Assist via Natural Language Tasks. In: Proceedings of the 2017 IEEE International Conference on Software Maintenance and Evolution. pp. 628–632. ICSME 2017, IEEE Computer Society, Los Alamitos, CA, USA (2017)
Google Scholar
Chen, C., Peng, X., Sun, J., Xing, Z., Wang, X., Zhao, Y., Zhang, H., Zhao, W.: Generative API Usage Code Recommendation with Parameter Concretization. Science China Information Sciences 62(9), 192103 (2019)
Google Scholar
Craswell, N.: Mean Reciprocal Rank, p. 1703. In: Liu, Ling and Özsu, M. Tamer (eds), Encyclopedia of Database Systems, Springer, Boston, MA (2009)
Google Scholar
Diamantopoulos, T., Oikonomou, N., Symeonidis, A.: Extracting Semantics from Question-Answering Services for Snippet Reuse. In: Proceedings of the 23rd International Conference on Fundamental Approaches to Software Engineering. pp. 119–139. Dublin, Ireland (2020)
Google Scholar
Gu, X., Zhang, H., Kim, S.: Deep Code Search. In: Proceedings of the 40th International Conference on Software Engineering. p. 933–944. ICSE ’18, Association for Computing Machinery, New York, NY, USA (2018)
Google Scholar
Gu, X., Zhang, H., Zhang, D., Kim, S.: Deep API Learning. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. pp. 631–642. FSE 2016, ACM, New York, NY, USA (2016)
Google Scholar
Heidarian, A., Dinneen, M.J.: A Hybrid Geometric Approach for Measuring Similarity Level Among Documents and Document Clustering. In: Proceedings of the 2016 IEEE Second International Conference on Big Data Computing Service and Applications. pp. 142–151. BigDataService 2016, IEEE Computer Society, Los Alamitos, CA, USA (2016)
Google Scholar
Husain, H., Wu, H.H., Gazit, T., Allamanis, M., Brockschmidt, M.: CodeSearchNet Challenge: Evaluating the State of Semantic Code Search (2019)
Google Scholar
Järvelin, K., Kekäläinen, J.: Cumulated Gain-Based Evaluation of IR Techniques. ACM Trans. Inf. Syst. 20(4), 422—446 (2002)
Google Scholar
Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. In: Proceedings of the 3rd International Conference on Learning Representations. pp. 1–15. ICLR 2015, San Diego, CA, USA (2015)
Google Scholar
Li, X., Jiang, H., Kamei, Y., Chen, X.: Bridging Semantic Gaps between Natural Languages and APIs with Word Embedding. IEEE Transactions on Software Engineering pp. 1–17 (2018)
Google Scholar
Lopes, C.V., Maj, P., Martins, P., Saini, V., Yang, D., Zitny, J., Sajnani, H., Vitek, J.: DéJàVu: A Map of Code Duplicates on GitHub. Proc. ACM Program. Lang. 1(OOPSLA) (2017)
Google Scholar
Nguyen, A.T., Nguyen, T.N.: Graph-Based Statistical Language Model for Code. In: Proceedings of the 37th International Conference on Software Engineering - Volume 1. p. 858–868. ICSE ’15, IEEE Press (2015)
Google Scholar
Nguyen, P.T., Di Rocco, J., Di Ruscio, D., Ochoa, L., Degueule, T., Di Penta, M.: FOCUS: A Recommender System for Mining API Function Calls and Usage Patterns. In: Proceedings of the 41st International Conference on Software Engineering. p. 1050–1060. ICSE ’19, IEEE Press (2019)
Google Scholar
Nguyen, T., Rigby, P.C., Nguyen, A.T., Karanfil, M., Nguyen, T.N.: T2API: Synthesizing API Code Usage Templates from English Texts with Statistical Translation. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. pp. 1013–1017. FSE 2016, ACM, New York, NY, USA (2016)
Google Scholar
Ponzanelli, L., Bacchelli, A., Lanza, M.: Seahawk: Stack Overflow in the IDE. In: Proceedings of the 2013 International Conference on Software Engineering. pp. 1295–1298. ICSE ’13, IEEE Press, Piscataway, NJ, USA (2013)
Google Scholar
Raghothaman, M., Wei, Y., Hamadi, Y.: SWIM: Synthesizing What I Mean: Code Search and Idiomatic Snippet Synthesis. In: Proceedings of the 38th International Conference on Software Engineering. pp. 357–367. ICSE ’16, ACM, New York, NY, USA (2016)
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, u., Polosukhin, I.: Attention is All You Need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. p. 6000–6010. NIPS’17, Curran Associates Inc., Red Hook, NY, USA (2017)
Google Scholar
Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., Wierstra, D.: Matching Networks for One Shot Learning. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. p. 3637–3645. NIPS’16, Curran Associates Inc., Red Hook, NY, USA (2016)
Google Scholar
Xu, C., Sun, X., Li, B., Lu, X., Guo, H.: MULAPI: Improving API method recommendation with API usage location. Journal of Systems and Software 142, 195 – 205 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Electrical and Computer Engineering Dept., Aristotle University of Thessaloniki, Thessaloniki, Greece
Evangelos Papathomas, Themistoklis Diamantopoulos & Andreas Symeonidis

Authors

Evangelos Papathomas
View author publications
You can also search for this author in PubMed Google Scholar
Themistoklis Diamantopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Symeonidis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Evangelos Papathomas .

Editor information

Editors and Affiliations

University of Oslo, Oslo, Norway
Einar Broch Johnsen
Johannes Kepler University of Linz, Linz, Austria
Manuel Wimmer

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Papathomas, E., Diamantopoulos, T., Symeonidis, A. (2022). Semantic Code Search in Software Repositories using Neural Machine Translation. In: Johnsen, E.B., Wimmer, M. (eds) Fundamental Approaches to Software Engineering. FASE 2022. Lecture Notes in Computer Science, vol 13241. Springer, Cham. https://doi.org/10.1007/978-3-030-99429-7_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-99429-7_13
Published: 29 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-99428-0
Online ISBN: 978-3-030-99429-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The European Joint Conferences on Theory and Practice of Software. (opens in a new tab)

Semantic Code Search in Software Repositories using Neural Machine Translation

Abstract

Chapter PDF

Similar content being viewed by others

Enrich Code Search Query Semantics with Raw Descriptions

A Code Search Method Incorporating Code Annotations

Framing of Quality Questions for Quality Code Snippets

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Semantic Code Search in Software Repositories using Neural Machine Translation

Abstract

Chapter PDF

Similar content being viewed by others

Enrich Code Search Query Semantics with Raw Descriptions

A Code Search Method Incorporating Code Annotations

Framing of Quality Questions for Quality Code Snippets

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation