Skip to main content

Advertisement

SpringerLink
Log in
Menu
Find a journal Publish with us
Search
Cart
Book cover

International Conference on Fundamental Approaches to Software Engineering

FASE 2022: Fundamental Approaches to Software Engineering pp 225–244Cite as

  1. Home
  2. Fundamental Approaches to Software Engineering
  3. Conference paper
Semantic Code Search in Software Repositories using Neural Machine Translation

Semantic Code Search in Software Repositories using Neural Machine Translation

  • Evangelos Papathomas  ORCID: orcid.org/0000-0001-8373-516510,
  • Themistoklis Diamantopoulos  ORCID: orcid.org/0000-0002-0520-722510 &
  • Andreas Symeonidis  ORCID: orcid.org/0000-0003-0235-604610 
  • Conference paper
  • Open Access
  • First Online: 29 March 2022
  • 2469 Accesses

  • 1 Citations

  • 1 Altmetric

Part of the Lecture Notes in Computer Science book series (LNCS,volume 13241)

Abstract

Nowadays, software development is accelerated through the reuse of code snippets found online in question-answering platforms and software repositories. In order to be efficient, this process requires forming an appropriate query and identifying the most suitable code snippet, which can sometimes be challenging and particularly time-consuming. Over the last years, several code recommendation systems have been developed to offer a solution to this problem. Nevertheless, most of them recommend API calls or sequences instead of reusable code snippets. Furthermore, they do not employ architectures advanced enough to exploit the semantics of natural language and code in order to form the optimal query from the question posed. To overcome these issues, we propose CodeTransformer, a code recommendation system that provides useful, reusable code snippets extracted from open-source GitHub repositories. By employing a neural network architecture that comprises advanced attention mechanisms, our system effectively understands and models natural language queries and code snippets in a joint vector space. Upon evaluating CodeTransformer quantitatively against a similar system and qualitatively using a dataset from Stack Overflow, we conclude that our approach can recommend useful and reusable snippets to developers.

Keywords

  • code reuse
  • semantic analysis
  • neural transformers

Download conference paper PDF

References

  1. Allamanis, M.: The Adverse Effects of Code Duplication in Machine Learning Models of Code. In: Proceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software. p. 143–153. Onward! 2019, Association for Computing Machinery, New York, NY, USA (2019)

    Google Scholar 

  2. Bernhardsson, E.: Annoy: Approximate Nearest Neighbors in C++/Python (2018), https://pypi.org/project/annoy/, Python package version 1.13.0

  3. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics 5, 135–146 (2017)

    Google Scholar 

  4. Cai, L., Wang, H., Huang, Q., Xia, X., Xing, Z., Lo, D.: BIKER: A Tool for Bi-Information Source Based API Method Recommendation. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. p. 1075–1079. ESEC/FSE 2019, ACM, New York, NY, USA (2019)

    Google Scholar 

  5. Campbell, B.A., Treude, C.: NLP2Code: Code Snippet Content Assist via Natural Language Tasks. In: Proceedings of the 2017 IEEE International Conference on Software Maintenance and Evolution. pp. 628–632. ICSME 2017, IEEE Computer Society, Los Alamitos, CA, USA (2017)

    Google Scholar 

  6. Chen, C., Peng, X., Sun, J., Xing, Z., Wang, X., Zhao, Y., Zhang, H., Zhao, W.: Generative API Usage Code Recommendation with Parameter Concretization. Science China Information Sciences 62(9), 192103 (2019)

    Google Scholar 

  7. Craswell, N.: Mean Reciprocal Rank, p. 1703. In: Liu, Ling and Özsu, M. Tamer (eds), Encyclopedia of Database Systems, Springer, Boston, MA (2009)

    Google Scholar 

  8. Diamantopoulos, T., Oikonomou, N., Symeonidis, A.: Extracting Semantics from Question-Answering Services for Snippet Reuse. In: Proceedings of the 23rd International Conference on Fundamental Approaches to Software Engineering. pp. 119–139. Dublin, Ireland (2020)

    Google Scholar 

  9. Gu, X., Zhang, H., Kim, S.: Deep Code Search. In: Proceedings of the 40th International Conference on Software Engineering. p. 933–944. ICSE ’18, Association for Computing Machinery, New York, NY, USA (2018)

    Google Scholar 

  10. Gu, X., Zhang, H., Zhang, D., Kim, S.: Deep API Learning. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. pp. 631–642. FSE 2016, ACM, New York, NY, USA (2016)

    Google Scholar 

  11. Heidarian, A., Dinneen, M.J.: A Hybrid Geometric Approach for Measuring Similarity Level Among Documents and Document Clustering. In: Proceedings of the 2016 IEEE Second International Conference on Big Data Computing Service and Applications. pp. 142–151. BigDataService 2016, IEEE Computer Society, Los Alamitos, CA, USA (2016)

    Google Scholar 

  12. Husain, H., Wu, H.H., Gazit, T., Allamanis, M., Brockschmidt, M.: CodeSearchNet Challenge: Evaluating the State of Semantic Code Search (2019)

    Google Scholar 

  13. Järvelin, K., Kekäläinen, J.: Cumulated Gain-Based Evaluation of IR Techniques. ACM Trans. Inf. Syst. 20(4), 422—446 (2002)

    Google Scholar 

  14. Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. In: Proceedings of the 3rd International Conference on Learning Representations. pp. 1–15. ICLR 2015, San Diego, CA, USA (2015)

    Google Scholar 

  15. Li, X., Jiang, H., Kamei, Y., Chen, X.: Bridging Semantic Gaps between Natural Languages and APIs with Word Embedding. IEEE Transactions on Software Engineering pp. 1–17 (2018)

    Google Scholar 

  16. Lopes, C.V., Maj, P., Martins, P., Saini, V., Yang, D., Zitny, J., Sajnani, H., Vitek, J.: DéJàVu: A Map of Code Duplicates on GitHub. Proc. ACM Program. Lang. 1(OOPSLA) (2017)

    Google Scholar 

  17. Nguyen, A.T., Nguyen, T.N.: Graph-Based Statistical Language Model for Code. In: Proceedings of the 37th International Conference on Software Engineering - Volume 1. p. 858–868. ICSE ’15, IEEE Press (2015)

    Google Scholar 

  18. Nguyen, P.T., Di Rocco, J., Di Ruscio, D., Ochoa, L., Degueule, T., Di Penta, M.: FOCUS: A Recommender System for Mining API Function Calls and Usage Patterns. In: Proceedings of the 41st International Conference on Software Engineering. p. 1050–1060. ICSE ’19, IEEE Press (2019)

    Google Scholar 

  19. Nguyen, T., Rigby, P.C., Nguyen, A.T., Karanfil, M., Nguyen, T.N.: T2API: Synthesizing API Code Usage Templates from English Texts with Statistical Translation. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. pp. 1013–1017. FSE 2016, ACM, New York, NY, USA (2016)

    Google Scholar 

  20. Ponzanelli, L., Bacchelli, A., Lanza, M.: Seahawk: Stack Overflow in the IDE. In: Proceedings of the 2013 International Conference on Software Engineering. pp. 1295–1298. ICSE ’13, IEEE Press, Piscataway, NJ, USA (2013)

    Google Scholar 

  21. Raghothaman, M., Wei, Y., Hamadi, Y.: SWIM: Synthesizing What I Mean: Code Search and Idiomatic Snippet Synthesis. In: Proceedings of the 38th International Conference on Software Engineering. pp. 357–367. ICSE ’16, ACM, New York, NY, USA (2016)

    Google Scholar 

  22. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, u., Polosukhin, I.: Attention is All You Need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. p. 6000–6010. NIPS’17, Curran Associates Inc., Red Hook, NY, USA (2017)

    Google Scholar 

  23. Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., Wierstra, D.: Matching Networks for One Shot Learning. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. p. 3637–3645. NIPS’16, Curran Associates Inc., Red Hook, NY, USA (2016)

    Google Scholar 

  24. Xu, C., Sun, X., Li, B., Lu, X., Guo, H.: MULAPI: Improving API method recommendation with API usage location. Journal of Systems and Software 142, 195 – 205 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Electrical and Computer Engineering Dept., Aristotle University of Thessaloniki, Thessaloniki, Greece

    Evangelos Papathomas, Themistoklis Diamantopoulos & Andreas Symeonidis

Authors
  1. Evangelos Papathomas
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Themistoklis Diamantopoulos
    View author publications

    You can also search for this author in PubMed Google Scholar

  3. Andreas Symeonidis
    View author publications

    You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Evangelos Papathomas .

Editor information

Editors and Affiliations

  1. University of Oslo, Oslo, Norway

    Prof. Einar Broch Johnsen

  2. Johannes Kepler University of Linz, Linz, Austria

    Prof. Manuel Wimmer

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and Permissions

Copyright information

© 2022 The Author(s)

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Papathomas, E., Diamantopoulos, T., Symeonidis, A. (2022). Semantic Code Search in Software Repositories using Neural Machine Translation. In: Johnsen, E.B., Wimmer, M. (eds) Fundamental Approaches to Software Engineering. FASE 2022. Lecture Notes in Computer Science, vol 13241. Springer, Cham. https://doi.org/10.1007/978-3-030-99429-7_13

Download citation

  • .RIS
  • .ENW
  • .BIB
  • DOI: https://doi.org/10.1007/978-3-030-99429-7_13

  • Published: 29 March 2022

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-99428-0

  • Online ISBN: 978-3-030-99429-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • The European Joint Conferences on Theory and Practice of Software.

    Published in cooperation with

    http://www.etaps.org/

Search

Navigation

  • Find a journal
  • Publish with us

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our imprints

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support

Not affiliated

Springer Nature

© 2023 Springer Nature