Skip to main content

A Code Search Method Incorporating Code Annotations

  • Conference paper
  • First Online:
Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom 2023)

Abstract

Code search is a technique for users to retrieve code snippets from the Code base using natural language, which is dedicated to retrieve the target code accurately and quickly to improve the efficiency of software development. The deep learning based code search technique greatly improves the accuracy of search by learning the relationship between code and query statements. Since it relies on the extracted code features, acquiring more code features is the key to quickly improve the search performance. However, most of the previous works have not taken code annotations into consideration. In this paper, we take code annotations as code features and apply them to code search, which is named ICA-CS (Code Search that Incorporates Code Annotations). In the method, firstly, the code features are embedded to get the corresponding vector representation. It is then processed by bidirectional LSTM (Long Short-Term Memory) network or multi-head attention respectively, followed by features fusion. And finally, the model is trained by joint embedding and using the minimised ranking loss function. As the experimental results show, on the evaluation metric MRR (mean reciprocal rank) compared to the state-of-the-art models DeepCS, SAN-CS, CARLCS-CNN and SelfAtt, the proposed model improves 48.96%, 17.11%, 41.01% and 13.07%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Zeng, C., et al.: deGraphCS: embedding variable-based flow graph for neural code search. ACM Trans. Softw. Eng. Methodol. 32, 34 (2023)

    Google Scholar 

  2. Cambronero, J., Li, H., Kim, S., Sen, K., Chandra, S.: When deep learning met code search. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 964–974 (2019)

    Google Scholar 

  3. Chen, Q., Zhou, M.: A neural framework for retrieval and summarization of source code. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pp. 826–831 , numpages = 6 Association for Computing Machinery (2018)

    Google Scholar 

  4. Shuai, J., et al.: Improving code search with co-attentive representation learning. In: Proceedings of the 28th International Conference on Program Comprehension, pp. 196–207 (2020)

    Google Scholar 

  5. Wan, Y., et al.: Multi-modal attention network learning for semantic source code retrieval. In: 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 13–25 IEEE (2019)

    Google Scholar 

  6. Ling, X., et al.: Deep graph matching and searching for semantic code retrieval. ACM Trans. Knowl. Discov. Data 15, 88 (2021)

    Google Scholar 

  7. Brandt, J., Guo, P.J., Lewenstein, J., Dontcheva, M., Klemmer, S.R.: Two studies of opportunistic programming: interleaving web foraging, learning, and writing code. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems 1589–1598 Association for Computing Machinery, Boston, MA, USA (2009)

    Google Scholar 

  8. Robillard, M.P.: What makes APIs hard to learn? Answers from developers. IEEE Softw. 26, 27–34 (2009)

    Google Scholar 

  9. GitHub. The 2022 State of the Octoverse. https://octoverse.github.com/. Accessed 15 Mar 2023

  10. Rahman, M.M. et al.: Evaluating how developers use general-purpose web-search for code retrieval. In: Proceedings of the 15th International Conference on Mining Software Repositories, pp. 465–475. Association for Computing Machinery, Gothenburg, Sweden (2018)

    Google Scholar 

  11. Grazia, L.D., Pradel, M.: Code search: a survey of techniques for finding code. ACM Comput. Surv. 55, 220 (2023)

    Google Scholar 

  12. Furnas, G.W., Landauer, T.K., Gomez, L.M., Dumais, S.T.: The vocabulary problem in human-system communication. Commun. ACM 30, 964–971 (1987)

    Article  Google Scholar 

  13. Kevic, K., Fritz, T.: Automatic search term identification for change tasks. In: Companion Proceedings of the 36th International Conference on Software Engineering 468–471. Association for Computing Machinery, Hyderabad, India (2014)

    Google Scholar 

  14. Lv, F., et al.: CodeHow: effective code search based on API understanding and extended boolean model (E). In: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 260–270 (2015)

    Google Scholar 

  15. McMillan, C., Grechanik, M., Poshyvanyk, D., Xie, Q., Fu, C.: Portfolio: finding relevant functions and their usage. In: Proceedings of the 33rd International Conference on Software Engineering, pp. 111–120. Association for Computing Machinery, Waikiki, Honolulu, HI, USA (2011)

    Google Scholar 

  16. Langville, A.N., Meyer, C.D.: Google’s PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press, Princeton (2006)

    Book  Google Scholar 

  17. Meili, L., Sun, X., Wang, S., Lo, D., Yucong, D.: Query expansion via WordNet for effective code search. In: 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), pp. 545–549 (2015)

    Google Scholar 

  18. Hu, F., et al.: Revisiting code search in a two-stage paradigm. In: Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining 994–1002 Association for Computing Machinery, Singapore, Singapore (2023)

    Google Scholar 

  19. Deng, Z., et al.: Fine-grained co-attentive representation learning for semantic code search. In: 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 396–407 (2022)

    Google Scholar 

  20. Gu, X., Zhang, H., Kim, S.: Deep code search. In: Proceedings of the 40th International Conference on Software Engineering, pp. 933–944 (2018)

    Google Scholar 

  21. Liu, S., et al.: GraphSearchNet: Enhancing GNNs via capturing global dependency for semantic code search. IEEE Trans. Softw. Eng. 49, 1–16 (2023)

    Google Scholar 

  22. Sridhara, G., Hill, E., Muppaneni, D., Pollock, L., Vijay-Shanker, K.: Towards automatically generating summary comments for Java methods. In: Proceedings of the 25th IEEE/ACM International Conference on Automated Software Engineering, pp. 43–52. Association for Computing Machinery, Antwerp, Belgium (2010)

    Google Scholar 

  23. Chen, X., Yu, C., Yang, G., et al.: Bash code comment generation method based on dual information retrieval. J. Softw. 34(03), 1310–1329 (2023)

    Google Scholar 

  24. Song, Q.W.: Research on Code Search Technology Based on Features of Code and Comment. Southeast University, Nanjing (2020)

    Google Scholar 

  25. Husain, H., Wu, H.-H., Gazit, T., Allamanis, M., Brockschmidt, M.: CodeSearchNet Challenge: Evaluating the State of Semantic Code Search. arXiv:1909.09436 (2019)

  26. Xu, R., Xiong, C., Chen, W. & Corso, J.J. Jointly modeling deep video and compositional text to bridge vision and language in a unified framework. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 2346–2352 AAAI Press, Austin, Texas (2015)

    Google Scholar 

  27. Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3128–3137 (2015)

    Google Scholar 

  28. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, vol. 2, pp. 3111–3119. Curran Associates Inc., Lake Tahoe, Nevada (2013)

    Google Scholar 

  29. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  30. Fang, S., Tan, Y.-S., Zhang, T., Liu, Y.: Self-attention networks for code search. Inf. Softw. Technol. 134, 106542 (2021)

    Google Scholar 

  31. Sak, H., Senior, A., Beaufays, F.: Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition. arXiv:1402.1128 (2014)

  32. Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6000–6010. Curran Associates Inc., Long Beach, California, USA (2017)

    Google Scholar 

  33. Collobert, R., et al.: Natural language processing (Almost) from Scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)

    Google Scholar 

  34. Frome, A., et al.: DeViSE: a deep visual-semantic embedding model. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, vol. 2, pp. 2121–2129 Curran Associates Inc., Lake Tahoe, Nevada (2013)

    Google Scholar 

  35. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 (2018)

  36. Linstead, E., et al.: Sourcerer: mining and searching internet-scale software repositories. Data Min. Knowl. Disc. 18, 300–336 (2009)

    Article  MathSciNet  Google Scholar 

  37. Liu, M., Yin, H.: Cross attention network for semantic segmentation. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 2434–2438 (2019)

    Google Scholar 

  38. Bai, X.: Text classification based on LSTM and attention. In: 2018 Thirteenth International Conference on Digital Information Management (ICDIM), pp. 29–32 (2018)

    Google Scholar 

  39. Yadav, S., Rai, A.: Frequency and temporal convolutional attention for text-independent speaker recognition. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6794–6798 (2020)

    Google Scholar 

  40. Ueda, T., Okada, M., Mori, N., Hashimoto, K.: A method to estimate request sentences using LSTM with self-attention mechanism. In: 2019 8th International Congress on Advanced Applied Informatics (IIAI-AAI), pp. 7–10 (2019)

    Google Scholar 

  41. Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. arXiv:1805.08318 (2018)

  42. Zhang, P., Zhu, H., Xiong, T., Yang, Y.: Co-attention network and low-rank bilinear pooling for aspect based sentiment analysis. In: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6725–6729 (2019)

    Google Scholar 

  43. Xu, L., et al.: Two-stage attention-based model for code search with textual and structural features. In: 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 342–353 (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianxun Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, Q., Liu, J., Zhang, X. (2024). A Code Search Method Incorporating Code Annotations. In: Gao, H., Wang, X., Voros, N. (eds) Collaborative Computing: Networking, Applications and Worksharing. CollaborateCom 2023. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 561. Springer, Cham. https://doi.org/10.1007/978-3-031-54521-4_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-54521-4_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-54520-7

  • Online ISBN: 978-3-031-54521-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics