A Code Search Method Incorporating Code Annotations

Li, Qi; Liu, Jianxun; Zhang, Xiangping

doi:10.1007/978-3-031-54521-4_18

Qi Li^18,19,
Jianxun Liu^18,19 &
Xiangping Zhang^18,19

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 561))

Included in the following conference series:

International Conference on Collaborative Computing: Networking, Applications and Worksharing

98 Accesses

Abstract

Code search is a technique for users to retrieve code snippets from the Code base using natural language, which is dedicated to retrieve the target code accurately and quickly to improve the efficiency of software development. The deep learning based code search technique greatly improves the accuracy of search by learning the relationship between code and query statements. Since it relies on the extracted code features, acquiring more code features is the key to quickly improve the search performance. However, most of the previous works have not taken code annotations into consideration. In this paper, we take code annotations as code features and apply them to code search, which is named ICA-CS (Code Search that Incorporates Code Annotations). In the method, firstly, the code features are embedded to get the corresponding vector representation. It is then processed by bidirectional LSTM (Long Short-Term Memory) network or multi-head attention respectively, followed by features fusion. And finally, the model is trained by joint embedding and using the minimised ranking loss function. As the experimental results show, on the evaluation metric MRR (mean reciprocal rank) compared to the state-of-the-art models DeepCS, SAN-CS, CARLCS-CNN and SelfAtt, the proposed model improves 48.96%, 17.11%, 41.01% and 13.07%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Zeng, C., et al.: deGraphCS: embedding variable-based flow graph for neural code search. ACM Trans. Softw. Eng. Methodol. 32, 34 (2023)
Google Scholar
Cambronero, J., Li, H., Kim, S., Sen, K., Chandra, S.: When deep learning met code search. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 964–974 (2019)
Google Scholar
Chen, Q., Zhou, M.: A neural framework for retrieval and summarization of source code. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pp. 826–831 , numpages = 6 Association for Computing Machinery (2018)
Google Scholar
Shuai, J., et al.: Improving code search with co-attentive representation learning. In: Proceedings of the 28th International Conference on Program Comprehension, pp. 196–207 (2020)
Google Scholar
Wan, Y., et al.: Multi-modal attention network learning for semantic source code retrieval. In: 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 13–25 IEEE (2019)
Google Scholar
Ling, X., et al.: Deep graph matching and searching for semantic code retrieval. ACM Trans. Knowl. Discov. Data 15, 88 (2021)
Google Scholar
Brandt, J., Guo, P.J., Lewenstein, J., Dontcheva, M., Klemmer, S.R.: Two studies of opportunistic programming: interleaving web foraging, learning, and writing code. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems 1589–1598 Association for Computing Machinery, Boston, MA, USA (2009)
Google Scholar
Robillard, M.P.: What makes APIs hard to learn? Answers from developers. IEEE Softw. 26, 27–34 (2009)
Google Scholar
GitHub. The 2022 State of the Octoverse. https://octoverse.github.com/. Accessed 15 Mar 2023
Rahman, M.M. et al.: Evaluating how developers use general-purpose web-search for code retrieval. In: Proceedings of the 15th International Conference on Mining Software Repositories, pp. 465–475. Association for Computing Machinery, Gothenburg, Sweden (2018)
Google Scholar
Grazia, L.D., Pradel, M.: Code search: a survey of techniques for finding code. ACM Comput. Surv. 55, 220 (2023)
Google Scholar
Furnas, G.W., Landauer, T.K., Gomez, L.M., Dumais, S.T.: The vocabulary problem in human-system communication. Commun. ACM 30, 964–971 (1987)
Article Google Scholar
Kevic, K., Fritz, T.: Automatic search term identification for change tasks. In: Companion Proceedings of the 36th International Conference on Software Engineering 468–471. Association for Computing Machinery, Hyderabad, India (2014)
Google Scholar
Lv, F., et al.: CodeHow: effective code search based on API understanding and extended boolean model (E). In: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 260–270 (2015)
Google Scholar
McMillan, C., Grechanik, M., Poshyvanyk, D., Xie, Q., Fu, C.: Portfolio: finding relevant functions and their usage. In: Proceedings of the 33rd International Conference on Software Engineering, pp. 111–120. Association for Computing Machinery, Waikiki, Honolulu, HI, USA (2011)
Google Scholar
Langville, A.N., Meyer, C.D.: Google’s PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press, Princeton (2006)
Book Google Scholar
Meili, L., Sun, X., Wang, S., Lo, D., Yucong, D.: Query expansion via WordNet for effective code search. In: 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), pp. 545–549 (2015)
Google Scholar
Hu, F., et al.: Revisiting code search in a two-stage paradigm. In: Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining 994–1002 Association for Computing Machinery, Singapore, Singapore (2023)
Google Scholar
Deng, Z., et al.: Fine-grained co-attentive representation learning for semantic code search. In: 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 396–407 (2022)
Google Scholar
Gu, X., Zhang, H., Kim, S.: Deep code search. In: Proceedings of the 40th International Conference on Software Engineering, pp. 933–944 (2018)
Google Scholar
Liu, S., et al.: GraphSearchNet: Enhancing GNNs via capturing global dependency for semantic code search. IEEE Trans. Softw. Eng. 49, 1–16 (2023)
Google Scholar
Sridhara, G., Hill, E., Muppaneni, D., Pollock, L., Vijay-Shanker, K.: Towards automatically generating summary comments for Java methods. In: Proceedings of the 25th IEEE/ACM International Conference on Automated Software Engineering, pp. 43–52. Association for Computing Machinery, Antwerp, Belgium (2010)
Google Scholar
Chen, X., Yu, C., Yang, G., et al.: Bash code comment generation method based on dual information retrieval. J. Softw. 34(03), 1310–1329 (2023)
Google Scholar
Song, Q.W.: Research on Code Search Technology Based on Features of Code and Comment. Southeast University, Nanjing (2020)
Google Scholar
Husain, H., Wu, H.-H., Gazit, T., Allamanis, M., Brockschmidt, M.: CodeSearchNet Challenge: Evaluating the State of Semantic Code Search. arXiv:1909.09436 (2019)
Xu, R., Xiong, C., Chen, W. & Corso, J.J. Jointly modeling deep video and compositional text to bridge vision and language in a unified framework. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 2346–2352 AAAI Press, Austin, Texas (2015)
Google Scholar
Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3128–3137 (2015)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, vol. 2, pp. 3111–3119. Curran Associates Inc., Lake Tahoe, Nevada (2013)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Fang, S., Tan, Y.-S., Zhang, T., Liu, Y.: Self-attention networks for code search. Inf. Softw. Technol. 134, 106542 (2021)
Google Scholar
Sak, H., Senior, A., Beaufays, F.: Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition. arXiv:1402.1128 (2014)
Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6000–6010. Curran Associates Inc., Long Beach, California, USA (2017)
Google Scholar
Collobert, R., et al.: Natural language processing (Almost) from Scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
Google Scholar
Frome, A., et al.: DeViSE: a deep visual-semantic embedding model. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, vol. 2, pp. 2121–2129 Curran Associates Inc., Lake Tahoe, Nevada (2013)
Google Scholar
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 (2018)
Linstead, E., et al.: Sourcerer: mining and searching internet-scale software repositories. Data Min. Knowl. Disc. 18, 300–336 (2009)
Article MathSciNet Google Scholar
Liu, M., Yin, H.: Cross attention network for semantic segmentation. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 2434–2438 (2019)
Google Scholar
Bai, X.: Text classification based on LSTM and attention. In: 2018 Thirteenth International Conference on Digital Information Management (ICDIM), pp. 29–32 (2018)
Google Scholar
Yadav, S., Rai, A.: Frequency and temporal convolutional attention for text-independent speaker recognition. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6794–6798 (2020)
Google Scholar
Ueda, T., Okada, M., Mori, N., Hashimoto, K.: A method to estimate request sentences using LSTM with self-attention mechanism. In: 2019 8th International Congress on Advanced Applied Informatics (IIAI-AAI), pp. 7–10 (2019)
Google Scholar
Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. arXiv:1805.08318 (2018)
Zhang, P., Zhu, H., Xiong, T., Yang, Y.: Co-attention network and low-rank bilinear pooling for aspect based sentiment analysis. In: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6725–6729 (2019)
Google Scholar
Xu, L., et al.: Two-stage attention-based model for code search with textual and structural features. In: 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 342–353 (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, 411201, Hunan, China
Qi Li, Jianxun Liu & Xiangping Zhang
Hunan Key Lab for Services Computing and Novel Software Technology, Hunan University of Science and Technology, Xiangtan, 411201, Hunan, China
Qi Li, Jianxun Liu & Xiangping Zhang

Authors

Qi Li
View author publications
You can also search for this author in PubMed Google Scholar
Jianxun Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiangping Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianxun Liu .

Editor information

Editors and Affiliations

Shanghai University, Shanghai, China
Honghao Gao
Xi’an Jiaotong-Liverpool, Suzhou, China
Xinheng Wang
University of Peloponnese, Patra, Greece
Nikolaos Voros

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Q., Liu, J., Zhang, X. (2024). A Code Search Method Incorporating Code Annotations. In: Gao, H., Wang, X., Voros, N. (eds) Collaborative Computing: Networking, Applications and Worksharing. CollaborateCom 2023. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 561. Springer, Cham. https://doi.org/10.1007/978-3-031-54521-4_18

Download citation

DOI: https://doi.org/10.1007/978-3-031-54521-4_18
Published: 23 February 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-54520-7
Online ISBN: 978-3-031-54521-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics