Advertisement

Automated Software Engineering

, Volume 26, Issue 4, pp 705–732 | Cite as

Enhance code search via reformulating queries with evolving contexts

  • Qing HuangEmail author
  • Guoqing Wu
Article
  • 54 Downloads

Abstract

To improve code search, many query expansion (QE) approaches use APIs or crowd knowledge for expanding a query. However, these approaches may sometimes negatively impact the retrieval performance. This is because they can’t distinguish the relevant terms from the irrelevant ones among a large set of candidate expansion terms and expand a query with irrelevant terms. In this paper, we propose QREC, a query reformulation approach with evolving contexts that refer to new/deleted terms and dependent terms during the code evolution. By considering the new terms as the relevant and the deleted terms as the irrelevant, QREC could reformulate a query with appropriate expansion terms. The experimental results show that QREC outperforms the state-of-the-art QE approaches (e.g., CodeHow and QECK) by 9–11% and improves the precision of the code search algorithms IR, Portfolio and VF by up to 37–45%.

Keywords

Code search Query reformulation Evolving context Code changes Statistical learning 

Notes

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (Grant Nos. 61902162, 61762049, 61872272, 61877031, 61802350, 61862033, 61772246, 61562042 and 61672470).

References

  1. Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM Comput. Surv. 44(1), 1 (2012)CrossRefGoogle Scholar
  2. Chaparro, O., Florez, J.M., Marcus, A.: Using observed behavior to reformulate queries during text retrieval-based bug localization. In: IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE (2017)Google Scholar
  3. Fischer, G., Henninger, S., Redmiles, D.: Cognitive tools for locating and comprehending software objects for reuse. In: Proceedings of the 13th International Conference on Software Engineering, pp. 318–328 (1991)Google Scholar
  4. Fluri, B., Wursch, M., Pinzger, M., Gall, H.C.: Change distilling—tree differencing for fine-grained source code change extraction. IEEE Trans. Softw. Eng. 33(11), 725–743 (2007)CrossRefGoogle Scholar
  5. Haiduc, S., Bavota, G., Marcus, A., Oliveto, R., De Andrea, L., Menzies, T.: Automatic query reformulations for text retrieval in software engineering. In: Proceedings of the 35th International Conference on Software Engineering (ICSE), pp. 842–851 (2013)Google Scholar
  6. Howard, M.J., Gupta, S., Pollock, L., Vijay-Shanker, K.: Automatically mining software-based, semantically-similar words from comment code mappings. In: Proceedings of the 10th Working Conference on Mining Software Repositories, pp. 377–386 (2013)Google Scholar
  7. Keivanloo, I., Rilling, J., Zou, Y.: Spotting working code examples. In: Proceedings of the 36th International Conference on Software Engineering, pp. 664–675 (2014)Google Scholar
  8. Lemos, O., Bajracharya, S., Ossher, J., Morla, R., Masiero, P., Baldi, P., Lopes, C.: CodeGenie: using test-cases to search and reuse source code. In: Proceedings of the Twenty-Second IEEE/ACM International Conference on Automated Software Engineering, pp. 525–526 (2007)Google Scholar
  9. Lv, F., Zhang, H., Lou, J.-G., Wang, S., Zhang, D., Zhao, J.: CodeHow: effective code search based on API understanding and extended boolean model (E). In: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 260–270 (2015)Google Scholar
  10. Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)CrossRefGoogle Scholar
  11. McMillan, C., Grechanik, M., Poshyvanyk, D., Fu, C., Xie, Q.: Exemplar: a source code search engine for finding highly relevant applications. IEEE Trans. Softw. Eng. 38(5), 1069–1087 (2012)CrossRefGoogle Scholar
  12. Mcmillan, C., Poshyvanyk, D., Grechanik, M., Xie, Q., Fu, C.: Portfolio: searching for relevant functions and their usages in millions of lines of code. ACM Trans. Softw. Eng. Methodol. 22(4), 1–30 (2013)CrossRefGoogle Scholar
  13. Nguyen, A.T., Hilton, M., Codoban, M., Nguyen, H.A., Mast, L., Rademacher, E., Nguyen, T.N., Dig, D.: API code recommendation using statistical learning from fine-grained changes. In: Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 511–522 (2016)Google Scholar
  14. Nie, L., Jiang, H., Ren, Z., Sun, Z., Li, X.: Query expansion based on crowd knowledge for code search. IEEE Trans. Serv. Comput. 9(5), 771–783 (2016)CrossRefGoogle Scholar
  15. Proksch, S., Amann, S., Nadi, S., Mezini, M.: Evaluating the evaluations of code recommender systems: a reality check. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, Singapore, pp. 111–121 (2016)Google Scholar
  16. Sadowski, C., Stolee, K.T., Elbaum, S.: How users search for code: a case study. Presented at the Proceedings, 10th Joint Meeting Foundations of Software Engineering (2015)Google Scholar
  17. Salton, G., Fox, E.A., Wu, H.: Extended boolean information retrieval. Commun. ACM 26, 1022–1036 (1983)MathSciNetCrossRefGoogle Scholar
  18. Sim, S.E, Clarke, C.L.A., Holt, R.C.: Archetypal source code searches: a survey of software users and maintainers. In: International Workshop on Program Comprehension, Iwpc ’98, Proceedings. IEEE, pp. 180–187 (1998)Google Scholar
  19. Sridhara, G., Hill, E., Pollock, L.L., Vijay-Shanker, K.: Identifying word relations in software: a comparative study of semantic similarity tools. In: Proceedings 16th IEEE International Conference on Program Comprehension (ICPC 08), pp. 123–132 (2008)Google Scholar
  20. Stolee, K.T., Elbaum, S., Dobos, D.: Solving the search for source code. ACM Trans. Softw. Eng. Methodol. (TOSEM) 23(3), 26 (2014)CrossRefGoogle Scholar
  21. Sun, X., Liu, X., Hu, J., Zhu, J.: Empirical studies on the NLP techniques for source code data preprocessing. In: Proceedings of the 3rd International Workshop on Evidential Assessment of Software Technologies, pp. 32–39 (2014)Google Scholar
  22. Tian, Y., Lo, D., Lawall, J.: SEWordSim: software-specific word similarity database. In: Companion Proceedings of the 36th International Conference on Software Engineering. ACM (2014)Google Scholar
  23. Xu, B., Lin, H., Lin, Y.: Assessment of learning to rank methods for query expansion. J. Assoc. Inf. Sci. Technol. (JASIST) 67(6), 1345–1357 (2016)MathSciNetCrossRefGoogle Scholar
  24. Ye, X., Shen, H., Ma, X., Bunescu, R.C., Liu, C.: From word embeddings to document similarities for improved information retrieval in software engineering. In: Proceedings of the 38th International Conference on Software Engineering, pp. 404–415 (2016)Google Scholar
  25. Youm, K.C., Ahn, J., Lee, E.: Improved bug localization based on code change histories and bug reports. Inf. Softw. Technol. 82, 177–192 (2017)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of Computer and Information EngineeringJiangxi Normal UniversityNanchangChina
  2. 2.State Key Laboratory of Software Engineering, School of ComputerWuhan UniversityWuhanChina

Personalised recommendations