Advertisement

Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

QE-integrating framework based on Github knowledge and SVM ranking

Abstract

The latest query expansion (QE) methods use the software development features for expanding queries. However, these methods allow only one feature to be considered at a time. To consider additional features simultaneously, we propose a QE method based on Github knowledge; this is a new comprehensive feature that covers both the existing features (i.e., the application program interface (API) information and crowd knowledge). It is extracted from the “pull requests” of code repositories on Github, which contain descriptions of a request and its commits, the participants’ comments and the API information of the changed files. In addition, we implement a black-box framework that integrates multiple QE methods based on the support vector machine ranking called Github knowledge search repository (GKSR). Our empirical evaluation shows that the GKSR outperforms the state-of-the-art QE methods CodeHow and QECK by 25%–32% in terms of precision.

This is a preview of subscription content, log in to check access.

References

  1. 1

    Haiduc S, Bavota G, Marcus A, et al. Automatic query reformulations fortext retrieval in software engineering. In: Proceedings of the 35th International Conference on Software Engineering, San Francisco, 2013. 842–851

  2. 2

    Fischer G, Henninger S, Redmiles D. Cognitive tools for locating and comprehending software objects for reuse. In: Proceedings of the 13th International Conference on Software Engineering, Austin, 1991. 318–328

  3. 3

    Lv F, Zhang H Y, Lou J G, et al. CodeHow: effective code search based on API understanding and extended boolean model (E). In: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), Lincoln, 2015. 260–270

  4. 4

    Nie L M, Jiang H, Ren Z L, et al. Query expansion based on crowd knowledge for code search. IEEE Trans Serv Comput, 2016, 9: 771–783

  5. 5

    Manning C D, Raghavan P, Schtze H. Introduction to Information Retrieval. Cambridge: Cambridge University Press, 2008

  6. 6

    de Souza L B L, Campos E, Maia M A. Ranking crowd knowledge to assist software development. In: Proceedings of the 22nd International Conference on Program Comprehension, Hyderabad, 2014. 72–82

  7. 7

    Nguyen A T, Hilton M, Codoban M, et al. API code recommendation using statistical learning from fine-grained changes. In: Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Seattle, 2016. 511–522

  8. 8

    Haiduc S, Bavota G, Marcus A, et al. Automatic query reformulations for text retrieval in software engineering. In: Proceedings of the 35th International Conference on Software Engineering, San Francisco, 2013. 842–851

  9. 9

    Gay G, Haiduc S, Marcus A, et al. On the use of relevance feedback in IR-based concept location. In: Proceedings of IEEE International Conference on Software Maintenance, Edmonton, 2009. 351–360

  10. 10

    Mcmillan C, Poshyvanyk D, Grechanik M, et al. Portfolio: searching for relevant functions and their usages in millions of lines of code. ACM Trans Softw Eng Methodol, 2013, 22: 1–30

  11. 11

    Joachims T. Optimizing search engines using clickthrough data. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, 2002. 133–142

  12. 12

    Joachims T. Training linear SVMs in linear time. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, 2006. 217–226

  13. 13

    Salton G, Fox E A, Wu H. Extended Boolean Information Retrieval. New York: Cornell University, 1983

  14. 14

    Niu H, Keivanloo I, Zou Y. Learning to rank code examples for code search engines. Empir Softw Eng, 2017, 22: 259–291

  15. 15

    Jiang H, Nie L M, Sun Z Y, et al. ROSF: leveraging information retrieval and supervised learning for recommending code snippets. IEEE Trans Serv Comput, 2017. doi: 10.1109/TSC.2016.2592909

  16. 16

    Xu K, Lin H F, Lin Y, et al. Patent retrieval based on multiple information resources. In: Proceedings of the 12th Asia Information Retrieval Societies Conference, Bejing, 2016

  17. 17

    Xu B, Lin H F, Lin Y. Assessment of learning to rank methods for query expansion. J Assoc Inf Sci Technol, 2016, 67: 1345–1357

Download references

Acknowledgements

This work was supported in part by National Natural Science Foundation of China (Grant Nos. 61672470, 61640221, 61562026).

Author information

Correspondence to Qing Huang or Huaiguang Wu.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Huang, Q., Wu, H. QE-integrating framework based on Github knowledge and SVM ranking. Sci. China Inf. Sci. 62, 52102 (2019). https://doi.org/10.1007/s11432-017-9465-9

Download citation

Keywords

  • code search
  • query expansion
  • Github knowledge
  • SVM ranking
  • crowd knowledge