QE-integrating framework based on Github knowledge and SVM ranking

Abstract

The latest query expansion (QE) methods use the software development features for expanding queries. However, these methods allow only one feature to be considered at a time. To consider additional features simultaneously, we propose a QE method based on Github knowledge; this is a new comprehensive feature that covers both the existing features (i.e., the application program interface (API) information and crowd knowledge). It is extracted from the “pull requests” of code repositories on Github, which contain descriptions of a request and its commits, the participants’ comments and the API information of the changed files. In addition, we implement a black-box framework that integrates multiple QE methods based on the support vector machine ranking called Github knowledge search repository (GKSR). Our empirical evaluation shows that the GKSR outperforms the state-of-the-art QE methods CodeHow and QECK by 25%–32% in terms of precision.

This is a preview of subscription content, access via your institution.

References

  1. 1

    Haiduc S, Bavota G, Marcus A, et al. Automatic query reformulations fortext retrieval in software engineering. In: Proceedings of the 35th International Conference on Software Engineering, San Francisco, 2013. 842–851

    Google Scholar 

  2. 2

    Fischer G, Henninger S, Redmiles D. Cognitive tools for locating and comprehending software objects for reuse. In: Proceedings of the 13th International Conference on Software Engineering, Austin, 1991. 318–328

    Google Scholar 

  3. 3

    Lv F, Zhang H Y, Lou J G, et al. CodeHow: effective code search based on API understanding and extended boolean model (E). In: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), Lincoln, 2015. 260–270

    Google Scholar 

  4. 4

    Nie L M, Jiang H, Ren Z L, et al. Query expansion based on crowd knowledge for code search. IEEE Trans Serv Comput, 2016, 9: 771–783

    Article  Google Scholar 

  5. 5

    Manning C D, Raghavan P, Schtze H. Introduction to Information Retrieval. Cambridge: Cambridge University Press, 2008

    Google Scholar 

  6. 6

    de Souza L B L, Campos E, Maia M A. Ranking crowd knowledge to assist software development. In: Proceedings of the 22nd International Conference on Program Comprehension, Hyderabad, 2014. 72–82

    Google Scholar 

  7. 7

    Nguyen A T, Hilton M, Codoban M, et al. API code recommendation using statistical learning from fine-grained changes. In: Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Seattle, 2016. 511–522

    Google Scholar 

  8. 8

    Haiduc S, Bavota G, Marcus A, et al. Automatic query reformulations for text retrieval in software engineering. In: Proceedings of the 35th International Conference on Software Engineering, San Francisco, 2013. 842–851

    Google Scholar 

  9. 9

    Gay G, Haiduc S, Marcus A, et al. On the use of relevance feedback in IR-based concept location. In: Proceedings of IEEE International Conference on Software Maintenance, Edmonton, 2009. 351–360

    Google Scholar 

  10. 10

    Mcmillan C, Poshyvanyk D, Grechanik M, et al. Portfolio: searching for relevant functions and their usages in millions of lines of code. ACM Trans Softw Eng Methodol, 2013, 22: 1–30

    Article  Google Scholar 

  11. 11

    Joachims T. Optimizing search engines using clickthrough data. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, 2002. 133–142

    Google Scholar 

  12. 12

    Joachims T. Training linear SVMs in linear time. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, 2006. 217–226

    Google Scholar 

  13. 13

    Salton G, Fox E A, Wu H. Extended Boolean Information Retrieval. New York: Cornell University, 1983

    Google Scholar 

  14. 14

    Niu H, Keivanloo I, Zou Y. Learning to rank code examples for code search engines. Empir Softw Eng, 2017, 22: 259–291

    Article  Google Scholar 

  15. 15

    Jiang H, Nie L M, Sun Z Y, et al. ROSF: leveraging information retrieval and supervised learning for recommending code snippets. IEEE Trans Serv Comput, 2017. doi: 10.1109/TSC.2016.2592909

    Google Scholar 

  16. 16

    Xu K, Lin H F, Lin Y, et al. Patent retrieval based on multiple information resources. In: Proceedings of the 12th Asia Information Retrieval Societies Conference, Bejing, 2016

    Google Scholar 

  17. 17

    Xu B, Lin H F, Lin Y. Assessment of learning to rank methods for query expansion. J Assoc Inf Sci Technol, 2016, 67: 1345–1357

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by National Natural Science Foundation of China (Grant Nos. 61672470, 61640221, 61562026).

Author information

Affiliations

Authors

Corresponding authors

Correspondence to Qing Huang or Huaiguang Wu.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Huang, Q., Wu, H. QE-integrating framework based on Github knowledge and SVM ranking. Sci. China Inf. Sci. 62, 52102 (2019). https://doi.org/10.1007/s11432-017-9465-9

Download citation

Keywords

  • code search
  • query expansion
  • Github knowledge
  • SVM ranking
  • crowd knowledge