The latest query expansion (QE) methods use the software development features for expanding queries. However, these methods allow only one feature to be considered at a time. To consider additional features simultaneously, we propose a QE method based on Github knowledge; this is a new comprehensive feature that covers both the existing features (i.e., the application program interface (API) information and crowd knowledge). It is extracted from the “pull requests” of code repositories on Github, which contain descriptions of a request and its commits, the participants’ comments and the API information of the changed files. In addition, we implement a black-box framework that integrates multiple QE methods based on the support vector machine ranking called Github knowledge search repository (GKSR). Our empirical evaluation shows that the GKSR outperforms the state-of-the-art QE methods CodeHow and QECK by 25%–32% in terms of precision.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Haiduc S, Bavota G, Marcus A, et al. Automatic query reformulations fortext retrieval in software engineering. In: Proceedings of the 35th International Conference on Software Engineering, San Francisco, 2013. 842–851
Fischer G, Henninger S, Redmiles D. Cognitive tools for locating and comprehending software objects for reuse. In: Proceedings of the 13th International Conference on Software Engineering, Austin, 1991. 318–328
Lv F, Zhang H Y, Lou J G, et al. CodeHow: effective code search based on API understanding and extended boolean model (E). In: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), Lincoln, 2015. 260–270
Nie L M, Jiang H, Ren Z L, et al. Query expansion based on crowd knowledge for code search. IEEE Trans Serv Comput, 2016, 9: 771–783
Manning C D, Raghavan P, Schtze H. Introduction to Information Retrieval. Cambridge: Cambridge University Press, 2008
de Souza L B L, Campos E, Maia M A. Ranking crowd knowledge to assist software development. In: Proceedings of the 22nd International Conference on Program Comprehension, Hyderabad, 2014. 72–82
Nguyen A T, Hilton M, Codoban M, et al. API code recommendation using statistical learning from fine-grained changes. In: Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Seattle, 2016. 511–522
Haiduc S, Bavota G, Marcus A, et al. Automatic query reformulations for text retrieval in software engineering. In: Proceedings of the 35th International Conference on Software Engineering, San Francisco, 2013. 842–851
Gay G, Haiduc S, Marcus A, et al. On the use of relevance feedback in IR-based concept location. In: Proceedings of IEEE International Conference on Software Maintenance, Edmonton, 2009. 351–360
Mcmillan C, Poshyvanyk D, Grechanik M, et al. Portfolio: searching for relevant functions and their usages in millions of lines of code. ACM Trans Softw Eng Methodol, 2013, 22: 1–30
Joachims T. Optimizing search engines using clickthrough data. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, 2002. 133–142
Joachims T. Training linear SVMs in linear time. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, 2006. 217–226
Salton G, Fox E A, Wu H. Extended Boolean Information Retrieval. New York: Cornell University, 1983
Niu H, Keivanloo I, Zou Y. Learning to rank code examples for code search engines. Empir Softw Eng, 2017, 22: 259–291
Jiang H, Nie L M, Sun Z Y, et al. ROSF: leveraging information retrieval and supervised learning for recommending code snippets. IEEE Trans Serv Comput, 2017. doi: 10.1109/TSC.2016.2592909
Xu K, Lin H F, Lin Y, et al. Patent retrieval based on multiple information resources. In: Proceedings of the 12th Asia Information Retrieval Societies Conference, Bejing, 2016
Xu B, Lin H F, Lin Y. Assessment of learning to rank methods for query expansion. J Assoc Inf Sci Technol, 2016, 67: 1345–1357
This work was supported in part by National Natural Science Foundation of China (Grant Nos. 61672470, 61640221, 61562026).
About this article
Cite this article
Huang, Q., Wu, H. QE-integrating framework based on Github knowledge and SVM ranking. Sci. China Inf. Sci. 62, 52102 (2019). https://doi.org/10.1007/s11432-017-9465-9
- code search
- query expansion
- Github knowledge
- SVM ranking
- crowd knowledge