Empirical Software Engineering

, Volume 24, Issue 4, pp 1869–1924 | Cite as

Automatic query reformulation for code search using crowdsourced knowledge

  • Mohammad M. RahmanEmail author
  • Chanchal K. Roy
  • David Lo


Traditional code search engines (e.g., Krugle) often do not perform well with natural language queries. They mostly apply keyword matching between query and source code. Hence, they need carefully designed queries containing references to relevant APIs for the code search. Unfortunately, preparing an effective search query is not only challenging but also time-consuming for the developers according to existing studies. In this article, we propose a novel query reformulation technique–RACK–that suggests a list of relevant API classes for a natural language query intended for code search. Our technique offers such suggestions by exploiting keyword-API associations from the questions and answers of Stack Overflow (i.e., crowdsourced knowledge). We first motivate our idea using an exploratory study with 19 standard Java API packages and 344K Java related posts from Stack Overflow. Experiments using 175 code search queries randomly chosen from three Java tutorial sites show that our technique recommends correct API classes within the Top-10 results for 83% of the queries, with 46% mean average precision and 54% recall, which are 66%, 79% and 87% higher respectively than that of the state-of-the-art. Reformulations using our suggested API classes improve 64% of the natural language queries and their overall accuracy improves by 19%. Comparisons with three state-of-the-art techniques demonstrate that RACK outperforms them in the query reformulation by a statistically significant margin. Investigation using three web/code search engines shows that our technique can significantly improve their results in the context of code search.


Code search Query reformulation Keyword-API association Crowdsourced knowledge Stack Overflow 



This research was supported in part by the Natural Sciences and Engineering Research Council of Canada (NSERC) and the Singapore Ministry of Education (MOE) Academic Research Fund (AcRF) Tier 1 grant.


  1. Bacchelli A, Lanza M, Robbes R (2010) Linking e-mails and source code artifacts. In: Proceedings ICSE, pp 375–384Google Scholar
  2. Bajracharya S, Lopes C (2012a) Analyzing and mining a code search engine usage log. Empirical Softw. Engg. 17(4-5):424–466CrossRefGoogle Scholar
  3. Bajracharya S, Lopes C (2012b) Analyzing and mining a code search engine usage log. EMSE 17(4-5):424–466Google Scholar
  4. Bojanowski P, Grave E, Joulin A, Mikolov T (2016) Enriching word vectors with subword information. arXiv:1607.04606
  5. Brandt J, Guo PJ, Lewenstein J, Dontcheva M, Klemmer SR (2009) Two Studies of opportunistic programming interleaving web foraging, learning, and writing code. In: Proceedings SIGCHI, pp 1589–1598Google Scholar
  6. Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1-7):107–117CrossRefGoogle Scholar
  7. Campbell BA, Treude C (2017) Nlp2code: code snippet content assist via natural language tasks. In: Proceedings ICSME, pp 628–632Google Scholar
  8. Capobianco G, Lucia AD, Oliveto R, Panichella A, Panichella S (2013) Improving IR-based traceability recovery via noun-based indexing of software artifacts. J Softw Evol Process 25(7):743–762CrossRefGoogle Scholar
  9. Carmel D, Yom-Tov E (2010) Estimating the query difficulty for information retrieval. Morgan & Claypool, San RafaelzbMATHGoogle Scholar
  10. Carmel D, Yom-Tov E, Darlow A, Pelleg D (2006) What makes a query difficult?. In: Proceedings SIGIR, pp 390–397Google Scholar
  11. Chan W, Cheng H, Lo D (2012) Searching connected API subgraph via text phrases. In: Proceedings FSE, pp 10:1–10:11Google Scholar
  12. Chaparro O, Florez JM, Marcus A (2017) Using observed behavior to reformulate queries during text retrieval-based bug localization. In: Proceedings ICSME, page to appearGoogle Scholar
  13. Dagenais B, Robillard MP (2012) Recovering traceability links between an API and its learning resources. In: Proceedings ICSE, pp 47–57Google Scholar
  14. Furnas GW, Landauer TK, Gomez LM, Dumais ST (1987) The vocabulary problem in human-system communication. Commun ACM 30(11):964–971CrossRefGoogle Scholar
  15. Gay G, Haiduc S, Marcus A, Menzies T (2009) On the use of relevance feedback in IR-based concept location. In: Proceedings ICSM, pp 351–360Google Scholar
  16. Gosling J, Joy B, Steele G, Bracha G (2012) The java language specification: Java SE 7th ednGoogle Scholar
  17. Gvero T, Kuncak V (2015) Interactive synthesis using free-form queries. In: Proceedings ICSE, pp 689–692Google Scholar
  18. Haiduc S, Marcus A (2011) On the effect of the query in IR-based concept location. In: Proceedings ICPC, pp 234–237Google Scholar
  19. Haiduc S, Bavota G, Marcus A, Oliveto R, De Lucia A, Menzies T (2013) Automatic query reformulations for text retrieval in software engineering. In: Proceedings ICSE, pp 842–851Google Scholar
  20. Harris Z (1968) Mathematical structures in language contentsGoogle Scholar
  21. Hellendoorn VJ, Devanbu P (2017) Are deep neural networks the best choice for modeling source code?. In: Proceedings ESEC/FSE, pp 763–773Google Scholar
  22. Hill E, Pollock L, Vijay-Shanker K (2009) Automatically capturing source code context of NL-queries for software maintenance and reuse. In: Proceedings ICSE, pp 232–242Google Scholar
  23. Howard MJ, Gupta S, Pollock L, Vijay-Shanker K (2013) Automatically mining software-based semantically-similar words from comment-code mappings. In: Proceedings MSR, pp 377–386Google Scholar
  24. Järvelin K., Kekäläinen J (2002) Cumulated gain-based evaluation of ir techniques. ACM Trans Inf Syst 20(4):422–446CrossRefGoogle Scholar
  25. Keivanloo I, Rilling J (2011) Internet-scale java source code data set.
  26. Keivanloo I, Rilling J, Zou Y (2014) Spotting working code examples. In: Proceedings ICSE, pp 664–675Google Scholar
  27. Kevic K, Fritz T (2014a) Automatic search term identification for change tasks. In: Proceedings ICSE, pp 468–471Google Scholar
  28. Kevic K, Fritz T (2014b) A dictionary to translate change tasks to source code. In: Proceedings MSR, pp 320–323Google Scholar
  29. Kimmig M, Monperrus M, Mezini M (2011) Querying source code with natural language. In: Proceedings ASE, pp 376–379Google Scholar
  30. Li Z, Wang T, Zhang Y, Zhan Y, Yin G (2016) Query reformulation by leveraging crowd wisdom for scenario-based software search. In: Proceedings internetware, pp 36–44Google Scholar
  31. Lin J, Liu Y, Guo J, Cleland-Huang J, Goss W, Liu W, Lohar S, Monaikul N, Rasin A (2017) Tiqi: a natural language interface for querying software project data. In: Proceedings ASE, pp 973–977Google Scholar
  32. Linares-Vásquez M., Bavota G, Di Penta M, Oliveto R, Poshyvanyk D (2014) How do api changes trigger stack overflow discussions? a study on the android sdk. In: Proceedings ICPC, pp 83–94Google Scholar
  33. Lopes C, Bajracharya S, Ossher J, Baldi P (2010) UCI source code data sets.
  34. Mamykina L, Manoim B, Mittal M, Hripcsak G, Hartmann B (2011) Design lessons from the fastest q & a site in the west. In: Proceedings CHI, pp 2857–2866Google Scholar
  35. McMillan C, Grechanik M, Poshyvanyk D, Xie Q, Fu C (2011) Portfolio: finding relevant functions and their usage. In: Proceedings ICSE, pp 111–120Google Scholar
  36. Mihalcea R, Tarau P (2004) Textrank: bringing order into texts. In: Proceedings EMNLP, pp 404–411Google Scholar
  37. Moreno L, Treadway JJ, Marcus A, Shen W (2014) On the use of stack traces to improve text retrieval-based bug localization. In: Proceedings ICSME, pp 151–160Google Scholar
  38. Moreno L, Bavota G, Haiduc S, Di Penta M, Oliveto R, Russo B, Marcus A (2015) Query-based configuration of text retrieval solutions for software engineering tasks. In: Proceedings ESEC/FSE, pp 567–578Google Scholar
  39. Nakasai K, Tsunoda M, Hata H (2016) Web search behaviors for software development. In: Proceedings CHASE, pp 125–128Google Scholar
  40. Nie L, Jiang H, Ren Z, Sun Z, Li X (2016) Query expansion based on crowd knowledge for code search. TSC 9(5):771–783Google Scholar
  41. Ponzanelli L, Bavota G, Di Penta M, Oliveto R, Lanza M (2014) Mining stackOverflow to turn the IDE into a self-confident programming prompter. In: Proceedings MSR, pp 102–111Google Scholar
  42. Rahman MM, Roy CK (2014) On the use of context in recommending exception handling code examples. In: Proceedings SCAM, pp 285–294Google Scholar
  43. Rahman MM, Roy CK (2016) QUICKAR: automatic query reformulation for concept location using crowdsourced knowledge. In: Proceedings ASE, pp 220–225Google Scholar
  44. Rahman MM, Roy CK (2017) STRICT: information retrieval based search term identification for concept location. In: Proceedings SANER, pp 79–90Google Scholar
  45. Rahman MM, Roy CK (2018) Effective reformulation of query for code search using crowdsourced knowledge and extra-large data analytics. In: Proceedings ICSME, p 12Google Scholar
  46. Rahman MM, Yeasmin S, Roy CK (2014) Towards a context-aware IDE-based meta search engine for recommendation about programming errors and exceptions. In: Proceedings CSMR–WCRE, pp 194–203Google Scholar
  47. Rahman MM, Roy CK, Lo D (2016) RACK: automatic API recommendation using Crowdsourced knowledge. In: Proceedings SANER, pp 349–359Google Scholar
  48. Rahman MM, Barson J, Paul S, Kayani J, Lois FA, Quezada SF, Parnin C, Stolee KT, Ray Baishakhi (2018) Evaluating how developers use general-purpose web-search for code retrieval. In: Proceedings MSR, p 10Google Scholar
  49. Rigby PC, Robillard MP (2013) Discovering essential code elements in informal documentation. In: Proceedings ICSE, pp 832–841Google Scholar
  50. Rocchio JJ (1971) The SMART retrieval system—experiments in automatic document processing. Prentice-Hall, Inc, Upper Saddle RiverGoogle Scholar
  51. Romano J, Kromrey JD, Coraggio J, Skowronek J (2006) Appropriate statistics for ordinal level data Should we really be using t-test and Cohen’sd for evaluating group differences on the NSSE and other surveys?. In: Annual meeting of the Florida Association of Institutional Research, pp 1–3Google Scholar
  52. Roy CK, Cordy JR (2008) NICAD: accurate detection of near-Miss Intentional clones using flexible pretty-printing and code normalization. In: Proceedings ICPC, pp 172–181Google Scholar
  53. Sadowski C, Stolee KT, Elbaum S (2015) How developers search for code A case study. In: Proceedings ESEC/FSE, pp 191–201Google Scholar
  54. Sirres R, Bissyandé TF, Kim D, Lo D, Klein J, Kim K, Traon YL (2018) Augmenting and structuring user queries to support efficient free-form code search. EMSEGoogle Scholar
  55. Sisman B, Kak AC (2013) Assisting code search with automatic query reformulation for bug localization. In: Proceedings MSR, pp 309–318Google Scholar
  56. Svajlenko J, Roy CK (2018) Fast, scalable and user-guided clone detection. In: Proceedings ICSE-c, pp 352–353Google Scholar
  57. Svajlenko J, Islam JF, Keivanloo I, Roy CK, Mia MM (2014) Towards a big data curated benchmark of inter-project code clones. In: Proceedings ICSME, pp 476–480Google Scholar
  58. Thongtanunam P, Kula RG, Yoshida N, Iida H, Matsumoto K (2015) Who Should Review my Code?. In: Proceedings SANER, pp 141–150Google Scholar
  59. Thummalapenta S, Xie T (2007) Parseweb: a programmer assistant for reusing open source code on the web. In: Proceedings ASE, pp 204–213Google Scholar
  60. Thung F, Lo D, Lawall J (2013a) Automated library recommendation. In: Proceedings WCRE, pp 182–191Google Scholar
  61. Thung F, Wang S, Lo D, Lawall J (2013b) Automatic recommendation of API methods from feature requests. In: Proceedings ASE, pp 290–300Google Scholar
  62. Toutanova K, Manning CD (2000) Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Proceedings EMNLP, pp 63–70Google Scholar
  63. Vassallo C, Panichella S, Di Penta M, Canfora G (2014) Codes: mining source code descriptions from developers discussions. In: Proceedings ICPC, pp 106–109Google Scholar
  64. Wang S, Lo D (2014) Version history, similar report, and structure: Putting them together for improved bug localization. In: Proceedings ICPC, pp 53–63Google Scholar
  65. Wang S, Lo D, Jiang L (2014) Active code search: incorporating user feedback to improve code search relevance. In: Proceedings ASE, pp 677–682Google Scholar
  66. Wang Y, Wang L, Li Y, He D, Liu T (2013) A theoretical analysis of NDCG type ranking measures. In: Proceedings COLT, pp 25–54Google Scholar
  67. Warr FW, Robillard MP (2007) Suade: topology-based searches for software investigation. In: Proceedings ICSE, pp 780–783Google Scholar
  68. Wong E, Yang J, Tan L (2013) AutoComment: mining question and answer sites for automatic comment generation. In: Proceedings ASE, pp 562–567Google Scholar
  69. Xia X, Bao L, Lo D, Kochhar PS, Hassan AE, Xing Z (2017) What do developers search for on the web? EMSE 22(6):3149–3185Google Scholar
  70. Xie T, Pei J (2006) MAPO: mining api usages from open source repositories. In: Proceedings MSR, pp 54–57Google Scholar
  71. Yang J, Tan L (2012) Inferring semantically related words from software context. In: Proceedings MSR, pp 161–170Google Scholar
  72. Yuan T, Lo D, Lawall J (2014) Automated construction of a software-specific word similarity database. In: Proceedings CSMR-WCRE, pp 44–53Google Scholar
  73. Zhang F, Niu H, Keivanloo I, Zou Y (2017) Expanding queries for code search using semantically related api class-names. TSE, page to appearGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.University of SaskatchewanSaskatoonCanada
  2. 2.Singapore Management UniversitySingaporeSingapore

Personalised recommendations