Skip to main content
Log in

CROKAGE: effective solution recommendation for programming tasks by leveraging crowd knowledge

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript


Developers often search for relevant code examples on the web for their programming tasks. Unfortunately, they face three major problems. First, they frequently need to read and analyse multiple results from the search engines to obtain a satisfactory solution. Second, the search is impaired due to a lexical gap between the query (task description) and the information associated with the solution (e.g., code example). Third, the retrieved solution may not be comprehensible, i.e., the code segment might miss a succinct explanation. To address these three problems, we propose CROKAGE (CrowdKnowledge Answer Generator), a tool that takes the description of a programming task (the query) as input and delivers a comprehensible solution for the task. Our solutions contain not only relevant code examples but also their succinct explanations written by human developers. The search for code examples is modeled as an Information Retrieval (IR) problem. We first leverage the crowd knowledge stored in Stack Overflow to retrieve the candidate answers against a programming task. For this, we use a fine-tuned IR technique, chosen after comparing 11 IR techniques in terms of performance. Then we use a multi-factor relevance mechanism to mitigate the lexical gap problem, and select the top quality answers related to the task. Finally, we perform natural language processing on the top quality answers and deliver the comprehensible solutions containing both code examples and code explanations unlike earlier studies. We evaluate and compare our approach against ten baselines, including the state-of-art. We show that CROKAGE outperforms the ten baselines in suggesting relevant solutions for 902 programming tasks (i.e., queries) of three popular programming languages: Java, Python and PHP. Furthermore, we use 24 programming tasks (queries) to evaluate our solutions with 29 developers and confirm that CROKAGE outperforms the state-of-art tool in terms of relevance of the suggested code examples, benefit of the code explanations and the overall solution quality (code + explanation).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others


  1. on July, 2019




  5. - dump published in March 2019




  9. the complete list of words is available at:


  11. Despite this limitation was explicitly stated in the CROKAGE web Page (i.e.,, a significant number of non Java queries were found

  12. CROKAGE search requires the query to have a minimum of one character and a maximum of 70 characters to run

  13. We adopt the list provided by Stanford:

  14. despite we use a semi-automatic process to filter out queries not related to Java, several still queries remained

  15. we append to the query “”

  16. herein we test the IR techniques using their default parameters. In the case of BM25, the default parameters are: k= 1.2 and b= 0.75

  17. The simplest form of language model that disconsiders all conditioning context, and estimates each term independently

  18. Although the title of the Q&A pair alone could represent the query intent, our output is the answer, thus we concatenate the title and body text of the answer in order to match the query with the answers of the candidate pairs


  20. Our general conclusions are not statistically confirmed for PHP language, despite supported by the four adopted metrics.

  21. except BIKER, whose behaviour we do not change


  • Ahasanuzzaman M, Asaduzzaman M, Roy CK, Schneider KA (2016) Mining duplicate questions in Stack Overflow. In: Proceeding MSR, pp 402–412

  • An L, Mlouki O, Khomh F, Antoniol G (2017) Stack overflow: a code laundering platform?. In: Proceeding SANER, pp 283–293

  • Apache (2020) Lucene,

  • Baeza-Yates R, Ribeiro-Neto B, et al. (1999) Modern information retrieval, vol 463. ACM Press, New York

    Google Scholar 

  • Bajracharya S, Ossher J, Lopes C (2010) Searching API usage examples in code repositories with Sourcerer API search. In: Workshop on search-driven development, pp 5–8

  • BeginnersBook (2020) BeginnersBook,

  • Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. TACL 5:135–146

    Article  Google Scholar 

  • Campbell BA, Treude C (2017) NLP2code: Code snippet content assist via natural language tasks. In: Proceeding ICSME, pp 628–632

  • Campos EC, Souza LBLD, Maia MA (2014) Nuggets miner: assisting developers by harnessing the Stack Overflow crowd knowledge and the Github traceability. In: Proceeding CBSoft-Tool Session

  • Campos EC, de Souza LB, Maia MA (2016) Searching crowd knowledge to recommend solutions for API usage tasks. J Softw Evol Process 28 (10):863–892

    Article  Google Scholar 

  • Chatterjee P, Gause B, Hedinger H, Pollock L (2017) Extracting code segments and their descriptions from research articles. In: Proceeding MSR, pp 91–101

  • Chen C, Xing Z, Liu Y, Ong KLX (2019) Mining likely analogical apis across third-party libraries via large-scale unsupervised api semantics embedding, TSE

  • Ciborowska A, Kraft NA, Damevski K (2018) Detecting and characterizing developer behavior following opportunistic reuse of code snippets from the web. In: Proceeding MSR, pp 94–97

  • Corbin J, Strauss A (1990) Basics of qualitative research: techniques and procedures for developing grounded theory sage publications

  • De Souza LBL, Campos EC, Maia MA (2014) Ranking crowd knowledge to assist software development. In: Proceeding Intl. Conf. on Program Comprehension, pp 72–82

  • Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding, arXiv:1810.04805

  • Diamantopoulos T, Symeonidis AL (2015) Employing source code information to improve question-answering in Stack Overflow. In: Proceeding MSR, pp 454–457

  • Facebook Inc (2020) Word representations in fastText,

  • Fang H, Zhai C (2005) An exploration of axiomatic approaches to information retrieval. In: Proceeding SIGIR ACM, pp 480–487

  • Fielding RT, Taylor RN (2002) Principled design of the modern web architecture. ACM Trans Int Technol (TOIT) 2(2):115–150

    Article  Google Scholar 

  • Fritz C, Peter E, Richler J (2012) Effect size estimates: current use, calculations, and interpretation. JEPG 141(1):2–18

    Google Scholar 

  • Fu W, Menzies T (2017) Easy over hard: A case study on deep learning. In: Proceeding ESEC/FSE, pp 49–60

  • Google Inc (2020) Google search engine,

  • Gu X, Zhang H, Zhang D, Kim S (2016) Deep API learning. In: Proceeding FSE, pp 631–642

  • Gu X, Zhang H, Kim S (2018) Deep code search. In: Proceeding ICSE, pp 933–944

  • Gvero T, Kuncak V (2015) Interactive synthesis using free-form queries. In: Proceeding ICSE, pp 689–692

  • Hill E, Rao S, Kak A (2012) On the use of stemming for concern location and bug localization in Java. In: Proceeding SCAM, pp 184–193

  • Hoogeveen D, Bennett A, Li Y, Verspoor KM, Baldwin T (2018) Detecting misflagged duplicate questions in community question-answering archives. In: Proceeding ICWSM, pp 112–120

  • Hu X, Li G, Xia X, Lo D, Jin Z (2018) Deep code comment generation. In: Proceeding ICPC, pp 200–210

  • Huang Q, Xia X, Xing Z, Lo D, Wang X (2018) API method recommendation without worrying about the task-API knowledge gap. In: Proceeding ASE, pp 293–304

  • Java2s (2020) Java2s,

  • Jsoup (2020) Java HTML parser,

  • Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174

    MATH  Google Scholar 

  • Li Z, Wang T, Zhang Y, Zhan Y, Yin G (2016) Query reformulation by leveraging crowd wisdom for scenario-based software search. In: Proceedings of the 8th asia-pacific symposium on internetware ACM, pp 36–44

  • Lv F, Zhang H, Lou J-G, Wang S, Zhang D, Zhao J (2015) Codehow: effective code search based on API understanding and extended boolean model (e). In: Proceeding ASE, pp 260–270

  • McMillan C, Grechanik M, Poshyvanyk D, Xie Q, Fu C (2011) Portfolio: finding relevant functions and their usage. In: Proceeding ICSE, pp 111–120

  • Microsoft Inc (2020) Bing search engine,

  • Mihalcea R, Corley C, Strapparava C, et al. (2006) Corpus-based and knowledge-based measures of text semantic similarity. In: Aaai 6 (2006):775–780

    Google Scholar 

  • Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013a) Distributed representations of words and phrases and their compositionality. In: Proceeding NIPS, pp 3111–3119

  • Mikolov T, Chen K, Corrado G, Dean J (2013b) Efficient estimation of word representations in vector space, arXiv:1301.3781

  • Nasehi SM, Sillito J, Maurer F, Burns C (2012) What makes a good code example?: A study of programming q&a in stackoverflow. In: Procedding ICSM IEEE, pp 25–34

  • Nguyen T, Rigby PC, Nguyen AT, Karanfil M, Nguyen TN (2016) T2API: synthesizing API code usage templates from English texts with statistical translation. In: Proceeding FSE, pp 1013–1017

  • Nie L, Jiang H, Ren Z, Sun Z, Li X (2016) Query expansion based on crowd knowledge for code search. IEEE Trans Serv Comput 9(5):771–783

    Article  Google Scholar 

  • Pagliardini M, Gupta P, Jaggi M (2017) Unsupervised learning of sentence embeddings using compositional n-gram features, arXiv:1703.02507

  • Ponzanelli L, Bacchelli A, Lanza M (2013a) Seahawk: Stack Overflow in the IDE. In: International conference on software engineering (ICSE), pp 1295–1298

  • Ponzanelli L, Bacchelli A, Lanza M (2013b) Leveraging crowd knowledge for software comprehension and development. In: Proceeding CSMR, pp 57–66

  • Ponzanelli L, Bavota G, Di Penta M, Oliveto R, Lanza M (2014a) Mining Stack Overflow to turn the IDE into a self-confident programming prompter. In: Proceeding MSR, pp 102–111

  • Ponzanelli L, Bavota G, Di Penta M, Oliveto R, Lanza M (2014b) Prompter: A self-confident recommender system. In: Proceeding ICSME. IEEE, pp 577–580

  • Raghothaman M, Wei Y, Hamadi Y (2016) SWIM: Synthesizing what I mean-code search and idiomatic snippet synthesis. In: Proceeding ICSE, pp 357–367

  • Ragkhitwetsagul C, Krinke J, Paixao M, Bianco G, Oliveto R (2018) Toxic code snippets on Stack Overflow, arXiv:1806.07659

  • Rahman MM, Roy CK (2017) STRICT: Information retrieval based search term identification for concept location. In: Proceeding SANER, pp 79–90

  • Rahman MM, Roy CK (2018) Effective reformulation of query for code search using crowdsourced knowledge and extra-large data analytics. In: Proceedings ICSME, pp 473–484

  • Rahman MM, Roy CK, Keivanloo I (2015) Recommending insightful comments for source code using crowdsourced knowledge. In: Proceeding SCAM, pp 81–90

  • Rahman MM, Roy CK, Lo D (2016) RACK: Automatic API recommendation using crowdsourced knowledge. In: Proceeding SANER, pp 349–359

  • Rahman MM, Roy CK, Lo D (2017) Rack: Code search in the IDE using crowdsourced knowledge. In: Proceeding ICSE, pp 51–54

  • Robertson SE, Walker S (1994) Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In: Proceeding ACM SIGIR, pp 232–241

  • Saryada W (2020) Kodejava,

  • Silva RFG, Paixao KVR, Maia MA (2018) Duplicate question detection in Stack overflow: a reproducibility study. In: Proceeding SANER, pp 572–581

  • Silva RF, Roy CK, Rahman MM, Schneider KA, Paixao K, de Almeida Maia M (2019) Recommending comprehensive solutions for programming tasks by mining crowd knowledge. In: Proceedings of the 27th international conference on program comprehension, IEEE Press, pp 358–368

  • Stack Exchange Inc (2020) Stack Overflow search engine,

  • Van Nguyen T, Nguyen AT, Phan HD, Nguyen TD, Nguyen TN (2017) Combining word2vec with revised vector space model for better code retrieval. In: Proceeding ICSE IEEE Press, pp 183–185

  • Wang Y, Feng Y, Martins R, Kaushik A, Dillig I, Reiss SP (2016) Hunter: next-generation code reuse for Java. In: Proceeding FSE, pp 1028–1032

  • Wang S, Lo D, Jiang L (2014) Active code search: incorporating user feedback to improve code search relevance. In: Proceedings of the 29th ACM/IEEE international conference on Automated software engineering. ACM, pp 677–682

  • Wang X, Peng Y, Zhang B (2018) Comment generation for source code:, State of the art, challenges and opportunities, arXiv:1802.02971

  • Wilcoxon F (1945) Individual comparisons by ranking methods. Biomet Bull 1(6):80–83

    Article  Google Scholar 

  • Wong E, Yang J, Tan L (2013) Autocomment: mining question and answer sites for automatic comment generation. In: Proceeding ASE, pp 562–567

  • Wong E, Liu T, Tan L (2015) Clocom: mining existing source code for automatic comment generation. In: Proceeding SANER, pp 380–389

  • Xu B, Ye D, Xing Z, Xia X, Chen G, Li S (2016) Predicting semantically linkable knowledge in developer online forums via convolutional neural network. In: Proceeding ASE, pp 51–62

  • Xu B, Xing Z, Xia X, Lo D (2017) Answerbot: automated generation of answer summary to developers technical questions. In: Proceedings ASE, pp 706–716

  • Xu B, Shirani A, Lo D, Alipour MA (2018) Prediction of relatedness in stack overflow: deep learning vs. svm: a reproducibility study. In: Proceeding ESEM ACM, p 21

  • Yang D, Martins P, Saini V, Lopes C (2017) Stack overflow in github: any snippets there?. In: Proceeding MSR, pp 280–290

  • Ye X, Shen H, Ma X, Bunescu R, Liu C (2016) From word embeddings to document similarities for improved information retrieval in software engineering. In: Proceeding ICSE, pp 404–415

  • Yin P, Deng B, Chen E, Vasilescu B, Neubig G (2018) Learning to mine aligned code and natural language pairs from stack overflow. In: Proceeding MSR, ser MSR ACM, pp 476–486

  • Zagalsky A, Barzilay O, Yehudai A (2012) Example overflow: using social media for code recommendation. In: Proceeding RSSE, pp 38–42

  • Zhai C, Lafferty J (2004) A study of smoothing methods for language models applied to information retrieval. TOIS 22(2):179–214

    Article  Google Scholar 

  • Zhang Y, Lo D, Xia X, Sun J-L (2015) Multi-factor duplicate question detection in Stack Overflow. JCST 30(5):981–997

    Google Scholar 

  • Zhang WE, Sheng QZ, Lau JH, Abebe E (2017a) Detecting duplicate posts in programming qa communities via latent semantics and association rules. In: Proceeding WWW, pp 1221–1229

  • Zhang WE, Sheng QZ, Shu Y, Nguyen VK (2017b) Feature analysis for duplicate detection in programming qa communities. In: Proceeding ADMA. Springer, New York, pp 623–638

Download references


We thank the authors of BIKER for sharing their tool. This research is supported in-part by a Canada First Research Excellence Fund (CFREF) grant coordinated by the Global Institute for Food Security (GIFS). We also thank the Brazilian funding agencies, CAPES, CNPq and FAPEMIG for supporting this research. At last, but not least, we thank the participants that worked in the qualitative evaluation of this work.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Marcelo de Almeida Maia.

Additional information

Communicated by: Tim Menzies

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

da Silva, R.F.G., Roy, C.K., Rahman, M.M. et al. CROKAGE: effective solution recommendation for programming tasks by leveraging crowd knowledge. Empir Software Eng 25, 4707–4758 (2020).

Download citation

  • Published:

  • Issue Date:

  • DOI: