Skip to main content
Log in

Learning to rank code examples for code search engines

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript


Source code examples are used by developers to implement unfamiliar tasks by learning from existing solutions. To better support developers in finding existing solutions, code search engines are designed to locate and rank code examples relevant to user’s queries. Essentially, a code search engine provides a ranking schema, which combines a set of ranking features to calculate the relevance between a query and candidate code examples. Consequently, the ranking schema places relevant code examples at the top of the result list. However, it is difficult to determine the configurations of the ranking schemas subjectively. In this paper, we propose a code example search approach that applies a machine learning technique to automatically train a ranking schema. We use the trained ranking schema to rank candidate code examples for new queries at run-time. We evaluate the ranking performance of our approach using a corpus of over 360,000 code snippets crawled from 586 open-source Android projects. The performance evaluation study shows that the learning-to-rank approach can effectively rank code examples, and outperform the existing ranking schemas by about 35.65 % and 48.42 % in terms of normalized discounted cumulative gain (NDCG) and expected reciprocal rank (ERR) measures respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others


  1. Codota:

  2. Google Code project hosting:

  3. We use the parser from Eclipse JDT:

  4. Igraph package:


  • Bailey P, Thomas P, Hawking D (2007) Does brandname influence percerived search result quality? yahoo!, google, and webkumara, Proceedings of ADCS

  • Bajracharya S, Ngo T, Linstead E, Rigor P, Dou Y, Baldi P, Lopes C (2006) Sourcerer: A search engine for open source code supporting structure-based search. In: Proceedings of International Conference on Object-Oriented Programming Systems, Systems, Languages, and Applications

  • Bajracharya SK, Ossher J, Lopes CV (2010) Leveraging usages similarity for effective retrieval of examples in code repositories. In: Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering, pp 157–166

  • Binkley D, Lawrie D (2014) Learning to rank improves ir in se. In: Proceedings of 2014 IEEE International Conference on Software Maintenance and Evolution

  • Brandt J, Guo P, Lewenstein J, Dontcheva M, Klemmer S (2009) Two studies of opportunistic programming: interleaving web foraging, learning, and writing code. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp 1589–1598

  • Breiman L (2001) Random forests p 5–32

  • Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. In: Proceedings of 7th International World-Wide Web Conference

  • Bruch M, Schfer T (2008) On evaluating recommender systems for api usages. In: Proceedings of the 2008 international workshop on Recommendation systems for software engineering, p16–20

  • Burges C, Shaked T, Renshaw E, Lazier A, Deeds M, Hamilton N, Hullender G (2005) Learning to rank using gradient descent. In: Proceedings of the 22nd international conference on Machine learning, p 89–96

  • Buse RPL, Weimer W (2010) Learning a metric for code readability. IEEE Trans Softw. Eng. 36:546–558

    Article  Google Scholar 

  • Buse RPL, Weimer W (2012) Synthesizing api usage examples. In: 34th International Conference on Software Engineering

  • Campbell M, Swinscow TDV (2009) Statistics at square one

  • Cao Z, Qin T, Liu TY, Tsai MF, Li H (2007) Learning to rank: from pairwise approach to listwise approach. In: Proceedings of the 24th international conference on Machine learning, p 129–136

  • Chapelle O, Metzler D, Zhang Y, Grinspan P (2009) Expected reciprocal rank for graded relevance. In: Proceedings of the 18th ACM conference on Information and knowledge management, p 621–630

  • Cliff N (1993) Dominance statistics: Ordinal analysis to answer ordinal questions

  • Crammer K, Singer Y (2001) Pranking with ranking. In: Advances in Neural Information Processing Systems 14, p 641–647

  • Freund Y, Iyer R, Schapire RE, Singer Y (2003) An efficient boosting algorithm for combining preferences. In: The Journal of Machine Learning Research, p 933–969

  • Gallardo-Valencia RE, Sim SE (2009) Internet-scale code search. In: Proceedings of the 2009 ICSE Workshop on Search-Driven Development-Users, Infrastructure, Tools and Evaluation, p 49–52

  • Grahne G, Zhu J (2003) Efficiently using prefix-trees in mining frequent itemsets. In: IEEE ICDM Worshop on Frequent Itemset Mining Implementations

  • Grechanik M, Fu C, Xie Q, McMillan C, Poshyvanyk D, Cumby C (2012) Exemplar: A source code search engine for finding highly relevant applications. IEEE Trans Softw Eng 38:1069–1087

    Article  Google Scholar 

  • Greenwood PE, Nikulin MS (1996) A guide to chi-squared testing

  • Harrington P (2012) Machine learning in action

  • Herbrich R, Graepel T, Obermayer K (2000) Large margin rank boundaries for ordinal regression. In: Advances in Large Margin Classifiers, pp 115–132. MIT Press

  • Holmes R, Cottrell R, Walker RJ, Denzinger J (2009) The end-to-end use of source code examples: An exploratory study. In: 25th IEEE International Conference on Software Maintenance

  • Holmes R, Walker RJ (2005) Murphy: Strathcona example recommendation tool. In: Proceedings of European Software Engineering Conference held jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, p 237–240

  • Holmes R, Walker RJ (2012) Systematizing pragmatic software reuse. ACM Trans Softw Eng Methodol 21

  • Jaccard P (1901) Tude comparative de la distribution florale dans une portion des alpes et des jura. In: Bulletin de la Socit Vaudoise des Sciences Naturelles 37, p 547–579

  • Jarvelin K, Kekalainen J (2002) Cumulated gain-based evaluation of ir techniques. In: ACM Transactions on Information Systems, p 422–446

  • Kapser C, Godfrey MW (2006) Cloning considered harmful” considered harmful. In: 13th Working Conference on Reverse Engineering, p 19–28

  • Keivanloo I, Rilling J, Zou Y (2014) Spotting working code examples. In: Proceedings of the 36th International Conference on Software Engineering, p 664–675

  • Kim J, Lee S, Hwang S, Kim S (2010) Towards an intelligent code search engine. In: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence

  • Lange BM, Moher TG (1989) Some strategies of reuse in an object-oriented programming environment. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, p 69–73

  • Li H (2011) A short introduction to learning to rank. IEICE Trans Inf Syst 94:1854–1862

    Article  Google Scholar 

  • Li P, Burges C, Wu Q (2008) Mcrank: Learning to rank using multiple classification and gradient boosting. In: Proceedings of the 21st Annual Conference on Neural Information Processing Systems, p 897–904

  • Liu TY (2009) Learning to rank for information retrieval. Found Trends Inf Retr 38:225–331

    Google Scholar 

  • Lohar S, Amornborvornwong S, AZ, Huang JC (2013) Improving trace accuracy through data-driven configuration and composition of tracing features. In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, p 378–388

  • Manning CD, Raghavan P, Schtze H (2008) Introduction to information retrieval

  • Manning CD, Raghavan P, Schutze H (2008) Scoring, term weighting, and the vector space model

  • McMillan C, Poshyvanyk D, Grechanik M, Xie Q, Fu C (2013) Portfolio: Searching for relevant functions and their usages in millions of lines of code. ACM Trans Softw Eng Methodol 22

  • Mishne A, Shoham S, Yahav E (2012) Typestate-based semantic code search over partial programs. In: Proceedings of the ACM international conference on Object oriented programming systems languages and applications, p 997–1016

  • Nagappan N, Ball T (2005) Use of relative code churn measures to predict system defect density. In: Proceedings of 27th international conference on Software engineering, p 284–292

  • Niu S, Guo J, Lan Y, Cheng X (2012) Top-k learning to rank: labeling, ranking and evaluation. In: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, p 751–760

  • Panichella A, Dit B, Oliveto R, Di Penta M, Poshyvanyk D, Lucia AD (2013) How to effectively use topic models for software engineering tasks? an approach based on genetic algorithms. In: Proceedings of the 2013 International Conference on SOftware Engineering (ICSE), pp 522–531

  • Reiss SP (2009) Semantics-based code search. In: Proceedings of the 31st International Conference on Software Engineering, p 243–253

  • Robert KY (2002) Design and methods

  • Robillard MP (2011) A field study of api learning obstacles. Empir Soft Eng 16

  • Romano J, Kromrey J, Coraggio J, Skowronek J (2006) Appropriate statistics for ordinal level data: Should we really be using t-test and cohen’s d for evaluating group differences on the nsse and other surveys?. In: AIR Forum, p 1–33

  • Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. In: Communications of the ACM, pp 613–620

  • Sheskin DJ (2007) Handbook of Parametric and Nonparametric Statistical Procedures. Chapman and Hall/CRC

  • Shihab E, Zhen M, Ibrahim W.M., Adams B, Hassan A.E (2010) Understanding the impact of code and process metrics on post-release defects: A case study on the eclipse project. In: Proceedings of 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement

  • Sim S, Gallardo-Valencia R, Philip K, Umarji M, M.A. Lopes C. (2012). In: Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work, pp 1361–1370

  • Stylos J, Faulring A, Yang Z, Myers BA (2009) Improving api documentation using api usage information. In: IEEE Symposium on Visual Languages and Human-Centric Computing, pp 119–126

  • Thomas P, Hawking D (2006) Evaluation by comparing result sets in context. In: Proceedings of ACM International Conference on Information and Knowledge Management

  • Thummalapenta S, Xie T (2007) Parseweb: A programmer assistant for reusing open source code on the web. In: Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering, pp 204–213

  • Wang J, Dang Y, Zhang H, Chen K, Xie T, Zhang D (2013) Mining succinct and high-coverage api usage patterns from source code. In: Proceedings of the 10th Working Conference on Mining Sotware Repositories, pp 319–328

  • Xu J, Li H (2007) Adarank: a boosting algorithm for information retrieval. In: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, p. 391–398

  • Xuan J, Monperrus M (2014) Learning to combine multiple ranking metrics for fault localization. In: Proceedings of 30th International Conference on Software Maintenance and Evolution

  • Ye X, Bunescu R, Liu C (2012) On the naturalness of software. In: Proceedings of IEEE International Conference on Software Engineering

  • Ye X, Bunescu R, Liu C (2014) Learning to rank relevant files for bug reports using domain knowledge. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp 689–699

  • Ying ATT, Robillard MP (2014) Selection and presentation practices for code example summarization. In: Proceedings of 22nd ACM SIGSOFT International Symposium on the Foundations of Software Engineering

  • Zhong H, Xie T, Pei P, Mei H (2009) Mapo: Mining and recommending api usage patterns. In: Proceedings of Euuropean Conference on Object-Oriented Programming, pp 318–343

  • Zhou J, Zhang H (2012) Learning to rank duplicate bug reports. In: Proceedings of the 21st ACM international conference on Information and knowledge management, pp 852–861

Download references


Many thanks to Liam Gordon, Graem Daly, Bipin Upadhyaya, Ehsan Salamati, and Feng Zhang for their valuable help in relevance labeling at this work.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Haoran Niu.

Additional information

Communicated by: Denys Poshyvanyk

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Niu, H., Keivanloo, I. & Zou, Y. Learning to rank code examples for code search engines. Empir Software Eng 22, 259–291 (2017).

Download citation

  • Published:

  • Issue Date:

  • DOI: