Skip to main content
Log in

Using bug descriptions to reformulate queries during text-retrieval-based bug localization

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Text Retrieval (TR)-based approaches for bug localization rely on formulating an initial query based on the full text of a bug report. When the query fails to retrieve the buggy code artifacts, developers can reformulate the query and retrieve more candidate code documents. Existing research on query reformulation focuses mostly on leveraging relevance feedback from the user or on expanding the original query with additional information. We hypothesize that the title of the bug reports, the observed behavior, expected behavior, steps to reproduce, and code snippets provided by the users in bug descriptions, contain the most relevant information for retrieving the buggy code artifacts, and that other parts of the descriptions contain more irrelevant terms, which hinder retrieval. This paper proposes and evaluates a set of query reformulation strategies based on the selection of existing information in bug descriptions, and the removal of irrelevant parts from the original query. The results show that selecting the bug report title and the observed behavior is the strategy that performs best across various TR-based bug localization approaches and code granularities, as it leads to retrieving the buggy code artifacts within the top-N results for 25.6% more queries (on average) than without query reformulation. This strategy is highly applicable and consistent across different thresholds N. Selecting the steps to reproduce or the expected behavior (when provided in the bug reports) along with the bug title and the observed behavior leads to higher performance (i.e., between 31.4% and 41.7% more queries) and comparable consistency, yet it is applicable in fewer cases. These reformulation strategies are easy to use and are independent of the underlying retrieval technique.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. See Section 6 for details

  2. https://tinyurl.com/ybye2zhc

  3. This data set is called BRT in our prior work (Chaparro et al. 2017a).

  4. Code search is a task similar but more general than TRBL.

  5. http://brat.nlplab.org/

  6. https://stormed.inf.usi.ch/

  7. See Table 11 and our replication package for more details.

  8. We changed the notation in the table for space reasons.

  9. HITS@N improvement cannot be measured for these two strategies because the HITS@N achieved by the initial queries (i.e., no reformulation) is zero, hence, the improvement is undefined (see Formula 3).

  10. See our replication package for the detailed MRR/MAP results (Chaparro et al. 2018).

  11. The # of queries for TITLE in Table 17 represents the avg. total # of queries for BRTracer.

  12. The query is low-quality for the other three file-level TRBL techniques as well.

  13. The position of the buggy file, after excluding the first top-5 documents would be 174.

  14. The reformulation results in the query: “too large first step ... (Dormand-Prince 8(5,3) ...) For embedded Runge-Kutta type, this step size ... and fails to stop).”

  15. Found at https://issues.apache.org/jira/browse/TIKA-1152

References

  • Ali N, Sabane A, Gueheneuc Y-G, Antoniol G (2012) Improving bug location using binary class relationships. In: Proceedings of the international working conference on source code analysis and manipulation (SCAM’12), pp 174–183

  • Bajracharya SK, Lopes CV (2012) Analyzing and mining a code search engine usage log. Empir Softw Eng 17(4-5):424–466

    Article  Google Scholar 

  • Bassett BR, Kraft NA (2013) Structural information based term weighting in text retrieval for feature location. In: Proceedings of the international conference on program comprehension (ICPC’13), pp 133–141

  • Carpineto C, Romano G (2012) A survey of automatic query expansion in information retrieval. Comput Surv 44(1):1

    Article  MATH  Google Scholar 

  • Chaparro O, Marcus A (2016) On the reduction of verbose queries in text retrieval based software maintenance. In: Proceedings of the international conference on software engineering (ICSE’16), pp 716–718

  • Chaparro O, Florez JM, Marcus A (2017a) Using observed behavior to reformulate queries during text retrieval-based bug localization. In: Proceedings of the 33rd international conference on software maintenance and evolution (ICSME’17), pp 376–387

  • Chaparro O, Lu J, Zampetti F, Moreno L, Di Penta M, Marcus A, Bavota G, Ng V (2017b) Detecting missing information in bug descriptions. In: Proceedings of the joint meeting on foundations of software engineering (ESEC/FSE’17), pp 396–407

  • Chaparro O, Florez JM, Marcus A (2018) Replication package. https://tinyurl.com/y7bzqnwc

  • Damevski K, Shepherd D, Pollock L (2016) A field study of how developers locate features in source code. Empir Softw Eng 21(2):724–747

    Article  Google Scholar 

  • Dao T, Zhang L, Na M (2017) How does execution information help with information-retrieval based bug localization? In: Proceedings of the international conference on program comprehension (ICPC’17), pp 241–250

  • Davies S, Roper M, Wood M (2012) Using bug report similarity to enhance bug localisation. In: Proceedings of the working conference on reverse engineering (WCRE’12), pp 125–134

  • Davies S, Roper M (2014) What’s in a bug report? In: Proceedings of the international, symposium on empirical software engineering and measurement (ESEM’14), pp 26:1–26:10

  • De Lucia A, Marcus A, Oliveto R, Poshyvanyk D (2012) Information retrieval methods for automated traceability recovery. In: Cleland-Huang J, Gotel O, Zisman A (eds) Software and systems traceability. Springer, pp 71–98

  • Dietrich T, Cleland-Huang J, Shin Y (2013) Learning effective query transformations for enhanced requirements trace retrieval. In: Proceedings of the international conference on automated software engineering (ASE’13), pp 586–591

  • Dilshener T, Wermelinger M, Yu Y (2016) Locating bugs without looking back. In: Proceedings of the international conference on mining software repositories (MSR’16), pp 286–290

  • Dit B, Revelle M, Gethers M, Poshyvanyk D (2012) Feature location in source code A taxonomy and survey. J Softw Evol Process 25(1):53–95

    Article  Google Scholar 

  • Eddy BP, Kraft NA, Gray J (2018) Impact of structural weighting on a latent dirichlet allocation–based feature location technique. J Softw Evol Process 30(1):e1892

    Article  Google Scholar 

  • Gay G, Haiduc S, Marcus A, Menzies T (2009) On the use of relevance feedback in ir-based concept location. In: Proceedings of the international conference on software maintenance (ICSM’09), pp 351–360

  • Ge X, Shepherd DC, Damevski K, Murphy-Hill E (2017) Design and evaluation of a multi-recommendation system for local code search. J Vis Lang Comput 39:1–9

    Article  Google Scholar 

  • Gibiec M, Czauderna A, Cleland-Huang J (2010) Towards mining replacement queries for hard-to-retrieve traces. In: Proceedings of the international conference on automated software engineering (ASE’10), pp 245–254

  • Guo J, Gibiec M, Cleland-Huang J (2017) Tackling the term-mismatch problem in automated trace retrieval. Empir Softw Eng 22(3):1103–1142

    Article  Google Scholar 

  • Haiduc S, Bavota G, Marcus A, Oliveto R, De Lucia A, Menzies Tim (2013) Automatic query reformulations for text retrieval in software engineering. In: Proceedings of the international conference on software engineering (ICSE’13), pp 842–851

  • Hatcher E, Gospodnetic O (2004) Lucene in action. Manning Publications

  • Hill E, Roldan-Vega M, Fails JA, Mallet G (2014) Nl-based query refinement and contextualized code search results: A user study. In: Proceedings of the conference on software maintenance, reengineering, and reverse engineering (CSMR-WCRE’14), pp 34–43

  • Hoang TV, Oentaryo RJ, Le TB, Lo D (2018) Network-clustered multi-modal bug localization. IEEE Transactions on Software Engineering. (to appear)

  • Hollander M, Wolfe DA, Chicken E (2013) Nonparametric statistical methods, vol 751. Wiley, New York

    MATH  Google Scholar 

  • Just R, Jalali D, Ernst MD (2014) Defects4j: a database of existing faults to enable controlled testing studies for java programs. In: Proceedings of the international symposium on software testing and analysis (ISSTA’14). ACM, pp 437–440

  • Kevic K, Fritz T (2014) Automatic search term identification for change tasks. In: Proceedings of the international conference on software engineering (ICSE’14), pp 468–471

  • Lemos OAL, de Paula AC, Sajnani H, Lopes CV (2015) Can the use of types and query expansion help improve large-scale code search? In: Proceedings of the international working conference on source code analysis and manipulation (SCAM’15), pp 41–50

  • Le T-DB, Thung F, Lo D (2014) Predicting effectiveness of ir-based bug localization techniques. In: Proceedings of the 25th international symposium on software reliability engineering (ISSRE’14), pp 335–345

  • Le T-DB, Oentaryo RJ, Oentaryo RJ, Lo D (2015) Information retrieval and spectrum based bug localization: better together. In: Proceedings of the joint meeting on foundations of software engineering (ESEC/FSE’15), pp 579–590

  • Lee J, Kim D, Tegawendé F, Jung Bissyandé W, Le Traon Y (2018) Bench4bl: reproducibility study on the performance of ir-based bug localization. In: Proceedings of the 27th international symposium on software testing and analysis (ISSTA’18) ISSTA 2018, pp 61–72

  • Li Z, Wang T, Zhang Y, Zhan Y, Yin G (2016) Query reformulation by leveraging crowd wisdom for scenario-based software search. In: Proceedings of the Asia-Pacific symposium on internetware (Internetware’16), pp 36–44

  • Lu XA, Keefer RB (1995) Query expansion/reduction and its impact on retrieval effectiveness. NIST Special Publication, pp 231–231

  • Lucene Apache (2017) https://lucene.apache.org/

  • Lv F, Zhang H, Lou J-G, Wang S, Zhang D, Zhao J (2015) Codehow: effective code search based on api understanding and extended boolean model. In: Proceedings of the international conference on automated software engineering (ASE’15), pp 260–270

  • Manning CD, Surdeanu M, Bauer J, Finkel JR, Bethard S, McClosky D (2014) The stanford corenlp natural language processing toolkit. In: Proceedings of the annual meeting of the association for computational linguistics (ACL’14), pp 55–60

  • Marcus A, Sergeyev A, Rajlich V, Maletic JI (2004) An information retrieval approach to concept location in source code. In: Proceedings of the working conference on reverse engineering (WCRE’04), pp 214–223

  • Marcus A, Haiduc S (2013) Text retrieval approaches for concept location in source code. In: Software Engineering: International Summer Schools, ISSSE 2009-2011, Salerno, Italy. Revised Tutorial Lectures, volume 7171 of Lecture Notes in Computer Science. Springer, pp 126–158

  • Mills C, Bavota G, Haiduc S, Oliveto R, Marcus A, De Lucia A (2017) Predicting query quality for applications of text retrieval to software engineering tasks. Trans Softw Eng Methodol 26(1):3:1–3:45

    Google Scholar 

  • Mills C, Pantiuchina J, Parra E, Bavota G, Haiduc S (2018) Are bug reports enough for text retrieval-based bug localization? In: Proceedings of the 34th IEEE international conference on software maintenance and evolution (ICSME’18), pp 410–421

  • Moreno L, Treadway JJ, Marcus A, Shen W (2014) On the use of stack traces to improve text retrieval-based bug localization. In: Proceedings of the conference on software maintenance and evolution (ICSME’14), pp 151–160

  • Nguyen AT, Nguyen TT, Al-Kofahi J, Nguyen HV, Nguyen TN (2011) A topic-based approach for narrowing the search space of buggy files from a bug report. In: Proceedings of the international conference on automated software engineering (ASE’11), pp 263–272

  • Nichols BD (2010) Augmented bug localization using past bug information. In: Proceedings of the annual southeast regional conference (ACMSE’10), pp 1–6

  • Nie L, He J, Ren Z, Sun Z, Li X (2016) Query expansion based on crowd knowledge for code search. IEEE Trans Serv Comput 9(5):771–783

    Article  Google Scholar 

  • Ponzanelli L, Mocci A, Lanza M (2015) Stormed: stack overflow ready made data. In: Proceedings of 12th working conference on mining software repositories (MSR’15), pp 474–477

  • Porter MF (1980) An algorithm for suffix stripping. Program 14(3):130–137

    Article  Google Scholar 

  • Rahman MM, Roy CK (2016) Quickar: automatic query reformulation for concept location using crowdsourced knowledge. In: Proceedings of the international conference on automated software engineering (ASE’16), pp 220–225

  • Rahman MM, Roy CK (2017a) Strict: information retrieval based search term identification for concept location. In: Proceeding of the conference on software analysis, evolution, and reengineering (SANER’17), pp 79–90

  • Rahman MM, Roy CK (2017b) Improved query reformulation for concept location using coderank and document structures. In: Proceedings of the international conference on automated software engineering (ASE’17). IEEE Press, pp 428–439

  • Rahman Md M, Barson J, Paul S, Kayani J, Lois FA, Quezada SF, Parnin C, Stolee KT, Ray B (2018a) Evaluating how developers use general-purpose web-search for code retrieval. In: Proceedings of the 15th international conference on mining software repositories (MSR’18), pp 465–475

  • Rahman MM, Roy CK (2018b) Improving ir-based bug localization with context-aware query reformulation. In: Proceedings of the 26th joint meeting on foundations of software engineering (ESEC/FSE’18). (to appear)

  • Rao S, Kak A (2011) Retrieval from software libraries for bug localization: a comparative study of generic and composite text models. In: Proceedings of the working conference on mining software repositories (MSR’11), pp 43–52

  • Rath M, Lo D, Mäder P (2018) Analyzing requirements and traceability information to improve bug localization. In: Proceedings of the working conference on mining software repositories (MSR’18). ACM

  • Roldan-Vega M, Mallet G, Hill E, Fails JA (2013) Conquer: a tool for nl-based query refinement and contextualizing code search results. In: Proceedings of the international conference on software maintenance (ICSM’13), pp 512–515

  • Saha RK, Lease M, Khurshid S, Perry DE (2013) Improving bug localization using structured information retrieval. In: Proceedings of the international conference on automated software engineering (ASE’13), pp 345–355

  • Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620

    Article  MATH  Google Scholar 

  • Seaman CB (1999) Qualitative methods in empirical studies of software engineering. IEEE Trans Softw Eng 25(4):557–572

    Article  Google Scholar 

  • Shepherd D, Fry ZP, Hill E, Pollock L, Vijay-Shanker K (2007) Using natural language program analysis to locate and understand action-oriented concerns. In: Proceedings of the international conference on aspect-oriented software development (AOSD’07), pp 212–224

  • Shi Z, Keung J, Bennin KE, Zhang X (2018) Comparing learning to rank techniques in hybrid bug localization. Appl Soft Comput 62:636–648

    Article  Google Scholar 

  • Sim SE, Umarji M, Ratanotayanon S, Lopes CV (2011) How well do search engines support code retrieval on the web? ACM Trans Softw Eng Methodol 21(1):4

    Article  Google Scholar 

  • Sisman B, Kak AC (2012) Incorporating version histories in information retrieval based bug localization. In: Proceedings of the working conference on mining software repositories (MSR’12), pp 50–59

  • Sisman B, Kak AC (2013) Assisting code search with automatic query reformulation for bug localization. In: Proceedings of the working conference on mining software repositories (MSR’13), pp 309–318

  • Sisman B, Akbar SA, Kak AC (2016) Exploiting spatial code proximity and order for improved source code retrieval for bug localization. J Softw Evol Process 29 (1):e1805

    Article  Google Scholar 

  • Starke J, Luce C, Sillito J (2009) Searching and skimming: an exploratory study. In: Proceedings of the international conference on software maintenance (ICSM’09), pp 157–166

  • Takahashi A, Sae-Lim N, Hayashi S, Motoshi S (2018) Preliminary study on using code smells to improve bug localization. In: Proceedings of the international conference on program comprehension (ICPC’18). ACM, p 4

  • Wang S, Lo D (2014) Version history, similar report, and structure: putting them together for improved bug localization. In: Proceedings of the 22nd international conference on program comprehension (ICPC’14), pp 53–63

  • Wang S, Lo D, Lawall J (2014a) Compositional vector space models for improved bug localization. In: Proceedings of the conference on software maintenance and evolution (ICSME’14), pp 171–180

  • Wang S, Lo D, Jiang L (2014b) Active code search: incorporating user feedback to improve code search relevance. In: Proceedings of the 29th ACM/IEEE international conference on automated software engineering (ASE’14), pp 677–682

  • Wang S, Lo D (2016) Amalgam+: composing rich information sources for accurate bug localization. J Softw Evol Process 28(10):921–942

    Article  Google Scholar 

  • Wen M, Wu R, Cheung S (2016) Locus: locating bugs from software changes. In: Proceedings of the 31st international conference on automated software engineering (ASE’16), pp 262–273

  • Wong C-P, Xiong Y, Zhang H, Hao D, Lu Z, Mei H (2014) Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis. In: Proceedings of the conference on software maintenance and evolution (ICSME’14), pp 181–190

  • Xiao Y, Keung J, Bennin KE, Mi Q (2018) Improving bug localization with word embedding and enhanced convolutional neural networks. Information and Software Technology

  • Ye X, Bunescu R, Liu C (2016a) Mapping bug reports to relevant files: a ranking model, a fine-grained benchmark, and feature evaluation. IEEE Trans Softw Eng 42(4):379–402

    Article  Google Scholar 

  • Ye X, Shen H, Ma X, Bunescu R, Liu C (2016b) From word embeddings to document similarities for improved information retrieval in software engineering. In: Proceedings of the international conference on software engineering (ICSE’16), pp 404–415

  • Youm KC, Ahn J, Lee E (2017) Improved bug localization based on code change histories and bug reports. Inf Softw Technol 82:177–192

    Article  Google Scholar 

  • Zhang Y, Lo D, Xia X, Le TDB, Scanniello G, Sun J (2016) Inferring links between concerns and methods with multi-abstraction vector space model. In: Proceedings of the international conference on software maintenance and evolution (ICSME’16), pp 110–121

  • Zhou J, Zhang H, Lo D (2012) Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports. In: Proceedings of the international conference on software engineering (ICSE’12), pp 14–24

  • Yu Z, Tong Y, Chen T, Han J (2017) Augmenting bug localization with part-of-speech and invocation. Int J Softw Eng Knowl Eng 27(6):925–949

    Article  Google Scholar 

  • Zimmermann T, Premraj R, Bettenburg N, Just S, Schröter A, Weiss C (2010) What makes a good bug report? IEEE Trans Softw Eng 36(5):618–643

    Article  Google Scholar 

Download references

Acknowledgments

This research was supported in part by the grants CCF-1848608 and CCF-1526118 from the US National Science Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oscar Chaparro.

Additional information

Communicated by: Lu Zhang, Thomas Zimmermann, Xin Peng and Hong Mei

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chaparro, O., Florez, J.M. & Marcus, A. Using bug descriptions to reformulate queries during text-retrieval-based bug localization. Empir Software Eng 24, 2947–3007 (2019). https://doi.org/10.1007/s10664-018-9672-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-018-9672-z

Keywords

Navigation