Skip to main content
Log in

On the relationship between bug reports and queries for text retrieval-based bug localization

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

As societal dependence on software continues to grow, bugs are becoming increasingly costly in terms of financial resources as well as human safety. Bug localization is the process by which a developer identifies buggy code that needs to be fixed to make a system safer and more reliable. Unfortunately, manually attempting to locate bugs solely from the information in a bug report requires advanced knowledge of how a system is constructed and the way its constituent pieces interact. Therefore, previous work has investigated numerous techniques for reducing the human effort spent in bug localization. One of the most common approaches is Text Retrieval (TR) in which a system’s source code is indexed into a search space that is then queried for code relevant to a given bug report. In the last decade, dozens of papers have proposed improvements to bug localization using TR with largely positive results. However, several other studies have called the technique into question. According to these studies, evaluations of TR-based approaches often lack sufficient controls on biases that artificially inflate the results, namely: misclassified bugs, tangled commits, and localization hints. Here we argue that contemporary evaluations of TR approaches also include a negative bias that outweighs the previously identified positive biases: while TR approaches expect a natural language query, most evaluations simply formulate this query as the full text of a bug report. In this study we show that highly performing queries can be extracted from the bug report text, in order to make TR effective even without the aforementioned positive biases. Further, we analyze the provenance of terms in these highly performing queries to drive future work in automatic query extraction from bug reports.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. https://lucene.apache.org/core/2_9_4/api/core/org/apache/lucene/search/Similarity.html

  2. https://lucene.apache.org/core/8_2_0/core/org/apache/lucene/search/similarities/BM25Similarity.html

  3. https://lucene.apache.org/

  4. We use the term “near-optimal” since the GA will ultimately converge to a local optimal solution which may or may not be the global optima.

  5. http://jmetal.sourceforge.net

References

  • Bettenburg N, Just S, Schröter A, Weiss C, Premraj R, Zimmermann T (2008) What makes a good bug report?. In: Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE’08). ACM, Atlanta, pp 308–318

  • Carpineto C, Romano G (2012) A survey of automatic query expansion in information retrieval. ACM Comput Surv (CSUR) 44(1):1

    Article  Google Scholar 

  • Chaparro O, Marcus A (2016) On the reduction of verbose queries in text retrieval based software maintenance. In: Proceedings of the 38th ACM/IEEE international conf. on software engineering, Austin, pp 716–718

  • Chaparro O, Florez J M, Marcus A (2017a) Using observed behavior to reformulate queries during text retrieval-based bug localization. In: Proceedings of the 33rd IEEE International Conference on Software Maintenance and Evolution (ICSME’17). IEEE, Shangai, pp 376–387

  • Chaparro O, Lu J, Zampetti F, Moreno L, Di Penta M, Marcus A, Bavota G, Ng V (2017b) Detecting missing information in bug descriptions. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. ACM, New York, pp 396–407. ESEC/FSE 2017

  • Chaparro O, Florez J M, Marcus A (2019) Using bug descriptions to reformulate queries during text-retrieval-based bug localization. Empirical Software Engineering, pp 1–61

  • Devore JL, Farnum N (1999) Applied Statistics for Engineers and Scientists. Duxbury

  • Dit B, Revelle M, Gethers M, Poshyvanyk D (2013) Feature location in source code: a taxonomy and survey. J Softw Evol Process 25(1):53–95

    Article  Google Scholar 

  • Gay G, Haiduc S, Marcus A, Menzies T (2009) On the use of relevance feedback in ir-based concept location. In: Proceedings of the 25th IEEE International Conf. on Software Maintenance, Edmonton, pp 351–360

  • Grissom RJ, Kim JJ (2005) Effect Sizes for Research: A Broad Practical Approach, 2nd edn. Lawrence Earlbaum Associates

  • Haiduc S, Bavota G, Oliveto R, Marcus A, De Lucia A (2012) Evaluating the specificity of text retrieval queries to support software engineering tasks. In: Proceedings of the 34th IEEE/ACM International Conference on Software Engineering (ICSE’12), Zurich, pp 1273–1276

  • Haiduc S, Bavota G, Marcus A, Oliveto R, De Lucia A, Menzies T (2013) Automatic query reformulations for text retrieval in software engineering. In: Proceedings of the 35th ACM/IEEE International Conference on Software Engineering (ICSE’13). IEEE, San Francisco, pp 842–851

  • Herzig K, Zeller A (2013) The impact of tangled code changes. In: Proceedings of the 10th IEEE Working Conference on Mining Software Repositories (MSR’13). IEEE, San Francisco, pp 121–130

  • Kawrykow D, Robillard MP (2011) Non-essential changes in version histories. In: Proceedings of the 33rd IEEE/ACM International Conference on Software Engineering (ICSE,11). IEEE, Waikiki, pp 351–360

  • Kim M, Lee E (2019) A novel approach to automatic query reformulation for ir-based bug localization. In: Proc. of the 34th ACM/SIGAPP Symposium on Applied Computing, SAC’19. ACM, New York, pp 1752–1759

  • Kochhar P S, Tian Y, Lo D (2014) Potential Biases in bug localization: Do they matter?. In: Proceedings of the 29th ACM/IEEE international conf. on automated software engineering (ASE’14). ACM, Vasteras, pp 803–814

  • Lawrie D, Binkley D (2018) On the value of bug reports for retrieval-based bug localization. In: 2018 IEEE International conference on software maintenance and evolution (ICSME). IEEE, pp 524–528

  • Linstead E, Bajracharya S, Ngo T, Rigor P, Lopes C, Baldi P (2008) Sourcerer: Mining and searching internet-scale software repositories. Data Min Knowl Disc 18(2):300–336

    Article  MathSciNet  Google Scholar 

  • Lukins S K, Kraft N A, Etzkorn LH (2008) Source code retrieval for bug localization using latent dirichlet allocation. In: Proceedings of the 15th IEEE Working Conference on Reverse Engineering (WCRE’08), Koblenz-Landau, pp 155–164

  • Lukins S K, Kraft N A, Etzkorn L H (2010) Bug localization using latent dirichlet allocation. Inf Softw Technol 52(9):972–990

    Article  Google Scholar 

  • Marcus A, Sergeyev A, Rajlich V, Maletic JI (2004) An information retrieval approach to concept location in source code. In: Proceedings of the 11th IEEE Working Conference on Reverse Engineering (WCRE’04). IEEE, Delft, pp 214-223

  • Marcus A, Antoniol G (2012) On the use of text retrieval techniques in software engineering. In: Proceedings of the 34th IEEE/ACM International Conf. on Software Engineering (ICSE’12), Technical Briefing. IEEE, Zurich

  • McMillan C, Grechanik M, Poshyvanyk D, Xie Q, Fu C (2011) Portfolio: Finding Relevant functions and their usage. In: Proceedings of the 33rd ACM/IEEE International Conference on Software Engineering (ICSE’11). ACM, Waikiki, pp 111–120

  • Mills C, Bavota G, Haiduc S, Oliveto R, Marcus A, Lucia A D (2017) Predicting query quality for applications of text retrieval to software engineering tasks. ACM Trans Softw Eng Methodol (TOSEM) 26(1):3

    Article  Google Scholar 

  • Mills C, Pantiuchina J, Parra E, Bavota G, Haiduc S (2018) Are bug reports enough for text retrieval-based bug localization?. In: 2018 IEEE International conference on software maintenance and evolution, ICSME 2018, Madrid, pp 381–392

  • Mills C (2019) Replication package. http://www.cs.fsu.edu/~serene/mills2019-emse-bugs/

  • Mitchell M (1998) An introduction to genetic algorithms. MIT Press, Cambridge

  • Poshyvanyk D, Marcus A, Dong Y, Sergeyev A (2005) Iriss-a source code exploration tool. In: Proceedings of the 21st IEEE International Conference on Software Maintenance (ICSM’05). IEEE, Budapest, pp 25–30

  • Rahman MM, Roy CK (2017a) Improved query reformulation for concept location using coderank and document structures. In: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE Press, pp 428–439

  • Rahman MM, Roy CK (2017b) Strict: Information retrieval based search term identification for concept location. In: Proceedings of the 24th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER’17), pp 79–90. https://doi.org/10.1109/SANER.2017.7884611

  • Rahman M M, Roy CK (2018) Improving ir-based bug localization with context-aware query reformulation. In: Proceedings of the 26th ACM joint meeting on european software engineering conf. and symposium on the foundations of software engineering, ACM, New York, pp 621–632

  • Rao S, Kak A (2011) Retrieval from software libraries for bug localization: A comparative study of generic and composite text models. In: Proceedings of the 8th IEEE Working Conference on Mining Software Repositories (MSR’11), ACM, Waikiki, pp 43–52

  • Razzaq A, Wasala A, Exton C, Buckley J (2018) The state of empirical evaluation in static feature location. ACM Trans Softw Eng Methodol (TOSEM) 28(1):2

    Google Scholar 

  • Roldan-vega M, Mallet G, Hill E, Fails JA (2013) Conquer: A tool for nl-based query refinement and contextualizing source code search results. In: Proceedings of the 29th IEEE International Conference on Software Maintenance (ICSM’13). IEEE, Eindhoven, pp 512–515

  • Saha R K, Lease M, Khurshid S, Perry DE (2013) Improving bug localization using structured information retrieval. In: Proceedings of the 28th IEEE International Conference on Automated Software Engineering (ASE’13). IEEE, Palo Alto, pp 345–355

  • Salton G, Wong A, Yang C S (1975) A vector space model for information retrieval. Commun ACM 18(11):613–620

    Article  Google Scholar 

  • Savage T, Revelle M, Poshyvanyk D (2010) Flat3: Feature location and textual tracing tool. In: Proceedings of the 32nd IEEE/ACM International Conference on Software Engineering (ICSE’10), vol 2. IEEE, Cape Town, pp 255–258

  • Shepherd D, Fry Z, Gibson E, Pollock L, Vijay-shanker K (2007) Using natural language program analysis to locate and understand action-oriented concerns. In: Proc. of the 6th international conf. on aspect oriented software development. ACM, Vancouver, pp 212–224

  • Shepherd D, Damevski K, Ropski B, Fritz T (2012) Sando: An extensible local code search framework. In: Proceedings of the 20th ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE’12). ACM, Cary, pp 15:1–15:2

  • Strasser H, Weber C (1999) On the asymptotic theory of permutation statistics. Mathematical Methods of Statistics 2

  • Wang S, Lo D (2014a) Version history, similar report, and structure: Putting them together for improved bug localization. In: Proceedings of the 22nd IEEE International Conference on Program Comprehension (ICPC’14). IEEE, pp 53–63

  • Wang S, Lo D, Lawall J (2014b) Compositional vector space models for improved bug localization. In: Proceeedings of the 30th IEEE international conf. on software maintenance and evolution, IEEE, Victoria, pp 171–180

  • Wang Q, Parnin C, Orso A (2015) Evaluating the usefulness of ir-based fault localization techniques. In: Proceedings of the 24th international symposium on software testing and analysis. ACM, Baltimore, pp 1-11

  • Wong SKM, Ziarko W, Wong PCN (1985) Generalized vector spaces model in information retrieval. In: Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’85. ACM, New York, pp 18–25

  • Ye X, Bunescu R, Liu C (2015) Mapping bug reports to relevant files: a ranking model, a fine-grained benchmark, and feature evaluation. IEEE Trans Softw Eng 42(4):379–402

    Article  Google Scholar 

  • Youm K C, Ahn J, Lee E (2017) Improved bug localization based on code change histories and bug reports. Inf Softw Technol 82:177–192

    Article  Google Scholar 

  • Zhao W, Zhang L, Liu Y, Sun J, Yang F (2006) SNIAFL: Towards a static non-interactive approach to feature location. ACM Trans Softw Eng Methodol 15(2):195–226

    Article  Google Scholar 

  • Zhou J, Zhang H, Lo D (2012) Where should the bugs be fixed?-more accurate information retrieval-based bug localization based on bug reports. In: Proceedings of the 34th IEEE International Conference on Software Engineering (ICSE’12). IEEE, Zurich, pp 14–24

Download references

Acknowledgements

Sonia Haiduc and Esteban Parra were supported in part by the National Science Foundation grants CCF-1846142 and CCF-1644285. Gabriele Bavota and Jevgenija Pantiuchina acknowledge the support by the Swiss National Science Foundation through the JITRA project, No. 172479.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chris Mills.

Additional information

Communicated by: David Lo and Foutse Khomh

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Software Maintenance and Evolution (ICSME)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mills, C., Parra, E., Pantiuchina, J. et al. On the relationship between bug reports and queries for text retrieval-based bug localization. Empir Software Eng 25, 3086–3127 (2020). https://doi.org/10.1007/s10664-020-09823-w

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-020-09823-w

Keywords

Navigation