On the relationship between bug reports and queries for text retrieval-based bug localization

Mills, Chris; Parra, Esteban; Pantiuchina, Jevgenija; Bavota, Gabriele; Haiduc, Sonia

doi:10.1007/s10664-020-09823-w

On the relationship between bug reports and queries for text retrieval-based bug localization

Published: 13 July 2020

Volume 25, pages 3086–3127, (2020)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Chris Mills¹,
Esteban Parra ORCID: orcid.org/0000-0001-9813-9518¹,
Jevgenija Pantiuchina²,
Gabriele Bavota² &
…
Sonia Haiduc¹

891 Accesses
13 Citations
Explore all metrics

Abstract

As societal dependence on software continues to grow, bugs are becoming increasingly costly in terms of financial resources as well as human safety. Bug localization is the process by which a developer identifies buggy code that needs to be fixed to make a system safer and more reliable. Unfortunately, manually attempting to locate bugs solely from the information in a bug report requires advanced knowledge of how a system is constructed and the way its constituent pieces interact. Therefore, previous work has investigated numerous techniques for reducing the human effort spent in bug localization. One of the most common approaches is Text Retrieval (TR) in which a system’s source code is indexed into a search space that is then queried for code relevant to a given bug report. In the last decade, dozens of papers have proposed improvements to bug localization using TR with largely positive results. However, several other studies have called the technique into question. According to these studies, evaluations of TR-based approaches often lack sufficient controls on biases that artificially inflate the results, namely: misclassified bugs, tangled commits, and localization hints. Here we argue that contemporary evaluations of TR approaches also include a negative bias that outweighs the previously identified positive biases: while TR approaches expect a natural language query, most evaluations simply formulate this query as the full text of a bug report. In this study we show that highly performing queries can be extracted from the bug report text, in order to make TR effective even without the aforementioned positive biases. Further, we analyze the provenance of terms in these highly performing queries to drive future work in automatic query extraction from bug reports.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Use of Artificial Intelligence in Writing Scientific Review Articles

Article Open access 16 January 2024

Large Language Model Assisted Software Engineering: Prospects, Challenges, and a Case Study

Sampling in software engineering research: a critical review and guidelines

Article 28 April 2022

Notes

https://lucene.apache.org/core/2_9_4/api/core/org/apache/lucene/search/Similarity.html
https://lucene.apache.org/core/8_2_0/core/org/apache/lucene/search/similarities/BM25Similarity.html
https://lucene.apache.org/
We use the term “near-optimal” since the GA will ultimately converge to a local optimal solution which may or may not be the global optima.
http://jmetal.sourceforge.net

References

Bettenburg N, Just S, Schröter A, Weiss C, Premraj R, Zimmermann T (2008) What makes a good bug report?. In: Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE’08). ACM, Atlanta, pp 308–318
Carpineto C, Romano G (2012) A survey of automatic query expansion in information retrieval. ACM Comput Surv (CSUR) 44(1):1
Article Google Scholar
Chaparro O, Marcus A (2016) On the reduction of verbose queries in text retrieval based software maintenance. In: Proceedings of the 38th ACM/IEEE international conf. on software engineering, Austin, pp 716–718
Chaparro O, Florez J M, Marcus A (2017a) Using observed behavior to reformulate queries during text retrieval-based bug localization. In: Proceedings of the 33rd IEEE International Conference on Software Maintenance and Evolution (ICSME’17). IEEE, Shangai, pp 376–387
Chaparro O, Lu J, Zampetti F, Moreno L, Di Penta M, Marcus A, Bavota G, Ng V (2017b) Detecting missing information in bug descriptions. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. ACM, New York, pp 396–407. ESEC/FSE 2017
Chaparro O, Florez J M, Marcus A (2019) Using bug descriptions to reformulate queries during text-retrieval-based bug localization. Empirical Software Engineering, pp 1–61
Devore JL, Farnum N (1999) Applied Statistics for Engineers and Scientists. Duxbury
Dit B, Revelle M, Gethers M, Poshyvanyk D (2013) Feature location in source code: a taxonomy and survey. J Softw Evol Process 25(1):53–95
Article Google Scholar
Gay G, Haiduc S, Marcus A, Menzies T (2009) On the use of relevance feedback in ir-based concept location. In: Proceedings of the 25th IEEE International Conf. on Software Maintenance, Edmonton, pp 351–360
Grissom RJ, Kim JJ (2005) Effect Sizes for Research: A Broad Practical Approach, 2nd edn. Lawrence Earlbaum Associates
Haiduc S, Bavota G, Oliveto R, Marcus A, De Lucia A (2012) Evaluating the specificity of text retrieval queries to support software engineering tasks. In: Proceedings of the 34th IEEE/ACM International Conference on Software Engineering (ICSE’12), Zurich, pp 1273–1276
Haiduc S, Bavota G, Marcus A, Oliveto R, De Lucia A, Menzies T (2013) Automatic query reformulations for text retrieval in software engineering. In: Proceedings of the 35th ACM/IEEE International Conference on Software Engineering (ICSE’13). IEEE, San Francisco, pp 842–851
Herzig K, Zeller A (2013) The impact of tangled code changes. In: Proceedings of the 10th IEEE Working Conference on Mining Software Repositories (MSR’13). IEEE, San Francisco, pp 121–130
Kawrykow D, Robillard MP (2011) Non-essential changes in version histories. In: Proceedings of the 33rd IEEE/ACM International Conference on Software Engineering (ICSE,11). IEEE, Waikiki, pp 351–360
Kim M, Lee E (2019) A novel approach to automatic query reformulation for ir-based bug localization. In: Proc. of the 34th ACM/SIGAPP Symposium on Applied Computing, SAC’19. ACM, New York, pp 1752–1759
Kochhar P S, Tian Y, Lo D (2014) Potential Biases in bug localization: Do they matter?. In: Proceedings of the 29th ACM/IEEE international conf. on automated software engineering (ASE’14). ACM, Vasteras, pp 803–814
Lawrie D, Binkley D (2018) On the value of bug reports for retrieval-based bug localization. In: 2018 IEEE International conference on software maintenance and evolution (ICSME). IEEE, pp 524–528
Linstead E, Bajracharya S, Ngo T, Rigor P, Lopes C, Baldi P (2008) Sourcerer: Mining and searching internet-scale software repositories. Data Min Knowl Disc 18(2):300–336
Article MathSciNet Google Scholar
Lukins S K, Kraft N A, Etzkorn LH (2008) Source code retrieval for bug localization using latent dirichlet allocation. In: Proceedings of the 15th IEEE Working Conference on Reverse Engineering (WCRE’08), Koblenz-Landau, pp 155–164
Lukins S K, Kraft N A, Etzkorn L H (2010) Bug localization using latent dirichlet allocation. Inf Softw Technol 52(9):972–990
Article Google Scholar
Marcus A, Sergeyev A, Rajlich V, Maletic JI (2004) An information retrieval approach to concept location in source code. In: Proceedings of the 11th IEEE Working Conference on Reverse Engineering (WCRE’04). IEEE, Delft, pp 214-223
Marcus A, Antoniol G (2012) On the use of text retrieval techniques in software engineering. In: Proceedings of the 34th IEEE/ACM International Conf. on Software Engineering (ICSE’12), Technical Briefing. IEEE, Zurich
McMillan C, Grechanik M, Poshyvanyk D, Xie Q, Fu C (2011) Portfolio: Finding Relevant functions and their usage. In: Proceedings of the 33rd ACM/IEEE International Conference on Software Engineering (ICSE’11). ACM, Waikiki, pp 111–120
Mills C, Bavota G, Haiduc S, Oliveto R, Marcus A, Lucia A D (2017) Predicting query quality for applications of text retrieval to software engineering tasks. ACM Trans Softw Eng Methodol (TOSEM) 26(1):3
Article Google Scholar
Mills C, Pantiuchina J, Parra E, Bavota G, Haiduc S (2018) Are bug reports enough for text retrieval-based bug localization?. In: 2018 IEEE International conference on software maintenance and evolution, ICSME 2018, Madrid, pp 381–392
Mills C (2019) Replication package. http://www.cs.fsu.edu/~serene/mills2019-emse-bugs/
Mitchell M (1998) An introduction to genetic algorithms. MIT Press, Cambridge
Poshyvanyk D, Marcus A, Dong Y, Sergeyev A (2005) Iriss-a source code exploration tool. In: Proceedings of the 21st IEEE International Conference on Software Maintenance (ICSM’05). IEEE, Budapest, pp 25–30
Rahman MM, Roy CK (2017a) Improved query reformulation for concept location using coderank and document structures. In: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE Press, pp 428–439
Rahman MM, Roy CK (2017b) Strict: Information retrieval based search term identification for concept location. In: Proceedings of the 24th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER’17), pp 79–90. https://doi.org/10.1109/SANER.2017.7884611
Rahman M M, Roy CK (2018) Improving ir-based bug localization with context-aware query reformulation. In: Proceedings of the 26th ACM joint meeting on european software engineering conf. and symposium on the foundations of software engineering, ACM, New York, pp 621–632
Rao S, Kak A (2011) Retrieval from software libraries for bug localization: A comparative study of generic and composite text models. In: Proceedings of the 8th IEEE Working Conference on Mining Software Repositories (MSR’11), ACM, Waikiki, pp 43–52
Razzaq A, Wasala A, Exton C, Buckley J (2018) The state of empirical evaluation in static feature location. ACM Trans Softw Eng Methodol (TOSEM) 28(1):2
Google Scholar
Roldan-vega M, Mallet G, Hill E, Fails JA (2013) Conquer: A tool for nl-based query refinement and contextualizing source code search results. In: Proceedings of the 29th IEEE International Conference on Software Maintenance (ICSM’13). IEEE, Eindhoven, pp 512–515
Saha R K, Lease M, Khurshid S, Perry DE (2013) Improving bug localization using structured information retrieval. In: Proceedings of the 28th IEEE International Conference on Automated Software Engineering (ASE’13). IEEE, Palo Alto, pp 345–355
Salton G, Wong A, Yang C S (1975) A vector space model for information retrieval. Commun ACM 18(11):613–620
Article Google Scholar
Savage T, Revelle M, Poshyvanyk D (2010) Flat3: Feature location and textual tracing tool. In: Proceedings of the 32nd IEEE/ACM International Conference on Software Engineering (ICSE’10), vol 2. IEEE, Cape Town, pp 255–258
Shepherd D, Fry Z, Gibson E, Pollock L, Vijay-shanker K (2007) Using natural language program analysis to locate and understand action-oriented concerns. In: Proc. of the 6th international conf. on aspect oriented software development. ACM, Vancouver, pp 212–224
Shepherd D, Damevski K, Ropski B, Fritz T (2012) Sando: An extensible local code search framework. In: Proceedings of the 20th ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE’12). ACM, Cary, pp 15:1–15:2
Strasser H, Weber C (1999) On the asymptotic theory of permutation statistics. Mathematical Methods of Statistics 2
Wang S, Lo D (2014a) Version history, similar report, and structure: Putting them together for improved bug localization. In: Proceedings of the 22nd IEEE International Conference on Program Comprehension (ICPC’14). IEEE, pp 53–63
Wang S, Lo D, Lawall J (2014b) Compositional vector space models for improved bug localization. In: Proceeedings of the 30th IEEE international conf. on software maintenance and evolution, IEEE, Victoria, pp 171–180
Wang Q, Parnin C, Orso A (2015) Evaluating the usefulness of ir-based fault localization techniques. In: Proceedings of the 24th international symposium on software testing and analysis. ACM, Baltimore, pp 1-11
Wong SKM, Ziarko W, Wong PCN (1985) Generalized vector spaces model in information retrieval. In: Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’85. ACM, New York, pp 18–25
Ye X, Bunescu R, Liu C (2015) Mapping bug reports to relevant files: a ranking model, a fine-grained benchmark, and feature evaluation. IEEE Trans Softw Eng 42(4):379–402
Article Google Scholar
Youm K C, Ahn J, Lee E (2017) Improved bug localization based on code change histories and bug reports. Inf Softw Technol 82:177–192
Article Google Scholar
Zhao W, Zhang L, Liu Y, Sun J, Yang F (2006) SNIAFL: Towards a static non-interactive approach to feature location. ACM Trans Softw Eng Methodol 15(2):195–226
Article Google Scholar
Zhou J, Zhang H, Lo D (2012) Where should the bugs be fixed?-more accurate information retrieval-based bug localization based on bug reports. In: Proceedings of the 34th IEEE International Conference on Software Engineering (ICSE’12). IEEE, Zurich, pp 14–24

Download references

Acknowledgements

Sonia Haiduc and Esteban Parra were supported in part by the National Science Foundation grants CCF-1846142 and CCF-1644285. Gabriele Bavota and Jevgenija Pantiuchina acknowledge the support by the Swiss National Science Foundation through the JITRA project, No. 172479.

Author information

Authors and Affiliations

Florida State University, 600 W College Ave, Tallahassee, FL, 32306, USA
Chris Mills, Esteban Parra & Sonia Haiduc
Università della Svizzera italiana, Via Giuseppe Buffi 13, 6900, Lugano, Switzerland
Jevgenija Pantiuchina & Gabriele Bavota

Authors

Chris Mills
View author publications
You can also search for this author in PubMed Google Scholar
Esteban Parra
View author publications
You can also search for this author in PubMed Google Scholar
Jevgenija Pantiuchina
View author publications
You can also search for this author in PubMed Google Scholar
Gabriele Bavota
View author publications
You can also search for this author in PubMed Google Scholar
Sonia Haiduc
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chris Mills.

Additional information

Communicated by: David Lo and Foutse Khomh

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Software Maintenance and Evolution (ICSME)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mills, C., Parra, E., Pantiuchina, J. et al. On the relationship between bug reports and queries for text retrieval-based bug localization. Empir Software Eng 25, 3086–3127 (2020). https://doi.org/10.1007/s10664-020-09823-w

Download citation

Published: 13 July 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s10664-020-09823-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the relationship between bug reports and queries for text retrieval-based bug localization

Abstract

Access this article

Similar content being viewed by others

The Use of Artificial Intelligence in Writing Scientific Review Articles

Large Language Model Assisted Software Engineering: Prospects, Challenges, and a Case Study

Sampling in software engineering research: a critical review and guidelines

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On the relationship between bug reports and queries for text retrieval-based bug localization

Abstract

Access this article

Similar content being viewed by others

The Use of Artificial Intelligence in Writing Scientific Review Articles

Large Language Model Assisted Software Engineering: Prospects, Challenges, and a Case Study

Sampling in software engineering research: a critical review and guidelines

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation