Abstract
As societal dependence on software continues to grow, bugs are becoming increasingly costly in terms of financial resources as well as human safety. Bug localization is the process by which a developer identifies buggy code that needs to be fixed to make a system safer and more reliable. Unfortunately, manually attempting to locate bugs solely from the information in a bug report requires advanced knowledge of how a system is constructed and the way its constituent pieces interact. Therefore, previous work has investigated numerous techniques for reducing the human effort spent in bug localization. One of the most common approaches is Text Retrieval (TR) in which a system’s source code is indexed into a search space that is then queried for code relevant to a given bug report. In the last decade, dozens of papers have proposed improvements to bug localization using TR with largely positive results. However, several other studies have called the technique into question. According to these studies, evaluations of TR-based approaches often lack sufficient controls on biases that artificially inflate the results, namely: misclassified bugs, tangled commits, and localization hints. Here we argue that contemporary evaluations of TR approaches also include a negative bias that outweighs the previously identified positive biases: while TR approaches expect a natural language query, most evaluations simply formulate this query as the full text of a bug report. In this study we show that highly performing queries can be extracted from the bug report text, in order to make TR effective even without the aforementioned positive biases. Further, we analyze the provenance of terms in these highly performing queries to drive future work in automatic query extraction from bug reports.
Similar content being viewed by others
Notes
We use the term “near-optimal” since the GA will ultimately converge to a local optimal solution which may or may not be the global optima.
References
Bettenburg N, Just S, Schröter A, Weiss C, Premraj R, Zimmermann T (2008) What makes a good bug report?. In: Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE’08). ACM, Atlanta, pp 308–318
Carpineto C, Romano G (2012) A survey of automatic query expansion in information retrieval. ACM Comput Surv (CSUR) 44(1):1
Chaparro O, Marcus A (2016) On the reduction of verbose queries in text retrieval based software maintenance. In: Proceedings of the 38th ACM/IEEE international conf. on software engineering, Austin, pp 716–718
Chaparro O, Florez J M, Marcus A (2017a) Using observed behavior to reformulate queries during text retrieval-based bug localization. In: Proceedings of the 33rd IEEE International Conference on Software Maintenance and Evolution (ICSME’17). IEEE, Shangai, pp 376–387
Chaparro O, Lu J, Zampetti F, Moreno L, Di Penta M, Marcus A, Bavota G, Ng V (2017b) Detecting missing information in bug descriptions. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. ACM, New York, pp 396–407. ESEC/FSE 2017
Chaparro O, Florez J M, Marcus A (2019) Using bug descriptions to reformulate queries during text-retrieval-based bug localization. Empirical Software Engineering, pp 1–61
Devore JL, Farnum N (1999) Applied Statistics for Engineers and Scientists. Duxbury
Dit B, Revelle M, Gethers M, Poshyvanyk D (2013) Feature location in source code: a taxonomy and survey. J Softw Evol Process 25(1):53–95
Gay G, Haiduc S, Marcus A, Menzies T (2009) On the use of relevance feedback in ir-based concept location. In: Proceedings of the 25th IEEE International Conf. on Software Maintenance, Edmonton, pp 351–360
Grissom RJ, Kim JJ (2005) Effect Sizes for Research: A Broad Practical Approach, 2nd edn. Lawrence Earlbaum Associates
Haiduc S, Bavota G, Oliveto R, Marcus A, De Lucia A (2012) Evaluating the specificity of text retrieval queries to support software engineering tasks. In: Proceedings of the 34th IEEE/ACM International Conference on Software Engineering (ICSE’12), Zurich, pp 1273–1276
Haiduc S, Bavota G, Marcus A, Oliveto R, De Lucia A, Menzies T (2013) Automatic query reformulations for text retrieval in software engineering. In: Proceedings of the 35th ACM/IEEE International Conference on Software Engineering (ICSE’13). IEEE, San Francisco, pp 842–851
Herzig K, Zeller A (2013) The impact of tangled code changes. In: Proceedings of the 10th IEEE Working Conference on Mining Software Repositories (MSR’13). IEEE, San Francisco, pp 121–130
Kawrykow D, Robillard MP (2011) Non-essential changes in version histories. In: Proceedings of the 33rd IEEE/ACM International Conference on Software Engineering (ICSE,11). IEEE, Waikiki, pp 351–360
Kim M, Lee E (2019) A novel approach to automatic query reformulation for ir-based bug localization. In: Proc. of the 34th ACM/SIGAPP Symposium on Applied Computing, SAC’19. ACM, New York, pp 1752–1759
Kochhar P S, Tian Y, Lo D (2014) Potential Biases in bug localization: Do they matter?. In: Proceedings of the 29th ACM/IEEE international conf. on automated software engineering (ASE’14). ACM, Vasteras, pp 803–814
Lawrie D, Binkley D (2018) On the value of bug reports for retrieval-based bug localization. In: 2018 IEEE International conference on software maintenance and evolution (ICSME). IEEE, pp 524–528
Linstead E, Bajracharya S, Ngo T, Rigor P, Lopes C, Baldi P (2008) Sourcerer: Mining and searching internet-scale software repositories. Data Min Knowl Disc 18(2):300–336
Lukins S K, Kraft N A, Etzkorn LH (2008) Source code retrieval for bug localization using latent dirichlet allocation. In: Proceedings of the 15th IEEE Working Conference on Reverse Engineering (WCRE’08), Koblenz-Landau, pp 155–164
Lukins S K, Kraft N A, Etzkorn L H (2010) Bug localization using latent dirichlet allocation. Inf Softw Technol 52(9):972–990
Marcus A, Sergeyev A, Rajlich V, Maletic JI (2004) An information retrieval approach to concept location in source code. In: Proceedings of the 11th IEEE Working Conference on Reverse Engineering (WCRE’04). IEEE, Delft, pp 214-223
Marcus A, Antoniol G (2012) On the use of text retrieval techniques in software engineering. In: Proceedings of the 34th IEEE/ACM International Conf. on Software Engineering (ICSE’12), Technical Briefing. IEEE, Zurich
McMillan C, Grechanik M, Poshyvanyk D, Xie Q, Fu C (2011) Portfolio: Finding Relevant functions and their usage. In: Proceedings of the 33rd ACM/IEEE International Conference on Software Engineering (ICSE’11). ACM, Waikiki, pp 111–120
Mills C, Bavota G, Haiduc S, Oliveto R, Marcus A, Lucia A D (2017) Predicting query quality for applications of text retrieval to software engineering tasks. ACM Trans Softw Eng Methodol (TOSEM) 26(1):3
Mills C, Pantiuchina J, Parra E, Bavota G, Haiduc S (2018) Are bug reports enough for text retrieval-based bug localization?. In: 2018 IEEE International conference on software maintenance and evolution, ICSME 2018, Madrid, pp 381–392
Mills C (2019) Replication package. http://www.cs.fsu.edu/~serene/mills2019-emse-bugs/
Mitchell M (1998) An introduction to genetic algorithms. MIT Press, Cambridge
Poshyvanyk D, Marcus A, Dong Y, Sergeyev A (2005) Iriss-a source code exploration tool. In: Proceedings of the 21st IEEE International Conference on Software Maintenance (ICSM’05). IEEE, Budapest, pp 25–30
Rahman MM, Roy CK (2017a) Improved query reformulation for concept location using coderank and document structures. In: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE Press, pp 428–439
Rahman MM, Roy CK (2017b) Strict: Information retrieval based search term identification for concept location. In: Proceedings of the 24th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER’17), pp 79–90. https://doi.org/10.1109/SANER.2017.7884611
Rahman M M, Roy CK (2018) Improving ir-based bug localization with context-aware query reformulation. In: Proceedings of the 26th ACM joint meeting on european software engineering conf. and symposium on the foundations of software engineering, ACM, New York, pp 621–632
Rao S, Kak A (2011) Retrieval from software libraries for bug localization: A comparative study of generic and composite text models. In: Proceedings of the 8th IEEE Working Conference on Mining Software Repositories (MSR’11), ACM, Waikiki, pp 43–52
Razzaq A, Wasala A, Exton C, Buckley J (2018) The state of empirical evaluation in static feature location. ACM Trans Softw Eng Methodol (TOSEM) 28(1):2
Roldan-vega M, Mallet G, Hill E, Fails JA (2013) Conquer: A tool for nl-based query refinement and contextualizing source code search results. In: Proceedings of the 29th IEEE International Conference on Software Maintenance (ICSM’13). IEEE, Eindhoven, pp 512–515
Saha R K, Lease M, Khurshid S, Perry DE (2013) Improving bug localization using structured information retrieval. In: Proceedings of the 28th IEEE International Conference on Automated Software Engineering (ASE’13). IEEE, Palo Alto, pp 345–355
Salton G, Wong A, Yang C S (1975) A vector space model for information retrieval. Commun ACM 18(11):613–620
Savage T, Revelle M, Poshyvanyk D (2010) Flat3: Feature location and textual tracing tool. In: Proceedings of the 32nd IEEE/ACM International Conference on Software Engineering (ICSE’10), vol 2. IEEE, Cape Town, pp 255–258
Shepherd D, Fry Z, Gibson E, Pollock L, Vijay-shanker K (2007) Using natural language program analysis to locate and understand action-oriented concerns. In: Proc. of the 6th international conf. on aspect oriented software development. ACM, Vancouver, pp 212–224
Shepherd D, Damevski K, Ropski B, Fritz T (2012) Sando: An extensible local code search framework. In: Proceedings of the 20th ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE’12). ACM, Cary, pp 15:1–15:2
Strasser H, Weber C (1999) On the asymptotic theory of permutation statistics. Mathematical Methods of Statistics 2
Wang S, Lo D (2014a) Version history, similar report, and structure: Putting them together for improved bug localization. In: Proceedings of the 22nd IEEE International Conference on Program Comprehension (ICPC’14). IEEE, pp 53–63
Wang S, Lo D, Lawall J (2014b) Compositional vector space models for improved bug localization. In: Proceeedings of the 30th IEEE international conf. on software maintenance and evolution, IEEE, Victoria, pp 171–180
Wang Q, Parnin C, Orso A (2015) Evaluating the usefulness of ir-based fault localization techniques. In: Proceedings of the 24th international symposium on software testing and analysis. ACM, Baltimore, pp 1-11
Wong SKM, Ziarko W, Wong PCN (1985) Generalized vector spaces model in information retrieval. In: Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’85. ACM, New York, pp 18–25
Ye X, Bunescu R, Liu C (2015) Mapping bug reports to relevant files: a ranking model, a fine-grained benchmark, and feature evaluation. IEEE Trans Softw Eng 42(4):379–402
Youm K C, Ahn J, Lee E (2017) Improved bug localization based on code change histories and bug reports. Inf Softw Technol 82:177–192
Zhao W, Zhang L, Liu Y, Sun J, Yang F (2006) SNIAFL: Towards a static non-interactive approach to feature location. ACM Trans Softw Eng Methodol 15(2):195–226
Zhou J, Zhang H, Lo D (2012) Where should the bugs be fixed?-more accurate information retrieval-based bug localization based on bug reports. In: Proceedings of the 34th IEEE International Conference on Software Engineering (ICSE’12). IEEE, Zurich, pp 14–24
Acknowledgements
Sonia Haiduc and Esteban Parra were supported in part by the National Science Foundation grants CCF-1846142 and CCF-1644285. Gabriele Bavota and Jevgenija Pantiuchina acknowledge the support by the Swiss National Science Foundation through the JITRA project, No. 172479.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: David Lo and Foutse Khomh
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: Software Maintenance and Evolution (ICSME)
Rights and permissions
About this article
Cite this article
Mills, C., Parra, E., Pantiuchina, J. et al. On the relationship between bug reports and queries for text retrieval-based bug localization. Empir Software Eng 25, 3086–3127 (2020). https://doi.org/10.1007/s10664-020-09823-w
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-020-09823-w