Skip to main content
Log in

The forgotten role of search queries in IR-based bug localization: an empirical study

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Being light-weight and cost-effective, IR-based approaches for bug localization have shown promise in finding software bugs. However, the accuracy of these approaches heavily depends on their used bug reports. A significant number of bug reports contain only plain natural language texts. According to existing studies, IR-based approaches cannot perform well when they use these bug reports as search queries. On the other hand, there is a piece of recent evidence that suggests that even these natural language-only reports contain enough good keywords that could help localize the bugs successfully. On one hand, these findings suggest that natural language-only bug reports might be a sufficient source for good query keywords. On the other hand, they cast serious doubt on the query selection practices in the IR-based bug localization. In this article, we attempted to clear the sky on this aspect by conducting an in-depth empirical study that critically examines the state-of-the-art query selection practices in IR-based bug localization. In particular, we use a dataset of 2,320 bug reports, employ ten existing approaches from the literature, exploit the Genetic Algorithm-based approach to construct optimal, near-optimal search queries from these bug reports, and then answer three research questions. We confirmed that the state-of-the-art query construction approaches are indeed not sufficient for constructing appropriate queries (for bug localization) from certain natural language-only bug reports. However, these bug reports indeed contain high-quality search keywords in their texts even though they might not contain explicit hints for localizing bugs (e.g., stack traces). We also demonstrate that optimal queries and non-optimal queries chosen from bug report texts are significantly different in terms of several keyword characteristics (e.g., frequency, entropy, position, part of speech). Such an analysis has led us to four actionable insights on how to choose appropriate keywords from a bug report. Furthermore, we demonstrate 27%–34% improvement in the performance of non-optimal queries through the application of our actionable insights to them. Finally, we summarize our study findings with future research directions (e.g., machine intelligence in keyword selection).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. https://goo.gl/GwXv6H

  2. https://bit.ly/2KU9IR2

  3. https://bit.ly/2RnIAPK

References

  • EMSE 2021 replication package (2021) https://github.com/masud-technope/EMSE-2019-Replication-Packagehttps://github.com/masud-technope/EMSE-2019-Replication-Package

  • Report: Software failure caused 1.7 trillion in financial losses in 2017 (2017) https://tek.io/2FBNl2i

  • Apache Lucene core (2019) https://lucene.apache.org/core

  • Blizzard-experimental data (2019) https://goo.gl/toCZrs

  • Cost of software debugging (2019). https://goo.gl/okoj21

  • Arif A, Rahman MM, Mukta SY (2009) Information retrieval by modified term weighting method using random walk model with query term position ranking. In: Proceedings of TEFSE. pp 526–530

  • Bachmann A, Bernstein A (2009) Software process data quality and characteristics: a historical view on open and closed source projects. In: Proceedings of TEFSE. pp 119–128

  • Bajracharya S, Ngo T, Linstead E, Dou Y, Rigor P, Baldi P, Lopes C (2006) Sourcerer: A search engine for open source code supporting structure-based search. In: Proceedings of OOPSLA-C. pp 681–682

  • Bavota G, De Lucia A, Oliveto R, Panichella A, Ricci F, Tortora G (2013) The role of artefact corpus in lsi-based traceability recovery. In: Proceedings of TEFSE. pp 83–89

  • Blanco R, Lioma C (2012) Graph-based term weighting for information retrieval. Inf Retr 15(1):54–92

    Article  Google Scholar 

  • Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1-7):107–117

    Article  Google Scholar 

  • Carmel D, Yom-Tov E, Darlow A, Pelleg D (2006) What makes a query difficult?, pp 390–397

  • Carpineto C, Romano G (2012) A survey of automatic query expansion in information retrieval . ACM Comput Surv 44:1:1–1:50

    Article  Google Scholar 

  • Chaparro O, Florez JM, Marcus A (2017) Using observed behavior to reformulate queries during text retrieval-based bug localization. In: Proceedings of ICSME. pp 376–387

  • Chaparro O, Florez JM, Singh U, Marcus A (2019) Reformulating queries for duplicate bug report detection. In: Proceedings of SANER, page 12

  • Chaparro O, Marcus A (2016) On the reduction of verbose queries in text retrieval based software maintenance. In: Proceedings of ICSE-C. pp 716–718

  • Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (20z) Smote: Synthetic minority over-sampling technique. J Artif Int Res 16(1):321–357

  • Furnas GW, Landauer TK, Gomez LM, Dumais ST (1987) The vocabulary problem in human-system communication. Commun ACM 30(11):964–971

    Article  Google Scholar 

  • Gay G, Haiduc S, Marcus A, Menzies T (2009) On the use of relevance feedback in IR-based concept location. In: Proceedings of ICSM. pp 351–360

  • Glaser BG, Strauss AL (1967) The discovery of grounded theory : strategies for qualitative research. Aldine Publishing, Chicago

    Google Scholar 

  • Le Goues C, Nguyen T, Forrest S, Weimer W (2012) GenProg a generic method for automatic software repair. TSE 38(1):54–72

    Google Scholar 

  • Haiduc S, Aponte J, Marcus A (2010) Supporting program comprehension with source code summarization. In: Proceedings of ICSE, vol 2. pp 223–226

  • Haiduc S, Bavota G, Marcus A, Oliveto R, De Lucia A, Menzies T (2013) Automatic query reformulations for text retrieval in software engineering. In: Proceedings of ICSE. pp 842–851

  • Haiduc S, Bavota G, Oliveto R, De Lucia A, Marcus A (2012a) Automatic query performance assessment during the retrieval of software artifacts. In: Proceedings of ASE. pp 90–99

  • Haiduc S, Bavota G, Oliveto R, Marcus A, De Lucia A (2012b) Evaluating the specificity of text retrieval queries to support software engineering tasks. In: Proceedings ICSE, pp 1273–1276

  • Hassan S, Mihalcea R, Banea C (2007) Random-walk term weighting for improved text classification. In: Proceedings of ICSC. pp 242–249

  • Herzig K, Zeller A (2013) The impact of tangled code changes. In: Proceedings of MSR. pp 121–130

  • Hill E, Pollock L, Vijay-Shanker K (2009) Automatically capturing source code context of NL-queries for software maintenance and reuse. In: Proceedings of ICSE. pp 232–242

  • Hill E, Rao S, Kak A (2012) On the use of stemming for concern location and bug localization in java. In: Proceedings of SCAM. pp 184–193

  • Howard MJ, Gupta S, Pollock L, Vijay-Shanker K (2013) Automatically mining software-based semantically-similar words from comment-code mappings. In: Proceedings of MSR. pp 377–386

  • Jones KS (1972) A statistical interpretation of term specificity and its application in retrieval. J Doc 28(1):11–21

    Article  Google Scholar 

  • Kevic K, Fritz T (2014a) A dictionary to translate change tasks to source code. In: Proceedings of MSR. pp 320–323

  • Kevic K, Fritz T (2014b) Automatic search term identification for change tasks. In: Proceedings of ICSE. pp 468–471

  • Kim M, Lee E (2018) Are information retrieval-based bug localization techniques trustworthy?. In: Proceedings of ICSE, pp 248–249

  • Kochhar PS, Le TB, Lo D (2014a) It’s not a bug, it’s a feature: Does misclassification affect bug localization?. In: Proceedings of MSR, pp 296–299

  • Kochhar PS, Tian Y, Lo D (2014b) Potential biases in bug localization: Do they matter?. In: Proceedings of ASE, pp 803–814

  • Le Tien-Duy B, Oentaryo RJ, Lo D (2015) Information retrieval and spectrum based bug localization better together. In: Proceedings of ESEC/FSE. pp 579–590

  • Lee J, Kim D, Bissyandé TF, Jung W, Le Traon Y (2018) Bench4bl: Reproducibility study on the performance of ir-based bug localization. In: Proceedings of ISSTA. pp 61–72

  • Lin J, Murray GC (2005) Assessing the term independence assumption in blind relevance feedback. In: Proceedings of SIGIR. pp 635–636

  • Linares-Vȧsquez M, Bavota G, Di Penta M, Oliveto R, Poshyvanyk D (2014) How do API changes trigger stack overflow discussions? a study on the android SDK. In: Proceedings of ICPC. pp 83–94

  • Liu D, Marcus A, Poshyvanyk D, Rajlich V (2007) Feature location via information retrieval based filtering of a single scenario execution trace. In: Proceedings of ASE. pp 234–243

  • Marcus A, Sergeyev A, Rajlich V, Maletic JI (2004) An information retrieval approach to concept location in source code. In: Proceedings of WCRE. pp 214–223

  • Mihalcea R, Tarau P (2004) Text rank bringing order into texts. In: Proceedings of EMNLP. pp 404–411

  • Miller GA (1995) Wordnet: A lexical database for english. Commun ACM 38(11):39–41

    Article  Google Scholar 

  • Mills C, Bavota G, Haiduc S, Oliveto R, Marcus A, Lucia AD (2017) Predicting query quality for applications of text retrieval to software engineering tasks. TOSEM 26(1):3:1–3:45

    Article  Google Scholar 

  • Mills C, Pantiuchina J, Parra E, Bavota G, Haiduc S (2018) Are bug reports enough for text retrieval-based bug localization?. In: Proceedings of ICSME, pp 381–392

  • Moreno L, Bavota G, Haiduc S, Di Penta M, Oliveto R, Russo B, Marcus A (2015) Query-based configuration of text retrieval solutions for software engineering tasks. In: Proceedings of ESEC/FSE. pp 567–578

  • Moreno L, Treadway JJ, Marcus A, Shen W (2014) On the use of stack traces to improve text retrieval-based bug localization. In: Proceedings of ICSME. pp 151–160

  • Nenkova A, Passonneau RJ (2004) Evaluating content selection in summarization: The pyramid method. In: Proceedings of HLT-NAACL. pp 145–152

  • Panichella A, Dit B, Oliveto R, Penta MD, Poshyvanyk D, Lucia AD (2016) Parameterizing and assembling IR-Based Solutions For SE tasks using genetic algorithms. In: Proceedings of SANER. pp 314–325

  • Parnin C, Orso A (2011) Are automated debugging techniques actually helping programmers?. In: Proceedings of ISSTA, pp 199–209

  • Perez F, Font J, Arcega L, Cetina C (2018) Automatic query reformulations for feature location in a model-based family of software products. Data Knowl Eng 116:159–176

    Article  Google Scholar 

  • Poshyvanyk D, Gueheneuc YG, Marcus A, Antoniol G, Rajlich V (2007) Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. TSE 33(6):420–432

    Google Scholar 

  • Rahman MM, Roy CK (2015) An insight into the unresolved questions at stack overflow. In: Proceedings of MSR. pp 426–429

  • Rahman MM, Roy CK (2016) QUICKAR automatic query reformulation for concept location using crowdsourced knowledge. In: Proceedings of ASE. pp 220–225

  • Rahman MM, Roy CK (2017a) Improved query reformulation for concept location using coderank and document structures. In: Proceedings of ASE. pp 428–439

  • Rahman MM, Roy CK (2017b) STRICT: Information retrieval based search term identification for concept location. In: Proceedings of SANER. pp 79–90

  • Rahman MM, Roy CK (2018a) Improving bug localization with report quality dynamics and query reformulation. In: Proceedings of ICSE-C. pp 348–349

  • Rahman MM, Roy CK (2018b) Improving ir-based bug localization with context-aware query reformulation. In: Proceedings of ESEC/FSE. pp 621–632

  • Rahman MM, Roy CK, Kula RG (2017) Predicting usefulness of code review comments using textual features and developer experience. In: Proceedings of MSR. pp 215–226

  • Rahman MM, Roy CK, Lo D (2016) RACK Automati API recommendation using crowdsourced knowledge. In: Proceedings of SANER. pp 349–359

  • Rao S, Kak A (2011) Retrieval from software libraries for bug localization a comparative study of generic and composite text models. In: Proceedings of MSR. pp 43–52

  • Robertson SE (1991) On term selection for query expansion. J Doc 46(4):359–364

    Article  Google Scholar 

  • Rocchio JJ (1971) The SMART retrieval system—experiments in automatic document processing. Prentice-Hall, Inc., Hoboken

    Google Scholar 

  • Saha RK, Lease M, Khurshid S, Perry DE (2013) Improving bug localization using structured information retrieval. In: Proceedings of ASE. pp 345–355

  • Shannon CE (1948) A mathematical theory of communication. Bell Syst Techn J 27(3):379–423

    Article  MathSciNet  Google Scholar 

  • Shepherd D, Fry ZP, Hill E, Pollock L, Vijay-Shanker K (2007) Using natural language program analysis to locate and understand action-oriented concerns. In: Proceedings of ASOD. pp 212–224

  • Shi Z, Keung J, Song Q (2014) An empirical study of BM25 and BM25F based feature location techniques. In: Proceedings of InnoSWDev. pp 106–114

  • Sisman B, Kak AC (2012) Incorporating version histories in information retrieval based bug localization. In: Proceedings of MSR. pp 50–59

  • Sisman B, Kak AC (2013) Assisting code search with automatic query reformulation for bug localization. In: Proceedings of MSR. pp 309–318

  • Toutanova K, Klein D, Manning CD, Singer Y (2003) Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of HLT-NAACL. pp 252–259

  • Wang Q, Parnin C, Orso A (2015) Evaluating the usefulness of ir-based fault localization techniques. In: Proceedings of ISSTA. pp 1–11

  • Wang S, Lo D (2014) Version history, similar report, and structure: putting them together for improved bug localization. In: Proceedings of ICPC. pp 53–63

  • Wang S, Lo D (2016) Amalgam+: Composing rich information sources for accurate bug localization. JSEP 28(10):921–942

    Google Scholar 

  • Wen M, Wu R, Cheung S (2016) Locus: Locating bugs from software changes. In: Proceedings of ASE. pp 262–273

  • Wong CP, Xiong Y, Zhang H, Hao D, Zhang L, Mei H (2014) Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis. In: Proceedings of ICSME. pp 181–190

  • Wong E, Yang J, Tan L (2013) AutoComment mining question and answer sites for automatic comment generation. In: Proceedings of ASE. pp 562–567

  • Wong WE, Gao R, Li Y, Abreu R, Wotawa F (2016) A survey on software fault localization. TSE 42(8):707–740

    Google Scholar 

  • Wu R, Zhang H, Kim S, Cheung S (2011) Relink: Recovering links between bugs and changes. In: Proceedings of ESEC/FSE. pp 15–25

  • Yang J, Tan L (2012) Inferring semantically related words from software context. In: Proceedings of MSR. pp 161–170

  • Youm KC, Ahn J, Kim J, Lee E (2015) Bug localization based on code change histories and bug reports. In: Proceedings of APSEC. pp 190–197

  • Yuan T, Lo D, Lawall J (2014) Automated construction of a software-specific word similarity database. In: Proceedings of CSMR-WCRE. pp 44–53

  • Zamani S, Peck Lee S, Shokripour R, Anvik J (2014) A noun-based approach to feature location using time-aware term-weighting. IST 56(8):991–1011

    Google Scholar 

  • Zhang T, He J, Luo X, Chan ATS (2016) A literature review of research in bug resolution Tasks, challenges and future directions. Comput J 59 (5):741–773

    Article  MathSciNet  Google Scholar 

  • Zhou J, Zhang H, Lo D (2012) Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports. In: Proceedings of ICSE. pp 14–24

  • Zou W, Lo D, Chen Z, Xia X, Feng Y, Xu B (2018), How practitioners perceive automated bug report management techniques. TSE, page to appear

Download references

Acknowledgements

This research was supported by Tenure-track startup grant, Dalhousie University, Saskatchewan Innovation & Opportunity Scholarship (2017–2018), and the Natural Sciences and Engineering Research Council of Canada (NSERC).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Masudur Rahman.

Additional information

Communicated by: Andrian Marcus

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Query attributes for the comparative analysis

Table 23 Query attributes used for the comparative analysis

Appendix B: Comparison of Query Feature Distribution

Fig. 9
figure 9

Comparison between optimal and non-optimal keywords (from bug reports leading to poor baseline queries) using their distributions (e.g., PMF=probability mass function) of (a) frequency, (b) entropy, (c) percentage from the body/description section of a bug report, and (d) percentage of nouns

Appendix C: Comparison of Query Features using Box plots

Fig. 10
figure 10

Comparison between optimal and non-optimal keywords (from bug reports leading to poor baseline queries) using their box plot of (a) frequency, (b) entropy, (c) percentage from the body/description section of a bug report, and (d) percentage of nouns

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rahman, M.M., Khomh, F., Yeasmin, S. et al. The forgotten role of search queries in IR-based bug localization: an empirical study. Empir Software Eng 26, 116 (2021). https://doi.org/10.1007/s10664-021-10022-4

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-021-10022-4

Keywords

Navigation