Skip to main content
Log in

An empirical study of the effectiveness of IR-based bug localization for large-scale industrial projects

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Bug localization, which refers to finding buggy files for a given bug report, is tedious and time-consuming for practical projects with tens of millions of lines of code. Recently, many information retrieval (IR)-based bug localization (IRBL) approaches have been proposed to formulate this problem as a search problem. Despite the excellent performance claimed in the literature, there is hardly any approach adopted in the industrial community to the best of our knowledge. The challenge of adapting IRBL to industrial projects is that the projects have different characteristics compared to open-source projects used in the literatures, which have not been taken into consideration in previous studies. In this paper, we re-implement six state-of-the-art IRBL techniques and evaluate their effectiveness on 10 Huawei projects consisting of 161,967 source code files and 24,437 bug reports in total. Localizing bugs in these projects faces several challenges, including the software product line, the bilingual issue, and the quality of bug reports, etc. We conduct comprehensive experiments to reveal how these factors affect IRBL effectiveness, and modify the data set to test whether some factors could be overcome, if additional information or hints are given. Based on the insights found in our work, we suggest potential improvements on IRBL techniques. This study is also expected to provide empirical evidences for other software tasks which face the same fundamental challenges.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. https://nlp.stanford.edu/software/segmenter.shtml

References

  • Akbar S A, Kak A C (2020) A large-scale comparative evaluation of IR-based tools for bug localization. In: Proceedings of the 17th international conference on mining software repositories, pp 21–31

  • Antoniol G, Ayari K, Di Penta M, Khomh F, Guéhéneuc Y -G (2008) Is it a bug or an enhancement?: a text-based approach to classify change requests. In: CASCON 8, pp 304–318

  • Bettenburg N, Just S, Schröter A, Weiss C, Premraj R, Zimmermann T (2007) Quality of bug reports in Eclipse. In: Proceedings of the 2007 OOPSLA workshop on eclipse technology eXchange. https://doi.org/10.1145/1328279.1328284, pp 21–25

  • Bird C, Bachmann A, Aune E, Duffy J, Bernstein A, Filkov V, Devanbu P (2009) Fair and balanced?: bias in bug-fix datasets. In: Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, pp 121–130

  • Chang P C, Galley M, Manning C D (2008) Optimizing Chinese word segmentation for machine translation performance. In: Proceedings of the third workshop on statistical machine translation, pp 224–232

  • Chaparro O, Florez J M, Marcus A (2019) Using bug descriptions to reformulate queries during text-retrieval-based bug localization. Empir Softw Eng 24 (5):2947–3007

    Article  Google Scholar 

  • Devlin J, Chang M -W, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805

  • Ekstrand M D, Riedl J T, Konstan J A (2011) Collaborative filtering recommender systems. Foundations and Trends®;, in Human–Computer Interaction 4(2):81–173

    Article  Google Scholar 

  • Furnas G W, Landauer T K, Gomez L M, Dumais S T (1987) The vocabulary problem in human-system communication. Commun ACM 30 (11):964–971. https://doi.org/10.1145/32206.32212

    Article  Google Scholar 

  • Gay G, Haiduc S, Marcus A, Menzies T (2009) On the use of relevance feedback in IR-based concept location. In: 2009 IEEE international conference on software maintenance, pp 351–360

  • Gu X, Zhang H, Kim S (2018) Deep code search. In: 2018 IEEE/ACM 40th international conference on software engineering (ICSE), pp 933–944

  • Herzig K, Just S, Zeller A (2013) It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In: Proceedings of the 2013 international conference on software engineering, pp 392–401

  • Joachims T, Swaminathan A, Schnabel T (2017) Unbiased learning-to-rank with biased feedback. In: Proceedings of the tenth ACM international conference on web search and data mining, pp 781–789

  • Kim S, Zimmermann T, Whitehead E J Jr, Zeller A (2007) Predicting faults from cached history. In: Proceedings of the 29th international conference on Software Engineering, pp 489–498

  • Kim S, Zhang H, Wu R, Gong L (2011a) Dealing with noise in defect prediction. In: 2011 33rd international conference on software engineering (ICSE), pp 481–490

  • Kim D, Tao Y, Kim S, Zeller A (2011b) Where should we fix this bug? A two-phase recommendation model. IEEE Trans Softw Eng 39 (11):1597–1610. https://doi.org/10.1109/TSE.2013.24

    Google Scholar 

  • Kochhar P S, Xia X, Lo D, Li S (2016) Practitioners’ expectations on automated fault localization. In: International symposium on software testing & analysis, pp 165–176

  • Lam A N, Nguyen A T, Nguyen H A, Nguyen T N (2015) Combining deep learning with information retrieval to localize buggy files for bug reports (n). In: 2015 30th IEEE/ACM international conference on automated software engineering (ASE), pp 476–481

  • Lam A N, Nguyen A T, Nguyen H A, Nguyen T N (2017) Bug localization with combination of deep learning and information retrieval. In: 2017 IEEE/ACM 25th international conference on program comprehension (ICPC), pp 218–229

  • Le T -D B, Thung F, Lo D (2014) Predicting effectiveness of IR-based bug localization techniques. In: 2014 IEEE 25th international symposium on software reliability engineering, pp 335–345

  • Le T -D B, Thung F, Lo D (2017) Will this localization tool be effective for this bug? Mitigating the impact of unreliability of information retrieval based bug localization tools. Empir Softw Eng 22(4):1–43

  • Lee J, Kim D, Bissyandé T F, Jung W, Le Traon Y (2018) Bench4BL: reproducibility study on the performance of IR-based bug localization. In: Proceedings of the 27th ACM SIGSOFT international symposium on software testing and analysis, pp 61–72

  • Lukins S K, Kraft N A, Etzkorn L H (2010) Bug localization using latent dirichlet allocation. Inf Softw Technol 52(9):972–990

    Article  Google Scholar 

  • Mann H B, Whitney D R (1947) On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 50–60

  • Manning C, Raghavan P, Schütze H (2010) Introduction to information retrieval. Nat Lang Eng 16(1):100–103

    Article  Google Scholar 

  • Mills C, Haiduc S (2017) The impact of retrieval direction on ir-based traceability link recovery. In: 2017 IEEE/ACM 39th international conference on software engineering: new ideas and emerging technologies results track (ICSE-NIER), pp 51–54

  • Mills C, Bavota G, Haiduc S, Oliveto R, Marcus A, De Lucia A (2017) Predicting query quality for applications of text retrieval to software engineering tasks. In: ACM Trans Softw Eng Methodol (TOSEM), vol 26, pp 1–45

  • Mills C, Parra E, Pantiuchina J, Bavota G, Haiduc S (2020) On the relationship between bug reports and queries for text retrieval-based bug localization. Empir Softw Eng 25(5):3086–3127

    Article  Google Scholar 

  • Murali V, Gross L, Qian R, Chandra S (2020) Industry-scale IR-based bug localization: a perspective from Facebook. arXiv:2010.09977

  • Nguyen A T, Nguyen T T, Al-Kofahi J, Nguyen H V, Nguyen T N (2011) A topic-based approach for narrowing the search space of buggy files from a bug report. In: Proceedings of the 2011 26th IEEE/ACM international conference on automated software engineering, pp 263–272

  • Oard D W, Diekema A R (1998) Cross-language information retrieval. Annu Rev Inf Sci Technol (ARIST) 33:223–56

    Google Scholar 

  • Paul C, Linda N (2002) Software product lines: patterns and practice. Addison-Wesley, Boston

    Google Scholar 

  • Poshyvanyk D, Gueheneuc Y -G, Marcus A, Antoniol G, Rajlich V (2007) Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans Softw Eng 33(6):420–432

    Article  Google Scholar 

  • Pingclasai N, Hata H, Matsumoto K -I (2013) Classifying bug reports to bugs and other requests using topic modeling. In: 2013 20th Asia-Pacific software engineering conference (APSEC) 2, pp 13–18

  • Qin H, Sun X (2018) Classifying bug reports into bugs and non-bugs using LSTM. In: Proceedings of the tenth asia-pacific symposium on Internetware, p 20

  • Rahman M M, Roy C (2018a) Poster: improving bug localization with report quality dynamics and query reformulation. In: 2018 IEEE/ACM 40th international conference on software engineering: companion (ICSE-companion), pp 348–349

  • Rahman M M, Roy C K (2018b) Improving ir-based bug localization with context-aware query reformulation. In: Proceedings of the 2018 26th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, pp 621–632

  • Rao S, Kak A (2011) Retrieval from software libraries for bug localization: a comparative study of generic and composite text model. In: Proceedings of the 8th working conference on mining software repositories, pp 43–52

  • Saha R K, Lease M, Khurshid S, Perry D E (2013) Improving bug localization using structured information retrieval. In: 2013 28th IEEE/ACM international conference on automated software engineering (ASE), pp 345–355

  • Saha R K, Lawall J, Khurshid S, Perry D E (2014) On the effectiveness of information retrieval based bug localization for c programs. In: 2014 IEEE international conference on software maintenance and evolution, pp 161–170

  • Schroter A, Schröter A, Bettenburg N, Premraj R (2010) Do stack traces help developers fix bugs?. In: 2010 7th IEEE working conference on mining software repositories (MSR 2010), pp 118–121

  • Sisman B, Kak A C (2012) Incorporating version histories in information retrieval based bug localization. In: 2012 9th IEEE working conference on mining software repositories (MSR), pp 50–59

  • Wan Y, Zhao Z, Yang M, Xu G, Ying H, Wu J, Yu P S (2018) Improving automatic source code summarization via deep reinforcement learning. In: Proceedings of the 33rd ACM/IEEE international conference on automated software engineering, pp 397–407

  • Wang S, Lo D (2014) Version history, similar report, and structure: putting them together for improved bug localization. In: Proceedings of the 22nd international conference on program comprehension, pp 53–63

  • Wang Q, Parnin C, Orso A (2015) Evaluating the usefulness of IR-based fault localization techniques. In: Proceedings of the 2015 international symposium on software testing and analysis, pp 1–11

  • Wen M, Wu R, Cheung S -C (2016) Locus: locating bugs from software changes. In: 2016 31St IEEE/ACM international conference on automated software engineering (ASE), pp 262–273

  • Wong C -P, Xiong Y, Zhang H, Hao D, Zhang L, Mei H (2014) Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis. In: 2014 IEEE international conference on software maintenance and evolution, pp 181–190

  • Wong W E, Gao R, Li Y, Abreu R, Wotawa F (2016) A survey on software fault localization. IEEE Trans Softw Eng 42(8):707–740

    Article  Google Scholar 

  • Xia X, Lo D, Wang X, Zhang C, Wang X (2014) Cross-language bug localization. In: Proceedings of the 22nd international conference on program comprehension, pp 275–278

  • Xuan H, Li M (2017) Enhancing the unified features to locate buggy files by exploiting the sequential nature of source code. In: IJCAI, pp 1909–1915

  • Xuan H, Li M, Zhou Z -H (2016) Learning unified features from natural and programming languages for locating buggy source code. In: IJCAI, pp 1606–1612

  • Ye X, Bunescu R, Liu C (2014) Learning to rank relevant files for bug reports using domain knowledge. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering, pp 689–699

  • Ye X, Shen H, Ma X, Bunescu R, Liu C (2016) From word embeddings to document similarities for improved information retrieval in software engineering. In: Proceedings of the 38th international conference on software engineering, pp 404–415

  • Youm K C, Ahn J, Kim J, Lee E (2015) Bug localization based on code change histories and bug reports. In: 2015 Asia-pacific software engineering conference (APSEC), pp 190–197

  • Zhang H (2009) An investigation of the relationships between lines of code and defects. In: 2009 IEEE international conference on software maintenance, pp 274–283

  • Zhao L, Callan J (2012) Automatic term mismatch diagnosis for selective query expansion. In: Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieva, pp 515–524

  • Zimmermann T, Premraj R, Bettenburg N, Just S, Schroter A, Weiss C (2010) What makes a good bug report? IEEE Trans Softw Eng 36 (5):618–643. https://doi.org/10.1109/TSE.2010.63

    Article  Google Scholar 

  • Zhou J, Zhang H, Lo D (2012) Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports. In: 2012 34th international conference on software engineering (ICSE), pp 14–24

Download references

Acknowledgements

This work is partially supported by Natural Science Foundation of China (No. 61872272, 61640221), and the Strategic Priority Research Program of Chinese Academy of Science (No.XDC05040100).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Qingan Li or Mengting Yuan.

Additional information

Communicated by: Aldeida Aleti, Annibale Panichella, and Shin Yoo

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Advances in Search-Based Software Engineering (SSBSE)

The authors have no relevant financial or non-financial interests to disclose.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, W., Li, Q., Ming, Y. et al. An empirical study of the effectiveness of IR-based bug localization for large-scale industrial projects. Empir Software Eng 27, 47 (2022). https://doi.org/10.1007/s10664-021-10082-6

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-021-10082-6

Keywords

Navigation