Abstract
Bug localization, which refers to finding buggy files for a given bug report, is tedious and time-consuming for practical projects with tens of millions of lines of code. Recently, many information retrieval (IR)-based bug localization (IRBL) approaches have been proposed to formulate this problem as a search problem. Despite the excellent performance claimed in the literature, there is hardly any approach adopted in the industrial community to the best of our knowledge. The challenge of adapting IRBL to industrial projects is that the projects have different characteristics compared to open-source projects used in the literatures, which have not been taken into consideration in previous studies. In this paper, we re-implement six state-of-the-art IRBL techniques and evaluate their effectiveness on 10 Huawei projects consisting of 161,967 source code files and 24,437 bug reports in total. Localizing bugs in these projects faces several challenges, including the software product line, the bilingual issue, and the quality of bug reports, etc. We conduct comprehensive experiments to reveal how these factors affect IRBL effectiveness, and modify the data set to test whether some factors could be overcome, if additional information or hints are given. Based on the insights found in our work, we suggest potential improvements on IRBL techniques. This study is also expected to provide empirical evidences for other software tasks which face the same fundamental challenges.
Similar content being viewed by others
References
Akbar S A, Kak A C (2020) A large-scale comparative evaluation of IR-based tools for bug localization. In: Proceedings of the 17th international conference on mining software repositories, pp 21–31
Antoniol G, Ayari K, Di Penta M, Khomh F, Guéhéneuc Y -G (2008) Is it a bug or an enhancement?: a text-based approach to classify change requests. In: CASCON 8, pp 304–318
Bettenburg N, Just S, Schröter A, Weiss C, Premraj R, Zimmermann T (2007) Quality of bug reports in Eclipse. In: Proceedings of the 2007 OOPSLA workshop on eclipse technology eXchange. https://doi.org/10.1145/1328279.1328284, pp 21–25
Bird C, Bachmann A, Aune E, Duffy J, Bernstein A, Filkov V, Devanbu P (2009) Fair and balanced?: bias in bug-fix datasets. In: Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, pp 121–130
Chang P C, Galley M, Manning C D (2008) Optimizing Chinese word segmentation for machine translation performance. In: Proceedings of the third workshop on statistical machine translation, pp 224–232
Chaparro O, Florez J M, Marcus A (2019) Using bug descriptions to reformulate queries during text-retrieval-based bug localization. Empir Softw Eng 24 (5):2947–3007
Devlin J, Chang M -W, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Ekstrand M D, Riedl J T, Konstan J A (2011) Collaborative filtering recommender systems. Foundations and Trends®;, in Human–Computer Interaction 4(2):81–173
Furnas G W, Landauer T K, Gomez L M, Dumais S T (1987) The vocabulary problem in human-system communication. Commun ACM 30 (11):964–971. https://doi.org/10.1145/32206.32212
Gay G, Haiduc S, Marcus A, Menzies T (2009) On the use of relevance feedback in IR-based concept location. In: 2009 IEEE international conference on software maintenance, pp 351–360
Gu X, Zhang H, Kim S (2018) Deep code search. In: 2018 IEEE/ACM 40th international conference on software engineering (ICSE), pp 933–944
Herzig K, Just S, Zeller A (2013) It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In: Proceedings of the 2013 international conference on software engineering, pp 392–401
Joachims T, Swaminathan A, Schnabel T (2017) Unbiased learning-to-rank with biased feedback. In: Proceedings of the tenth ACM international conference on web search and data mining, pp 781–789
Kim S, Zimmermann T, Whitehead E J Jr, Zeller A (2007) Predicting faults from cached history. In: Proceedings of the 29th international conference on Software Engineering, pp 489–498
Kim S, Zhang H, Wu R, Gong L (2011a) Dealing with noise in defect prediction. In: 2011 33rd international conference on software engineering (ICSE), pp 481–490
Kim D, Tao Y, Kim S, Zeller A (2011b) Where should we fix this bug? A two-phase recommendation model. IEEE Trans Softw Eng 39 (11):1597–1610. https://doi.org/10.1109/TSE.2013.24
Kochhar P S, Xia X, Lo D, Li S (2016) Practitioners’ expectations on automated fault localization. In: International symposium on software testing & analysis, pp 165–176
Lam A N, Nguyen A T, Nguyen H A, Nguyen T N (2015) Combining deep learning with information retrieval to localize buggy files for bug reports (n). In: 2015 30th IEEE/ACM international conference on automated software engineering (ASE), pp 476–481
Lam A N, Nguyen A T, Nguyen H A, Nguyen T N (2017) Bug localization with combination of deep learning and information retrieval. In: 2017 IEEE/ACM 25th international conference on program comprehension (ICPC), pp 218–229
Le T -D B, Thung F, Lo D (2014) Predicting effectiveness of IR-based bug localization techniques. In: 2014 IEEE 25th international symposium on software reliability engineering, pp 335–345
Le T -D B, Thung F, Lo D (2017) Will this localization tool be effective for this bug? Mitigating the impact of unreliability of information retrieval based bug localization tools. Empir Softw Eng 22(4):1–43
Lee J, Kim D, Bissyandé T F, Jung W, Le Traon Y (2018) Bench4BL: reproducibility study on the performance of IR-based bug localization. In: Proceedings of the 27th ACM SIGSOFT international symposium on software testing and analysis, pp 61–72
Lukins S K, Kraft N A, Etzkorn L H (2010) Bug localization using latent dirichlet allocation. Inf Softw Technol 52(9):972–990
Mann H B, Whitney D R (1947) On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 50–60
Manning C, Raghavan P, Schütze H (2010) Introduction to information retrieval. Nat Lang Eng 16(1):100–103
Mills C, Haiduc S (2017) The impact of retrieval direction on ir-based traceability link recovery. In: 2017 IEEE/ACM 39th international conference on software engineering: new ideas and emerging technologies results track (ICSE-NIER), pp 51–54
Mills C, Bavota G, Haiduc S, Oliveto R, Marcus A, De Lucia A (2017) Predicting query quality for applications of text retrieval to software engineering tasks. In: ACM Trans Softw Eng Methodol (TOSEM), vol 26, pp 1–45
Mills C, Parra E, Pantiuchina J, Bavota G, Haiduc S (2020) On the relationship between bug reports and queries for text retrieval-based bug localization. Empir Softw Eng 25(5):3086–3127
Murali V, Gross L, Qian R, Chandra S (2020) Industry-scale IR-based bug localization: a perspective from Facebook. arXiv:2010.09977
Nguyen A T, Nguyen T T, Al-Kofahi J, Nguyen H V, Nguyen T N (2011) A topic-based approach for narrowing the search space of buggy files from a bug report. In: Proceedings of the 2011 26th IEEE/ACM international conference on automated software engineering, pp 263–272
Oard D W, Diekema A R (1998) Cross-language information retrieval. Annu Rev Inf Sci Technol (ARIST) 33:223–56
Paul C, Linda N (2002) Software product lines: patterns and practice. Addison-Wesley, Boston
Poshyvanyk D, Gueheneuc Y -G, Marcus A, Antoniol G, Rajlich V (2007) Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans Softw Eng 33(6):420–432
Pingclasai N, Hata H, Matsumoto K -I (2013) Classifying bug reports to bugs and other requests using topic modeling. In: 2013 20th Asia-Pacific software engineering conference (APSEC) 2, pp 13–18
Qin H, Sun X (2018) Classifying bug reports into bugs and non-bugs using LSTM. In: Proceedings of the tenth asia-pacific symposium on Internetware, p 20
Rahman M M, Roy C (2018a) Poster: improving bug localization with report quality dynamics and query reformulation. In: 2018 IEEE/ACM 40th international conference on software engineering: companion (ICSE-companion), pp 348–349
Rahman M M, Roy C K (2018b) Improving ir-based bug localization with context-aware query reformulation. In: Proceedings of the 2018 26th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, pp 621–632
Rao S, Kak A (2011) Retrieval from software libraries for bug localization: a comparative study of generic and composite text model. In: Proceedings of the 8th working conference on mining software repositories, pp 43–52
Saha R K, Lease M, Khurshid S, Perry D E (2013) Improving bug localization using structured information retrieval. In: 2013 28th IEEE/ACM international conference on automated software engineering (ASE), pp 345–355
Saha R K, Lawall J, Khurshid S, Perry D E (2014) On the effectiveness of information retrieval based bug localization for c programs. In: 2014 IEEE international conference on software maintenance and evolution, pp 161–170
Schroter A, Schröter A, Bettenburg N, Premraj R (2010) Do stack traces help developers fix bugs?. In: 2010 7th IEEE working conference on mining software repositories (MSR 2010), pp 118–121
Sisman B, Kak A C (2012) Incorporating version histories in information retrieval based bug localization. In: 2012 9th IEEE working conference on mining software repositories (MSR), pp 50–59
Wan Y, Zhao Z, Yang M, Xu G, Ying H, Wu J, Yu P S (2018) Improving automatic source code summarization via deep reinforcement learning. In: Proceedings of the 33rd ACM/IEEE international conference on automated software engineering, pp 397–407
Wang S, Lo D (2014) Version history, similar report, and structure: putting them together for improved bug localization. In: Proceedings of the 22nd international conference on program comprehension, pp 53–63
Wang Q, Parnin C, Orso A (2015) Evaluating the usefulness of IR-based fault localization techniques. In: Proceedings of the 2015 international symposium on software testing and analysis, pp 1–11
Wen M, Wu R, Cheung S -C (2016) Locus: locating bugs from software changes. In: 2016 31St IEEE/ACM international conference on automated software engineering (ASE), pp 262–273
Wong C -P, Xiong Y, Zhang H, Hao D, Zhang L, Mei H (2014) Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis. In: 2014 IEEE international conference on software maintenance and evolution, pp 181–190
Wong W E, Gao R, Li Y, Abreu R, Wotawa F (2016) A survey on software fault localization. IEEE Trans Softw Eng 42(8):707–740
Xia X, Lo D, Wang X, Zhang C, Wang X (2014) Cross-language bug localization. In: Proceedings of the 22nd international conference on program comprehension, pp 275–278
Xuan H, Li M (2017) Enhancing the unified features to locate buggy files by exploiting the sequential nature of source code. In: IJCAI, pp 1909–1915
Xuan H, Li M, Zhou Z -H (2016) Learning unified features from natural and programming languages for locating buggy source code. In: IJCAI, pp 1606–1612
Ye X, Bunescu R, Liu C (2014) Learning to rank relevant files for bug reports using domain knowledge. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering, pp 689–699
Ye X, Shen H, Ma X, Bunescu R, Liu C (2016) From word embeddings to document similarities for improved information retrieval in software engineering. In: Proceedings of the 38th international conference on software engineering, pp 404–415
Youm K C, Ahn J, Kim J, Lee E (2015) Bug localization based on code change histories and bug reports. In: 2015 Asia-pacific software engineering conference (APSEC), pp 190–197
Zhang H (2009) An investigation of the relationships between lines of code and defects. In: 2009 IEEE international conference on software maintenance, pp 274–283
Zhao L, Callan J (2012) Automatic term mismatch diagnosis for selective query expansion. In: Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieva, pp 515–524
Zimmermann T, Premraj R, Bettenburg N, Just S, Schroter A, Weiss C (2010) What makes a good bug report? IEEE Trans Softw Eng 36 (5):618–643. https://doi.org/10.1109/TSE.2010.63
Zhou J, Zhang H, Lo D (2012) Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports. In: 2012 34th international conference on software engineering (ICSE), pp 14–24
Acknowledgements
This work is partially supported by Natural Science Foundation of China (No. 61872272, 61640221), and the Strategic Priority Research Program of Chinese Academy of Science (No.XDC05040100).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Communicated by: Aldeida Aleti, Annibale Panichella, and Shin Yoo
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: Advances in Search-Based Software Engineering (SSBSE)
The authors have no relevant financial or non-financial interests to disclose.
Rights and permissions
About this article
Cite this article
Li, W., Li, Q., Ming, Y. et al. An empirical study of the effectiveness of IR-based bug localization for large-scale industrial projects. Empir Software Eng 27, 47 (2022). https://doi.org/10.1007/s10664-021-10082-6
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-021-10082-6