Improving bug management using correlations in crash reports

Wang, Shaohua; Khomh, Foutse; Zou, Ying

doi:10.1007/s10664-014-9333-9

Improving bug management using correlations in crash reports

Published: 10 October 2014

Volume 21, pages 337–367, (2016)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Shaohua Wang¹,
Foutse Khomh² &
Ying Zou³

698 Accesses
37 Citations
Explore all metrics

Abstract

Nowadays, many software organizations rely on automatic problem reporting tools to collect crash reports directly from users’ environments. These crash reports are later grouped together into crash types. Usually, developers prioritize crash types based on the number of crash reports and file bug reports for the top crash types. Because a bug can trigger a crash in different usage scenarios, different crash types are sometimes related to the same bug. Two bugs are correlated when the occurrence of one bug causes the other bug to occur. We refer to a group of crash types related to identical or correlated bug reports, as a crash correlation group. In this paper, we propose five rules to identify correlated crash types automatically. We propose an algorithm to locate and rank buggy files using crash correlation groups. We also propose a method to identify duplicate and related bug reports. Through an empirical study on Firefox and Eclipse, we show that the first three rules can identify crash correlation groups using stack trace information, with a precision of 91 % and a recall of 87 % for Firefox and a precision of 76 % and a recall of 61 % for Eclipse. On the top three buggy file candidates, the proposed bug localization algorithm achieves a recall of 62 % and a precision of 42 % for Firefox, and a recall of 52 % and a precision of 50 % for Eclipse. On the top 10 buggy file candidates, the recall increases to 92 % for Firefox and 90 % for Eclipse. The proposed duplicate bug report identification method achieves a recall of 50 % and a precision of 55 % on Firefox, and a recall of 47 % and a precision of 35 % on Eclipse. Developers can combine the proposed crash correlation rules with the new bug localization algorithm to identify and fix correlated crash types all together. Triagers can use the duplicate bug report identification method to reduce their workload by filtering duplicate bug reports automatically.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

How different are different diff algorithms in Git?

Article Open access 11 September 2019

Yusuf Sulistyo Nugroho, Hideaki Hata & Kenichi Matsumoto

Backstabber’s Knife Collection: A Review of Open Source Software Supply Chain Attacks

Software defect prediction: future directions and challenges

Article 27 February 2024

Zhiqiang Li, Jingwen Niu & Xiao-Yuan Jing

Notes

References

Agrawal R, Srikant R (1994) Fast algorithm for mining association rules in large databases. In: Proceedings of the 20th international conference on very large databases, pp 487–499. San Francisco
Ball T, Naik NM, Rajamani SK (2003) From symptom to cause: localizing errors in counterexample traces. In: Proceedings of the 30th ACM SIGPLAN-SIGACT symposium on principles of programming languages, pp 97–105
Betttenburg N, Premraj R, Zimmermann T, Kim S (2008) Extracting structual information from bug reports. In: Proceedings of the 5th international working conference on mining software repositories. Leipzig
Brodie M, Ma S, Rachevsky L, Champlin J (2005) Automatic problem determination using call-stack matching. J Netw Syst Manag 2:13
Google Scholar
Connecting with customers (2012) http://www.microsoft.com/mscorp/execmail/2002/1002customers.mspx. Accessed 27 March 2012
Cosine Similarity (2013) http://en.wikipedia.org/wiki/Cosine_similarity. Access 27 October 2013
Chan B, Zou Y, Hassan AE, Sinha A (2009) Visualizing the results of field testing. In: Proceedings of the 18th international conference on program comprehension, pp 114–123. Minho
Dang Y, Wu R, Zhang H, Zhang D, Nobel P (2012) ReBucket: a method for clustering duplicate crash reports based on call stack similarity. In: Proceedings of the 2012 international conference on software engineering, pp 1084–1093. Zurich
Dhaliwal T, Khomh F, Zou Y (2011) Classifying field crash reports for fixing bugs: a case study of Mozilla Firefox. In: Proceedings of the 27th IEEE international conference on software maintenance. Williamsburg
Eric Wong W, Debroy V (2009) A survey of software fault localization. Technical Report UTDCS-45-09, Department of Computer Science, The University of Texas at Dallas
Firefox Stability Improvement (2012) http://blog.mozilla.com/metrics/2010/04/08/dramaticstabilityimprovementsinrefox/ http://blog.mozilla.com/metrics/2010/04/08/dramaticstabilityimprovementsinrefox/. Accessed 22 March 2012
Glerum K, Kinshumann K, Greenberg S, Aul G, Orgovan V, Nichols G, Grant D, Loihle G, Hunt G (2009) Debugging in the (very) large: ten years of implementation and experience
Heyer L, Kruglyak S, Yooseph S (1999) Exploring expression data: identification and analysis of coexpressed genes. Genome research, vol 9, no 11, pp 1106–1115. Cold Spring Harbor Laboratory Press
Jones J, Harrold MJ, Stasko J (2002) Visualization of test information to assist fault localization. In: Proceedings of the 24th international conference on software engineering, pp 467–477. Orlando
Jones JA, Harrold MJ (2005) Empirical evaluation of the Tarantula automatic fault-localization technique. In: Proceedings of the 20th IEEE/ACM conference on automated software engineering, pp 273–282
Kim D, Wang X, Kim S, Zeller A, Cheung SC, Park S (2011) Which crashes should I fix first? Predicting top crashes at an early stage to prioritize debugging efforts. IEEE Trans Softw Eng 3:37
Google Scholar
Kim S, Zimmermann T, Nagappan N (2011) Crash graphs: an aggregated view of multiple crashes to improve crash triage
Khomh F, Chan B, Zou Y, Hassan AE (2011) An entropy evaluation approach for triaging field crashes: a case study of Mozilla Firefox. In: Proceedings of the 18th working conference on reverse engineering. Lero, Limerick
Kruskal JB (1983) An overview of sequence comparison: time warps, string edits, and macromolecules. SIAM Review 25 (2):201–237
Article MathSciNet MATH Google Scholar
Lee W, Soffa ML (2010) Path-based fault correlation. In: Proceedings of the 8th ACM SIGSOFT international symposium on foundations of software engineering. Santa Fe, New Mexico
Liblit B, Naik M, Zheng AX, Aiken A, Jordan MI (2005) Scalable statistical bug isolation. In: Proceedings of the 2005 ACM SIGPLAN conference on programming language design and implementation, pp 15–26. Chicago, Illinois
Michie D, Spiegelhalter DJ, Taylor CC (1994) Machine learning, neural and statistical classification. Prentice Hall
Mozilla Crash Reporting Server (2012) http://crash-stats.mozilla.com/products/Firefox. Accessed 22 March 2012
Nessa S, Abedin M, Eric Wong W, Khan L, Qi Y (2008) Software fault localization using N-gram analysis. In: Proceedings of the 3rd international conference on wireless algorithms, systems, and applications, pp 548–559. LNCS
Podgurski A, Leon D, Francis PA, Masri W, Minch M, Sun J, Wang B (2003) Automated support for classifying software failure reports. In: Proceedings of the 25th international conference on software engineering, pp 465–475
Raghavan V, Wong M (1986) A critical analysis of vector space model for information retrieval. J Am Soc Inf Sci 37 (5):279–287
Article Google Scholar
Schröter A, Bettenburg N, Premraj R (2010) Do stack traces help developers fix bugs? MSR 2010: 7th IEEE working conference on mining software repositories, pp 118–121
Śliwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes?. ACM SIGSOFT Softw Eng Notes 30 (4):1–5
Article Google Scholar
Socorro: Mozilla’s Crash Reporting Server (2012) http://blog.mozilla.com/webdev/2010/05/19/socorro-mozilla-crash-reports/ http://blog.mozilla.com/webdev/2010/05/19/socorro-mozilla-crash-reports/. Accessed 22 March 2012
Sun C, Lo D, Wang X, Jiang J, Khoo S (2010) A discriminative model approach for accurate duplicate bug report retrieval. In: Proceedings of the 32th international conference on software engineering, pp 45–54. Cape Town
Sun C, Lo D, Khoo S, Jiang J (2011) Toward more accurate retrieval of duplicate bug reports
Wang J, Han J (2004) BIDE: efficient mining of frequent closed sequences. In: Proceedings of the 20th international conference on data engineering, pp 79–90
Wang S, Khomh F, Zou Y (2013) Improving bug localization using correlations in crash reports. In: Proceedings of the 10th IEEE working conference on mining software repositories, pp 247–256. San Francisco
Wang X, Zhang L, Xie T, Anvik J, Sun J (2008) An approach to detecting duplicate bug reports using natural language and execution information. In: Proceedings of the 30th international conference on software engineering, pp 461–470. Leipzig
Web browsers (2012) (Global marketshare), Roxr Software Ltd., http://bit.ly/81klgi. Retrieved Accessed 12 Jan 2012
Yin RK (2002) Case study research: design and methods, 3rd edn. SAGE Publications

Download references

Acknowledgements

The authors would like to thank Tejinder Dhaliwal and Feng Zhang, of Queen’s University, for their help during data collection and for their many useful comments on this work.

Author information

Authors and Affiliations

School of Computing, Queen’s University, Kingston, ON, Canada
Shaohua Wang
SWAT Lab, DGIGL, Polytechnique Montréal, Montréal, QC, Canada
Foutse Khomh
Electrical and Computer Engineering, Queen’s University, Kingston, ON, Canada
Ying Zou

Authors

Shaohua Wang
View author publications
You can also search for this author in PubMed Google Scholar
Foutse Khomh
View author publications
You can also search for this author in PubMed Google Scholar
Ying Zou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shaohua Wang.

Additional information

Communicated by Massimiliano Di Penta and Sung Kim

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, S., Khomh, F. & Zou, Y. Improving bug management using correlations in crash reports. Empir Software Eng 21, 337–367 (2016). https://doi.org/10.1007/s10664-014-9333-9

Download citation

Published: 10 October 2014
Issue Date: April 2016
DOI: https://doi.org/10.1007/s10664-014-9333-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving bug management using correlations in crash reports

Abstract

Access this article

Similar content being viewed by others

How different are different diff algorithms in Git?

Backstabber’s Knife Collection: A Review of Open Source Software Supply Chain Attacks

Software defect prediction: future directions and challenges

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

How different are different diff algorithms in Git?

Backstabber’s Knife Collection: A Review of Open Source Software Supply Chain Attacks

Software defect prediction: future directions and challenges

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation