Empirical Software Engineering

, Volume 21, Issue 2, pp 337–367 | Cite as

Improving bug management using correlations in crash reports

  • Shaohua Wang
  • Foutse Khomh
  • Ying Zou


Nowadays, many software organizations rely on automatic problem reporting tools to collect crash reports directly from users’ environments. These crash reports are later grouped together into crash types. Usually, developers prioritize crash types based on the number of crash reports and file bug reports for the top crash types. Because a bug can trigger a crash in different usage scenarios, different crash types are sometimes related to the same bug. Two bugs are correlated when the occurrence of one bug causes the other bug to occur. We refer to a group of crash types related to identical or correlated bug reports, as a crash correlation group. In this paper, we propose five rules to identify correlated crash types automatically. We propose an algorithm to locate and rank buggy files using crash correlation groups. We also propose a method to identify duplicate and related bug reports. Through an empirical study on Firefox and Eclipse, we show that the first three rules can identify crash correlation groups using stack trace information, with a precision of 91 % and a recall of 87 % for Firefox and a precision of 76 % and a recall of 61 % for Eclipse. On the top three buggy file candidates, the proposed bug localization algorithm achieves a recall of 62 % and a precision of 42 % for Firefox, and a recall of 52 % and a precision of 50 % for Eclipse. On the top 10 buggy file candidates, the recall increases to 92 % for Firefox and 90 % for Eclipse. The proposed duplicate bug report identification method achieves a recall of 50 % and a precision of 55 % on Firefox, and a recall of 47 % and a precision of 35 % on Eclipse. Developers can combine the proposed crash correlation rules with the new bug localization algorithm to identify and fix correlated crash types all together. Triagers can use the duplicate bug report identification method to reduce their workload by filtering duplicate bug reports automatically.


Crashes Crash reports Stack traces Bug localization Bug duplication 



The authors would like to thank Tejinder Dhaliwal and Feng Zhang, of Queen’s University, for their help during data collection and for their many useful comments on this work.


  1. Agrawal R, Srikant R (1994) Fast algorithm for mining association rules in large databases. In: Proceedings of the 20th international conference on very large databases, pp 487–499. San FranciscoGoogle Scholar
  2. Ball T, Naik NM, Rajamani SK (2003) From symptom to cause: localizing errors in counterexample traces. In: Proceedings of the 30th ACM SIGPLAN-SIGACT symposium on principles of programming languages, pp 97–105Google Scholar
  3. Betttenburg N, Premraj R, Zimmermann T, Kim S (2008) Extracting structual information from bug reports. In: Proceedings of the 5th international working conference on mining software repositories. LeipzigGoogle Scholar
  4. Brodie M, Ma S, Rachevsky L, Champlin J (2005) Automatic problem determination using call-stack matching. J Netw Syst Manag 2:13Google Scholar
  5. Connecting with customers (2012) Accessed 27 March 2012
  6. Cosine Similarity (2013) Access 27 October 2013
  7. Chan B, Zou Y, Hassan AE, Sinha A (2009) Visualizing the results of field testing. In: Proceedings of the 18th international conference on program comprehension, pp 114–123. MinhoGoogle Scholar
  8. Dang Y, Wu R, Zhang H, Zhang D, Nobel P (2012) ReBucket: a method for clustering duplicate crash reports based on call stack similarity. In: Proceedings of the 2012 international conference on software engineering, pp 1084–1093. ZurichGoogle Scholar
  9. Dhaliwal T, Khomh F, Zou Y (2011) Classifying field crash reports for fixing bugs: a case study of Mozilla Firefox. In: Proceedings of the 27th IEEE international conference on software maintenance. WilliamsburgGoogle Scholar
  10. Eric Wong W, Debroy V (2009) A survey of software fault localization. Technical Report UTDCS-45-09, Department of Computer Science, The University of Texas at DallasGoogle Scholar
  11. Glerum K, Kinshumann K, Greenberg S, Aul G, Orgovan V, Nichols G, Grant D, Loihle G, Hunt G (2009) Debugging in the (very) large: ten years of implementation and experienceGoogle Scholar
  12. Heyer L, Kruglyak S, Yooseph S (1999) Exploring expression data: identification and analysis of coexpressed genes. Genome research, vol 9, no 11, pp 1106–1115. Cold Spring Harbor Laboratory PressGoogle Scholar
  13. Jones J, Harrold MJ, Stasko J (2002) Visualization of test information to assist fault localization. In: Proceedings of the 24th international conference on software engineering, pp 467–477. OrlandoGoogle Scholar
  14. Jones JA, Harrold MJ (2005) Empirical evaluation of the Tarantula automatic fault-localization technique. In: Proceedings of the 20th IEEE/ACM conference on automated software engineering, pp 273–282Google Scholar
  15. Kim D, Wang X, Kim S, Zeller A, Cheung SC, Park S (2011) Which crashes should I fix first? Predicting top crashes at an early stage to prioritize debugging efforts. IEEE Trans Softw Eng 3:37Google Scholar
  16. Kim S, Zimmermann T, Nagappan N (2011) Crash graphs: an aggregated view of multiple crashes to improve crash triageGoogle Scholar
  17. Khomh F, Chan B, Zou Y, Hassan AE (2011) An entropy evaluation approach for triaging field crashes: a case study of Mozilla Firefox. In: Proceedings of the 18th working conference on reverse engineering. Lero, LimerickGoogle Scholar
  18. Kruskal JB (1983) An overview of sequence comparison: time warps, string edits, and macromolecules. SIAM Review 25 (2):201–237MathSciNetCrossRefzbMATHGoogle Scholar
  19. Lee W, Soffa ML (2010) Path-based fault correlation. In: Proceedings of the 8th ACM SIGSOFT international symposium on foundations of software engineering. Santa Fe, New MexicoGoogle Scholar
  20. Liblit B, Naik M, Zheng AX, Aiken A, Jordan MI (2005) Scalable statistical bug isolation. In: Proceedings of the 2005 ACM SIGPLAN conference on programming language design and implementation, pp 15–26. Chicago, IllinoisGoogle Scholar
  21. Michie D, Spiegelhalter DJ, Taylor CC (1994) Machine learning, neural and statistical classification. Prentice HallGoogle Scholar
  22. Mozilla Crash Reporting Server (2012) Accessed 22 March 2012
  23. Nessa S, Abedin M, Eric Wong W, Khan L, Qi Y (2008) Software fault localization using N-gram analysis. In: Proceedings of the 3rd international conference on wireless algorithms, systems, and applications, pp 548–559. LNCSGoogle Scholar
  24. Podgurski A, Leon D, Francis PA, Masri W, Minch M, Sun J, Wang B (2003) Automated support for classifying software failure reports. In: Proceedings of the 25th international conference on software engineering, pp 465–475Google Scholar
  25. Raghavan V, Wong M (1986) A critical analysis of vector space model for information retrieval. J Am Soc Inf Sci 37 (5):279–287CrossRefGoogle Scholar
  26. Schröter A, Bettenburg N, Premraj R (2010) Do stack traces help developers fix bugs? MSR 2010: 7th IEEE working conference on mining software repositories, pp 118–121Google Scholar
  27. Śliwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes?. ACM SIGSOFT Softw Eng Notes 30 (4):1–5CrossRefGoogle Scholar
  28. Sun C, Lo D, Wang X, Jiang J, Khoo S (2010) A discriminative model approach for accurate duplicate bug report retrieval. In: Proceedings of the 32th international conference on software engineering, pp 45–54. Cape TownGoogle Scholar
  29. Sun C, Lo D, Khoo S, Jiang J (2011) Toward more accurate retrieval of duplicate bug reportsGoogle Scholar
  30. Wang J, Han J (2004) BIDE: efficient mining of frequent closed sequences. In: Proceedings of the 20th international conference on data engineering, pp 79–90Google Scholar
  31. Wang S, Khomh F, Zou Y (2013) Improving bug localization using correlations in crash reports. In: Proceedings of the 10th IEEE working conference on mining software repositories, pp 247–256. San FranciscoGoogle Scholar
  32. Wang X, Zhang L, Xie T, Anvik J, Sun J (2008) An approach to detecting duplicate bug reports using natural language and execution information. In: Proceedings of the 30th international conference on software engineering, pp 461–470. LeipzigGoogle Scholar
  33. Web browsers (2012) (Global marketshare), Roxr Software Ltd., Retrieved Accessed 12 Jan 2012
  34. Yin RK (2002) Case study research: design and methods, 3rd edn. SAGE PublicationsGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.School of ComputingQueen’s UniversityKingstonCanada
  2. 2.SWAT Lab, DGIGL, Polytechnique MontréalMontréalCanada
  3. 3.Electrical and Computer EngineeringQueen’s UniversityKingstonCanada

Personalised recommendations