Skip to main content
Log in

TraceSim: An Alignment Method for Computing Stack Trace Similarity

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Software systems can automatically submit crash reports to a repository for investigation when program failures occur. A significant portion of these crash reports are duplicate, i.e., they are caused by the same software issue. Therefore, if the volume of submitted reports is very large, automatic grouping of duplicate crash reports can significantly ease and speed up analysis of software failures. This task is known as crash report deduplication. Given a huge volume of incoming reports, increasing quality of deduplication is an important task. The majority of studies address it via information retrieval or sequence matching methods based on the similarity of stack traces from two crash reports. While information retrieval methods disregard the position of a frame in a stack trace, the existing works based on sequence matching algorithms do not fully consider subroutine global frequency and unmatched frames. Besides, due to data distribution differences among software projects, parameters that are learned using machine learning algorithms are necessary to provide more flexibility to the methods. In this paper, we propose TraceSim – an approach for crash report deduplication which combines TF-IDF, optimum global alignment, and machine learning (ML) in a novel way. Moreover, we propose a new evaluation methodology for this task that is more comprehensive and robust than previously used evaluation approaches. TraceSim significantly outperforms seven baselines and state-of-the-art methods in the majority of the scenarios. It is the only approach that achieves competitive results on all datasets regarding all considered metrics. Moreover, we conduct an extensive ablation study that demonstrates the importance of each TraceSim’s element to its final performance and robustness. Finally, we provide the source code for all considered methods and evaluation methodology as well as the created datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. https://wiki.ubuntu.com/Apport

  2. https://crash-stats.mozilla.com/

  3. https://goto.google.com/crash/root

  4. https://github.com/irving-muller/TraceSim_EMSE

  5. https://lucene.apache.org/

  6. https://www.elastic.co/elasticsearch/

  7. https://github.com/irving-muller/TraceSim_EMSE

  8. https://crash-stats.mozilla.org/

  9. https://bugs.launchpad.net/

  10. https://bugs.eclipse.org/bugs/

  11. https://bz.apache.org/netbeans/

  12. https://bugzilla.gnome.org/

  13. https://metacpan.org/pod/Parse::StackTrace

  14. We conducted a preliminary investigation to find the best number of iterations.

  15. These strategies were designed for techniques that consider the frame order. Since these information retrieval techniques are based on the bag-of-words model, such strategies are not effective for them.

  16. The experiments in Sections 5.1 and 5.2 were run in a shared and heterogeneous environment. Therefore, it is difficult to compare the run times based on these experiments.

References

  • Ahmed I, Mohan N, Jensen C (2014) The impact of automatic crash reports on bug triaging and development in mozilla. In: Proceedings of The International Symposium on Open Collaboration, Association for Computing Machinery, New York, NY, USA, OpenSym ’14, pp 1–8. https://doi.org/10.1145/2641580.2641585

  • Banerjee S, Syed Z, Helmick J, Culp M, Ryan K, Cukic B (2017) Automated triaging of very large bug repositories. Information and Software Technology 89:1–13. https://doi.org/10.1016/j.infsof.2016.09.006. http://www.sciencedirect.com/science/article/pii/S0950584916301653

    Article  Google Scholar 

  • Bartz K, Stokes JW, Platt JC, Kivett R, Grant D, Calinoiu S, Loihle G (2008) Finding similar failures using callstack similarity. In: Proceedings of the Third Conference on Tackling Computer Systems Problems with Machine Learning Techniques, USENIX Association, Berkeley, CA, USA, SysML’08, pp 1–1

  • Bergstra J, Yamins D, Cox DD (2013a) Hyperopt: A python library for optimizing the hyperparameters of machine learning algorithms. In: Proceedings of the 12th Python in Science Conference, Citeseer, pp 13–20

  • Bergstra J, Yamins D, Cox DD (2013b) Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In: Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28, JMLR.org, ICML’13, p I–115–I–123

  • Brodie M, Ma S, Lohman G, Mignet L, Modani N, Wilding M, Champlin J, Sohn P (2005) Quickly finding known software problems via automated symptom matching. In: Second International Conference on Autonomic Computing (ICAC’05), pp 101–110. https://doi.org/10.1109/ICAC.2005.49

  • Campbell JC, Santos EA, Hindle A (2016) The unreasonable effectiveness of traditional information retrieval in crash report deduplication. In: Proceedings of the 13th International Conference on Mining Software Repositories, ACM, New York, NY, USA, MSR ’16, pp 269–280. https://doi.org/10.1145/2901739.2901766

  • Chierichetti F, Kumar R, Pandey S, Vassilvitskii S (2010) Finding the jaccard median. In: Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms, SIAM, pp 293–311

  • Dang Y, Wu R, Zhang H, Zhang D, Nobel P (2012) Rebucket: A method for clustering duplicate crash reports based on call stack similarity. In: Proceedings of the 34th International Conference on Software Engineering, IEEE Press, Piscataway, NJ, USA, ICSE ’12, pp 1084–1093. http://dl.acm.org/citation.cfm?id=2337223.2337364

  • Deza MM, Deza E (2016) Encyclopedia of Distances, 4th edn. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-00234-2_1

  • Dhaliwal T, Khomh F, Zou Y (2011) Classifying field crash reports for fixing bugs: A case study of mozilla firefox. In: Proceedings of the 2011 27th IEEE International Conference on Software Maintenance, IEEE Computer Society, Washington, DC, USA, ICSM ’11, pp 333–342. https://doi.org/10.1109/ICSM.2011.6080800

  • Ebrahimi N, Trabelsi A, Islam M S, Hamou-Lhadj A, Khanmohammadi K (2019) An hmm-based approach for automatic detection and classification of duplicate bug reports. Inf Softw Technol 113:98–109

    Article  Google Scholar 

  • Gehan EA (1965) A generalized wilcoxon test for comparing arbitrarily singly-censored samples. Biometrika 52(1/2):203–223. http://www.jstor.org/stable/2333825

    Article  MathSciNet  Google Scholar 

  • Glerum K, Kinshumann K, Greenberg S, Aul G, Orgovan V, Nichols G, Grant D, Loihle G, Hunt G (2009) Debugging in the (very) large: Ten years of implementation and experience. In: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, Association for Computing Machinery, New York, NY, USA, SOSP ’09, p 103–116. https://doi.org/10.1145/1629575.1629586

  • Kampstra P (2008) Beanplot: A boxplot alternative for visual comparison of distributions. Journal of Statistical Software, Code Snippets 28(1):1–9. https://doi.org/10.18637/jss.v028.c01, https://www.jstatsoft.org/v028/c01

    Google Scholar 

  • Kim S, Zimmermann T, Nagappan N (2011) Crash graphs: An aggregated view of multiple crashes to improve crash triage. In: 2011 IEEE/IFIP 41St international conference on dependable systems & networks. IEEE, DSN, pp 486–493

  • Koopaei NE, Hamou-Lhadj A (2015) Crashautomata: An approach for the detection of duplicate crash reports based on generalizable automata. In: Proceedings of the 25th Annual International Conference on Computer Science and Software Engineering, IBM Corp., USA, CASCON ’15, p 201–210

  • Lerch J, Mezini M (2013) Finding duplicates of your yet unwritten bug report. In: Proceedings of the 2013 17th European Conference on Software Maintenance and Reengineering, IEEE Computer Society, Washington, DC, USA, CSMR ’13, pp 69–78. https://doi.org/10.1109/CSMR.2013.17

  • Manning CD, Schütze H (1999) Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge, Massachusetts. http://nlp.stanford.edu/fsnlp/

  • Miller FP, Vandome AF, McBrewster J (2009) Levenshtein distance: information theory, computer science, string (computer science), string metric Damerau? Levenshtein distance. Spell Checker, Hamming Distance. Alpha Press

  • Modani N, Gupta R, Lohman G, Syeda-Mahmood T, Mignet L (2007) Automatically identifying known software problems. In: Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering Workshop, IEEE Computer Society, Washington, DC, USA, ICDEW ’07, pp 433–441. https://doi.org/10.1109/ICDEW.2007.4401026

  • Moroo A, Aizawa A, Hamamoto T (2017) Reranking-based crash report deduplication. In: He X (ed) SEKE ’17. https://doi.org/10.18293/SEKE2017-135, pp 507–510

  • Needleman S, Wunsch C (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48(3):443–453. https://doi.org/10.1016/0022-2836(70)90057-4

    Article  Google Scholar 

  • Putatunda S, Rama K (2018) A comparative analysis of hyperopt as against other approaches for hyper-parameter optimization of xgboost. In: Proceedings of the 2018 International Conference on Signal Processing and Machine Learning, Association for Computing Machinery, New York, NY, USA, SPML ’18, p 6–10. https://doi.org/10.1145/3297067.3297080

  • Rakha MS, Bezemer C, Hassan AE (2018) Revisiting the performance evaluation of automated approaches for the retrieval of duplicate issue reports. IEEE Trans Softw Eng 44(12):1245–1268. https://doi.org/10.1109/TSE.2017.2755005

    Article  Google Scholar 

  • Sabor KK, Hamou-Lhadj A, Larsson A (2017) DURFEX: A feature extraction technique for efficient detection of duplicate bug reports. In: 2017 IEEE International Conference on Software Quality, Reliability and Security, QRS 2017, Prague, Czech Republic, July 25-29, 2017, IEEE, pp 240–250. https://doi.org/10.1109/QRS.2017.35

  • Schroter A, Schröter A, Bettenburg N, Premraj R (2010) Do stack traces help developers fix bugs?. In: 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), IEEE, pp 118–121

  • Sellers PH (1974) On the theory and computation of evolutionary distances. SIAM J Appl Math 26(4):787–793

    Article  MathSciNet  Google Scholar 

  • Sun C, Lo D, Khoo SC, Jiang J (2011) Towards more accurate retrieval of duplicate bug reports. In: Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, IEEE Computer Society, Washington, DC, USA, ASE ’11, pp 253–262. https://doi.org/10.1109/ASE.2011.6100061

  • Waskom M (2020) mwaskom/seaborn. https://doi.org/10.5281/zenodo.592845

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Irving Muller Rodrigues.

Additional information

Communicated by: Foutse Khomh

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE)

We would like to gratefully acknowledge the Natural Sciences and Engineering Research Council of Canada (NSERC), Ericsson, Ciena, and EffciOS for funding this project. Moreover, this research was enabled in part by the support provided by WestGrid (https://www.westgrid.ca/) and Compute Canada (www.computecanada.ca).

Appendix A: Additional Ablation Study Results

Appendix A: Additional Ablation Study Results

In this appendix, we expand the ablation study in which Global Weight, Local Weight, the diff(⋅) Function, and normalization are removed. We depict ΔAUC, ΔMAP, and ΔRR@1 between the original TraceSim and each possible configuration that has not more than two components enabled in Figs. 14151617181920 and 21.

Fig. 14
figure 14

Distributions of ΔAUC (left), ΔMAP (middle) and ΔRR@1 (right) between full TraceSim and TraceSim without the diff(⋅) Function and Normalization

Fig. 15
figure 15

Distributions of ΔAUC (left), ΔMAP (middle) and ΔRR@1 (right) between full TraceSim and TraceSim without Global Weight and the diff(⋅) Function

Fig. 16
figure 16

Distributions of ΔAUC (left), ΔMAP (middle) and ΔRR@1 (right) between full TraceSim and TraceSim without Global Weight and Normalization

Fig. 17
figure 17

Distributions of ΔAUC (left), ΔMAP (middle) and ΔRR@1 (right) between full TraceSim and TraceSim without Global Weight, Local Weight and Normalization

Fig. 18
figure 18

Distributions of ΔAUC (left), ΔMAP (middle) and ΔRR@1 (right) between full TraceSim and TraceSim without Global Weight, the diff(⋅) Function, and Normalization

Fig. 19
figure 19

Distributions of ΔAUC (left), ΔMAP (middle) and ΔRR@1 (right) between full TraceSim and TraceSim without Local Weight and Normalization

Fig. 20
figure 20

Distributions of ΔAUC (left), ΔMAP (middle) and ΔRR@1 (right) between full TraceSim and TraceSim without Local Weight and the diff(⋅) Function

Fig. 21
figure 21

Distributions of ΔAUC (left), ΔMAP (middle) and ΔRR@1 (right) between full TraceSim and TraceSim without Local Weight, the diff(⋅) Function and Normalization

The following configurations are not reported:

  1. 1.

    TraceSim without Global Weight and Local Weight. In this case, frame weights are always equal to 1. Since the normalization was designed based on variable frame weights, the normalization loses its effectiveness.

  2. 2.

    TraceSim without Global Weight, Local Weight, and the diff(⋅) Function. Similarly to the previous configuration, the normalization is not effective because the frame weights are constants.

  3. 3.

    TraceSim without Global Weight, Local Weight, normalization and the diff(⋅) Function. This configuration is equivalent to NW algorithm in which the match, mismatch and gap values are set to 1.0, 2.0, and 1.0, respectively.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rodrigues, I.M., Khvorov, A., Aloise, D. et al. TraceSim: An Alignment Method for Computing Stack Trace Similarity. Empir Software Eng 27, 53 (2022). https://doi.org/10.1007/s10664-021-10070-w

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-021-10070-w

Keywords

Navigation