TraceSim: An Alignment Method for Computing Stack Trace Similarity

Rodrigues, Irving Muller; Khvorov, Aleksandr; Aloise, Daniel; Vasiliev, Roman; Koznov, Dmitrij; Fernandes, Eraldo Rezende; Chernishev, George; Luciv, Dmitry; Povarov, Nikita

doi:10.1007/s10664-021-10070-w

TraceSim: An Alignment Method for Computing Stack Trace Similarity

Published: 01 March 2022

Volume 27, article number 53, (2022)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Irving Muller Rodrigues ORCID: orcid.org/0000-0001-5478-4099¹,
Aleksandr Khvorov²,
Daniel Aloise¹,
Roman Vasiliev³,
Dmitrij Koznov⁴,
Eraldo Rezende Fernandes⁵,
George Chernishev⁴,
Dmitry Luciv⁴ &
…
Nikita Povarov³

506 Accesses
6 Citations
Explore all metrics

Abstract

Software systems can automatically submit crash reports to a repository for investigation when program failures occur. A significant portion of these crash reports are duplicate, i.e., they are caused by the same software issue. Therefore, if the volume of submitted reports is very large, automatic grouping of duplicate crash reports can significantly ease and speed up analysis of software failures. This task is known as crash report deduplication. Given a huge volume of incoming reports, increasing quality of deduplication is an important task. The majority of studies address it via information retrieval or sequence matching methods based on the similarity of stack traces from two crash reports. While information retrieval methods disregard the position of a frame in a stack trace, the existing works based on sequence matching algorithms do not fully consider subroutine global frequency and unmatched frames. Besides, due to data distribution differences among software projects, parameters that are learned using machine learning algorithms are necessary to provide more flexibility to the methods. In this paper, we propose TraceSim – an approach for crash report deduplication which combines TF-IDF, optimum global alignment, and machine learning (ML) in a novel way. Moreover, we propose a new evaluation methodology for this task that is more comprehensive and robust than previously used evaluation approaches. TraceSim significantly outperforms seven baselines and state-of-the-art methods in the majority of the scenarios. It is the only approach that achieves competitive results on all datasets regarding all considered metrics. Moreover, we conduct an extensive ablation study that demonstrates the importance of each TraceSim’s element to its final performance and robustness. Finally, we provide the source code for all considered methods and evaluation methodology as well as the created datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving bug management using correlations in crash reports

Article 10 October 2014

Single-objective Versus Multi-objectivized Optimization for Evolutionary Crash Reproduction

Bucketing Failing Tests via Symbolic Analysis

Notes

https://wiki.ubuntu.com/Apport
https://crash-stats.mozilla.com/
https://goto.google.com/crash/root
https://github.com/irving-muller/TraceSim_EMSE
https://lucene.apache.org/
https://www.elastic.co/elasticsearch/
https://github.com/irving-muller/TraceSim_EMSE
https://crash-stats.mozilla.org/
https://bugs.launchpad.net/
https://bugs.eclipse.org/bugs/
https://bz.apache.org/netbeans/
https://bugzilla.gnome.org/
https://metacpan.org/pod/Parse::StackTrace
We conducted a preliminary investigation to find the best number of iterations.
These strategies were designed for techniques that consider the frame order. Since these information retrieval techniques are based on the bag-of-words model, such strategies are not effective for them.
The experiments in Sections 5.1 and 5.2 were run in a shared and heterogeneous environment. Therefore, it is difficult to compare the run times based on these experiments.

References

Ahmed I, Mohan N, Jensen C (2014) The impact of automatic crash reports on bug triaging and development in mozilla. In: Proceedings of The International Symposium on Open Collaboration, Association for Computing Machinery, New York, NY, USA, OpenSym ’14, pp 1–8. https://doi.org/10.1145/2641580.2641585
Banerjee S, Syed Z, Helmick J, Culp M, Ryan K, Cukic B (2017) Automated triaging of very large bug repositories. Information and Software Technology 89:1–13. https://doi.org/10.1016/j.infsof.2016.09.006. http://www.sciencedirect.com/science/article/pii/S0950584916301653
Article Google Scholar
Bartz K, Stokes JW, Platt JC, Kivett R, Grant D, Calinoiu S, Loihle G (2008) Finding similar failures using callstack similarity. In: Proceedings of the Third Conference on Tackling Computer Systems Problems with Machine Learning Techniques, USENIX Association, Berkeley, CA, USA, SysML’08, pp 1–1
Bergstra J, Yamins D, Cox DD (2013a) Hyperopt: A python library for optimizing the hyperparameters of machine learning algorithms. In: Proceedings of the 12th Python in Science Conference, Citeseer, pp 13–20
Bergstra J, Yamins D, Cox DD (2013b) Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In: Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28, JMLR.org, ICML’13, p I–115–I–123
Brodie M, Ma S, Lohman G, Mignet L, Modani N, Wilding M, Champlin J, Sohn P (2005) Quickly finding known software problems via automated symptom matching. In: Second International Conference on Autonomic Computing (ICAC’05), pp 101–110. https://doi.org/10.1109/ICAC.2005.49
Campbell JC, Santos EA, Hindle A (2016) The unreasonable effectiveness of traditional information retrieval in crash report deduplication. In: Proceedings of the 13th International Conference on Mining Software Repositories, ACM, New York, NY, USA, MSR ’16, pp 269–280. https://doi.org/10.1145/2901739.2901766
Chierichetti F, Kumar R, Pandey S, Vassilvitskii S (2010) Finding the jaccard median. In: Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms, SIAM, pp 293–311
Dang Y, Wu R, Zhang H, Zhang D, Nobel P (2012) Rebucket: A method for clustering duplicate crash reports based on call stack similarity. In: Proceedings of the 34th International Conference on Software Engineering, IEEE Press, Piscataway, NJ, USA, ICSE ’12, pp 1084–1093. http://dl.acm.org/citation.cfm?id=2337223.2337364
Deza MM, Deza E (2016) Encyclopedia of Distances, 4th edn. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-00234-2_1
Dhaliwal T, Khomh F, Zou Y (2011) Classifying field crash reports for fixing bugs: A case study of mozilla firefox. In: Proceedings of the 2011 27th IEEE International Conference on Software Maintenance, IEEE Computer Society, Washington, DC, USA, ICSM ’11, pp 333–342. https://doi.org/10.1109/ICSM.2011.6080800
Ebrahimi N, Trabelsi A, Islam M S, Hamou-Lhadj A, Khanmohammadi K (2019) An hmm-based approach for automatic detection and classification of duplicate bug reports. Inf Softw Technol 113:98–109
Article Google Scholar
Gehan EA (1965) A generalized wilcoxon test for comparing arbitrarily singly-censored samples. Biometrika 52(1/2):203–223. http://www.jstor.org/stable/2333825
Article MathSciNet Google Scholar
Glerum K, Kinshumann K, Greenberg S, Aul G, Orgovan V, Nichols G, Grant D, Loihle G, Hunt G (2009) Debugging in the (very) large: Ten years of implementation and experience. In: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, Association for Computing Machinery, New York, NY, USA, SOSP ’09, p 103–116. https://doi.org/10.1145/1629575.1629586
Kampstra P (2008) Beanplot: A boxplot alternative for visual comparison of distributions. Journal of Statistical Software, Code Snippets 28(1):1–9. https://doi.org/10.18637/jss.v028.c01, https://www.jstatsoft.org/v028/c01
Google Scholar
Kim S, Zimmermann T, Nagappan N (2011) Crash graphs: An aggregated view of multiple crashes to improve crash triage. In: 2011 IEEE/IFIP 41St international conference on dependable systems & networks. IEEE, DSN, pp 486–493
Koopaei NE, Hamou-Lhadj A (2015) Crashautomata: An approach for the detection of duplicate crash reports based on generalizable automata. In: Proceedings of the 25th Annual International Conference on Computer Science and Software Engineering, IBM Corp., USA, CASCON ’15, p 201–210
Lerch J, Mezini M (2013) Finding duplicates of your yet unwritten bug report. In: Proceedings of the 2013 17th European Conference on Software Maintenance and Reengineering, IEEE Computer Society, Washington, DC, USA, CSMR ’13, pp 69–78. https://doi.org/10.1109/CSMR.2013.17
Manning CD, Schütze H (1999) Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge, Massachusetts. http://nlp.stanford.edu/fsnlp/
Miller FP, Vandome AF, McBrewster J (2009) Levenshtein distance: information theory, computer science, string (computer science), string metric Damerau? Levenshtein distance. Spell Checker, Hamming Distance. Alpha Press
Modani N, Gupta R, Lohman G, Syeda-Mahmood T, Mignet L (2007) Automatically identifying known software problems. In: Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering Workshop, IEEE Computer Society, Washington, DC, USA, ICDEW ’07, pp 433–441. https://doi.org/10.1109/ICDEW.2007.4401026
Moroo A, Aizawa A, Hamamoto T (2017) Reranking-based crash report deduplication. In: He X (ed) SEKE ’17. https://doi.org/10.18293/SEKE2017-135, pp 507–510
Needleman S, Wunsch C (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48(3):443–453. https://doi.org/10.1016/0022-2836(70)90057-4
Article Google Scholar
Putatunda S, Rama K (2018) A comparative analysis of hyperopt as against other approaches for hyper-parameter optimization of xgboost. In: Proceedings of the 2018 International Conference on Signal Processing and Machine Learning, Association for Computing Machinery, New York, NY, USA, SPML ’18, p 6–10. https://doi.org/10.1145/3297067.3297080
Rakha MS, Bezemer C, Hassan AE (2018) Revisiting the performance evaluation of automated approaches for the retrieval of duplicate issue reports. IEEE Trans Softw Eng 44(12):1245–1268. https://doi.org/10.1109/TSE.2017.2755005
Article Google Scholar
Sabor KK, Hamou-Lhadj A, Larsson A (2017) DURFEX: A feature extraction technique for efficient detection of duplicate bug reports. In: 2017 IEEE International Conference on Software Quality, Reliability and Security, QRS 2017, Prague, Czech Republic, July 25-29, 2017, IEEE, pp 240–250. https://doi.org/10.1109/QRS.2017.35
Schroter A, Schröter A, Bettenburg N, Premraj R (2010) Do stack traces help developers fix bugs?. In: 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), IEEE, pp 118–121
Sellers PH (1974) On the theory and computation of evolutionary distances. SIAM J Appl Math 26(4):787–793
Article MathSciNet Google Scholar
Sun C, Lo D, Khoo SC, Jiang J (2011) Towards more accurate retrieval of duplicate bug reports. In: Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, IEEE Computer Society, Washington, DC, USA, ASE ’11, pp 253–262. https://doi.org/10.1109/ASE.2011.6100061
Waskom M (2020) mwaskom/seaborn. https://doi.org/10.5281/zenodo.592845

Download references

Author information

Authors and Affiliations

Polytechnique Montreal, Montreal, Canada
Irving Muller Rodrigues & Daniel Aloise
JetBrains, HSE University, St Petersburg, Russia
Aleksandr Khvorov
JetBrains, St Petersburg, Russia
Roman Vasiliev & Nikita Povarov
Saint-Petersburg State University, Saint-Petersburg, Russia
Dmitrij Koznov, George Chernishev & Dmitry Luciv
FACOM – UFMS, Campo Grande, Brazil
Eraldo Rezende Fernandes

Authors

Irving Muller Rodrigues
View author publications
You can also search for this author in PubMed Google Scholar
Aleksandr Khvorov
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Aloise
View author publications
You can also search for this author in PubMed Google Scholar
Roman Vasiliev
View author publications
You can also search for this author in PubMed Google Scholar
Dmitrij Koznov
View author publications
You can also search for this author in PubMed Google Scholar
Eraldo Rezende Fernandes
View author publications
You can also search for this author in PubMed Google Scholar
George Chernishev
View author publications
You can also search for this author in PubMed Google Scholar
Dmitry Luciv
View author publications
You can also search for this author in PubMed Google Scholar
Nikita Povarov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Irving Muller Rodrigues.

Additional information

Communicated by: Foutse Khomh

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE)

We would like to gratefully acknowledge the Natural Sciences and Engineering Research Council of Canada (NSERC), Ericsson, Ciena, and EffciOS for funding this project. Moreover, this research was enabled in part by the support provided by WestGrid (https://www.westgrid.ca/) and Compute Canada (www.computecanada.ca).

Appendix A: Additional Ablation Study Results

In this appendix, we expand the ablation study in which Global Weight, Local Weight, the diff(⋅) Function, and normalization are removed. We depict ΔAUC, ΔMAP, and ΔRR@1 between the original TraceSim and each possible configuration that has not more than two components enabled in Figs. 14, 15, 16, 17, 18, 19, 20 and 21.

The following configurations are not reported:

1.
TraceSim without Global Weight and Local Weight. In this case, frame weights are always equal to 1. Since the normalization was designed based on variable frame weights, the normalization loses its effectiveness.
2.
TraceSim without Global Weight, Local Weight, and the diff(⋅) Function. Similarly to the previous configuration, the normalization is not effective because the frame weights are constants.
3.
TraceSim without Global Weight, Local Weight, normalization and the diff(⋅) Function. This configuration is equivalent to NW algorithm in which the match, mismatch and gap values are set to 1.0, 2.0, and 1.0, respectively.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rodrigues, I.M., Khvorov, A., Aloise, D. et al. TraceSim: An Alignment Method for Computing Stack Trace Similarity. Empir Software Eng 27, 53 (2022). https://doi.org/10.1007/s10664-021-10070-w

Download citation

Accepted: 28 October 2021
Published: 01 March 2022
DOI: https://doi.org/10.1007/s10664-021-10070-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

TraceSim: An Alignment Method for Computing Stack Trace Similarity

Abstract

Access this article

Similar content being viewed by others

Improving bug management using correlations in crash reports

Single-objective Versus Multi-objectivized Optimization for Evolutionary Crash Reproduction

Bucketing Failing Tests via Symbolic Analysis

Notes

References