Traceability recovery between bug reports and test cases-a Mozilla Firefox case study

Gadelha, Guilherme; Ramalho, Franklin; Massoni, Tiago

doi:10.1007/s10515-021-00287-w

Traceability recovery between bug reports and test cases-a Mozilla Firefox case study

Published: 07 July 2021

Volume 28, article number 8, (2021)
Cite this article

Automated Software Engineering Aims and scope Submit manuscript

396 Accesses
8 Citations
Explore all metrics

Abstract

Automatic recovery of traceability between software artifacts may promote early detection of issues and better calculate change impact. Information Retrieval (IR) techniques have been proposed for the task, but they differ considerably in input parameters and results. It is difficult to assess results when those techniques are applied in isolation, usually in small or medium-sized software projects. Recently, multilayered approaches to machine learning, in special Deep Learning (DL), have achieved success in text classification through their capacity to model complex relationships among data. In this article, we apply several IR and DL techniques for investing automatic traceability between bug reports and manual test cases, using historical data from the Mozilla Firefox’s Quality Assurance (QA) team. In this case study, we assess the following IR techniques: LSI, LDA, and BM25, in addition to a DL architecture called Convolutional Neural Networks (CNNs), through the use of Word Embeddings. In this context of traceability, we observe poor performances from three out of the four studied techniques. Only the LSI technique presented acceptable results, standing out even over the state-of-the-art BM25 technique. The obtained results suggest that the semi-automatic application of the LSI technique – with an appropriate combination of thresholds – may be feasible for real-world software projects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 8

Data collection and quality challenges in deep learning: a data-centric AI perspective

Article 03 January 2023

Investigating large language models capabilities for automatic code repair in Python

Article 09 May 2024

Improving Android app exploratory testing with UI test cases using code change analysis

Article 04 April 2024

Data availability

All the data is available in the following repository: https://github.com/guilhermemg/trace-links-tc-br.

Notes

https://www.mozilla.org/
https://bugzilla.mozilla.org
https://wiki.mozilla.org/Platform/GFX/APZ
Given an observable variable X and a target variable Y, a generative model is a statistical model of the joint probability distribution on \(X \times Y\), P(X, Y) (Y. Ng and Jordan 2002)
https://github.com/svn2github/word2vec
https://commoncrawl.org/
https://www.mozilla.org
https://public.etherpad-mozilla.org/
http://bugzilla.mozilla.org
Bug Fields: https://bugs.documentfoundation.org/page.cgi?id=fields.html
PyBossa Platform: https://pybossa.com/
https://support.mozilla.org https://wiki.mozilla.org/QA/https://www.paessler.com/manuals https://addons.mozilla.org https://developer.mozilla.org
NLTK: https://www.nltk.org
SciKit: https://scikit-learn.org/stable/
Gensim: https://radimrehurek.com/gensim/
SpaCy: https://spacy.io/
https://github.com/guilhermemg/trace-links-tc-br
https://nlp.stanford.edu/projects/glove/
https://spacy.io/models/en

References

Antoniol, G., Canfora, G., Casazza, G., De Lucia, A., Merlo, E.: Recovering traceability links between code and documentation. IEEE Trans. Softw. Eng. 28(10), 970–983 (2002). https://doi.org/10.1109/TSE.2002.1041053
Article Google Scholar
Berry, D.M.: Evaluation of tools for hairy requirements and software engineering tasks. In: Proceedings - 2017 IEEE 25th International Requirements Engineering Conference Workshops. REW 2017, 284–291 (2017). https://doi.org/10.1109/REW.2017.25
Bjarnason, E., Unterkalmsteiner, M., Borg, M., Engström, E.: A multi-case study of agile requirements engineering and the use of test cases as requirements. Inf. Softw. Technol. (2016). https://doi.org/10.1016/j.infsof.2016.03.008
Article Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). https://doi.org/10.1111/j.1365-2966.2012.21196.x. arXiv:1111.6189
Article MATH Google Scholar
Borg, M., Runeson, P., Ardö, A.: Recovering from a decade: a systematic mapping of information retrieval approaches to software traceability. Empir. Softw. Eng. 19(6), 1565–1616 (2014). https://doi.org/10.1007/s10664-013-9255-y
Article Google Scholar
Buttcher, S., Clarke, C.L.A., Cormack, G.V.: Information retrieval-implementing and evaluating search engines. MIT Press, Cambridge (2010)
MATH Google Scholar
Canfora, G., Cerulo, L.: Fine grained indexing of software repositories to support impact analysis. Adv. Mater. Res. (2006). https://doi.org/10.4028/www.scientific.net/AMR.785-786.1516
Capobianco, G., De Lucia, A., Oliveto, R., Panichella, A., Panichella, S.: On the role of the nouns in IR-based traceability recovery. In: IEEE International Conference on Program Comprehension pp 148–157, (2009a) https://doi.org/10.1109/ICPC.2009.5090038
Capobianco, G., De Lucia, A., Oliveto, R., Panichella, A., Panichella, S.: Traceability recovery using numerical analysis. In: Proceedings - Working Conference on Reverse Engineering, WCRE pp 195–204, (2009b) https://doi.org/10.1109/WCRE.2009.14
Davies, S., Roper, M.: What’s in a bug report?. In: Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement - ESEM ’14 pp 1–10, (2014) https://doi.org/10.1145/2652524.2652541
De Lucia, A., Fasano, F., Oliveto, R., Tortora, G.: Can information retrieval techniques effectively support traceability link recovery?. In: IEEE International Conference on Program Comprehension 2006, 307–316 (2006). https://doi.org/10.1109/ICPC.2006.15
De Lucia, A., Oliveto, R., Tortora, G.: Assessing IR-based traceability recovery tools through controlled experiments. Empir. Softw. Eng. 14(1), 57–92 (2009). https://doi.org/10.1007/s10664-008-9090-8
Article Google Scholar
Deerwester, S., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990). https://doi.org/10.1017/CBO9781107415324.004
Article Google Scholar
Dekhtyar, A., Fong, V.: RE Data Challenge: Requirements Identification with Word2Vec and TensorFlow. In: Proceedings - 2017 IEEE 25th International Requirements Engineering Conference. RE 2017, 484–489 (2017). https://doi.org/10.1109/RE.2017.26
Dekhtyar, A., Hayes, J.H., Sundaram, S., Holbrook, A., Dekhtyar, O.: Technique integration for requirements assessment. In: Proceedings - 15th IEEE International Requirements Engineering Conference. RE 2007, 141–152 (2007). https://doi.org/10.1109/RE.2007.60
Eder, S., Hauptmann, B., Junker, M., Vaas, R., Prommer, K.H.: Selecting manual regression test cases automatically using trace link recovery and change coverage. In: Proceedings of the 9th International Workshop on Automation of Software Test, Association for Computing Machinery, New York, NY, USA, AST 2014, p. 29–35 (2014)
Falessi, D., Cantone, G., Canfora, G.: A comprehensive characterization of NLP techniques for identifying equivalent requirements. In: Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement - ESEM ’10 p 1, (2010) https://doi.org/10.1145/1852786.1852810
Falessi, D., Di Penta, M., Canfora, G., Cantone, G.: Estimating the number of remaining links in traceability recovery. Empir. Softw. Eng. 22(3), 996–1027 (2017). https://doi.org/10.1007/s10664-016-9460-6
Article Google Scholar
Fazzini, M., Prammer, M., D’Amorim, M., Orso, A.: Automatically translating bug reports into test cases for mobile apps. In: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis - ISSTA 2018 pp 141–152, (2018) https://doi.org/10.1145/3213846.3213869
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. (2016) https://doi.org/10.1016/B978-0-12-801775-3.00001-9, arXiv:1011.1669v3
Gotel, O,C.Z., Finkelstein, A.C.W.: An Analysis of the Requirements Traceability Problem. In: 1st International Conference on Requirements Engineering (RE 1994) pp 94–101, (1994) https://doi.org/10.1109/ICRE.1994.292398
Guo, J., Cheng, J., Cleland-Huang, J.: Semantically Enhanced Software Traceability Using Deep Learning Techniques. In: Proceedings - 2017 IEEE/ACM 39th International Conference on Software Engineering, ICSE 2017 pp 3–14, (2017) https://doi.org/10.1109/ICSE.2017.9, arXiv:1804.02438
Hayes, J.H., Dekhtyar, A., Sundaram, S.K.: Tracing and mapping : supporting software quality predictions (2005)
Hayes, J.H., Dekhtyar, A., Sundaram, S.K.: Advancing candidate link generation for requirements tracing: the study of methods. IEEE Trans. Softw. Eng. 32(1), 4–19 (2006). https://doi.org/10.1109/TSE.2006.3
Article Google Scholar
Hayes, J.H., Dekhtyar, A., Sundaram, S.K., Holbrook, E.A., Vadlamudi, S., April, A.: Requirements tracing on target (RETRO): improving software maintenance through traceability recovery. Innov. Syst. Softw. Eng. 3(3), 193–202 (2007). https://doi.org/10.1007/s11334-007-0024-1
Article Google Scholar
Hemmati, H., Sharifi, F.: Investigating NLP-Based Approaches for Predicting Manual Test Case Failure. In: Proceedings - 2018 IEEE 11th International Conference on Software Testing, Verification and Validation, ICST 2018 pp 309–319, (2018) https://doi.org/10.1109/ICST.2018.00038
Hoffman, M.D., Bach, F.R., Blei, D.M., Bach, F.R.: Online Learning for Latent Dirichlet Allocation. AcademiaEdu pp 1–5 (2012)
Kaushik, N., Tahvildari, L., Moore, M.: Reconstructing traceability between bugs and test cases: an experimental study. In: Proceedings - Working Conference on Reverse Engineering, WCRE pp 411–414, (2011) https://doi.org/10.1109/WCRE.2011.58
Kun, Chen, Wei, Zhang, Haiyan, Zhao, Hong, Mei: An approach to constructing feature models based on requirements clustering pp 31–40, (2005) https://doi.org/10.1109/re.2005.9
Lee, D.: How to write a bug report that will make your engineers love you. (2016) Retrieved May 30, 2019 from https://testlio.com/blog/the-ideal-bug-report
Lormans, M., Van Deursen, A.: Can LSI help reconstructing requirements traceability in design and test?. In: Proceedings of the European Conference on Software Maintenance and Reengineering, CSMR pp. 47–56, (2006) https://doi.org/10.1109/CSMR.2006.13
Lucia, A.D., Penta, M.D., Oliveto, R., Panichella, A., Panichella, S.: Improving IR-based traceability recovery using smoothing filters. In: IEEE International Conference on Program Comprehension pp. 21–30, (2011) https://doi.org/10.1109/ICPC.2011.34
Lucia, A.D., Di, M., Oliveto, R., Panichella, A., Panichella, S.: Applying a smoothing filter to improve IR-based traceability recovery processes?: an empirical investigation q. Inf. Softw. Technol. 55, 741–754 (2013)
Article Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: An introduction to information retrieval. Cambridge University Press, Cambridge (2009)
MATH Google Scholar
Mäntylä, M.V., Khomh, F., Adams, B., Engström, E., Petersen, K.: On rapid releases and software testing. Presented at the (2013). https://doi.org/10.1109/ICSM.2013.13
Merten, T., Krämer, D., Mager, B., Schell, P., Bürsner, S., Paech, B.: Do Information Retrieval Algorithms for Automated Traceability Perform Effectively on Issue Tracking System Data? Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 9619, 45–62 (2016). https://doi.org/10.1007/978-3-319-30282-9_4
Mikolov, T., Chen, K., Corrado, G., Dean, J. Efficient Estimation of Word Representations in Vector Space (2013) arXiv:1301.3781
Mills, C.: Automating traceability link recovery through classification. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering - ESEC/FSE 2017, ACM Press, New York, New York, USA, pp. 1068–1070, (2017) https://doi.org/10.1145/3106237.3121280
Minelli, R., Lanza, M.: Software analytics for mobile applications–insights lessons learned. In: 17th European Conference on Software Maintenance and Reengineering, pp. 144–153 (2013)
Oliveto, R., Gethers, M., Poshyvanyk, D., Lucia, A.D., De Lucia, A.: On the equivalence of information retrieval methods for automated traceability link recovery.In: IEEE International Conference on Program Comprehension pp 68–71, (2010) https://doi.org/10.1109/ICPC.2010.20
Panichella, A., Dit, B., Oliveto, R., Di Penta, M., Poshynanyk, D., De Lucia, A.: How to effectively use topic models for software engineering tasks? An approach based on Genetic Algorithms. In: Proceedings - International Conference on Software Engineering pp. 522–531, (2013) https://doi.org/10.1109/ICSE.2013.6606598
Passos, L., Czarnecki, K., Apel, S., Wa̧sowski, A., Kästner, C., Guo, J.: Feature-oriented software evolution p 1, (2013) https://doi.org/10.1145/2430502.2430526
Pennington, J., Socher, R., Manning, C.: Glove: Global Vectors for Word Representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 1532–1543, (2014) https://doi.org/10.3115/v1/D14-1162, arXiv:1504.06654
Robertson, S., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond, vol 3. (2009) https://doi.org/10.1561/1500000019
Sabev, P., Grigorova, K.: Manual to automated testing: An effort-based approach for determining the priority of software test automation (2015)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975). https://doi.org/10.1145/361219.361220
Article MATH Google Scholar
Sommerville, I.: Software engineering, 9th edn. Addison-Wesley, Boston (2010). https://doi.org/10.1111/j.1365-2362.2005.01463.x
Book MATH Google Scholar
Ng, A.Y., Jordan, M.: On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. Adv. Neural Inf. Process. Sys. 2, 841–848 (2002)
Google Scholar
Yadla, S., Hayes, J.H., Dekhtyar, A.: Tracing requirements to defect reports: an application of information retrieval techniques. Innov. Syst. Softw. Eng. 1(2), 116–124 (2005). https://doi.org/10.1007/s11334-005-0011-3
Article Google Scholar
Zimmermann, T., Premraj, R., Bettenburg, N., Just, S., Schroter, A., Weiss, C., Schröter, A., Weiss, C.: What makes a good bug report? IEEE Trans. Softw. Eng. 36(5), 618–643 (2010). https://doi.org/10.1109/TSE.2010.63
Article Google Scholar

Download references

Acknowledgements

We would like to thank the Brazilian agency CAPES for partially funding this research. We also would like to thank all the volunteers that kindly participated of our study. Last but not least, thanks for the anonymous reviewers whose contributions were decisive for the improving preliminary versions of the article.

Funding

Brazilian federal agency CAPES of Ministry of Education (MEC/Brazil).

Author information

Authors and Affiliations

Computing and Systems Department (DSC), Federal University of Campina Grande (UFCG), Campina Grande, Paraiba, Brazil
Guilherme Gadelha, Franklin Ramalho & Tiago Massoni

Authors

Guilherme Gadelha
View author publications
You can also search for this author in PubMed Google Scholar
Franklin Ramalho
View author publications
You can also search for this author in PubMed Google Scholar
Tiago Massoni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guilherme Gadelha.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Code availability

All the code is available in the same repository as the data and material: https://github.com/guilhermemg/trace-links-tc-br.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gadelha, G., Ramalho, F. & Massoni, T. Traceability recovery between bug reports and test cases-a Mozilla Firefox case study. Autom Softw Eng 28, 8 (2021). https://doi.org/10.1007/s10515-021-00287-w

Download citation

Received: 12 October 2020
Accepted: 24 June 2021
Published: 07 July 2021
DOI: https://doi.org/10.1007/s10515-021-00287-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Traceability recovery between bug reports and test cases-a Mozilla Firefox case study

Abstract

Access this article

Similar content being viewed by others

Data collection and quality challenges in deep learning: a data-centric AI perspective

Investigating large language models capabilities for automatic code repair in Python

Improving Android app exploratory testing with UI test cases using code change analysis

Data availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Code availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Traceability recovery between bug reports and test cases-a Mozilla Firefox case study

Abstract

Access this article

Similar content being viewed by others

Data collection and quality challenges in deep learning: a data-centric AI perspective

Investigating large language models capabilities for automatic code repair in Python

Improving Android app exploratory testing with UI test cases using code change analysis

Data availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Code availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation