Abstract
We report the results of a controlled experiment and a replication performed with different subjects, in which we assessed the usefulness of an Information Retrieval-based traceability recovery tool during the traceability link identification process. The main result achieved in the two experiments is that the use of a traceability recovery tool significantly reduces the time spent by the software engineer with respect to manual tracing. Replication with different subjects allowed us to investigate if subjects’ experience and ability play any role in the traceability link identification process. In particular, we made some observations concerning the retrieval accuracy achieved by the software engineers with and without the tool support and with different levels of experience and ability.
Similar content being viewed by others
Notes
We decided to select such a threshold as it represents the median of the possible grades for any exam to be passed by a student in an Italian University (min 18/30 and max 30/30).
We decided to use such a threshold to discriminate between low and high thresholds as it represents the median of the possible thresholds used to cut the ranked list.
References
Antoniol G, Casazza G, Cimitile A (2000a) Traceability recovery by modelling programmer behaviour. In: Proceedings of 7th working conference on reverse engineering, vol 240–247. IEEE CS, Brisbane
Antoniol G, Canfora G, Casazza G, De Lucia A (2000b) Identifying the starting impact set of a maintenance request. In: Proceedings of 4th European conference on software maintenance and reengineering. IEEE CS, Zurich, pp 227–230
Antoniol G, Canfora G, Casazza G, De Lucia A, Merlo E (2002) Recovering traceability links between code and documentation. IEEE Trans Softw Eng 28(10):970–983
Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley, Reading
Basili VR, Selby RW, Hutchens DH (1986) Experimentation in software engineering. IEEE Trans Softw Eng 12(7):758–773
Bruegge B, De Lucia A, Fasano F, Tortora G (2006) Supporting distributed software development with fine-grained artefact management. In: Proceedings of 2nd international conference on global software engineering. Florianopolis, 16–19 October 2006, pp 213–222
Cleland-Huang J, Settimi R, Duan C, Zou X (2005) Utilizing supporting evidence to improve dynamic requirements traceability. In: Proceedings of 13th IEEE international requirements engineering conference. IEEE CS, Paris, pp 135–144
Conover WJ (1998) Practical nonparametric statistics, 3rd edn. Wiley, New York
Cullum JK, Willoughby RA (1998) Lanczos algorithms for large symmetric eigenvalue computations, vol 1, chapter real rectangular matrices. Birkhauser, Boston
De Lucia A, Oliveto R, Sgueglia P (2006a) Incremental approach and user feedbacks: a silver bullet for traceability recovery. In: Proceedings of 22nd IEEE international conference on software maintenance. IEEE CS, Philadelphia, pp 299–309
De Lucia A, Di Penta M, Oliveto R, Zurolo F (2006b) Improving comprehensibility of source code via traceability information: a controlled experiment. In: Proceedings of 14th IEEE international conference on program comprehension. IEEE CS, Athens, pp 317–326
De Lucia A, Fasano F, Francese R, Tortora G (2004) ADAMS: an artefact-based process support system. In: Proceedings of 16th international conference on software engineering and knowledge engineering. KSI, Banff, pp 31–36
De Lucia A, Oliveto R, Tortora G (2007a) Recovering traceability links using information retrieval tools: a controlled experiment. In: Proceedings of international symposium on grand challenges in traceability. ACM, Lexington, pp 46–55
De Lucia A, Fasano F, Oliveto R, Tortora G (2007b) Recovering traceability links in software artefact management systems using information retrieval methods. ACM Trans Softw Eng Methodol 16(4):13
De Lucia A, Oliveto R, Tortora G (2008) ADAMS re-trace: traceability link recovery via latent semantic indexing. In: Proceedings of 30th IEEE/ACM international conference on software engineering. ACM, Leipzig, pp 839–842
Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407
Devore JL, Farnum N (1999) Applied statistics for engineers and scientists. Brooks/Cole, Duxbury
Di Penta M, Gradara S, Antoniol G (2002) Traceability recovery in RAD software systems. In: Proceedings of 10th international workshop in program comprehension. IEEE CS, Paris, pp 207–216
Domges R, Pohl K (1998) Adapting traceability environments to project specific needs. Commun ACM 41(12):55–62
Duan C, Cleland-Huang J (2007) Clustering support for automated tracing. In: Proceedings of 22nd IEEE/ACM international conference on automated software engineering. ACM, Atlanta, pp 244–253
Dumais ST (1991) Improving the retrieval of information from external sources. Behav Res Meth Instrum Comput 23:229–236
Dumais ST (1993) LSI meets TREC: a status report. In: Proceedings of the first text retrieval conference (TREC-1). NIST Special Publication, pp 137–152
Gotel O, Finkelstein A (1994) An analysis of the requirements traceability problem. In: Proceedings of 1st international conference on requirements engineering. IEEE CS, Colorado Springs, pp 94–101
Harman D (1992) Information retrieval: data structures and algorithms, chapter ranking algorithms. Prentice-Hall, Englewood Cliffs, pp 363–392
Hayes JH, Dekhtyar A, Osborne J (2003) Improving requirements tracing via information retrieval. In: Proceedings of 11th IEEE international requirements engineering conference. IEEE CS, Monterey, pp 138–147
Hayes JH, Dekhtyar A, Sundaram SK (2006) Advancing candidate link generation for requirements tracing: the study of methods. IEEE Trans Softw Eng 32(1):4–19
Juristo N, Moreno A (2001) Basics of software engineering experimentation. Kluwer Academic, Dordrecht
Leffingwell D (1997) Calculating your return on investment from more effective requirements management. Technical report, Rational Software Corporation
Lin J, Lin CC, Cleland-Huang J, Settimi R, Amaya J, Bedford G, Berenbach B, Khadra OB, Duan C, Zou X (2006) Poirot: a distributed tool supporting enterprise-wide automated traceability. In: Proceedings of 14th IEEE international requirements engineering conference. IEEE CS, Minneapolis, pp 356–357
Lormans M, van Deursen A (2006) Can LSI help reconstructing requirements traceability in design and test? In: Proceedings of 10th European conference on software maintenance and reengineering. IEEE CS, Bari, pp 45–54
Lormans M, Gross H, van Deursen A, van Solingen R, Stehouwer A (2006) Monitoring requirements coverage using reconstructed views: an industrial case study. In: Proceedings of 13th working conference on reverse enginering. IEEE CS, Benevento, pp 275–284
Marcus A, Maletic JI (2003) Recovering documentation-to-source-code traceability links using latent semantic indexing. In: Proceedings of 25th international conference on software engineering. IEEE CS, Portland, pp 125–135
Marcus A, Xie X, Poshyvanyk D (2005) When and how to visualize traceability links? In: Proceedings of 3rd international workshop on traceability in emerging forms of software engineering. ACM, Long Beach, pp 56–61
Oliveto R (2008) Traceability management meets information retrieval methods: strengths and limitations. PhD thesis, University of Salerno, March. www.sesa.dmi.unisa.it/thesis/oliveto.pdf
Oppenheim AN (1992) Questionnaire design, interviewing and attitude measurement. Pinter, London
Pfleeger SL, Menezes W (2000) Marketing technology to software practitioners. IEEE Softw 17(1):27–33
Pinhero FAC, Goguen JA (1996) An object-oriented tool for tracing requirements. IEEE Softw 13(2):52–64
Porter MF (1980) An algorithm for suffix stripping. Program 14(3):130–137
Ricca F, Di Penta M, Torchiano M, Tonella P, Ceccato M (2007) The role of experience and ability in comprehension tasks supported by UML stereotypes. In: Proceedings of 29th international conference on software engineering. IEEE Computer Society, Minneapolis, pp 375–384
Settimi R, Cleland-Huang J, Ben Khadra O, Mody J, Lukasik W, De Palma C (2004) Supporting software evolution through dynamically retrieving traces to UML artifacts. In: Proceedings of 7th IEEE international workshop on principles of software evolution. IEEE CS, Kyoto, pp 49–54
Wohlin C, Runeson P, Host M, Ohlsson MC, Regnell B, Wesslen A (2000) Experimentation in software engineering—an introduction. Kluwer, Deventer
Yadla S, Huffman Hayes J, Dekhtyar A (2005) Tracing requirements to defect reports: an application of information retrieval techniques. Innov Syst Softw Eng NASA J 1(2):116–124
Zou X, Settimi R, Cleland-Huang J (2007) Term-based enhancement factors for improving automated requirement trace retrieval. In: Proceedings of international symposium on grand challenges in traceability. ACM, Lexington, pp 40–45
Acknowledgements
We would like to thank the anonymous reviewers for their detailed, constructive, and thoughtful comments that helped us to improve the presentation of the results in this paper. We are very grateful to Dr. Massimiliano Di Penta of University of Sannio, Italy, for his constructive comments that helped us to improve the presentation of the experimental results in this paper. Special thanks are also due to the students who were involved in the experiment as subjects. The work described in this paper is supported by the project METAMORPHOS (MEthods and Tools for migrAting software systeMs towards web and service Oriented aRchitectures: exPerimental evaluation, usability, and tecHnOlogy tranSfer), funded by MiUR (Ministero dell’Università e della Ricerca) under grant PRIN-2006-2006098097.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: Tim Menzies and Letha Etzkorn
Appendix
Appendix
1.1 A.1 Introduction
In this appendix we report examples of all types of artefacts used in the experimentation, i.e., use cases, interaction diagrams, test cases, and code classes. It is worth noting that the language of the artefacts used in the experiments is Italian. For the sake of example we translate the original artefacts in English.
Note that an IR-based traceability recovery process indexes all the artefacts in the repository by extracting information about the occurrences of terms (words) within them. The extraction of the terms is preceded by a text normalisation phase that (i) prunes out white spaces and most non-textual tokens from the text (i.e., operators, special symbols, some numbers, etc.) and (ii) splits into separate words source code identifiers composed of two or more words (i.e., TelephoneNumber and telephone_number are split into the words telephone and number).
Moreover, during the indexing process we also use a stop word function and a stop word list to discard common words (i.e., articles, adverbs, etc) that are not useful to capture the semantics of the artefact content. The stop word function prunes out all the words having a length less than a fixed threshold, while the stop word list is used to cut-off all the words contained in a given word list. A more complicated artefact pre-processing is represented by morphological analysis, like stemming (Porter 1980), that removes suffixes of words to extract their stems. For example, “working”, “works”, and “worker” all become the stem work. The effects of stemming for LSI are variable, sometimes resulting in small improvements, sometimes in small decrements in performances (Dumais 1991, 1993). For this reason, we do not perform any morphological analysis of the software artefacts.
Table 11 shows the use case describing the functionality “Insert laboratory”, while Fig. 8 shows the UML collaboration diagram that describes the same functionality. Note that in this case only the description of the UML diagram is indexed (see Table 12).
Table 13 reports an example of a code class. Finally, Table 14 reports an example of a test case. As we can see test case were actually execution scenarios, so very close to use cases and sequence diagrams.
Rights and permissions
About this article
Cite this article
De Lucia, A., Oliveto, R. & Tortora, G. Assessing IR-based traceability recovery tools through controlled experiments. Empir Software Eng 14, 57–92 (2009). https://doi.org/10.1007/s10664-008-9090-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-008-9090-8