Skip to main content
Log in

Assessing IR-based traceability recovery tools through controlled experiments

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

We report the results of a controlled experiment and a replication performed with different subjects, in which we assessed the usefulness of an Information Retrieval-based traceability recovery tool during the traceability link identification process. The main result achieved in the two experiments is that the use of a traceability recovery tool significantly reduces the time spent by the software engineer with respect to manual tracing. Replication with different subjects allowed us to investigate if subjects’ experience and ability play any role in the traceability link identification process. In particular, we made some observations concerning the retrieval accuracy achieved by the software engineers with and without the tool support and with different levels of experience and ability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. We decided to select such a threshold as it represents the median of the possible grades for any exam to be passed by a student in an Italian University (min 18/30 and max 30/30).

  2. We decided to use such a threshold to discriminate between low and high thresholds as it represents the median of the possible thresholds used to cut the ranked list.

References

  • Antoniol G, Casazza G, Cimitile A (2000a) Traceability recovery by modelling programmer behaviour. In: Proceedings of 7th working conference on reverse engineering, vol 240–247. IEEE CS, Brisbane

    Google Scholar 

  • Antoniol G, Canfora G, Casazza G, De Lucia A (2000b) Identifying the starting impact set of a maintenance request. In: Proceedings of 4th European conference on software maintenance and reengineering. IEEE CS, Zurich, pp 227–230

    Chapter  Google Scholar 

  • Antoniol G, Canfora G, Casazza G, De Lucia A, Merlo E (2002) Recovering traceability links between code and documentation. IEEE Trans Softw Eng 28(10):970–983

    Article  Google Scholar 

  • Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley, Reading

    Google Scholar 

  • Basili VR, Selby RW, Hutchens DH (1986) Experimentation in software engineering. IEEE Trans Softw Eng 12(7):758–773

    Google Scholar 

  • Bruegge B, De Lucia A, Fasano F, Tortora G (2006) Supporting distributed software development with fine-grained artefact management. In: Proceedings of 2nd international conference on global software engineering. Florianopolis, 16–19 October 2006, pp 213–222

  • Cleland-Huang J, Settimi R, Duan C, Zou X (2005) Utilizing supporting evidence to improve dynamic requirements traceability. In: Proceedings of 13th IEEE international requirements engineering conference. IEEE CS, Paris, pp 135–144

    Chapter  Google Scholar 

  • Conover WJ (1998) Practical nonparametric statistics, 3rd edn. Wiley, New York

    Google Scholar 

  • Cullum JK, Willoughby RA (1998) Lanczos algorithms for large symmetric eigenvalue computations, vol 1, chapter real rectangular matrices. Birkhauser, Boston

    Google Scholar 

  • De Lucia A, Oliveto R, Sgueglia P (2006a) Incremental approach and user feedbacks: a silver bullet for traceability recovery. In: Proceedings of 22nd IEEE international conference on software maintenance. IEEE CS, Philadelphia, pp 299–309

    Chapter  Google Scholar 

  • De Lucia A, Di Penta M, Oliveto R, Zurolo F (2006b) Improving comprehensibility of source code via traceability information: a controlled experiment. In: Proceedings of 14th IEEE international conference on program comprehension. IEEE CS, Athens, pp 317–326

    Chapter  Google Scholar 

  • De Lucia A, Fasano F, Francese R, Tortora G (2004) ADAMS: an artefact-based process support system. In: Proceedings of 16th international conference on software engineering and knowledge engineering. KSI, Banff, pp 31–36

    Google Scholar 

  • De Lucia A, Oliveto R, Tortora G (2007a) Recovering traceability links using information retrieval tools: a controlled experiment. In: Proceedings of international symposium on grand challenges in traceability. ACM, Lexington, pp 46–55

    Google Scholar 

  • De Lucia A, Fasano F, Oliveto R, Tortora G (2007b) Recovering traceability links in software artefact management systems using information retrieval methods. ACM Trans Softw Eng Methodol 16(4):13

    Article  Google Scholar 

  • De Lucia A, Oliveto R, Tortora G (2008) ADAMS re-trace: traceability link recovery via latent semantic indexing. In: Proceedings of 30th IEEE/ACM international conference on software engineering. ACM, Leipzig, pp 839–842

    Google Scholar 

  • Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407

    Article  Google Scholar 

  • Devore JL, Farnum N (1999) Applied statistics for engineers and scientists. Brooks/Cole, Duxbury

    Google Scholar 

  • Di Penta M, Gradara S, Antoniol G (2002) Traceability recovery in RAD software systems. In: Proceedings of 10th international workshop in program comprehension. IEEE CS, Paris, pp 207–216

    Chapter  Google Scholar 

  • Domges R, Pohl K (1998) Adapting traceability environments to project specific needs. Commun ACM 41(12):55–62

    Article  Google Scholar 

  • Duan C, Cleland-Huang J (2007) Clustering support for automated tracing. In: Proceedings of 22nd IEEE/ACM international conference on automated software engineering. ACM, Atlanta, pp 244–253

    Google Scholar 

  • Dumais ST (1991) Improving the retrieval of information from external sources. Behav Res Meth Instrum Comput 23:229–236

    Google Scholar 

  • Dumais ST (1993) LSI meets TREC: a status report. In: Proceedings of the first text retrieval conference (TREC-1). NIST Special Publication, pp 137–152

  • Gotel O, Finkelstein A (1994) An analysis of the requirements traceability problem. In: Proceedings of 1st international conference on requirements engineering. IEEE CS, Colorado Springs, pp 94–101

    Chapter  Google Scholar 

  • Harman D (1992) Information retrieval: data structures and algorithms, chapter ranking algorithms. Prentice-Hall, Englewood Cliffs, pp 363–392

    Google Scholar 

  • Hayes JH, Dekhtyar A, Osborne J (2003) Improving requirements tracing via information retrieval. In: Proceedings of 11th IEEE international requirements engineering conference. IEEE CS, Monterey, pp 138–147

    Google Scholar 

  • Hayes JH, Dekhtyar A, Sundaram SK (2006) Advancing candidate link generation for requirements tracing: the study of methods. IEEE Trans Softw Eng 32(1):4–19

    Article  Google Scholar 

  • Juristo N, Moreno A (2001) Basics of software engineering experimentation. Kluwer Academic, Dordrecht

    MATH  Google Scholar 

  • Leffingwell D (1997) Calculating your return on investment from more effective requirements management. Technical report, Rational Software Corporation

  • Lin J, Lin CC, Cleland-Huang J, Settimi R, Amaya J, Bedford G, Berenbach B, Khadra OB, Duan C, Zou X (2006) Poirot: a distributed tool supporting enterprise-wide automated traceability. In: Proceedings of 14th IEEE international requirements engineering conference. IEEE CS, Minneapolis, pp 356–357

    Google Scholar 

  • Lormans M, van Deursen A (2006) Can LSI help reconstructing requirements traceability in design and test? In: Proceedings of 10th European conference on software maintenance and reengineering. IEEE CS, Bari, pp 45–54

    Google Scholar 

  • Lormans M, Gross H, van Deursen A, van Solingen R, Stehouwer A (2006) Monitoring requirements coverage using reconstructed views: an industrial case study. In: Proceedings of 13th working conference on reverse enginering. IEEE CS, Benevento, pp 275–284

    Chapter  Google Scholar 

  • Marcus A, Maletic JI (2003) Recovering documentation-to-source-code traceability links using latent semantic indexing. In: Proceedings of 25th international conference on software engineering. IEEE CS, Portland, pp 125–135

    Chapter  Google Scholar 

  • Marcus A, Xie X, Poshyvanyk D (2005) When and how to visualize traceability links? In: Proceedings of 3rd international workshop on traceability in emerging forms of software engineering. ACM, Long Beach, pp 56–61

    Chapter  Google Scholar 

  • Oliveto R (2008) Traceability management meets information retrieval methods: strengths and limitations. PhD thesis, University of Salerno, March. www.sesa.dmi.unisa.it/thesis/oliveto.pdf

  • Oppenheim AN (1992) Questionnaire design, interviewing and attitude measurement. Pinter, London

    Google Scholar 

  • Pfleeger SL, Menezes W (2000) Marketing technology to software practitioners. IEEE Softw 17(1):27–33

    Article  Google Scholar 

  • Pinhero FAC, Goguen JA (1996) An object-oriented tool for tracing requirements. IEEE Softw 13(2):52–64

    Article  Google Scholar 

  • Porter MF (1980) An algorithm for suffix stripping. Program 14(3):130–137

    Google Scholar 

  • Ricca F, Di Penta M, Torchiano M, Tonella P, Ceccato M (2007) The role of experience and ability in comprehension tasks supported by UML stereotypes. In: Proceedings of 29th international conference on software engineering. IEEE Computer Society, Minneapolis, pp 375–384

    Google Scholar 

  • Settimi R, Cleland-Huang J, Ben Khadra O, Mody J, Lukasik W, De Palma C (2004) Supporting software evolution through dynamically retrieving traces to UML artifacts. In: Proceedings of 7th IEEE international workshop on principles of software evolution. IEEE CS, Kyoto, pp 49–54

    Chapter  Google Scholar 

  • Wohlin C, Runeson P, Host M, Ohlsson MC, Regnell B, Wesslen A (2000) Experimentation in software engineering—an introduction. Kluwer, Deventer

    MATH  Google Scholar 

  • Yadla S, Huffman Hayes J, Dekhtyar A (2005) Tracing requirements to defect reports: an application of information retrieval techniques. Innov Syst Softw Eng NASA J 1(2):116–124

    Article  Google Scholar 

  • Zou X, Settimi R, Cleland-Huang J (2007) Term-based enhancement factors for improving automated requirement trace retrieval. In: Proceedings of international symposium on grand challenges in traceability. ACM, Lexington, pp 40–45

    Google Scholar 

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their detailed, constructive, and thoughtful comments that helped us to improve the presentation of the results in this paper. We are very grateful to Dr. Massimiliano Di Penta of University of Sannio, Italy, for his constructive comments that helped us to improve the presentation of the experimental results in this paper. Special thanks are also due to the students who were involved in the experiment as subjects. The work described in this paper is supported by the project METAMORPHOS (MEthods and Tools for migrAting software systeMs towards web and service Oriented aRchitectures: exPerimental evaluation, usability, and tecHnOlogy tranSfer), funded by MiUR (Ministero dell’Università e della Ricerca) under grant PRIN-2006-2006098097.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rocco Oliveto.

Additional information

Editors: Tim Menzies and Letha Etzkorn

Appendix

Appendix

1.1 A.1 Introduction

In this appendix we report examples of all types of artefacts used in the experimentation, i.e., use cases, interaction diagrams, test cases, and code classes. It is worth noting that the language of the artefacts used in the experiments is Italian. For the sake of example we translate the original artefacts in English.

Note that an IR-based traceability recovery process indexes all the artefacts in the repository by extracting information about the occurrences of terms (words) within them. The extraction of the terms is preceded by a text normalisation phase that (i) prunes out white spaces and most non-textual tokens from the text (i.e., operators, special symbols, some numbers, etc.) and (ii) splits into separate words source code identifiers composed of two or more words (i.e., TelephoneNumber and telephone_number are split into the words telephone and number).

Moreover, during the indexing process we also use a stop word function and a stop word list to discard common words (i.e., articles, adverbs, etc) that are not useful to capture the semantics of the artefact content. The stop word function prunes out all the words having a length less than a fixed threshold, while the stop word list is used to cut-off all the words contained in a given word list. A more complicated artefact pre-processing is represented by morphological analysis, like stemming (Porter 1980), that removes suffixes of words to extract their stems. For example, “working”, “works”, and “worker” all become the stem work. The effects of stemming for LSI are variable, sometimes resulting in small improvements, sometimes in small decrements in performances (Dumais 1991, 1993). For this reason, we do not perform any morphological analysis of the software artefacts.

Table 11 shows the use case describing the functionality “Insert laboratory”, while Fig. 8 shows the UML collaboration diagram that describes the same functionality. Note that in this case only the description of the UML diagram is indexed (see Table 12).

Table 11 Example of use case
Fig. 8
figure 8

Example of interaction diagram

Table 12 Description of the interaction diagram shown in Fig. 8

Table 13 reports an example of a code class. Finally, Table 14 reports an example of a test case. As we can see test case were actually execution scenarios, so very close to use cases and sequence diagrams.

Table 13 Example of source code
Table 14 Example of test case

Rights and permissions

Reprints and permissions

About this article

Cite this article

De Lucia, A., Oliveto, R. & Tortora, G. Assessing IR-based traceability recovery tools through controlled experiments. Empir Software Eng 14, 57–92 (2009). https://doi.org/10.1007/s10664-008-9090-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-008-9090-8

Keywords

Navigation