Advertisement

Information Retrieval Methods for Automated Traceability Recovery

  • Andrea De Lucia
  • Andrian Marcus
  • Rocco OlivetoEmail author
  • Denys Poshyvanyk
Chapter

Abstract

The potential benefits of traceability are well known and documented, as well as the impracticability of recovering and maintaining traceability links manually. Indeed, the manual management of traceability information is an error prone and time consuming task. Consequently, despite the advantages that can be gained, explicit traceability is rarely established unless there is a regulatory reason for doing so. Extensive efforts have been brought forth to improve the explicit connection of software artifacts in the software engineering community (both research and commercial). Promising results have been achieved using Information Retrieval (IR) techniques for traceability recovery. IR-based traceability recovery methods propose a list of candidate traceability links based on the similarity between the text contained in the software artifacts. Software artifacts have different structures and the common element among many of them is the textual data, which most often captures the informal semantics of artifacts. For example, source code includes large volume of textual data in the form of comments and identifiers. In consequence, IR-based approaches are very well suited to address the traceability recovery problem. The conjecture is that artifacts with high textual similarity are good candidates to be traced to each other since they share several concepts. In this chapter we overview a general process of using IR-based methods for traceability link recovery and overview some of them in a greater detail: probabilistic, vector space, and Latent Semantic Indexing models. Finally, we discuss common approaches to measuring the performance of IR-based traceability recovery methods and the latest advances in techniques for the analysis of candidate links.

Keywords

Information Retrieval Traceability Link Software Artifact Traceability Recovery Candidate Link 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgments

We would like to thank the anonymous reviewers for their detailed, constructive, and thoughtful comments that helped us to improve the presentation of the results in this chapter.

References

  1. Abadi, A., Nisenson, M., Simionovici, Y.: A traceability technique for specifications. In: Proceedings of 16th IEEE International Conference on Program Comprehension, pp. 103–112. IEEE CS Press, Amsterdam, The Netherlands (2008)Google Scholar
  2. Antoniol, G., Canfora, G., Casazza, G., De Lucia, A.: Information retrieval models for recovering traceability links between code and documentation. In: Proceedings of 16th IEEE International Conference on SoftwareMaintenance, pp. 40–51. IEEE CS Press, San Jose, CA (2000a)Google Scholar
  3. Antoniol, G., Canfora, G., Casazza, G., De Lucia, A., Merlo, E.: Tracing object-oriented code into functional requirements. In: Proceedings of 8th IEEE International Workshop on Program Comprehension, pp. 79–87. IEEE CS Press, Limerick, Ireland (2000b)Google Scholar
  4. Antoniol, G., Canfora, G., Casazza, G., De Lucia, A., Merlo, E.: Recovering traceability links between code and documentation. IEEE Trans. Softw. Eng. 28(10), 970–983 (2002)CrossRefGoogle Scholar
  5. Antoniol, G., Canfora, G., De Lucia, A., Merlo, E.: Recovering code to documentation links in OO systems. In: Proceedings of 6th Working Conference on Reverse Engineering, pp. 136–144. IEEE CS Press, Atlanta, GA (1999)Google Scholar
  6. Antoniol, G., Casazza, G., Cimitile, A.: Traceability recovery by modelling programmer behaviour. In: Proceedings of 7th Working Conference on Reverse Engineering, vol. 240–247. IEEE CS Press, Brisbane, QLD (2000c)Google Scholar
  7. Antoniol, G., Guéhéneuc, Y.-G., Merlo, E., Tonella, P.: Mining the Lexicon used by programmers during sofware evolution. In: Proceedings of the 23rd IEEE International Conference on Software Maintenance, pp. 14–23. IEEE Press, Paris, France (2007)Google Scholar
  8. Asuncion, Hazeline U., Asuncion, A., Taylor, Richard N.: Software traceability with topic modeling. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, pp. 95–104. ACM Press, Cape Town, South Africa (2010)Google Scholar
  9. Bacchelli, A., Lanza, M., Robbes, R.: Linking e-mails and source code artifacts. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, vol. 1, pp. 375–384. ICSE, Cape Town, South Africa (2010)Google Scholar
  10. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading, MA (1999)Google Scholar
  11. Bain, L., Engelhardt, M.: Introduction to Probability and Mathematical Statistics. Duxbury Press, Pacific Grove, CA (1992)zbMATHGoogle Scholar
  12. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  13. Capobianco, G., De Lucia, A., Oliveto, R., Panichella, A., Panichella, S.: On the role of the nouns in IR-based traceability recovery. In: Proceedings of 17th IEEE International Conference on Program Comprehension. Vancouver, British Columbia, Canada (2009a)Google Scholar
  14. Capobianco, G., De Lucia, A., Oliveto, R., Panichella, A., Panichella, S.: Traceability recovery using numerical analysis. In: Proceedings of 16th Working Conference on Reverse Engineering. IEEE CS Press, Lille, France (2009b)Google Scholar
  15. Cleland-Huang, J., Czauderna, A., Gibiec, M., Emenecker, J.: A machine learning approach for tracing regulatory codes to product specific requirements. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, pp. 155–164. ICSE, Cape Town, South Africa (2010)Google Scholar
  16. Cleland-Huang, J., Settimi, R., Duan, C., Zou, X.: Utilizing supporting evidence to improve dynamic requirements traceability. In: Proceedings of 13th IEEE International Requirements Engineering Conference, pp. 135–144. IEEE CS Press, Paris, France (2005)Google Scholar
  17. Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley-Interscience, New York, NY (1991)zbMATHCrossRefGoogle Scholar
  18. Cullum, J.K., Willoughby, R.A.: Lanczos Algorithms for Large Symmetric Eigenvalue Computations, vol. 1, chapter Real rectangular matrices. Birkhauser, Boston, MA (1998)zbMATHGoogle Scholar
  19. De Lucia, A., Fasano, F., Oliveto, R., Tortora, G.: Enhancing an Artifact management system with traceability recovery features. In: Proceedings of 20th IEEE International Conference on Software Maintenance, pp. 306–315. IEEE CS Press, Chicago, IL (2004)Google Scholar
  20. De Lucia, A., Fasano, F., Oliveto, R., Tortora, G.: Can information retrieval effectively support traceability link recovery? In: Proceedings of 14th IEEE International Conference on Program Comprehension, pp. 307–316. IEEE CS Press, Athens, Greece (2006a)Google Scholar
  21. De Lucia, A., Fasano, F., Oliveto, R., Tortora, G.: Recovering traceability link in software Artifacts management systems using information retrieval methods. ACM Trans. Softw. Eng. Methodol. 16(4), Article 13 (2007)Google Scholar
  22. De Lucia, A., Oliveto, R., Sgueglia, P.: Incremental approach and user feedbacks: A Silver Bullet for traceability recovery. In: Proceedings of 22nd IEEE International Conference on Software Maintenance, pp. 299–309. Sheraton Society Hill, Philadelphia, PA. IEEE CS Press (2006b)Google Scholar
  23. De Lucia, A., Oliveto, R., Tortora, G.: IR-based traceability recovery processes: An empirical comparison of “One-Shot” and incremental processes. In: Proceedings of 23rd International Conference Automated Software Engineering, pp. 39–48. ACM Press, L’Aquila, Italy (2008)Google Scholar
  24. De Lucia, A., Oliveto, R., Tortora, G.: Assessing IR-based traceability recovery tools through controlled experiments. Empirical Softw. Eng. 14(1), 57–93 (2009a)CrossRefGoogle Scholar
  25. De Lucia, A., Oliveto, R., Tortora, G.: The role of the coverage analysis in traceability recovery process: A controlled experiment. In: Proceedings of 25th International Conference on Software Maintenance. IEEE Press, Edmonton, Canada (2009b)Google Scholar
  26. De Mori, R.: Spoken Dialogues with Computers. Academic, London (1998)Google Scholar
  27. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Amer. Soc. Informat. Sci. 41(6), 391–407 (1990)CrossRefGoogle Scholar
  28. Dekhtyar, A., Hayes, J.H., Menzies, T.: Text is software too. In: Proceedings of Mining of Software Repositories Workshop, pp. 22–26. Edinburgh, Scotland (2004)Google Scholar
  29. Di Penta, M., Gradara, S., Antoniol, G.: Traceability recovery in RAD software systems. In: Proceedings of 10th International Workshop in Program Comprehension, pp. 207–216. IEEE CS Press, Paris, France (2002)Google Scholar
  30. Dumais, S.T.: Improving the retrieval of information from external sources. Behav. Res. Meth. Instrum. Comput. 23, 229–236 (1991)CrossRefGoogle Scholar
  31. Enslen, E., Hill, E., Pollock, L.L., Vijay-Shanker, K.: Mining source code to automatically split identifiers for software analysis. In: Proceedings of the 6th International Working Conference on Mining Software Repositories, pp. 71–80. Vancouver, British Columbia, Canada (2009)Google Scholar
  32. Gibiec, M., Czauderna, A., Cleland-Huang, J.: Towards mining replacement queries for hard-to-retrieve traces. In: Proceedings of the 25th IEEE/ACM International Conference on Automated Software Engineering, pp. 245–254. ACM Press, Antwerp, Belgium (2010)Google Scholar
  33. Haiduc, S., Marcus, A.: On the use of domain terms in source code. In: Proceedings of 16th IEEE International Conference on Program Comprehension, pp. 113–122. IEEE CS Press, Amsterdam, The Netherlands (2008)Google Scholar
  34. Harman, D.K.: Overview of the first Text REtrieval Conference (TREC-1). In: Proceedings of the First Text REtrieval Conference (TREC-1), pp. 1–20. NIST Special Publication, Gaithersburg, MD (1993)Google Scholar
  35. Hayes, J.H., Dekhtyar, A., Osborne, J.: Improving requirements tracing via information retrieval. In: Proceedings of 11th IEEE International Requirements Engineering Conference, pp. 138–147. IEEE CS Press, Monterey, CA (2003)Google Scholar
  36. Hayes, J.H., Dekhtyar, A., Sundaram, S.K.: Advancing candidate link generation for requirements tracing: The study of methods. IEEE Trans. Softw. Eng. 32(1), 4–19 (2006)CrossRefGoogle Scholar
  37. Hollink, V., Kamps, J., Monz, C., de Rijke, M.: Monolingual document retrieval for European languages. Inform. Retriev. 7(1–2), 33–52 (2004)CrossRefGoogle Scholar
  38. Jurafsky, D., Martin, J.: Speech and Language Processing. Prentice Hall, Englewood Cliffs, NJ (2000)Google Scholar
  39. Keenan, E.L.: Formal Semantics of Natural Language. Cambridge University Press, Cambridge (1975)zbMATHCrossRefGoogle Scholar
  40. Lawrie, D.J., Binkley, D., Morrell, C.: Normalizing source code vocabulary. In: Proceedings of the 17th Working Conference on Reverse Engineering, pp. 3–12. IEEE CS Press, Beverly, MA (2010)Google Scholar
  41. Lormans, M., Deursen, A., Gross, H.-G.: An industrial case study in reconstructing requirements views. Empirical Softw. Eng. 13(6), 727–760 (2008)CrossRefGoogle Scholar
  42. Lormans, M., Gross, H., van Deursen, A., van Solingen, R., Stehouwer, A.: Monitoring requirements coverage using reconstructed views: An industrial case study. In: Proceedings of 13th Working Conference on Reverse Engineering, pp. 275–284. IEEE CS Press, Benevento, Italy (2006)Google Scholar
  43. Lormans, M., Van Deursen, A.: Reconstructing requirements coverage views from design and test using traceability recovery via LSI. In: Proceedings of 3rd International Workshop on Traceability in Emerging Forms of Software Engineering, pp. 37–42. ACM Press, Long Beach, CA (2005)Google Scholar
  44. Lormans, M., van Deursen, A.: Can LSI help reconstructing requirements traceability in design and test? In: Proceedings of 10th European Conference on Software Maintenance and Reengineering, pp. 45–54. IEEE CS Press, Bari, Italy (2006)Google Scholar
  45. Madani, N., Guerrouj, L., Di Penta, M., Guéhéneuc, Y.-G., Antoniol, G.: Recognizing words from source code identifiers using speech recognition techniques. In: Proceedings of the 14th European Conference on Software Maintenance and Reengineering. CSMR, Madrid, Spain (2010)Google Scholar
  46. Marcus, A., Maletic, J.I.: Recovering documentation-to-source-code traceability links using latent semantic indexing. In: Proceedings of 25th International Conference on Software Engineering, pp. 125–135. IEEE CS Press, Portland, Oregon (2003)Google Scholar
  47. Marcus, A., Maletic, J.I., Sergeyev, A.: Recovery of traceability links between software documentation and source code. Int. J. Softw. Eng. Knowl. Eng. 15(5), 811–836 (2005)CrossRefGoogle Scholar
  48. Ney, H., Essen, U.: On smoothing techniques for bigrambases natural language modelling. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 825–828. IEEE CS Press, Toronto, ON (1991)Google Scholar
  49. Oliveto, R., Gethers, M., Poshyvanyk, D., De Lucia, A.: On the equivalence of information retrieval methods for automated traceability link recovery. In: Proceedings of the 18th IEEE International Conference on Program Comprehension, pp. 68–71. Braga, Portugal (2010)Google Scholar
  50. Porter, M.F.: An algorithm for suffix stripping. Program 14(3):130–137 (1980)CrossRefGoogle Scholar
  51. Poshyvanyk, D., Gael-Gueheneuc, Y., Marcus, A., Antoniol, G., Rajlich, V.: Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans. Softw. Eng., 33(6), 420–432 (2007)CrossRefGoogle Scholar
  52. Ramesh, B., Jarke, M.: Toward reference models for requirements traceability. IEEE Trans. Softw. Eng. 27:58–93 (2001)CrossRefGoogle Scholar
  53. Revelle, M., Dit, B., Poshyvanyk, D.: Using data fusion and web mining to support feature location in software. In: Proceedings of the 18th IEEE International Conference on Program Comprehension, pp. 14–23. Braga, Portugal (2010)Google Scholar
  54. Salton, G., Wong, A., Yang, C.S.: A vector space model for information retrieval. Commun. ACM 18(11), 613–620 (1975)zbMATHCrossRefGoogle Scholar
  55. Settimi, R., Cleland-Huang, J., Ben Khadra, O., Mody, J., Lukasik, W., De Palma, C.: Supporting software evolution through dynamically retrieving traces to UML Artifacts. In: Proceedings of 7th IEEE International Workshop on Principles of Software Evolution, pp. 49–54. IEEE CS Press, Kyoto, Japan (2004)Google Scholar
  56. Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. Document. 28, 11–21 (1972)CrossRefGoogle Scholar
  57. Witten, I.H., Bell, T.C.: The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression. IEEE Trans. Inform. Theory 37(4), 1085–1094 (1991)CrossRefGoogle Scholar
  58. Yadla, S., Huffman Hayes, J., Dekhtyar, A.: Tracing requirements to defect reports: an application of information retrieval techniques. Innov. Syst. Softw. Eng.: A NASA J. 1(2), 116–124 (2005)CrossRefGoogle Scholar
  59. Zou, X., Settimi, R., Cleland-Huang, J.: Phrasing in dynamic requirements trace retrieval. In: Proceedings of the 30th Annual International Computer Software and Application Conference, pp. 265–272. Chicago, IL (2006)Google Scholar
  60. Zou, X., Settimi, R., Cleland-Huang, J.: Term-based enhancement factors for improving automated requirement trace retrieval. In: Proceedings of International Symposium on Grand Challenges in Traceability, pp. 40–45. ACM Press, Lexington, Kentuky (2007)Google Scholar
  61. Zou, X., Settimi, R., Cleland-Huang, J.: Evaluating the use of project glossaries in automated trace retrieval. In: Proceedings of the International Conference on Software Engineering Research and Practice, pp. 157–163. Las Vegas, NV (2008)Google Scholar
  62. Zou, X., Settimi, R., Cleland-Huang, J.: Improving automated requirements trace retrieval: A study of term-based enhancement methods. Empir. Softw. Eng. 15(2), 119–146 (2010)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2012

Authors and Affiliations

  • Andrea De Lucia
    • 1
  • Andrian Marcus
    • 2
  • Rocco Oliveto
    • 3
    Email author
  • Denys Poshyvanyk
    • 4
  1. 1.University of SalernoFiscianoItaly
  2. 2.Wayne State UniversityDetroitUSA
  3. 3.University of MolisePesche (IS)Italy
  4. 4.The College of William and MaryWilliamsburgUSA

Personalised recommendations