Empirical Software Engineering

, Volume 15, Issue 2, pp 119–146 | Cite as

Improving automated requirements trace retrieval: a study of term-based enhancement methods

  • Xuchang Zou
  • Raffaella SettimiEmail author
  • Jane Cleland-Huang


Automated requirements traceability methods that utilize Information Retrieval (IR) methods to generate and maintain traceability links are often more efficient than traditional manual approaches, however the traces they generate are imprecise and significant human effort is needed to evaluate and filter the results. This paper investigates and compares three term-based enhancement methods that are designed to improve the performance of a probabilistic automated tracing tool. Empirical studies show that the enhancement methods can be effective in increasing the accuracy of the retrieved traces; however the effectiveness of each method varies according to specific project characteristics. The analysis of such characteristics has lead to the development of two new project-level metrics which can be used to predict the effectiveness of each enhancement method for a given data set. A procedure to automatically extract critical keywords and phrases from a set of traceable artifacts is also presented to enhance the automated trace retrieval algorithm. The procedure is tested on two new datasets.


Requirements traceability Requirements management Information retrieval models. 



The work described in this paper was partially funded by NSF grants CCR-0306303 and CCF0810924.


  1. Antoniol G, Canfora G, De Lucia A, Casazza G (2000) Information Retrieval Models for Recovering Traceability Links between Code and Documentation. Proceedings of the International Conference on Software Maintenance, San Jose, California, USA, pp. 40–51.Google Scholar
  2. Borger E, Gotzhein R (2000) Requirements Engineering Case Study ‘Light Control’. Journal of Universal Computer Science 6(7):580–596Google Scholar
  3. Burke R, Hammond K., Kulyukin V., Lytinen S., Tomuro N. and Schoenberg S. (1997) Natural language processing in the FAQ finder system: results and prospects. AAAI Spring Symposium on Natural Language Processing for the World Wide Web, pp. 17–26.Google Scholar
  4. Church K, Hanks P (1990) Word Association Norms, Mutual Information, and Lexicography. Computational Linguistics 16(1):22–29Google Scholar
  5. Cleland-Huang J, Settimi R, BenKhadra O, Berezhanskaya E, Christina S (2005a) Goal-Centric traceability for managing non-functional requirements. Proceedings of the 27th International Conference on Software Engineering, St. Louis, MO, USA, pp. 362–271.Google Scholar
  6. Cleland-Huang J, Settimi R, Duan C, Zou X (2005b) Utilizing supporting evidence to improve dynamic requirements traceability. Proceedings of the 13th IEEE International Requirements Engineering Conference, Paris, France, pp. 135–144.Google Scholar
  7. Croft W, Turtle H, Lewis A (1991) The use of phrases and structured queries in information retrieval. Proceeding of the 14th International ACM SIGIR conference on Research and development in information retrieval, Chicago, IL, USA, pp. 32–45.Google Scholar
  8. Cronen-Townsend S, Zhou Y, Croft W B (2002) Predicting Query Performance. Proceedings of the 25th Annual International ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR 2002), pp 299–306.Google Scholar
  9. Davis AM (1990) Software Requirements: Analysis and Specification. Prentice Hall, Englewood Cliffs, NJGoogle Scholar
  10. De Lucia A, Fasano F, Oliveto R, Tortora G (2007) Recovering traceability links in software artifact management systems using information retrieval methods. ACM Transactions on Software Engineering and Methodology (TOSEM), 16(4), article n.13.Google Scholar
  11. De Lucia A, Oliveto R, Tortora G (2009) Assessing IR-based traceability recovery tools through controlled experiments. Empirical Software Engineering 14(1):57–92CrossRefGoogle Scholar
  12. Deerwester S, Dumais S, Furnas G, Landauer T, Harshman R (1990) Indexing by latent semantic analysis. Journal of the American Society for Information Science 41:391–407CrossRefGoogle Scholar
  13. Dekhtyar, A.; Hayes, J.H.; Sundaram, S.; Holbrook, A.; Dekhtyar, O., (2007) Technique Integration for Requirements Assessment, Proceedings of 15th International Requirements Engineering Conference, pp.141–150.Google Scholar
  14. Evans MW (1989) The Software Factory. John Wiley and Sons, Hoboken, NJGoogle Scholar
  15. Fagan J (1987) Experiments in Automatic Phrase Indexing for Document Retrieval: A Comparison of Syntactic and Non-Syntactic Methods (Doctoral dissertation, Cornell University, Computer Science Department). Technical Report, pp. 87–868.Google Scholar
  16. Fellbaum, C editor (1998). Wordnet: An Electronic Lexical Database, MIT Press Books.Google Scholar
  17. Forsythe GE, Malcolm MA, Moler CB (1977) Computer Methods for Mathematical Computations (Chapter 9: Least squares and the singular value decomposition). Prentice Hall, Englewood Cliffs, NJGoogle Scholar
  18. Frakes WB, Baeza-Yates R (1992) Information Retrieval: Data Structures and Algorithms. Prentice Hall, Englewood Cliffs, NJGoogle Scholar
  19. Furnas G W, Deerwester S, Dumais S T, Landauer T K, Harshman R A, Streeter V, Lochbaum K E (1988), Information retrieval using a singular value decomposition model of latent semantic structure. Proceedings of SIGIR, pp. 465–480.Google Scholar
  20. Gay L, Croft W (1990) Interpreting Nominal Compounds for Information Retrieval. Inf Process Manage 26(1):21–38CrossRefGoogle Scholar
  21. Gotel O, Finkelstein A (1994) An analysis of the requirements traceability problem. Proceedings of the 1st International Conference on Requirements Engineering, Colorado Springs, Colorado, USA, pp. 94–101.Google Scholar
  22. Hayes , J. H., Dekhtyar, A., Osbourne, J. (2003). Improving requirements tracing via information retrieval. Proceedings of the 11th International Conference on Requirements Engineering, pp. 151–161.Google Scholar
  23. Hayes JH, Dekhtyar A, Sundaram S (2006) Advancing Candidate Link Generation for Requirements Tracing: the Study of Methods. IEEE Transactions on Software Engineering 32(1):4–19CrossRefGoogle Scholar
  24. Interactive Development Environments (1991). Software through pictures: products and services overview, IDE Inc.Google Scholar
  25. Joho H, Sanderson M (2007) Document Frequency and Term Specificity. Proceeding of the 8th Recherche d’Information Assistée par Ordinateur Conference (RIAO’07), Pittsburgh, PA, USA.Google Scholar
  26. Jones KS, van Rijsbergen CJ (1976) Information Retrieval Test Collections. Journal of Documentation 32:59–75CrossRefGoogle Scholar
  27. Kaindl H (1993) The Missing Link in Requirements Engineering. ACM SIGSOFT Software Engineering Notes 18(2):30–39CrossRefGoogle Scholar
  28. Lin J, Lin C C, Cleland-Huang J, Settimi R, Amaya J, Bedford G, Berenbach B, Khadra O B, Duan C Zou X. (2006). Poirot: a distributed tool supporting enterprise-wide traceability. Proceeding of the 14th IEEE International Conference on Requirements Engineering, Minneapolis, MN, USA, pp. 11–15.Google Scholar
  29. Maletic J I, Munson E V, Marcus A, Nguyen T N (2003) Using a hypertext model for traceability link conformance analysis. Proceeding of the 2nd International Workshop on Traceability in Emerging Forms of Software Engineering, Montreal, CA, USA, pp. 47–54.Google Scholar
  30. Marcus A, Maletic J I (2003) Recovering documentation-to-source-code traceability links using latent semantic indexing. Proceeding of the 25th IEEE International Conference on Software Engineering, Portland, Oregon, USA, pp. 125–135.Google Scholar
  31. Matsuo Y, Ishisuka M (2004) Keyword Extraction from a Single Document using Word Co-occurrence Statistical Information. International Journal on Artificial Intelligence Tools 13(1):157–169CrossRefGoogle Scholar
  32. PROMISE (2008) Software Engineering Repository,, accessed 8/8/2008.
  33. Robertson S, Robertson J (1999) Mastering the Requirements Process, Reading. Addison-Wesley, MAGoogle Scholar
  34. Rocchio J (1971) The SMART Retrieval System: Experiments in Automatic Document Processing (Relevance feedback in information retrieval). Prentice-Hall, Englewood Cliffs, NJGoogle Scholar
  35. Salton G, Buckley C (1988) Term weighting approaches in automatic retrieval. Information Processing and Management 24(5):513-523.CrossRefGoogle Scholar
  36. Salton G, Yang C, Yu C (1974) A Theory of Term Importance in Automatic Text Analysis. Journal of the American Society for Information Science 26(1):33–44CrossRefGoogle Scholar
  37. Salton G, Wong A, Yang CS (1975) A Vector Space Model for Automatic Indexing. Commun ACM 18(11):613–620zbMATHCrossRefGoogle Scholar
  38. Settimi R, Cleland-Huang J, BenKhadra O, Mody J, Lukasik W, DePalma C (2004) Supporting change in evolving software systems through dynamic traces to UML. Proceeding of the 7th IEEE International Workshop on Principles of Software Evolution, Kyoto, Japan, pp. 49–54.Google Scholar
  39. Singhal A, Choi J, Hindle D, Lewis DD, Pereira F (1999) AT&T at TREC-7. Proceedings of TREC-7, Gaithersburg, MD, USA, pp. 239–252.Google Scholar
  40. Tufis D, Mason O (1998) Tagging Romanian texts: a case study for QTAG, a language independent probabilistic tagger. Proceedings of the International Conference on Language Resources & Evaluation, Granada, Spain, pp 589–596Google Scholar
  41. Wong SKM, Yao YY (1991) A Probabilistic Inference Model for Information Retrieval. Information Systems 16(3):301–321CrossRefMathSciNetGoogle Scholar
  42. Zou X (2009) Improving Automated Requirements Trace Retrieval Through Term-Based Enhancement Strategies. PhD thesis, School of Computing, DePaul University, Chicago, IL. Technical Report n. 09–001.Google Scholar
  43. Zou X, Settimi R, Cleland-Huang J (2006) Phrasing in Dynamic Requirements Trace Retrieval, Proceedings of the 30th Annual International Computer Software and Application Conference (COMPSAC06). Chicago, IL, USA, pp 265–272CrossRefGoogle Scholar
  44. Zou X, Settimi R, Cleland-Huang J (2007) Term-based Enhancement Factors in Automated Requirements Traceability Retrieval, Proceedings of the 2nd International Symposium on Grand Challenge in Traceability. Lexington, KY, USA, pp 40–45Google Scholar
  45. Zou X, Settimi R, Cleland-Huang J (2008) Evaluating the Use of Project Glossaries in Automated Trace Retrieval. Proceedings of the 2008 International Conference on Software Engineering Research and Practice (SERP’08), Las Vegas, USA, pp. 157–163.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Xuchang Zou
    • 1
  • Raffaella Settimi
    • 1
    Email author
  • Jane Cleland-Huang
    • 2
  1. 1.School of Computing, DePaul UniversityChicagoUSA
  2. 2.System and Requirements Engineering Center, School of ComputingDePaul UniversityChicagoUSA

Personalised recommendations