Empirical Software Engineering

, Volume 22, Issue 3, pp 1103–1142 | Cite as

Tackling the term-mismatch problem in automated trace retrieval

  • Jin Guo
  • Marek Gibiec
  • Jane Cleland-Huang


Software intensive systems, especially those that are deployed in safety or security critical applications, must conform to an increasingly large and complex set of regulatory codes. For example all health-care related products in the USA are governed by the Health Insurance Portability and Accountability Act (HIPAA) which requires covered entities to adopt administrative and technical safeguards in order to protect the privacy of personal medical information. Financial software systems in the USA must comply with the Sarbanes-Oxley act of 2002 (SOX), which establishes wide ranging standards for all U.S. public company boards, management, and public accounting firms. Furthermore, safety critical systems often have to satisfy a staggeringly large number of regulatory codes impacting both software and hardware components. These regulations impact almost every part of the system including its electrical, mechanical, operational, and software components.

Current practice...


Requirements engineering Traceability Query augmentation Semantic traceability 



The work described in this paper was supported by US National Science Foundation grants CCF-1319680 and CCF-0447594.


  1. Abebe SL, Tonella P (2015) Extraction of domain concepts from the source code. Sci Comput Program 98:680–706CrossRefGoogle Scholar
  2. Agrwal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on very large data Bases (VLDB’94). Santiago, Chile, pp 487–499Google Scholar
  3. Antoniol G, Canfora G, de Lucia A, Casazza G (2000) Information retrieval models for recovering traceability links between code and documentation. In: ICSM ’00: Proceedings of the international conference on software maintenance (ICSM’00). IEEE Computer Society, Washington, DC, USA, p 40Google Scholar
  4. Antoniol G, Canfora G, Casazza G, De Lucia A, Merlo E (2002) Recovering traceability links between code and documentation. IEEE Trans Softw Eng 28(10):970–983CrossRefGoogle Scholar
  5. Assawamekin N, Sunetnanta T, Pluempitiwiriyawej C (2010) Ontology-based multiperspective requirements traceability framework. Knowl Inf Syst 25(3):493–522CrossRefGoogle Scholar
  6. Baeza-Yates RA, Ribeiro-Neto BA (1999) Modern information retrieval. ACM Press/Addison-WesleyGoogle Scholar
  7. Bennett KH, Rajlich VT (2000) Software maintenance and evolution: a roadmap. In: ICSE ’00: Proceedings of the Conference on the future of software engineering. ACM, New York, NY, USA, pp 73–87Google Scholar
  8. Berenbach B, Gruseman D, Cleland-Huang J (2010) Application of just in time tracing to regulatory codes. In: Proceedings of the conference on systems engineering research. Stevens Institute of Technology. Holbroken, NJGoogle Scholar
  9. Breaux TD, Rao A (2013) Formal analysis of privacy requirements specifications for multi-tier applications. In: 21st IEEE international requirements engineering conference, RE, 2013, Rio de Janeiro-RJ, Brazil, July 15–19, 2013, pp 14–20Google Scholar
  10. Broder AZ, Fontoura M, Gabrilovich E, Joshi A, Josifovski V, Zhang T (2007) Robust classification of rare queries using web knowledge. In: SIGIR ’07: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, NY, USA, pp 231–238Google Scholar
  11. Cleland-Huang J, Guo J (2014) Towards more intelligent trace retrieval algorithms. In: (RAISE) Workshop on realizing artificial intelligence synergies in software engineeringGoogle Scholar
  12. Cleland-Huang J, Chang CK, Christensen M (2003) Event-based traceability for managing evolutionary change. IEEE Trans Softw Eng 29(9):796–810CrossRefGoogle Scholar
  13. Cleland-Huang J, Settimi R, Duan C, Zou X (2005) Utilizing supporting evidence to improve dynamic requirements traceability. In: 13th IEEE international conference on requirements engineering (RE 2005), 29 August–2 September 2005, Paris, France, pp 135–144Google Scholar
  14. Cleland-Huang J, Settimi R, Zou X, Solc P (2006) The detection and classification of non-functional requirements with application to early aspects. In: RE, pp 36–45Google Scholar
  15. Cleland-Huang J, Berenbach B, Clark S, Settimi R, Romanova E (2007) Best practices for automated traceability. Computer 40(6):27–35CrossRefGoogle Scholar
  16. Cleland-Huang J, Settimi R, Zou X, Solc P (2007) Automated detection and classification of non-functional requirements. Requir Eng 12(2):103–120CrossRefGoogle Scholar
  17. Cleland-Huang J, Czauderna A, Gibiec M, Emenecker J (2010) A machine learning approach for tracing regulatory codes to product specific requirements. In: ICSE ’10: Proceedings of the 32nd ACM/IEEE international conference on software engineering. ACM, New York, NY, USA, pp 155–164Google Scholar
  18. CoEST (2008) Center of excellence for software traceability,
  19. Cuddeback D, Dekhtyar A, Hayes JH (2010) Automated requirements traceability: the study of human analysts. In: RE’10: Proceedings of the IEEE international requirements engineering conference. IEEEGoogle Scholar
  20. Dasgupta T, Grechanik M, Moritz E, Dit B, Poshyvanyk D (2013) Enhancing software traceability by automatically expanding corpora with relevant documentation. In: 2013 IEEE international conference on software maintenance, Eindhoven, The Netherlands, September 22–28, 2013, pp 320– 329Google Scholar
  21. Dietrich T, Cleland-Huang J, Shin Y (2013) Learning effective query transformations for enhanced requirements trace retrieval. In: 2013 28th IEEE/ACM International Conference on Automated Software Engineering, ASE, 2013, Silicon valley, CA, USA. November 11–15, 2013, pp 586–591Google Scholar
  22. Egyed A (2003) A scenario-driven approach to trace dependency analysis. IEEE Trans Softw Eng 29(2):116–132CrossRefGoogle Scholar
  23. FAA. AC20-115C. Do-178c: Software considerations in airborne systems and equipment certificationGoogle Scholar
  24. Falessi D, Cantone G, Canfora G (2013) Empirical principles and an industrial case study in retrieving equivalent requirements via natural language processing techniques. IEEE Trans Software Eng 39(1):18–44CrossRefGoogle Scholar
  25. Gabrilovich E, Broder A, Fontoura M, Joshi A, Josifovski V, Riedel L, Zhang T (2009) Classifying search queries using the web as a source of knowledge. ACM Trans Web 3(2):1–28CrossRefGoogle Scholar
  26. Gervasi V, Zowghi D (2014) Supporting traceability through affinity mining. In 2014 IEEE 22nd International Requirements Engineering Conference (RE), Karlskrona, pp 143–152Google Scholar
  27. Gibiec M, Czauderna A, Cleland-Huang J (2010) Towards mining replacement queries for hard-to-retrieve traces. In: ASE ’10: Proceedings of the IEEE/acm international conference on automated software engineering. ACM, New York, NY, USA, pp 245–254Google Scholar
  28. Gotel OCZ, Finkelstein ACW (1994) An analysis of the requirements traceability problem. pp 94–101Google Scholar
  29. Gotel O, Finkelstein A (1997) Extended requirements traceability: results of an industrial case study. In: RE ’97: Proceedings of the 3rd IEEE international symposium on requirements engineering. IEEE Computer Society, Washington, DC, USA, p 169Google Scholar
  30. Grando A, Schwab R (2013) Building and evaluating an ontology-based tool for reasoning about consent permission. In: AMIA annual symposium proceedings, pp 514–523Google Scholar
  31. Grando M, Boxwala A, Schwab R, Alipanah N (2012) Ontological approach for the management of informed consent permissions. In: 2012 IEEE second international conference on healthcare informatics imaging and systems biology (HISB), pp 51–60Google Scholar
  32. Gruber TR (1993) A translation approach to portable ontology specifications. Knowledge acquisition 5(2):199–220CrossRefGoogle Scholar
  33. Guizzardi G (2010) Theoretical foundations and engineering tools for building ontologies as reference conceptual models. Semantic Web 1(1–2):3–10Google Scholar
  34. Guo J (2016) Ontology learning and its application in software-intensive projects. In: Proceedings of the 38th international conference on software engineering, ICSE 2016, Austin, TX, USA, May 14–22, 2016 - Companion Volume, pp 843–846Google Scholar
  35. Guo J, Cleland-Huang J, Berenbach B (2013) Foundations for an expert system in domainspecific traceability In 2013 21st IEEE International Requirements Engineering Conference (RE), Rio de Janeiro 2013, pp 42–51Google Scholar
  36. Guo J, Monaikul N, Plepel C, Cleland-Huang J (2014) Towards an intelligent domain-specific traceability solution. In: ACM/IEEE international conference on automated software engineering, ASE ’14, Vasteras, Sweden - September 15–19, 2014, pp 755–766Google Scholar
  37. Hayashi S, Yoshikawa T, Saeki M (2010) Sentence-to-code traceability recovery with domain ontologies. In: Han J, Thu TD (eds) APSEC. IEEE Computer Society, pp 385–394Google Scholar
  38. Hayes JH, Dekhtyar A, Sundaram SK (2006) Advancing candidate link generation for requirements tracing: the study of methods. IEEE Trans Softw Eng 32 (1):4–19Google Scholar
  39. Hayes JH, Dekhtyar A, Sundaram SK, Howard S (2004) Helping analysts trace requirements: an objective look. In: RE ’04: Proceedings of the requirements engineering conference, 12th IEEE international. IEEE Computer Society, Washington, DC, USA, pp 249–259Google Scholar
  40. Hill E, Fry ZP, Boyd H, Sridhara G, Novikova Y, Pollock LL, Vijay-Shanker K (2008) AMAP: Automatically mining abbreviation expansions in programs to enhance software maintenance tools. In: Proceedings of the 2008 international working conference on mining software repositories, MSR 2008 (Co-located with ICSE), Leipzig, Germany, May 10–11, 2008, Proceedings, pp 79–88Google Scholar
  41. Hu J, Wang G, Lochovsky F, Sun J-T, Chen Z (2009) Understanding user’s query intent with wikipedia. In: WWW ’09: Proceedings of the 18th international conference on world wide web. ACM, New York, NY, USA, pp 471–480Google Scholar
  42. ISO (2010) Iso/ts 21547:2010, health informatics security requirements for archiving of electronic health records. International Organization for Standards TC 215 Health Informatics, (Last accessed 12/20/10)
  43. Klein D, Manning CD (2003) Accurate unlexicalized parsing. In: Proceedings of the 41st annual meeting on association for computational linguistics - vol 1. ACL ’03. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 423–430Google Scholar
  44. Kof L, Gacitua R, Rouncefield M, Sawyer P (2010) Concept mapping as a means of requirements tracing. In: MaRK’10)Google Scholar
  45. Li Y, Cleland-Huang J (2009) Ontology-based trace retrieval. In: Traceability in emerging forms of software engineering (TEFSE2013. San Francisco, USAGoogle Scholar
  46. Lin D (1998) An information-theoretic definition of similarity. In: ICML., vol 98, pp 296–304Google Scholar
  47. Lin J, Lin CC, Cleland-Huang J, Settimi R, Amaya J, Bedford G, Berenbach B, Khadra OB, Duan C, Zou X (2006) Poirot: a distributed tool supporting enterprise-wide automated traceability. In: RE, pp 356–357Google Scholar
  48. Lohar S, Amornborvornwong S, Zisman A, Cleland-Huang J (2013) Improving trace accuracy through data-driven configuration and composition of tracing features. In: Joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, ESEC/FSE’13, Saint Petersburg, Russian Federation, August 18–26, 2013, pp 378–388Google Scholar
  49. Lucia AD, Oliveto R, Sgueglia P (2006) Incremental approach and user feedbacks: a silver bullet for traceability recovery. IEEE International Conference on Software Maintenance 0:299–309Google Scholar
  50. Lucia AD, Marcus A, Oliveto R, Poshyvanyk D (2012) Information retrieval methods for automated traceability recovery. In: Software and systems traceability., pp 71–98Google Scholar
  51. Mäder P., Jones PL, Zhang Y, Cleland-Huang J (2013) Strategic traceability for safety-critical projects. IEEE Softw 30(3):58–66CrossRefGoogle Scholar
  52. Mahmoud A, Niu N (2015) On the role of semantics in automated requirements tracing. Requir Eng 20(3):281–300CrossRefGoogle Scholar
  53. Marcus A, Maletic JI (2000) Using latent semantic analysis to identify similarities in source code to support program understanding. In: ICTAI ’00: Proceedings of the 12th IEEE international conference on tools with artificial intelligence. IEEE Computer Society, Washington, DC, USA, p 46Google Scholar
  54. Marcus A, Maletic JI (2003) Recovering documentation-to-source-code traceability links using latent semantic indexing. In: ICSE ’03: Proceedings of the 25th international conference on software engineering. IEEE Computer Society, Washington, DC, USA, pp 125–135Google Scholar
  55. Mirakhorli M, Cleland-Huang J (2016) Detecting, tracing, and monitoring architectural tactics in code. IEEE Trans Software Eng 42(3):205–220Google Scholar
  56. Mirakhorli M, Shin Y, Cleland-Huang J, Çinar M (2012) A tactic-centric approach for automating traceability of quality concernsGoogle Scholar
  57. Murta LGP, van der Hoek A, Werner CML (2006) Archtrace: Policy-based support for managing evolving architecture-to-implementation traceability links. In: ASE ’06: Proceedings of the 21st IEEE/ACM International Conference on Automated Software Engineering. IEEE Computer Society, Washington, DC, USA, pp 135–144Google Scholar
  58. Nauman M, Khan S (2007) Using personalizedweb search for enhancing common sense and folksonomy based intelligent search systems. In: WI ’07: Proceedings of the IEEE/WIC/ACM international conference on web intelligence. IEEE Computer Society, Washington, DC, USA, pp 423– 426Google Scholar
  59. PCI (2006) Pci security standard quick reference guide. Payment Card Industry Security Guidelines (Last accessed 12/30/10)
  60. Porter M (1980) Porter’s stemming algorithm, pp 130–137Google Scholar
  61. Ramesh B, Jarke M (2001) Toward reference models for requirements traceability. IEEE Trans Softw Eng 27(1):58–93CrossRefGoogle Scholar
  62. Rempel P, Mäder P., Kuschke T, Cleland-Huang J (2015) Traceability gap analysis for assessing the conformance of software traceability to relevant guidelines. In: Software Engineering & Management 2015, Multikonferenz der GI-fachbereiche Softwaretechnik (SWT) und Wirtschaftsinformatik (WI), FA WI-MAW, 17. März - 20. März 2015, Dresden, Germany, pp 120–121Google Scholar
  63. Salton G (1989) Automatic text processing: the transformation, analysis, and retrieval of information by computer. Addison-Wesley Longman Publishing Co., Inc., BostonGoogle Scholar
  64. Salton G, McGill M (1986) Introduction to modern information retrieval. McGraw-Hill, New YorkzbMATHGoogle Scholar
  65. Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620CrossRefzbMATHGoogle Scholar
  66. Sarbanes-Oxley (2002) A guide to the sarbanes-oxley act. The Sarbanes-Oxley Compliance Guide 2002, (Last accessed on 12/30/10)
  67. Shen D, Sun J-T, Yang Q, Chen Z (2006) Building bridges for web query classification. In: SIGIR ’06: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, NY, USA, pp 131–138Google Scholar
  68. Shin Y, Cleland-Huang J (2012) A comparative evaluation of two user feedback techniques for requirements trace retrieval. In: SAC, pp 1069–1074Google Scholar
  69. Shin Y, Hayes JH, Cleland-Huang J (2015) Guidelines for benchmarking automated software traceability techniques. In: 8th IEEE/ACM international symposium on software and systems traceability, SST, 2015, Florence, Italy, May 17, 2015, pp 61–67Google Scholar
  70. Spanoudakis G, Zisman A, Pérez-Miñana E, Krause P (2004) Rule-based generation of requirements traceability relations. J Syst Softw 72(2):105–127CrossRefGoogle Scholar
  71. Stanford (2013) Protégé: Open source ontology editorGoogle Scholar
  72. Sultanov H, Hayes JH (2013) Application of reinforcement learning to requirements engineering: requirements tracing. In: 21st IEEE International Requirements Engineering Conference, RE 2013, Rio de janeiro-RJ, Brazil, July 15–19, 2013, pp 52–61Google Scholar
  73. Sultanov H, Hayes JH, Kong W (2011) Application of swarm techniques to requirements tracing. Requir Eng 16(3):209–226CrossRefGoogle Scholar
  74. Tufis D, Mason O (1998) Tagging romanian texts: a case study for qtag, a language independent probabilistic tagger. In: Proceedings of the first international conference on language resources and evaluation (LREC), pp 589–596Google Scholar
  75. U.S. Food and Drug Administration (2002) General principles of software validation. U.S. Dept. of Health and Human Services 1:1Google Scholar
  76. Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (2000) Experimentation in software engineering: an introduction. Kluwer Academic Publishers, NorwellCrossRefzbMATHGoogle Scholar
  77. Yurekli B, Capan G, Yilmazel B, Yilmazel O (2009) Guided navigation using query log mining through query expansion. In: NSS ’09: Proceedings of the 2009 third international conference on network and system security. IEEE Computer Society, Washington, DC, USA, pp 560–564Google Scholar
  78. Zhang Y, Witte R, Rilling J, Haarslev V (2008) Ontological approach for the semantic recovery of traceability links between software artefacts. Software, IET 2 (3):185–203CrossRefGoogle Scholar
  79. Zou X, Settimi R, Cleland-Huang J (2010) Improving automated requirements trace retrieval: a study of term-based enhancement methods. Empirical Softw Engg 15 (2):119–146CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.University of Notre DameNotre DameUSA
  2. 2.DePaul UniversityChicagoUSA

Personalised recommendations