Empirical Software Engineering

, Volume 20, Issue 2, pp 442–478 | Cite as

An empirical study on the importance of source code entities for requirements traceability

  • Nasir AliEmail author
  • Zohreh Sharafi
  • Yann-Gaël Guéhéneuc
  • Giuliano Antoniol


Requirements Traceability (RT) links help developers during program comprehension and maintenance tasks. However, creating RT links is a laborious and resource-consuming task. Information Retrieval (IR) techniques are useful to automatically create traceability links. However, IR-based techniques typically have low accuracy (precision, recall, or both) and thus, creating RT links remains a human intensive process. We conjecture that understanding how developers verify RT links could help improve the accuracy of IR-based RT techniques to create RT links. Consequently, we perform an empirical study consisting of four case studies. First, we use an eye-tracking system to capture developers’ eye movements while they verify RT links. We analyse the obtained data to identify and rank developers’ preferred types of Source Code Entities (SCEs), e.g., domain vs. implementation-level source code terms and class names vs. method names. Second, we perform another eye-tracking case study to confirm that it is the semantic content of the developers’ preferred types of SCEs and not their locations that attract developers’ attention and help them in their task to verify RT links. Third, we propose an improved term weighting scheme, i.e., Developers Preferred Term Frequency/Inverse Document Frequency (D P T F / I D F), that uses the knowledge of the developers’ preferred types of SCEs to give more importance to these SCEs into the term weighting scheme. We integrate thisweighting scheme with an IR technique, i.e., Latent Semantic Indexing (LSI), to create a new technique to RT link recovery. Using three systems (iTrust, Lucene, and Pooka), we show that the proposed technique statistically improves the accuracy of the recovered RT links over a technique based on LSI and the usual Term Frequency/Inverse Document Frequency (T F / I D F) weighting scheme. Finally, we compare the newly proposed D P T F / I D F with our original Domain Or Implementation/Inverse Document Frequency (D O I / I D F) weighting scheme.


Source Code Weighting Scheme Latent Dirichlet Allocation Vector Space Model Latent Semantic Indexing 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



The authors would like to thank all the participants of the case studies as this work would not be possible without their collaboration. This work has been partially supported by the NSERC Research Chairs on Software Cost-effective Change, Evolution and on Software Patterns and Patterns of Software, and by Fonds de recherche du Québec – Nature et technologies(FRQNT).


  1. Abadi A, Nisenson M, Simionovici Y (2008) A traceability technique for specifications. In: Proceeding of 16th IEEE international conference on program comprehension, pp 103 –112Google Scholar
  2. Abebe SL, Tonella P (2011) Towards the extraction of domain concepts from the identifiers. In: Proceeding of 18th working conference on reverse engineering (WCRE), pp 77–86Google Scholar
  3. Ali N, Guéhéneuc Y-G, Antoniol G (2011a) Factors impacting the inputs of traceability recovery approaches. In: Zisman A, Cleland-Huang J, Gotel O (eds) Software and systems traceability, chapter 7. Springer, New YorkGoogle Scholar
  4. Ali N, Gueheneuc Y-G, Antoniol G (2011b) Requirements traceability for object oriented systems by partitioning source code. In: Proceedings of 18th working conference on reverse engineering, WCRE ’11. IEEE Computer Society, Washington, DC, pp pp 45–54Google Scholar
  5. Ali N, Guéhéneuc Y-G, Antoniol G (2011c) Trust-based requirements traceability. In: Proceeding of 19th IEEE international conference on program comprehension. IEEE Computer Society, Washington, DC,p 10Google Scholar
  6. Ali N, Guéhéneuc Y-G, Antoniol G (2012a) Trustrace: mining software repositories to improve the accuracy of requirement traceability links. IEEE Trans Softw Eng 99(PrePrints):1Google Scholar
  7. Ali N, Sharafi Z, Guéhéneuc Y-G, Antoniol G (2012b) An empirical study on requirements traceability using eye-tracking. In: Proceedings of IEEE international conference on software maintenance, pp 191–200Google Scholar
  8. Antoniol G, Caprile B, Potrich A, Tonella P (2000) Design-code traceability for object-oriented systems. Ann Softw Eng 9(1):35–58CrossRefGoogle Scholar
  9. Antoniol G, Canfora G, Casazza G, De Lucia A, Merlo E (2002) Recovering traceability links between code and documentation. IEEE Trans Softw Eng 28(10):970–983CrossRefGoogle Scholar
  10. Bachmann A, Bird C, Rahman F, Devanbu P, Bernstein A (2010) The missing links: bugs and bug-fix commits. In: Proceedings of the 18th ACM SIGSOFT international symposium on foundations of software engineering, FSE ’10. ACM, New York, pp 97–106Google Scholar
  11. Baldi PF, Lopes CV, Linstead EJ, Bajracharya SK (2008) A theory of aspects as latent topics. Sigplan Not 43(10):543–562CrossRefGoogle Scholar
  12. Bednarik R, Tukiainen M (2006) An eye-tracking methodology for characterizing program comprehension processes. In: Proceedings of the 2006 symposium on eye tracking research & applications. ETRA ’06. ACM, New York, pp 125–132Google Scholar
  13. Bunge M (1977) Treatise on basic philosophy: vol. 3: ontology I: the furniture of the world. Reidel, BostonCrossRefzbMATHGoogle Scholar
  14. Busjahn T, Schulte C, Busjahn A (2011) Analysis of code reading to gain more insight in program comprehension. In: Proceedings of the 11th Koli calling international conference on computing education research. Koli Calling ’11. ACM, New York, pp 1–9Google Scholar
  15. Cepeda Porras G, Guéhéneuc Y-G (2010) An empirical study on the efficiency of different design pattern representations in uml class diagrams. Empir Softw Eng 15:493–522CrossRefGoogle Scholar
  16. Dagenais B, Ossher H, Bellamy RKE, Robillard MP, de Vries JP (2010) Moving into a new software project landscape. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering - volume 1. ICSE ’10. ACM, New York, pp 275–284Google Scholar
  17. De Lucia A, Fasano F, Oliveto R, Tortora G (2007) Recovering traceability links in software artifact management systems using information retrieval methods. ACM Trans Softw Eng Methodol 16(4)Google Scholar
  18. De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichella S (2011a) Improving ir-based traceability recovery using smoothing filters. In: Proceeding of 19th IEEE international conference on program comprehension, pp 21 –30Google Scholar
  19. De Lucia A, Di Penta M, Oliveto R (2011b) Improving source code lexicon via traceability and information retrieval. IEEE Trans Softw Eng 37:205–227CrossRefGoogle Scholar
  20. De Lucia A, Marcus A, Oliveto R, Poshyvanyk D (2012) Information retrieval methods for automated traceability recovery. In: Software and systems traceability, pp 71–98Google Scholar
  21. De Smet B, Lempereur L, Sharafi Z, Guéhéneuc Y-G, Antoniol G, Habra N (2012) Taupe: visualizing and analyzing eye-tracking data. Sci Comput ProgramGoogle Scholar
  22. Dit B, Panichella A, Moritz E, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2013) Configuring topic models for software engineering tasks in tracelab. In: Proceedings of 7th ACM/IEEE international conference in software engineering, vol 13, pp 105–109Google Scholar
  23. Duchowski AT (2002) A breadth-first survey of eye-tracking applications. Behav Res Methods 34(4):455–470CrossRefGoogle Scholar
  24. Duchowski AT (2007) Eye tracking methodology: theory and practice. Springer, New YorkGoogle Scholar
  25. Erol B, Berkner K, Joshi S (2006) Multimedia thumbnails for documents. In: Proceedings of the 14th annual ACM international conference on Multimedia. MULTIMEDIA ’06. ACM, New York, pp 231–240Google Scholar
  26. Gethers M, Savage T, Di Penta M, Oliveto R, Poshyvanyk D, De Lucia A (2011) Codetopics: which topic am i coding now? In: Proceedings of the 33rd international conference on software engineering. ACM, pp 1034–1036Google Scholar
  27. Gotel OCZ, Finkelstein CW (1994) An analysis of the requirements traceability problem. In: 1st international conference on requirements engineering, pp 94–101Google Scholar
  28. Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci USA 101(Suppl 1):5228–5235CrossRefGoogle Scholar
  29. Guéhéneuc YG (2006) Taupe: towards understanding program comprehension. In: Proceedings of conference of the center for advanced studies on collaborative research. ACM, pp 1–13Google Scholar
  30. Kitchenham BA, Pfleeger SL, Pickard LM, Jones PW, Hoaglin DC, El Emam K, Rosenberg J (2002) Preliminary guidelines for empirical research in software engineering. Trans Softw Eng 28(8):721–734CrossRefGoogle Scholar
  31. Kowalski G (2010) Information retrieval architecture and algorithms. Springer, New YorkGoogle Scholar
  32. Macbeth G, Razumiejczyk E, Ledesma RD (2011) Cliff’s delta calculator: a non-parametric effect size program for two groups of observations. Univ Psychol 10(2):545–555Google Scholar
  33. Marcus A, Maletic JI (2003) Recovering documentation-to-source-code traceability links using latent semantic indexing. In: Proceedings of 25th international conference on software engineering. IEEE CS Press, Portland, pp 125–135Google Scholar
  34. Maskeri G, Sarkar S, Heafield K (2008) Mining business topics in source code using latent dirichlet allocation. In: Proceedings of the 1st India software engineering conference, ISEC ’08. ACM, New York, pp 113–120Google Scholar
  35. Pan B, Hembrooke H, Joachims T, Lorigo L, Gay G, Granka L (2007) In Google we trust: users’ decisions on rank, position, and relevance. J Comput-Mediat Commun 12(3):801–823CrossRefGoogle Scholar
  36. Panichella A, Dit B, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2013) How to effectively use topic models for software engineering tasks? an approach based on genetic algorithms. In: Proceedings of the 2013 international conference on software engineering. IEEE Press, pp 522–531Google Scholar
  37. Poshyvanyk D, Guéhéneuc Y-G, Marcus A, Antoniol G, Rajlich V (2007) Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans Softw Eng 33(6):420–432CrossRefGoogle Scholar
  38. Rayner K (1998) Eye movements in reading and information processing: 20 years of research. Psychol Bull 124(3):372CrossRefGoogle Scholar
  39. Seeing Machine (2012) Seeing Machine’s website - FaceLAB. Accessed 13 July 2012
  40. Sharif B, Kagdi H (2011) On the use of eye tracking in software traceability. In: Proceedings of the 6th international workshop on traceability in emerging forms of software engineering (TEFSE). New York, pp 67–70Google Scholar
  41. Sharif B, Maletic JI (2010) An eye tracking study on camelcase and under_score identifier styles. In: Proceedings of 18th international conference on program comprehension (ICPC). IEEE, pp 196–205Google Scholar
  42. Sharif B, Falcone M, Maletic JI (2012) An eye-tracking study on the role of scan time in finding source code defects. In: Proceedings of the symposium on eye tracking research and applications. ETRA ’12. ACM, New York, pp 381–384Google Scholar
  43. Sun YH, He PL, Chen ZG (2004) An improved term weighting scheme for vector space model. In: Proceedings of 2004 international conference on machine learning and cybernetics, vol 3. IEEE, pp 1692–1695Google Scholar
  44. Uwano H, Nakamura M, Monden A, Matsumoto K (2006) Analyzing individual performance of source code review using reviewers’ eye movement. In: Proceedings of the 2006 symposium on eye tracking research & applications (ETRA). ACM, New York, pp 133–140Google Scholar
  45. Wang J, Peng X, Xing Z, Zhao W (2011) An exploratory study of feature location process: distinct phases, recurring patterns, and elementary actions. In: Proceedings of 27th IEEE international conference on software maintenance (ICSM), pp 213–222Google Scholar
  46. Yusuf S, Kagdi H, Maletic JI (2007) Assessing the comprehension of uml class diagrams via eye tracking. In: Proceedings of 15th IEEE international conference on program comprehension (ICPC). IEEE,pp 113–122Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Nasir Ali
    • 1
    Email author
  • Zohreh Sharafi
    • 2
  • Yann-Gaël Guéhéneuc
    • 2
  • Giuliano Antoniol
    • 2
  1. 1.Department of Electrical and Computer EngineeringUniversity of WaterlooKingstonCanada
  2. 2.DGIGL, École Polytechnique de MontréalMontrealCanada

Personalised recommendations