Abstract
Requirements Traceability (RT) links help developers during program comprehension and maintenance tasks. However, creating RT links is a laborious and resource-consuming task. Information Retrieval (IR) techniques are useful to automatically create traceability links. However, IR-based techniques typically have low accuracy (precision, recall, or both) and thus, creating RT links remains a human intensive process. We conjecture that understanding how developers verify RT links could help improve the accuracy of IR-based RT techniques to create RT links. Consequently, we perform an empirical study consisting of four case studies. First, we use an eye-tracking system to capture developers’ eye movements while they verify RT links. We analyse the obtained data to identify and rank developers’ preferred types of Source Code Entities (SCEs), e.g., domain vs. implementation-level source code terms and class names vs. method names. Second, we perform another eye-tracking case study to confirm that it is the semantic content of the developers’ preferred types of SCEs and not their locations that attract developers’ attention and help them in their task to verify RT links. Third, we propose an improved term weighting scheme, i.e., Developers Preferred Term Frequency/Inverse Document Frequency (D P T F / I D F), that uses the knowledge of the developers’ preferred types of SCEs to give more importance to these SCEs into the term weighting scheme. We integrate thisweighting scheme with an IR technique, i.e., Latent Semantic Indexing (LSI), to create a new technique to RT link recovery. Using three systems (iTrust, Lucene, and Pooka), we show that the proposed technique statistically improves the accuracy of the recovered RT links over a technique based on LSI and the usual Term Frequency/Inverse Document Frequency (T F / I D F) weighting scheme. Finally, we compare the newly proposed D P T F / I D F with our original Domain Or Implementation/Inverse Document Frequency (D O I / I D F) weighting scheme.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
In this paper, we call “source code entities” any domain-level term, implementation-level term, class name, method name, variable name, or comment found in a piece of code. Domain concepts are concepts pertaining to the use of the system by users. Implementation concepts relate to data structures, GUI elements, databases, and algorithms. For example, in the Pooka e-mail client, addAddress in AddressBook.java class and addFocusListener in AddressEntryTextArea.java are domain-level and implementation-level concepts, respectively.
We consider any object X is a source code class, i.e., c i .
References
Abadi A, Nisenson M, Simionovici Y (2008) A traceability technique for specifications. In: Proceeding of 16th IEEE international conference on program comprehension, pp 103 –112
Abebe SL, Tonella P (2011) Towards the extraction of domain concepts from the identifiers. In: Proceeding of 18th working conference on reverse engineering (WCRE), pp 77–86
Ali N, Guéhéneuc Y-G, Antoniol G (2011a) Factors impacting the inputs of traceability recovery approaches. In: Zisman A, Cleland-Huang J, Gotel O (eds) Software and systems traceability, chapter 7. Springer, New York
Ali N, Gueheneuc Y-G, Antoniol G (2011b) Requirements traceability for object oriented systems by partitioning source code. In: Proceedings of 18th working conference on reverse engineering, WCRE ’11. IEEE Computer Society, Washington, DC, pp pp 45–54
Ali N, Guéhéneuc Y-G, Antoniol G (2011c) Trust-based requirements traceability. In: Proceeding of 19th IEEE international conference on program comprehension. IEEE Computer Society, Washington, DC,p 10
Ali N, Guéhéneuc Y-G, Antoniol G (2012a) Trustrace: mining software repositories to improve the accuracy of requirement traceability links. IEEE Trans Softw Eng 99(PrePrints):1
Ali N, Sharafi Z, Guéhéneuc Y-G, Antoniol G (2012b) An empirical study on requirements traceability using eye-tracking. In: Proceedings of IEEE international conference on software maintenance, pp 191–200
Antoniol G, Caprile B, Potrich A, Tonella P (2000) Design-code traceability for object-oriented systems. Ann Softw Eng 9(1):35–58
Antoniol G, Canfora G, Casazza G, De Lucia A, Merlo E (2002) Recovering traceability links between code and documentation. IEEE Trans Softw Eng 28(10):970–983
Bachmann A, Bird C, Rahman F, Devanbu P, Bernstein A (2010) The missing links: bugs and bug-fix commits. In: Proceedings of the 18th ACM SIGSOFT international symposium on foundations of software engineering, FSE ’10. ACM, New York, pp 97–106
Baldi PF, Lopes CV, Linstead EJ, Bajracharya SK (2008) A theory of aspects as latent topics. Sigplan Not 43(10):543–562
Bednarik R, Tukiainen M (2006) An eye-tracking methodology for characterizing program comprehension processes. In: Proceedings of the 2006 symposium on eye tracking research & applications. ETRA ’06. ACM, New York, pp 125–132
Bunge M (1977) Treatise on basic philosophy: vol. 3: ontology I: the furniture of the world. Reidel, Boston
Busjahn T, Schulte C, Busjahn A (2011) Analysis of code reading to gain more insight in program comprehension. In: Proceedings of the 11th Koli calling international conference on computing education research. Koli Calling ’11. ACM, New York, pp 1–9
Cepeda Porras G, Guéhéneuc Y-G (2010) An empirical study on the efficiency of different design pattern representations in uml class diagrams. Empir Softw Eng 15:493–522
Dagenais B, Ossher H, Bellamy RKE, Robillard MP, de Vries JP (2010) Moving into a new software project landscape. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering - volume 1. ICSE ’10. ACM, New York, pp 275–284
De Lucia A, Fasano F, Oliveto R, Tortora G (2007) Recovering traceability links in software artifact management systems using information retrieval methods. ACM Trans Softw Eng Methodol 16(4)
De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichella S (2011a) Improving ir-based traceability recovery using smoothing filters. In: Proceeding of 19th IEEE international conference on program comprehension, pp 21 –30
De Lucia A, Di Penta M, Oliveto R (2011b) Improving source code lexicon via traceability and information retrieval. IEEE Trans Softw Eng 37:205–227
De Lucia A, Marcus A, Oliveto R, Poshyvanyk D (2012) Information retrieval methods for automated traceability recovery. In: Software and systems traceability, pp 71–98
De Smet B, Lempereur L, Sharafi Z, Guéhéneuc Y-G, Antoniol G, Habra N (2012) Taupe: visualizing and analyzing eye-tracking data. Sci Comput Program
Dit B, Panichella A, Moritz E, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2013) Configuring topic models for software engineering tasks in tracelab. In: Proceedings of 7th ACM/IEEE international conference in software engineering, vol 13, pp 105–109
Duchowski AT (2002) A breadth-first survey of eye-tracking applications. Behav Res Methods 34(4):455–470
Duchowski AT (2007) Eye tracking methodology: theory and practice. Springer, New York
Erol B, Berkner K, Joshi S (2006) Multimedia thumbnails for documents. In: Proceedings of the 14th annual ACM international conference on Multimedia. MULTIMEDIA ’06. ACM, New York, pp 231–240
Gethers M, Savage T, Di Penta M, Oliveto R, Poshyvanyk D, De Lucia A (2011) Codetopics: which topic am i coding now? In: Proceedings of the 33rd international conference on software engineering. ACM, pp 1034–1036
Gotel OCZ, Finkelstein CW (1994) An analysis of the requirements traceability problem. In: 1st international conference on requirements engineering, pp 94–101
Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci USA 101(Suppl 1):5228–5235
Guéhéneuc YG (2006) Taupe: towards understanding program comprehension. In: Proceedings of conference of the center for advanced studies on collaborative research. ACM, pp 1–13
Kitchenham BA, Pfleeger SL, Pickard LM, Jones PW, Hoaglin DC, El Emam K, Rosenberg J (2002) Preliminary guidelines for empirical research in software engineering. Trans Softw Eng 28(8):721–734
Kowalski G (2010) Information retrieval architecture and algorithms. Springer, New York
Macbeth G, Razumiejczyk E, Ledesma RD (2011) Cliff’s delta calculator: a non-parametric effect size program for two groups of observations. Univ Psychol 10(2):545–555
Marcus A, Maletic JI (2003) Recovering documentation-to-source-code traceability links using latent semantic indexing. In: Proceedings of 25th international conference on software engineering. IEEE CS Press, Portland, pp 125–135
Maskeri G, Sarkar S, Heafield K (2008) Mining business topics in source code using latent dirichlet allocation. In: Proceedings of the 1st India software engineering conference, ISEC ’08. ACM, New York, pp 113–120
Pan B, Hembrooke H, Joachims T, Lorigo L, Gay G, Granka L (2007) In Google we trust: users’ decisions on rank, position, and relevance. J Comput-Mediat Commun 12(3):801–823
Panichella A, Dit B, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2013) How to effectively use topic models for software engineering tasks? an approach based on genetic algorithms. In: Proceedings of the 2013 international conference on software engineering. IEEE Press, pp 522–531
Poshyvanyk D, Guéhéneuc Y-G, Marcus A, Antoniol G, Rajlich V (2007) Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans Softw Eng 33(6):420–432
Rayner K (1998) Eye movements in reading and information processing: 20 years of research. Psychol Bull 124(3):372
Seeing Machine (2012) Seeing Machine’s website - FaceLAB. http://www.seeingmachines.com/product/facelab/ Accessed 13 July 2012
Sharif B, Kagdi H (2011) On the use of eye tracking in software traceability. In: Proceedings of the 6th international workshop on traceability in emerging forms of software engineering (TEFSE). New York, pp 67–70
Sharif B, Maletic JI (2010) An eye tracking study on camelcase and under_score identifier styles. In: Proceedings of 18th international conference on program comprehension (ICPC). IEEE, pp 196–205
Sharif B, Falcone M, Maletic JI (2012) An eye-tracking study on the role of scan time in finding source code defects. In: Proceedings of the symposium on eye tracking research and applications. ETRA ’12. ACM, New York, pp 381–384
Sun YH, He PL, Chen ZG (2004) An improved term weighting scheme for vector space model. In: Proceedings of 2004 international conference on machine learning and cybernetics, vol 3. IEEE, pp 1692–1695
Uwano H, Nakamura M, Monden A, Matsumoto K (2006) Analyzing individual performance of source code review using reviewers’ eye movement. In: Proceedings of the 2006 symposium on eye tracking research & applications (ETRA). ACM, New York, pp 133–140
Wang J, Peng X, Xing Z, Zhao W (2011) An exploratory study of feature location process: distinct phases, recurring patterns, and elementary actions. In: Proceedings of 27th IEEE international conference on software maintenance (ICSM), pp 213–222
Yusuf S, Kagdi H, Maletic JI (2007) Assessing the comprehension of uml class diagrams via eye tracking. In: Proceedings of 15th IEEE international conference on program comprehension (ICPC). IEEE,pp 113–122
Acknowledgments
The authors would like to thank all the participants of the case studies as this work would not be possible without their collaboration. This work has been partially supported by the NSERC Research Chairs on Software Cost-effective Change, Evolution and on Software Patterns and Patterns of Software, and by Fonds de recherche du Québec – Nature et technologies(FRQNT).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Massimiliano Di Penta and Jonathan Maletic
Rights and permissions
About this article
Cite this article
Ali, N., Sharafi, Z., Guéhéneuc, YG. et al. An empirical study on the importance of source code entities for requirements traceability. Empir Software Eng 20, 442–478 (2015). https://doi.org/10.1007/s10664-014-9315-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-014-9315-y