Skip to main content
Log in

Propagating frugal user feedback through closeness of code dependencies to improve IR-based traceability recovery

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Traceability recovery captures trace links among different software artifacts (e.g., requirements and code) when two artifacts cover the same part of system functionalities. These trace links provide important support for developers in software maintenance and evolution tasks. Information Retrieval (IR) is now the mainstream technique for semi-automatic approaches to recover candidate trace links based on textual similarities among artifacts. The performance of IR-based traceability recovery is evaluated by the ranking of relevant traces in the generated lists of candidate links. Unfortunately, this performance is greatly hindered by the vocabulary mismatch problem between different software artifacts. To address this issue, a growing body of enhancing strategies based on user feedback is proposed to adjust the calculated IR values of candidate links after the user verifies part of these links. However, the improvement brought by this kind of strategies requires a large amount of user feedback, which could be infeasible in practice. In this paper, we propose to improve IR-based traceability recovery by propagating a small amount of user feedback through the closeness analysis on call and data dependencies in the code. Specifically, our approach first iteratively asks users to verify a small set of candidate links. The collected frugal feedback is then composed with the quantified functional similarity for each code dependency (called closeness) and the generated IR values to improve the ranking of unverified links. An empirical evaluation based on nine real-world systems with three mainstream IR models shows that our approach can outperform five baseline approaches by using only a small amount of user feedback.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. http://www.CoEST.org

  2. https://agile.csc.ncsu.edu/iTrust/wiki/doku.php

  3. Java Virtual Machine Tool Interface, https://docs.oracle.com/javase/8/docs/platform/jvmti/jvmti.html

  4. https://github.com/huiAlex/CLUSTER-Prime

  5. https://agile.csc.ncsu.edu/iTrust/wiki/doku.php

  6. https://github.com/bardsoftware/ganttproject

  7. https://github.com/apache/maven

  8. https://github.com/apache/pig

  9. https://github.com/infinispan/infinispan

  10. https://github.com/kiegroup/drools

  11. https://github.com/apache/derby

  12. http://www.seamframework.org/Seam2.html

  13. https://github.com/apache/groovy

  14. https://gradle.org/

  15. https://github.com/gousiosg/java-callgraph

  16. https://www.atlassian.com/software/jira

  17. https://github.com/

  18. https://issues.redhat.com/browse/JBSEAM-4810

  19. https://issues.apache.org/jira/browse/DERBY-5840

  20. https://issues.redhat.com/browse/ISPN-5553

  21. https://issues.apache.org/jira/browse/DERBY-2583

  22. https://issues.apache.org/jira/browse/DERBY-2569

  23. https://issues.apache.org/jira/browse/DERBY-1478

  24. https://issues.apache.org/jira/browse/PIG-1429

References

  • Abadi A, Nisenson M, Simionovici Y (2008) A traceability technique for specifications. In: Krikhaar RL, Lämmel R, Verhoef C (eds) The 16th IEEE international conference on program comprehension, ICPC 2008, Amsterdam, The Netherlands, June 10-13, 2008, IEEE Computer Society, pp 103–112

  • Ali N, Guéhéneuc Y, Antoniol G (2013) Trustrace: Mining software repositories to improve the accuracy of requirement traceability links. IEEE Trans Software Eng 39(5):725–741. https://doi.org/10.1109/TSE.2012.71

    Article  Google Scholar 

  • Ali N, Sharafi Z, Guéhéneuc Y, Antoniol G (2015) An empirical study on the importance of source code entities for requirements traceability. Empirical Software Engineering 20(2):442–478

    Article  Google Scholar 

  • Antoniol G, Canfora G, Casazza G, Lucia AD, Merlo E (2002) Recovering traceability links between code and documentation. IEEE Trans Software Eng 28(10):970–983

    Article  Google Scholar 

  • Antoniol G, Casazza G, Cimitile A (2000) Traceability recovery by modeling programmer behavior. In: Proceedings of the seventh working conference on reverse engineering, WCRE’00, Brisbane, Australia, November 23-25, 2000, IEEE Computer Society, pp 240–247

  • Baeza-Yates RA, Ribeiro-Neto BA (1999) Modern information retrieval. ACM Press / Addison-Wesley

  • Baezayates R, Ribeironeto B (2011) Modern information retrieval. Addison-Wesley Publishing CompanyUnited States

  • Binkley D (2007) Source code analysis: A road map. In: Future of software engineering (FOSE ’07). IEEE Computer Society, Los Alamitos, CA, USA, pp 104–119

  • Bravenboer M, Smaragdakis Y (2009) Strictly declarative specification of sophisticated points-to analyses. In: Arora S, Leavens GT (eds) Proceedings of the 24th Annual ACM SIGPLAN conference on object-oriented programming, systems, languages, and applications, OOPSLA 2009, October 25-29, 2009, Orlando, Florida, USA, ACM, pp 243–262. https://doi.org/10.1145/1640089.1640108

  • Burgstaller B, Egyed A (2010) Understanding where requirements are implemented. In: 26th IEEE International conference on software maintenance (ICSM 2010), September 12–18, 2010. Romania, IEEE Computer Society, Timisoara, pp 1–5

  • Cleland-Huang J (2013) Are requirements alive and kicking? IEEE Softw 30(3):13–15

    Article  Google Scholar 

  • Cleland-Huang J, Gotel O, Hayes JH, Mäder P, Zisman A (2014) Software traceability: trends and future directions. In: Herbsleb JD, Dwyer MB (eds) Proceedings of the on future of software engineering, FOSE 2014, Hyderabad, India, May 31 - June 7, 2014, ACM, pp 55–69

  • Cleland-Huang J, Settimi R, Duan C, Zou X (2005) Utilizing supporting evidence to improve dynamic requirements traceability. In: 13th IEEE international conference on requirements engineering (RE 2005), 29 August - 2 September 2005. France, IEEEComputer Society, Paris, pp 135–144

  • De Lucia A, Marcus A, Oliveto R, Poshyvanyk D (2012) Information retrieval methods for automated traceability recovery. In: Gotel O, Zisman A (eds) Cleland-Huang J. Software and systems traceability, Springer, pp 71–98

  • De Lucia A, Oliveto R, Sgueglia P (2006) Incremental approach and user feedbacks: a silver bullet for traceability recovery. In: 22nd IEEE international conference on software maintenance (ICSM 2006), 24–27 September 2006. Pennsylvania, USA, IEEE Computer Society, Philadelphia, pp 299–309

  • De Lucia A, Penta MD, Oliveto R, Panichella A, Panichella S (2011) Improving ir-based traceability recovery using smoothing filters. In: The 19th IEEE international conference on program comprehension, ICPC 2011, Kingston, ON, Canada, June 22-24, 2011, IEEE Computer Society, pp 21–30

  • Dit B, Revelle M, Poshyvanyk D (2013) Integrating information retrieval, execution and link analysis algorithms to improve feature location in software. Empirical Software Engineering 18(2):277–309

    Article  Google Scholar 

  • Eaddy M, Aho AV, Antoniol G, Guéhéneuc Y (2008) CERBERUS: tracing requirements to source code using information retrieval, dynamic analysis, and program analysis. In: Krikhaar RL, Lämmel R, Verhoef C (eds) The 16th IEEE international conference on program comprehension, ICPC 2008, Amsterdam, The Netherlands, June 10-13, 2008, IEEE Computer Society, pp 53–62, https://doi.org/10.1109/ICPC.2008.39

  • Egyed A, Graf F, Grünbacher P (2010) Effort and quality of recovering requirements-to-code traces: Two exploratory experiments. In: RE 2010, 18th IEEE international requirements engineering conference, Sydney, New South Wales, Australia, September 27 - October 1, 2010, IEEE Computer Society, pp 221–230

  • Gethers M, Oliveto R, Poshyvanyk D, Lucia AD (2011) On integrating orthogonal information retrieval methods to improve traceability recovery. In: IEEE 27th international conference on software maintenance, ICSM 2011, Williamsburg, VA, USA, September 25-30, 2011, IEEE Computer Society, pp 133–142

  • Guo J, Cheng J, Cleland-Huang J (2017) Semantically enhanced software traceability using deep learning techniques. In: Uchitel S, Orso A, Robillard MP (eds) Proceedings of the 39th international conference on software engineering, ICSE 2017, Buenos Aires, Argentina, May 20-28, 2017, IEEE / ACM, pp 3–14

  • Hayes JH, Dekhtyar A, Sundaram SK (2006) Advancing candidate link generation for requirements tracing: The study of methods. IEEE Trans Software Eng 32(1):4–19

    Article  Google Scholar 

  • Khatiwada S, Tushev M, Mahmoud A (2020) On combining IR methods to improve bug localization. In: ICPC ’20: 28th International conference on program comprehension, Seoul, Republic of Korea, July 13-15, 2020, ACM, pp 252–262. https://doi.org/10.1145/3387904.3389280

  • Kuang H, Mäder P, Hu H, Ghabi A, Huang L, Lü J, Egyed A (2015) Can method data dependencies support the assessment of traceability between requirements and source code? J Softw Evol Process 27(11):838–866

    Article  Google Scholar 

  • Kuang H, Gao H, Hu H, Ma X, Lu J, Mäder P, Egyed A (2019) Using frugal user feedback with closeness analysis on code to improve ir-based traceability recovery. In: Guéhéneuc Y, Khomh F, Sarro F (eds) Proceedings of the 27th international conference on program comprehension, ICPC 2019, Montreal, QC, Canada, May 25-31, 2019, IEEE / ACM, pp 369–379

  • Kuang H, Nie J, Hu H, Lü J (2016) Improving automatic identification of outdated requirements by using closeness analysis based on source code changes. In: Zhang L, Xu C (eds) Software engineering and methodology for emerging domains. Springer Singapore, Singapore, pp 52–67

  • Kuang H, Nie J, Hu H, Rempel P, Lu J, Egyed A, Mäder P (2017) Analyzing closeness of code dependencies for improving ir-based traceability recovery. In: Pinzger M, Bavota G, Marcus A (eds) IEEE 24th international conference on software analysis, evolution and reengineering, SANER 2017, Klagenfurt, Austria, February 20-24, 2017, IEEE Computer Society, pp 68–78

  • Li Y, Tan T, Møller A, Smaragdakis Y (2020) A principled approach to selective context sensitivity for pointer analysis. ACM Trans Program Lang Syst 42(2):10:1-10:40. https://doi.org/10.1145/3381915

    Article  Google Scholar 

  • Lin Y, Meng G, Xue Y, Xing Z, Sun J, Peng X, Liu Y, Zhao W, Dong JS (2017a) Mining implicit design templates for actionable code reuse. In: Rosu G, Penta MD, Nguyen TN (eds) Proceedings of the 32nd IEEE/ACM international conference on automated software engineering, ASE 2017, Urbana, IL, USA, October 30 - November 03, 2017, IEEE Computer Society, pp 394–404

  • Lin Y, Sun J, Xue Y, Liu Y, Dong JS (2017b) Feedback-based debugging. In: Uchitel S, Orso A, Robillard MP (eds) Proceedings of the 39th international conference on software engineering, ICSE 2017, Buenos Aires, Argentina, May 20-28, 2017, IEEE / ACM, pp 393–403

  • Livshits B, Sridharan M, Smaragdakis Y, Lhoták O, Amaral JN, Chang BE, Guyer SZ, Khedker UP, Møller A, Vardoulakis D (2015) In defense of soundiness: a manifesto. Commun ACM 58(2):44–46. https://doi.org/10.1145/2644805

    Article  Google Scholar 

  • Li X, Zhu S, d’Amorim M, Orso A (2018) Enlightened debugging. In: Chaudron M, Crnkovic I, Chechik M, Harman M (eds) Proceedings of the 40th international conference on software engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018, ACM, pp 82–92

  • Lohar S, Amornborvornwong S, Zisman A, Cleland-Huang J (2013) Improving trace accuracy through data-driven configuration and composition of tracing features. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering, Association for Computing Machinery, New York, NY, USA, p 378–388. https://doi.org/10.1145/2491411.2491432

  • Macbeth G, Razumiejczyk E, Ledesma RD (2011) Cliff’s delta calculator: A non-parametric effect size program for two groups of observations. Universitas Psychologica 10(2):545–555

    Article  Google Scholar 

  • Mäder P, Egyed A (2012) Assessing the effect of requirements traceability for software maintenance. In: 28th IEEE international conference on software maintenance, ICSM 2012, Trento, Italy, September 23-28, 2012, IEEE Computer Society, pp 171–180

  • Marcus A, Maletic JI (2003) Recovering documentation-to-source-code traceability links using latent semantic indexing. In: Clarke LA, Dillon L, Tichy WF (eds) Proceedings of the 25th international conference on software engineering, May 3-10, 2003, Portland, Oregon, USA, IEEE Computer Society, pp 125–137

  • McMillan C, Poshyvanyk D, Revelle M (2009) Combining textual and structural analysis of software artifacts for traceability link recovery. In: Antoniol G, Poshyvanyk D, Oliveto R (eds) ICSE workshop on traceability in emerging forms of software engineering, TEFSE@ICSE 2009. Vancouver, BC, Canada, 18 May, 2009, IEEE Computer Society, pp 41–48

  • Palomba F, Salza P, Ciurumelea A, Panichella S, Gall HC, Ferrucci F, Lucia AD (2017) Recommending and localizing change requests for mobile apps based on user reviews. In: Uchitel S, Orso A, Robillard MP (eds) Proceedings of the 39th International conference on software engineering, ICSE 2017, Buenos Aires, Argentina, May 20-28, 2017, IEEE / ACM, pp 106–117

  • Panichella A (2021) A systematic comparison of search-based approaches for lda hyperparameter tuning. Information and Software Technology 130:106411. https://doi.org/10.1016/j.infsof.2020.106411

    Article  Google Scholar 

  • Panichella A, Dit B, Oliveto R, Di Penta M, Poshynanyk D, De Lucia A (2013) How to effectively use topic models for software engineering tasks? an approach based on genetic algorithms. In: 2013 35th International conference on software engineering (ICSE), pp 522–531. https://doi.org/10.1109/ICSE.2013.6606598

  • Panichella A, Dit B, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2016) Parameterizing and assembling ir-based solutions for se tasks using genetic algorithms. In: 2016 IEEE 23rd international conference on software analysis, evolution, and reengineering (SANER), vol 1, pp 314–325. https://doi.org/10.1109/SANER.2016.97

  • Panichella A, Lucia AD, Zaidman A (2015) Adaptive user feedback for ir-based traceability recovery. In: Mäder P, Oliveto R (eds) 8th IEEE/ACM International symposium on software and systems traceability, SST 2015, Florence, Italy, May 17, 2015, IEEE Computer Society, pp 15–21

  • Panichella A, McMillan C, Moritz E, Palmieri D, Oliveto R, Poshyvanyk D, Lucia AD (2013) When and how using structural information to improve ir-based traceability recovery. In: Cleve A, Ricca F, Cerioli M (eds) 17th European conference on software maintenance and reengineering, CSMR 2013, Genova, Italy, March 5-8, 2013, IEEE Computer Society, pp 199–208

  • Penta MD, Gradara S, Antoniol G (2002) Traceability recovery in RAD software systems. In: 10th International workshop on program comprehension (IWPC 2002), 27–29 June 2002. France, IEEE Computer Society, Paris, pp 207–216

  • Porter MF (1980) An algorithm for suffix stripping. Program 14(3):130–137. https://doi.org/10.1108/eb046814

    Article  Google Scholar 

  • Poshyvanyk D, Guéhéneuc Y, Marcus A, Antoniol G, Rajlich V (2007) Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans Software Eng 33(6):420–432

    Article  Google Scholar 

  • Rath M, Rempel P, Mäder P (2017) The ilmseven dataset. In: Moreira A, Araújo J, Hayes J, Paech B (eds) 25th IEEE international requirements engineering conference, RE 2017, Lisbon, Portugal, September 4-8, 2017, IEEE Computer Society, pp 516–519

  • Rath M, Rendall J, Guo JLC, Cleland-Huang J, Mäder P (2019) Traceability in the wild: Automatically augmenting incomplete trace links. In: Becker S, Bogicevic I, Herzwurm G, Wagner S (eds) Software engineering and software management, SE/SWM 2019, Stuttgart, Germany, February 18-22, 2019, GI, LNI, vol P-292, p 63

  • Rempel P, Mäder P (2017) Preventing defects: The impact of requirements traceability completeness on software quality. IEEE Trans Software Eng 43(8):777–797

    Article  Google Scholar 

  • Rocchio JJ (1971) Relevance feedback in information retrieval. In: Salton G (ed) The Smart retrieval system - experiments in automatic document processing. Prentice-Hall, Englewood Cliffs, NJ, pp 313–323

  • Salton G, Buckley C (1990) Improving retrieval performance by relevance feedback. J Am Soc Inf Sci 41(4):288–297

    Article  Google Scholar 

  • Sharif B, Meinken J, Shaffer T, Kagdi HH (2017) Eye movements in software traceability link recovery. Empir Softw Eng 22(3):1063–1102. https://doi.org/10.1007/s10664-016-9486-9

    Article  Google Scholar 

  • Sui L, Dietrich J, Tahir A, Fourtounis G (2020) On the recall of static call graph construction in practice. In: Rothermel G, Bae D (eds) ICSE ’20: 42nd international conference on software engineering, Seoul, South Korea, 27 June - 19 July, 2020, ACM, pp 1049–1060. https://doi.org/10.1145/3377811.3380441

  • Walters B, Shaffer T, Sharif B, Kagdi HH (2014) Capturing software traceability links from developers’ eye gazes. In: Roy CK, Begel A, Moonen L (eds) 22nd International conference on program comprehension, ICPC 2014, Hyderabad, India, June 2-3, 2014, ACM, pp 201–204. https://doi.org/10.1145/2597008.2597795

  • Wilcoxon F (1944) Individual comparisons by ranking methods. Biom Bull Biometrics 1(6):80–83

    Article  Google Scholar 

  • Wohlrab R, Knauss E, Steghöfer J, Maro S, Anjorin A, Pelliccione P (2020) Collaborative traceability management: a multiple case study from the perspectives of organization, process, and culture. Requir Eng 25(1):21–45

    Article  Google Scholar 

  • Zyrianov V, Newman CD, Guarnera DT, Collard ML, Maletic JI (2019) srcptr: a framework for implementing static pointer analysis approaches. In: Guéhéneuc Y, Khomh F, Sarro F (eds) Proceedings of the 27th international conference on program comprehension, ICPC 2019, Montreal, QC, Canada, May 25-31, 2019, IEEE / ACM, pp 144–147. https://doi.org/10.1109/ICPC.2019.00031

Download references

Acknowledgements

This work is funded by the National Natural Science Foundation of China (Grant Nos. 61690204 and 61802173), the general program of the State Key Laboratory for Novel Software Technology (Grant Nos. ZZKT2021B05), the Collaborative Innovation Center of Novel Software Technology and Industrialization, the German Ministry of Education and Research (BMBF) grant: 01IS16003B and by DFG grant: MA 5030/3-1, and funded by the Austrian Science Fund (FWF), grand no. P31989, and by the Austrian COMET K1-Centre Pro2Future of the Austrian Research Promotion Agency (FFG) with funding from the Austrian ministries BMVIT and BMDW, and the Province of Upper Austria.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongyu Kuang.

Ethics declarations

Conflicts of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Communicated by: Federica Sarro and Foutse Khomh

This article belongs to the Topical Collection: International Conference on Program Comprehension (ICPC)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gao, H., Kuang, H., Ma, X. et al. Propagating frugal user feedback through closeness of code dependencies to improve IR-based traceability recovery. Empir Software Eng 27, 41 (2022). https://doi.org/10.1007/s10664-021-10091-5

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-021-10091-5

Keywords

Navigation