Skip to main content

Do Information Retrieval Algorithms for Automated Traceability Perform Effectively on Issue Tracking System Data?

  • Conference paper
  • First Online:
Requirements Engineering: Foundation for Software Quality (REFSQ 2016)

Abstract

[Context and motivation] Traces between issues in issue tracking systems connect bug reports to software features, they connect competing implementation ideas for a software feature or they identify duplicate issues. However, the trace quality is usually very low. To improve the trace quality between requirements, features, and bugs, information retrieval algorithms for automated trace retrieval can be employed. Prevailing research focusses on structured and well-formed documents, such as natural language requirement descriptions. In contrast, the information in issue tracking systems is often poorly structured and contains digressing discussions or noise, such as code snippets, stack traces, and links. Since noise has a negative impact on algorithms for automated trace retrieval, this paper asks: [Question/Problem] Do information retrieval algorithms for automated traceability perform effectively on issue tracking system data? [Results] This paper presents an extensive evaluation of the performance of five information retrieval algorithms. Furthermore, it investigates different preprocessing stages (e.g. stemming or differentiating code snippets from natural language) and evaluates how to take advantage of an issue’s structure (e.g. title, description, and comments) to improve the results. The results show that algorithms perform poorly without considering the nature of issue tracking data, but can be improved by project-specific preprocessing and term weighting. [Contribution] Our results show how automated trace retrieval on issue tracking system data can be improved. Our manually created gold standard and an open-source implementation based on the OpenTrace platform can be used by other researchers to further pursue this topic.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Among other criteria as defined in ISO/IEC/IEEE 29148:2011 [14].

  2. 2.

    More preprocessing techniques are available. As an example [9] consider only nouns, adjectives, adverbs, and verbs for further processing.

  3. 3.

    Figure 1 intentionally omits other meta-data such as authoring information, date- and time-stamps, or the issue status, since it is not relevant for the remainder of this paper.

  4. 4.

    The researched projects use the ITSs Redmine and GitHub. In Redmine the issue type needs to be specified, GitHub allows tagging.

  5. 5.

    Dag and Gervasi [20] surveyed automated approaches to improve the NL quality.

  6. 6.

    This is discussed in depth in [19].

  7. 7.

    https://lucene.apache.org.

  8. 8.

    http://www2.inf.h-brs.de/~tmerte2m – In addition to the source code, gold standards, extracted issues, and experiment results are also available for download.

  9. 9.

    In addition, removing stop words and stemming is considered IR best practices, e.g. [2, 17].

  10. 10.

    Removing code snippets and other noise can be achieved automatically, e.g. [18].

  11. 11.

    We also allowed the annotation of the following trace types: \(I_1\) precedes, is parent of, blocks, clones \(I_2\).

References

  1. Angius, E., Witte, R.: OpenTrace: an open source workbench for automatic software traceability link recovery. In: 2012 19th Working Conference on Reverse Engineering, pp. 507–508 (2012)

    Google Scholar 

  2. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval: The Concepts and Technology Behind Search. Addison-Wesley Professional, Boston (2011)

    Google Scholar 

  3. Borg, M., Runeson, P., Ardö, A.: Recovering from a decade: a systematic mapping of information retrieval approaches to software traceability. Empirical Softw. Eng. 19(6), 1565–1616 (2014)

    Article  Google Scholar 

  4. Chen, X., Hosking, J., Grundy, J.: A combination approach for enhancing automated traceability. In: Proceedings of the 33rd International Conference on Software Engineering, Waikiki, Honolulu, HI, USA, pp. 912–915. ACM (2011)

    Google Scholar 

  5. Cleland-Huang, J., Settimi, R., Romanova, E., Berenbach, B., Clark, S.: Best practices for automated traceability. Computer 40(6), 27–35 (2007)

    Article  Google Scholar 

  6. Cunningham, H., Maynard, D., Bontcheva, K.: Text Processing with GATE (Version 6). University of Sheffield Department of Computer Science (2011)

    Google Scholar 

  7. Furnas, G.W., Deerwester, S., Dumais, S.T., Landauer, T.K.: Harshman, R.A., Streeter, L.A., Lochbaum, K.E.: Information retrieval using a singular value decomposition model of latent semantic structure. In: Proceedings of the 11th International ACM SIGIR Conference on R&D in Information Retrieval - SIGIR 1988, pp. 465–480. ACM Press, New York (1988)

    Google Scholar 

  8. Gervasi, V., Zowghi, D.: Mining requirements links. In: Berry, D., Franch, X. (eds.) REFSQ 2011. LNCS, vol. 6606, pp. 196–201. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  9. Gervasi, V., Zowghi, D.: Supporting traceability through affinity mining. In: IEEE 22nd International Requirements Engineering Conference, pp. 143–152. IEEE (2014)

    Google Scholar 

  10. Gotel, O., Cleland-Huang, J., Hayes, J.H., Zisman, A., Egyed, A., Grunbacher, P., Antoniol, G.: The quest for Ubiquity: a roadmap for software and systems traceability research. In: 20th IEEE International Requirements Engineering Conference, pp. 71–80. IEEE (2012)

    Google Scholar 

  11. Guo, J., Cleland-Huang, J., Berenbach, B.: Foundations for an expert system in domain-specific traceability. In: 21st IEEE International Requirements Engineering Conference (RE), pp. 42–51, no. 978. IEEE (2013)

    Google Scholar 

  12. Heck, P., Zaidman, A.: Horizontal traceability for just-in-time requirements: the case for open source feature requests. J. Softw. Evol. Process 26(12), 1280–1296 (2014)

    Article  Google Scholar 

  13. Huffman Hayes, J., Dekhtyar, A., Sundaram, S.K.: Advancing candidate link generation for requirements tracing: the study of methods. IEEE Trans. Softw. Eng. 32(1), 4–19 (2006)

    Article  Google Scholar 

  14. ISO/IEC/IEEE: Intl. STANDARD ISO/IEC/IEEE 29148: 2011 (2011)

    Google Scholar 

  15. Lv, Y.: Lower-bounding term frequency normalization. In: ACM Conference on Information and Knowledge Management, pp. 7–16 (2011)

    Google Scholar 

  16. Lv, Y., Zhai, C.: When documents are very long, BM25 fails! In: Proceedings of the 34th International ACM SIGIR Conference on R&D in Information Retrieval - SIGIR 2011, p. 1103, no. I. ACM, New York (2011)

    Google Scholar 

  17. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, 1st edn. Cambridge University Press, New York (2008)

    Book  MATH  Google Scholar 

  18. Merten, T., Mager, B., Bürsner, S., Paech, B.: Classifying unstructured data into natural language text and technical information. In: Proceedings of the 11th Working Conference on Mining Software Repositories - MSR 2014, pp. 300–303. ACM, New York (2014)

    Google Scholar 

  19. Merten, T., Mager, B., Hübner, P., Quirchmayr, T., Bürsner, S., Paech, B.: Requirements communication in issue tracking systems in four open-source projects. In: 6th International Workshop on Requirements Prioritization and Communication (RePriCo), pp. 114–125. CEUR Workshop Proceedings (2015)

    Google Scholar 

  20. Natt och Dag, J., Gervasi, V.: Managing large repositories of naturallanguage requirements. In: Aurum, A., Wohlin, S. (eds.) Engineering and Managing Software Requirements, pp. 219–244. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  21. Nguyen, A.T., Nguyen, T.T., Nguyen, H.A., Nguyen, T.N.: Multi-layered approach for recovering links between bug reports and fixes. In: Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering - FSE 2012, p. 1 (2012)

    Google Scholar 

  22. Niu, N., Mahmoud, A.: Enhancing candidate link generation for requirements tracing: the cluster hypothesis revisited. In: 20th IEEE International Requirements Engineering Conference, pp. 81–90. IEEE (2012)

    Google Scholar 

  23. Oliveto, R., Gethers, M., Poshyvanyk, D., De Lucia, A.: On the equivalence of information retrieval methods for automated traceability link recovery. In: 2010 IEEE 18th International Conference on Program Comprehension, pp. 68–71, June 2010

    Google Scholar 

  24. Paech, B., Hubner, P., Merten, T.: What are the features of this software? In: ICSEA 2014, The Ninth International Conference on Software Engineering Advances, pp. 97–106. IARIA XPS Press (2014)

    Google Scholar 

  25. Robertson, S.E., Walker, S., Hancock-Beaulieu, M., Gull, A., Lau, M.: Okapi at TREC. In: Proceedings of The First Text REtrieval Conference, TREC 1992, National Institute of Standards and Technology (NIST). Special Publication, pp. 21–30 (1992)

    Google Scholar 

  26. Runeson, P., Alexandersson, M., Nyholm, O.: Detection of duplicate defect reports using natural language processing. In: International Conference on SE, pp. 499–508 (2007)

    Google Scholar 

  27. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)

    Article  MATH  Google Scholar 

  28. Skerrett, I.:The Eclipse Foundation: The Eclipse Community Survey (2011)

    Google Scholar 

  29. Sultanov, H., Hayes, J.H.: Application of reinforcement learning to requirements engineering: requirements tracing. In: 21st IEEE International Requirements Engineering Conference, pp. 52–61. IEEE (2013)

    Google Scholar 

  30. Wang, X., Zhang, L., Xie, T., Anvik, J., Sun, J.: An approach to detecting duplicate bug reports using natural language and execution information. In: 2008 ACM/IEEE 30th International Conference on Software Engineering, pp. 461–470 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thorsten Merten .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Merten, T., Krämer, D., Mager, B., Schell, P., Bürsner, S., Paech, B. (2016). Do Information Retrieval Algorithms for Automated Traceability Perform Effectively on Issue Tracking System Data?. In: Daneva, M., Pastor, O. (eds) Requirements Engineering: Foundation for Software Quality. REFSQ 2016. Lecture Notes in Computer Science(), vol 9619. Springer, Cham. https://doi.org/10.1007/978-3-319-30282-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-30282-9_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-30281-2

  • Online ISBN: 978-3-319-30282-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics