Abstract
[Context and motivation] Traces between issues in issue tracking systems connect bug reports to software features, they connect competing implementation ideas for a software feature or they identify duplicate issues. However, the trace quality is usually very low. To improve the trace quality between requirements, features, and bugs, information retrieval algorithms for automated trace retrieval can be employed. Prevailing research focusses on structured and well-formed documents, such as natural language requirement descriptions. In contrast, the information in issue tracking systems is often poorly structured and contains digressing discussions or noise, such as code snippets, stack traces, and links. Since noise has a negative impact on algorithms for automated trace retrieval, this paper asks: [Question/Problem] Do information retrieval algorithms for automated traceability perform effectively on issue tracking system data? [Results] This paper presents an extensive evaluation of the performance of five information retrieval algorithms. Furthermore, it investigates different preprocessing stages (e.g. stemming or differentiating code snippets from natural language) and evaluates how to take advantage of an issue’s structure (e.g. title, description, and comments) to improve the results. The results show that algorithms perform poorly without considering the nature of issue tracking data, but can be improved by project-specific preprocessing and term weighting. [Contribution] Our results show how automated trace retrieval on issue tracking system data can be improved. Our manually created gold standard and an open-source implementation based on the OpenTrace platform can be used by other researchers to further pursue this topic.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Among other criteria as defined in ISO/IEC/IEEE 29148:2011 [14].
- 2.
More preprocessing techniques are available. As an example [9] consider only nouns, adjectives, adverbs, and verbs for further processing.
- 3.
Figure 1 intentionally omits other meta-data such as authoring information, date- and time-stamps, or the issue status, since it is not relevant for the remainder of this paper.
- 4.
The researched projects use the ITSs Redmine and GitHub. In Redmine the issue type needs to be specified, GitHub allows tagging.
- 5.
Dag and Gervasi [20] surveyed automated approaches to improve the NL quality.
- 6.
This is discussed in depth in [19].
- 7.
- 8.
http://www2.inf.h-brs.de/~tmerte2m – In addition to the source code, gold standards, extracted issues, and experiment results are also available for download.
- 9.
- 10.
Removing code snippets and other noise can be achieved automatically, e.g. [18].
- 11.
We also allowed the annotation of the following trace types: \(I_1\) precedes, is parent of, blocks, clones \(I_2\).
References
Angius, E., Witte, R.: OpenTrace: an open source workbench for automatic software traceability link recovery. In: 2012 19th Working Conference on Reverse Engineering, pp. 507–508 (2012)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval: The Concepts and Technology Behind Search. Addison-Wesley Professional, Boston (2011)
Borg, M., Runeson, P., Ardö, A.: Recovering from a decade: a systematic mapping of information retrieval approaches to software traceability. Empirical Softw. Eng. 19(6), 1565–1616 (2014)
Chen, X., Hosking, J., Grundy, J.: A combination approach for enhancing automated traceability. In: Proceedings of the 33rd International Conference on Software Engineering, Waikiki, Honolulu, HI, USA, pp. 912–915. ACM (2011)
Cleland-Huang, J., Settimi, R., Romanova, E., Berenbach, B., Clark, S.: Best practices for automated traceability. Computer 40(6), 27–35 (2007)
Cunningham, H., Maynard, D., Bontcheva, K.: Text Processing with GATE (Version 6). University of Sheffield Department of Computer Science (2011)
Furnas, G.W., Deerwester, S., Dumais, S.T., Landauer, T.K.: Harshman, R.A., Streeter, L.A., Lochbaum, K.E.: Information retrieval using a singular value decomposition model of latent semantic structure. In: Proceedings of the 11th International ACM SIGIR Conference on R&D in Information Retrieval - SIGIR 1988, pp. 465–480. ACM Press, New York (1988)
Gervasi, V., Zowghi, D.: Mining requirements links. In: Berry, D., Franch, X. (eds.) REFSQ 2011. LNCS, vol. 6606, pp. 196–201. Springer, Heidelberg (2011)
Gervasi, V., Zowghi, D.: Supporting traceability through affinity mining. In: IEEE 22nd International Requirements Engineering Conference, pp. 143–152. IEEE (2014)
Gotel, O., Cleland-Huang, J., Hayes, J.H., Zisman, A., Egyed, A., Grunbacher, P., Antoniol, G.: The quest for Ubiquity: a roadmap for software and systems traceability research. In: 20th IEEE International Requirements Engineering Conference, pp. 71–80. IEEE (2012)
Guo, J., Cleland-Huang, J., Berenbach, B.: Foundations for an expert system in domain-specific traceability. In: 21st IEEE International Requirements Engineering Conference (RE), pp. 42–51, no. 978. IEEE (2013)
Heck, P., Zaidman, A.: Horizontal traceability for just-in-time requirements: the case for open source feature requests. J. Softw. Evol. Process 26(12), 1280–1296 (2014)
Huffman Hayes, J., Dekhtyar, A., Sundaram, S.K.: Advancing candidate link generation for requirements tracing: the study of methods. IEEE Trans. Softw. Eng. 32(1), 4–19 (2006)
ISO/IEC/IEEE: Intl. STANDARD ISO/IEC/IEEE 29148: 2011 (2011)
Lv, Y.: Lower-bounding term frequency normalization. In: ACM Conference on Information and Knowledge Management, pp. 7–16 (2011)
Lv, Y., Zhai, C.: When documents are very long, BM25 fails! In: Proceedings of the 34th International ACM SIGIR Conference on R&D in Information Retrieval - SIGIR 2011, p. 1103, no. I. ACM, New York (2011)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, 1st edn. Cambridge University Press, New York (2008)
Merten, T., Mager, B., Bürsner, S., Paech, B.: Classifying unstructured data into natural language text and technical information. In: Proceedings of the 11th Working Conference on Mining Software Repositories - MSR 2014, pp. 300–303. ACM, New York (2014)
Merten, T., Mager, B., Hübner, P., Quirchmayr, T., Bürsner, S., Paech, B.: Requirements communication in issue tracking systems in four open-source projects. In: 6th International Workshop on Requirements Prioritization and Communication (RePriCo), pp. 114–125. CEUR Workshop Proceedings (2015)
Natt och Dag, J., Gervasi, V.: Managing large repositories of naturallanguage requirements. In: Aurum, A., Wohlin, S. (eds.) Engineering and Managing Software Requirements, pp. 219–244. Springer, Heidelberg (2005)
Nguyen, A.T., Nguyen, T.T., Nguyen, H.A., Nguyen, T.N.: Multi-layered approach for recovering links between bug reports and fixes. In: Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering - FSE 2012, p. 1 (2012)
Niu, N., Mahmoud, A.: Enhancing candidate link generation for requirements tracing: the cluster hypothesis revisited. In: 20th IEEE International Requirements Engineering Conference, pp. 81–90. IEEE (2012)
Oliveto, R., Gethers, M., Poshyvanyk, D., De Lucia, A.: On the equivalence of information retrieval methods for automated traceability link recovery. In: 2010 IEEE 18th International Conference on Program Comprehension, pp. 68–71, June 2010
Paech, B., Hubner, P., Merten, T.: What are the features of this software? In: ICSEA 2014, The Ninth International Conference on Software Engineering Advances, pp. 97–106. IARIA XPS Press (2014)
Robertson, S.E., Walker, S., Hancock-Beaulieu, M., Gull, A., Lau, M.: Okapi at TREC. In: Proceedings of The First Text REtrieval Conference, TREC 1992, National Institute of Standards and Technology (NIST). Special Publication, pp. 21–30 (1992)
Runeson, P., Alexandersson, M., Nyholm, O.: Detection of duplicate defect reports using natural language processing. In: International Conference on SE, pp. 499–508 (2007)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Skerrett, I.:The Eclipse Foundation: The Eclipse Community Survey (2011)
Sultanov, H., Hayes, J.H.: Application of reinforcement learning to requirements engineering: requirements tracing. In: 21st IEEE International Requirements Engineering Conference, pp. 52–61. IEEE (2013)
Wang, X., Zhang, L., Xie, T., Anvik, J., Sun, J.: An approach to detecting duplicate bug reports using natural language and execution information. In: 2008 ACM/IEEE 30th International Conference on Software Engineering, pp. 461–470 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Merten, T., Krämer, D., Mager, B., Schell, P., Bürsner, S., Paech, B. (2016). Do Information Retrieval Algorithms for Automated Traceability Perform Effectively on Issue Tracking System Data?. In: Daneva, M., Pastor, O. (eds) Requirements Engineering: Foundation for Software Quality. REFSQ 2016. Lecture Notes in Computer Science(), vol 9619. Springer, Cham. https://doi.org/10.1007/978-3-319-30282-9_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-30282-9_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30281-2
Online ISBN: 978-3-319-30282-9
eBook Packages: Computer ScienceComputer Science (R0)