Empirical Software Engineering

, Volume 21, Issue 5, pp 1960–1989

Studying the needed effort for identifying duplicate issues

  • Mohamed Sami Rakha
  • Weiyi Shang
  • Ahmed E. Hassan
Article

DOI: 10.1007/s10664-015-9404-6

Cite this article as:
Rakha, M.S., Shang, W. & Hassan, A.E. Empir Software Eng (2016) 21: 1960. doi:10.1007/s10664-015-9404-6

Abstract

Many recent software engineering papers have examined duplicate issue reports. Thus far, duplicate reports have been considered a hindrance to developers and a drain on their resources. As a result, prior research in this area focuses on proposing automated approaches to accurately identify duplicate reports. However, there exists no studies that attempt to quantify the actual effort that is spent on identifying duplicate issue reports. In this paper, we empirically examine the effort that is needed for manually identifying duplicate reports in four open source projects, i.e., Firefox, SeaMonkey, Bugzilla and Eclipse-Platform. Our results show that: (i) More than 50 % of the duplicate reports are identified within half a day. Most of the duplicate reports are identified without any discussion and with the involvement of very few people; (ii) A classification model built using a set of factors that are extracted from duplicate issue reports classifies duplicates according to the effort that is needed to identify them with a precision of 0.60 to 0.77, a recall of 0.23 to 0.96, and an ROC area of 0.68 to 0.80; and (iii) Factors that capture the developer awareness of the duplicate issue’s peers (i.e., other duplicates of that issue) and textual similarity of a new report to prior reports are the most influential factors in our models. Our findings highlight the need for effort-aware evaluation of approaches that identify duplicate issue reports, since the identification of a considerable amount of duplicate reports (over 50 %) appear to be a relatively trivial task for developers. To better assist developers, research on identifying duplicate issue reports should put greater emphasis on assisting developers in identifying effort-consuming duplicate issues.

Keywords

Mining software repositories Automated detection of duplicate issues Software issue reports Effort based analysis Duplicate bug reports 

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Mohamed Sami Rakha
    • 1
  • Weiyi Shang
    • 1
  • Ahmed E. Hassan
    • 1
  1. 1.Software Analysis and Intelligence Lab (SAIL), School of ComputingQueen’s UniversityKingstonCanada

Personalised recommendations