1 Introduction

While a software system is being developed, software engineers use version control repositories to produce and manage their code. Researchers and testers report issues, which are then stored in other repositories, known as issue-tracking systems, where many kinds of issues can be found.

Issue-tracking systems facilitate the process of solving these bugs, but their shortcoming is the difficulty in distinguishing which of the reports are bug reports or not. These systems provide an interface to manage reports of maintenance activities where researchers can report issues describing bug reports, features or code optimizations. During the bug triage process it is difficult to distinguish bug reports from other issues; a study describes that two of five issues are misclassified [2]. This misclassification causes bias predicting bugs where non-bug reports are taken into account.

To distinguish the bug reports we could have used automatic classification systems, as described in [1], but the vocabulary used in the description of the issues could change from project to project, as well as the policy depending on the project. Consequently, data validation is recommended as mentioned in  [2].

Linking a bug report in a issue-tracking system and the corresponding fix-commit may not be a trivial task. Traditionally, the methods used in link recovery [4, 5] are based on text patterns or the mining of key phrases. Unfortunately, these methods include many false negatives causing bias in data [6, 7]. Therefore other methods, such as the Mlink approach, have been developed to link bug report with fixes using features in the changed source files corresponding to commit logs in addition to the traditional textual features [3]. But in all of these methods, it is supposed that the issues are bug reports.

In this paper, we present a tool that displays, to the benefit of the researchers, a collection of all the necessary information needed to decide if an issue is a bug report or not. The tool, through the collection of exhaustive data on bug reports and the corresponding fix-commit, along with researchers extensive knowledge of the system, will help the last in their decision making, leading them into choosing only bug reports. This way they will not recur in any bias induced by non bug reports. To the best of our knowledge, this is the first tool that provides support to the identification of bugs and classification of bug reports. The need of the contribution of this tool arises from the increasing interest that both the academy and industry world is showing in the bug classification as a primary factor in modern software development.

2 The Tool

The tool is a web application, therefore it runs in a browser. It displays the main data for distinguishing bug reports from others issues. The researchers will be responsible for classifying the issues from Launchpad as bug reports or not, and can thereby explain their decision for each issue. The issues are what we will refer as tickets during the paper.

2.1 Architecture

The tool integrates information from Launchpad as issue-tracking system and Gerrit as code review system. The Fig. 1 presents the architecture of the tool. The tool was developed using JavaScript, Node, JQuery and HTML5 technologies. The queries to the API of Gerrit and Launchpad are executed on server side. The responses are displayed on the client side. The end user can view the information displayed and interact with the server through events. Both sides exchange information using JSON files along with using their own REST API. Furthermore, we use a third-party application between GitHub and the browser in order to integrate some functionalities from GitHub.

Fig. 1.
figure 1

Architecture of the tool.

2.2 Main Features

Figure 2 illustrates a screen short of the main tab of the tool. In particular, the tool displays the ID of the tickets which are extracted randomly from each issue-tracking repository of OpenStack. Other related information is displayed. Based on all these data, the researcher can decide whether the issue is a bug report or not. We focused on displaying the main parameters that help in the classification of reports, such as title and description of the report, as well as the description of the fix commit. For example for ticket ID 1531734 the tool displays the information related with the ticket in Launchpad and its corresponding review in Gerrit.

There is other additional information that the tool does not displays. If the researchers find it necessary, they can access the Launchapd and Gerrit web pages, respectively of the ticket and review, through the links provided by the tool. Thereby they can access extra information such as the comments written by code review researchers that correspond to that particular ticket. This provides a mean for tracking the history of the ticket from the moment it was opened until it was closed.

The tool further facilitates researchers to record and express their opinion about the ticket after reading all the information that is automatically displayed. They have to classify the ticket as Bug report or Not Bug report. Due to unsophisticated description used in the ticket, the researchers could doubt the classification. For this reason we add an extra option in the classification, Undecided. Furthermore, the researchers have a text area to write keywords found in the title, in the description of the ticket and commit message, that support their classification. Finally, they can leave their comment on why they classified a report as Bug, Not Bug or Undecided. Such information, in the future, will help us building an automatic bug classification system.

Another feature of the tool is that it allows to carry out a blind analysis of the tickets. Since all the data analysis inserted about a ticket is saved in a file on ones GitHub account, such analysis can be done by two or more researchers in parallel. By saving the data in GitHub, we could also measure the time that each researcher need for an analysis, which tickets were more difficult to analyze and other metrics that can help us understanding the current problem of issue misclassification.

Fig. 2.
figure 2

Screenshot of analyze tab

The web page provides different functionalities depending on the tab the researcher is browsing. We explain these functionalities in the following:

  1. 1.

    Tab Repository: In this tab you can choose which repository you want to analyze. Currently the tool supports the four principal repositories of OpenStack: Cinder, Nova, Neutron and Horizon.

  2. 2.

    Tab Analyze: It is the Tab illustrated in Fig. 2. It is where all the data from a specific ticket are displayed. The user can either select a random identifier or insert one of his choice. According to the data retrieved from Launchpad and Gerrit, the researcher can classify the ticket.

  3. 3.

    Tab Statistics: This tab extracts the data already analyzed by a researcher involved in the analysis from their user account in GitHub. It analyzes these data and displays a distribution of the classifications in a table;

  4. 4.

    Tab Modify: In case the researcher thinks to have inserted a mistake during the analysis, in this tab he/she can edit any of the data saved in his/her GitHub repository.

At the current state we present the initial version of the tool which is available at;Footnote 1, as well as a demonstration videoFootnote 2. It is licensed under GPL 0 (General Public License) and you can find the code at a GitHub pageFootnote 3. Anyone can use the tool, regardless of having GitHub account or not. However, it should be noted, for the researcher to save, modify data and see statistics of analysed tickets automatically, he/she should create a GitHub repository with the same name as the OpenStack project to be analysed, for example if the OpenStack project name is Nova, than the GitHub repository name should be Nova.

3 Results

We have analyzed 459 different tickets under the support of the initial version of the tool. 125 tickets where from Cinder, 125 from Nova, 125 from Horizon and 84 from Neutron. All the tickets have been analyzed by two out of the three researchers. The Table 1 shows the percentage of tickets classified as bug reports for the different researchers. These results don’t report for some combinations of researchers because of in some projects, only a researcher analyzed all the tickets and the two remaining analyzed the half of these tickets each one.

Table 1. Classification statistics of each researcher

The percentages between R1 and R3 are really similar, whereas the R2 has identified more Bug Reports in his analysis. But, the three results support the misclassification present in bug tracking systems. Furthermore, according to  [2]’s work, approximately two of five issues are misclassified in the analysis of R1 and R3.

Focusing in the concordance between researchers analyzing the same ticket, 417 tickets present a double bind review process, obtaining that each ticket was analyzed by two researchers. Table 2 shows the percentage of concordance between researchers in each repository after the analysis of the tickets.

Table 2. Concordance between each researcher in each repository

Table 2 shows that the concordance of the researchers is high, but, also demonstrate the difficulty to classify tickets as bug report or as not bug report, because each researcher can have different opinions about a specific ticket. The concordance could be higher if they were expert in the project.

All data from the analysis are available in the GitHub repositories of the researchersFootnote 4, the repositories having the same name of the projects analyzed in OpenStack.

3.1 Future Work

Since we are conducting empirical studies based on OpenStack projects, the current tool is limited to OpenStack as a pilot project. In the future, our aim is to extend the tool at extracting tickets from others bug tracking systems, such as Bugzilla or GitHub where the server will operate against them to analyze the most OSS project. Additionally, we aim to study the misclassification in this OSS projects. We would like to add more features to the tool. One of them would be to display information such as the lines of code changed in the files affected from the fix-commit, along with the code in the bug seeding moment. Furthermore, we would like to implement an automatic classifier for the tickets, based on the semantic of the description of the ticket and the fix-commit. The result will indicate a percentage of confidence about whether a ticket is a bug report or not. However, the researcher will always make the final decision. The automatic classification will enable researchers to focus only on problematic issues, which can be easily misclassified.

We also aim to investigate what will be the results if the data sources used by the tool to automatically extract tickets are used in isolation to manually classify bugs or other possible bug classifying tools. This will help us validate our results and the tool to further improve.