Appendix A: CLONE-HUNTRESS tool description and use
Here we describe CLONE-HUNTRESS, our online tool for (1) identifying clones between a user selected source project and a target list of Java-based GitHub projects, and (2) tracking changes to the clones over time. Our design goal was to provide a GitHub integrated, comprehensive, and efficient tool that users can interact with transparently, without the need to experience the mechanics of the clone search process. We wanted users to be able to come back to the tool over time and be able to monitor the changes to the cloned code. The tool is available at http://clone-det.ictic.sharif.edu/.
Finding clones among the many projects that exist in GitHub is very time consuming and computationally infeasible, specially when constrained by a reasonable response time limit. Also, as per our findings in the main text of this paper, clones are often found in pairs of projects in the same domain. Hence, to speed up the search among projects, CLONE-HUNTRESS allows users to search and track clones between projects in the same domain.
We selected a list of projects consisting of the 39422 Java-based GitHub projects, as an initial preset list that will grow over time through automatic addition of users’ projects. This number is derived from the implementation of the queries described in Section 4.1 and applying them to the Aprilst2018 GHTorrent MySQL dump. In other words we selected Java projects that had at least 2 developers, were at least 1 year old, and had more than 10 commits. We also eliminated projects that were forked.
The front page of the tool is shown in Fig. 12.
A.1 Login, registration and settings
CLONE-HUNTRESS is GitHub integrated. To use CLONE-HUNTRESS a user must first get authenticated through GitHub. Once authenticated, CLONE-HUNTRESS automatically pulls the list of the user’s publicly available projects and adds them to their profile within the tool. Users can chose one from these projects, or add other projects manually, as described later, as the source project for clone detection.
By clicking on the user’s GitHub name, email, or avatar on the dashboard, the Profile page is shown, where users can change the tool’s tracking frequency settings. As shown in Fig. 13, there are two options that govern CLONE-HUNTRESS’s behavior. The first one is the update frequency of the tracked clones. This frequency determines how often the tool should update the changes that are taking place on the tracked clone code. The second one is the frequency at which clone detection is executed from scratch. This option exists because after a sufficiently long time, many of the tracked clones may change via commits, and thus may not be similar anymore to the original clone in the user’s project.
A.2 Detecting and tracking clones
The main functionality of CLONE-HUNTRESS i.e., tacking clones, is accessible through the ”Add project” button on the top right corner of the dashboard (Fig. 14) which redirects the user to the corresponding page (Fig. 15, top) where users can select a project from their list of GitHub projects. In addition to the list of user’s GitHub projects, any other project of interest can be selected as the source by providing its URL directly, as illustrated in Fig. 15 (top). Once a project is specified, the tool will ask for the project’s application domain, and once it is specified and ”Get projects” is pressed, it will present a list of all projects (within its current project list) in that application domain (Fig. 15, bottom).
Users can select up to 20 target projects from the given list, to detect clones between them and the source project. These limitations are imposed for two reasons: 1) Hardware resource limitations and response time limits and 2) The fact that tracking a large number of projects eventually leads to confusion rather than providing benefits. Users are also able to add any other GitHub project to the target list by specifying the project link directly, using the “Add other project” button below the list, as illustrated in Fig. 16. The target list can be reset to its original form using the “Reset project list” button at the bottom of the list.
With the source and target projects chosen, clone detection is initiated by pressing the “Detect-Clones” button at the bottom of the page. It could take the tool a few minutes to show the results of clone detection. When done, CLONE-HUNTRESS will redirect the user to the result page, which will resemble Fig. 17. If any clones are found, the results will show the clone instances from the source project and those from the target projects.
Users can choose to track any clone instance they want by selecting them and clicking on the “Save and track” button, and over time see the changes that occur on these selected instances. There is a limitation on the number of traceable clones. Users can track up to 20 clone instances due to the aforementioned reasons. After choosing some clone instances to track, users are returned to their dashboard. Every clone detection that the user has done will be displayed as a row in a table placed in the dashboard page, as shown in Fig. 14.
A.3 Tracking reports
CLONE-HUNTRESS provides View, Edit, and Delete functions in each row of the clone detection table (see the buttons in the ACTION column in Fig. 14). The View buttons report the tracking of changes made to the respective clone instances. Our tool checks at pre-specified intervals whether or not the clone instances have changed, and if so, the number of changes will be displayed as a notification on the View button. The intervals are identified by the update frequencies of tracked clones, found under the Profile page, as mentioned before. Clicking on the View button will redirect users to an “Alerts and Reports” page for that clone, similar to Fig. 18. There, clones from the user’s source project will be shown, and below each there will be the tracked clone instances, and links to the actual code. Changed clone instances are marked and users can visit the changed files. It is also possible for users to untrack any clones or clone instances from this page.
Edit directs users to a page similar to the first page of the process (Fig. 19), where users can repeat the steps of clone detection. The tool shows them all the steps they have already taken, and they can change anything they want and re-run clone detection again. Through the Delete button, the corresponding entry be deleted, and so the results of clone detection for that specific project will disappear.
A.4 Future improvements
While we have tried our best to provide a polished and useful product, there are many ways in which our tool can be improved. The first and foremost thing is to improve its hardware resource so that clone detection and checking for updates does not take as much time and users would be able to check for clones across more projects. The second area of improvement is to provide documentation and access to CLONE-HUNTRESS’s web services so other developers may integrate its functionalities within other tools and environments such as Eclipse.