Guest editorial: mining software repositories
- 1.5k Downloads
The Mining Software Repositories (MSR) field analyzes the rich data available in software repositories to uncover interesting and actionable information about software systems and projects. Thanks to the ready availability of software configuration management, mailing list, and bug tracking repositories from open source projects, it has gained popularity since 2004 and continues to be one of the fastest growing fields in the area of software engineering. Researchers in this field empirically explore a range of software engineering questions using software repository data as the primary source of information. Some commonly explored areas include software evolution, models of software development processes, characterization of developers and their activities, prediction of future software qualities, use of machine learning techniques on software project data, software bug prediction, analysis of software change patterns, and analysis of code clones. This special issue provides five recent MSR papers, that are briefly discussed as follows.
The paper “An In-Depth Study of the Promises and Perils of Mining GitHub” by Kalliamvakou, Gousios, Blincoe, Damian, Singer, and German reports the characteristics of the repositories and users on GitHub including how users take advantage of GitHub’s main features and how their activities are tracked on GitHub and related datasets to point out misalignments between the real and mined data. The results indicate that while GitHub provides a rich source of data on software development, mining GitHub for research purposes should take various potential perils into account.
In the paper “Studying Just-In-Time Defect Prediction Using Cross-Project Models” by Kamei, Fukushima, McIntosh, Yamashita, Ubayashi, and Hassan, the cold start problem for Just-In-Time (JIT) defect prediction using cross-project data is addressed. Through an empirical study with eleven open source projects the authors find that the performance of defect prediction models can be improved by combining the data of several projects to form a larger pool of training data and by selecting projects that are similar to the testing project.
Also on the topic of defect prediction, the paper “Towards Building a Universal Defect Prediction Model with Rank Transformed Predictors”, by Zhang, Mockus, Keivanloo, and Zou proposes a universal defect prediction model by using the transformed data of 1,385 open source projects from SourceForge and GoogleCode. This universal model permits users to predict defects within and across projects with an accuracy comparable to within-project prediction models.
In the paper “An Empirical Study of the Impact of Modern Code Review Practices on Software Quality”, the authors McIntosh, Kamei, Adams, and Hassan present an empirical study of code review practices and found that code review coverage, participation, and expertise share a significant link with software quality. These findings clearly indicate that poorly-reviewed code has a negative impact on software quality in large software systems.
Finally, the paper “Prompter: Turning the IDE into a Self-confident Programming Assistant” by Ponzanelli, Bavota, Di Penta, Oliveto, and Lanza proposes a system to automatically provide Stack Overflow discussions based on the current context in an Integrated Development Environment (IDE). The results of the evaluation with several participants showed the approach is effective in helping developers to improve the correctness of their tasks, but there are issues with the volatility of the recommendations.
We are grateful for the continuous support and encouragement offered by the editorial board for the Journal of Empirical Software Engineering and by the Editor-in-Chief Lionel Briand and Thomas Zimmermann. We also thank the authors for keeping up with the review schedule and the reviewersfor their detailed and constructive comments which helped to shape the papers.