Mining software repositories as a research area arose from the availability of open source code and information about open source software development. Many open source projects from 2000 on started publishing not only versioned releases of their software, but they were enabling open access to their version control systems, configuration management systems, wikis, mailing lists, and issue trackers. This open access lowered the barriers for researchers of all sorts to start evaluating theories of software development against projects in the wild, as well as discover methods of software development in the wild without the heavy constraint of industrial partnership. Mining software repositories started as a data mining exercise to extract as much as possible from software projects to aid development or reason about software engineering in general. This bolstered empirical software engineering which sought to study software development and software engineering from a fundamentally quantitative and qualitative point of view—this was different than much of the prior research which was tool building or proscriptive recommendations for processes.

This special section on Mining Software Repositories (MSR) serves to highlight and elaborate on papers from the International Conference on Mining Software Repositories 2017. A workshop on mining software repositories was founded in 2004 and grew into an international conference, which still honors its roots in extraction of software source code and metadata from various sources.

This special section serves to showcase and elaborate on the full papers and data showcase papers of particular interest to the MSR reviewers. Many papers were nominated and a few were chosen at the conference and invited. The special section gives our authors more space to be more thorough and concise, explore deeper into the topics discussed at the conference, and do better science in general. Each paper was rigorously reviewed by three or more reviewers, and the authors worked especially hard as every paper faced one major and one or more minor revisions.

The following papers were invited, reviewed, and accepted:

“Mock Objects For Testing Java Systems - Why and How Developers Use Them, and How They Evolve” is a study on the use and evolution of mock objects used in testing. The authors survey developers about mock object use and engage in mining of projects that use mock objects to characterize self-reported and available mock object use and evolution. This study is one of the first studies on mock objects and mocking is a popular and controversial testing topic among developers.

“Classifying code comments in Java software systems” tries to study tens of thousands of topics via manual labelling and machine learning. The authors employ a taxonomy of topic line-types to characterize what exists within topics. This kind of research can aide automated systems that must parse comments for copyright, purpose, technical debt and other concerns.

“Cross-Project Code Clones in GitHub” is an ecosystem wide study of code reuse and why code is copy and pasted. They find that there are projects who act as a clone seeds which enable other projects to follow suite. The authors also propose a clone onion model about the transmission of clones through layers of modularity from with project, to within domain, to within the global software ecosystem.

“Empowering OCL Research: A Large-Scale Corpus of Open-Source Data” is an extension of a data showcase paper that addresses one of the greatest challenges to model driven engineering (MDE) research: empirical data, and MDE based projects. Without existing MDE projects to study much MDE research becomes prescriptive rather than relying on empirical evidence. Not only is a dataset described in great detail and shared, some initial investigations into OCL projects is provided.

“High-level Software Requirements and Iteration Changes: A Predictive Model” presents multiple industrial case studies of software development processes and software requirements undergoing change over numerous iterations of product development. This work provides evolutionary insight into requirements shifting overtime and impact on processes that must address this dynamism.

As guest editors, we express our deep gratitude to both reviewers and authors for making this such a high quality special section on Mining Software Repositories. It would be impossible without community support from our MSR 2017 program committee to help referee these papers to completion. At the same time without interesting and novel work from our community, this would not be possible.

We hope you gain much from this special section on Mining Software Repositories!

Lin Tan and Abram Hindle

Guest Editors