Skip to main content
Log in

Pitfalls and guidelines for using time-based Git data

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Many software engineering research papers rely on time-based data (e.g., commit timestamps, issue report creation/update/close dates, release dates). Like most real-world data however, time-based data is often dirty. To date, there are no studies that quantify how frequently such data is used by the software engineering research community, or investigate sources of and quantify how often such data is dirty. Depending on the research task and method used, including such dirty data could affect the research results. This paper presents an extended survey of papers that utilize time-based data, published in the Mining Software Repositories (MSR) conference series. Out of the 754 technical track and data papers published in MSR 2004–2021, we saw at least 290 (38%) papers utilized time-based data. We also observed that most time-based data used in research papers comes in the form of Git commits, often from GitHub. Based on those results, we then used the Boa and Software Heritage infrastructures to help identify and quantify several sources of dirty Git timestamp data. Finally we provide guidelines/best practices for researchers utilizing time-based data from Git repositories.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. All authors brainstormed potential keywords and helped create the final list.

  2. https://github.com/MetricsGrimoire/CVSAnalY

  3. Note that: “While Subversion automatically attaches properties (svn:date, svn:author, svn:log, and so on) to revisions, it does not presume thereafter the existence of those properties, and neither should you or the tools you use to interact with your repository.” https://svnbook.red-bean.com/en/1.7/svn.advanced.props.html

  4. The Kotlin dataset contains some projects which may exist in the Java dataset.

  5. The SF.net dataset contained Subversion projects, which store commit IDs as integers and thus are not unique across projects and can not be easily deduplicated.

  6. https://archive.softwareheritage.org/browse/origin/log/?origin_url=https://github.com/KevinHoward/Irony&timestamp=2015-07-29T09:07:18Z

  7. https://archive.softwareheritage.org/browse/origin/log/?origin_url=https://github.com/maodouzi/PY&timestamp=2015-08-07T07:29:54Z

  8. https://www.nltk.org/

  9. https://stackoverflow.com/questions/633353

  10. https://stackoverflow.com/questions/52507279

  11. https://stackoverflow.com/questions/16259105

  12. https://gerritcodereview.com

  13. https://www.mercurial-scm.org/wiki/HgGit

  14. https://opensource.google/projects/moe

  15. https://github.com/scrapy/scrapy

  16. http://boa.cs.iastate.edu/boa/?q=content/dataset-notes-october-2019

References

  • Ahasanuzzaman M, Asaduzzaman M, Roy CK, Schneider KA (2016) Mining duplicate questions in stack overflow. In: Proceedings of the 13th international conference on mining software repositories, MSR ’16. Association for Computing Machinery, New York, pp 402–412. https://doi.org/10.1145/2901739.2901770

  • Antoniol G, Rollo VF, Venturi G (2005) Linear predictive coding and cepstrum coefficients for mining time variant information from software repositories. In: Proceedings of the 2005 international workshop on mining software repositories, MSR ’05, vol 2005. Association for Computing Machinery, New York, pp 1–5. https://doi.org/10.1145/1083142.1083156

  • Baysal O, Holmes R, Godfrey MW (2012) Mining usage data and development artifacts. In: 2012 9th IEEE working conference on mining software repositories (MSR), pp 98–107. https://doi.org/10.1109/MSR.2012.6224305

  • Bird C, Rigby PC, Barr ET, Hamilton DJ, German DM, Devanbu P (2009) The promises and perils of mining Git. In: 2009 6th IEEE international working conference on mining software repositories, pp 1–10. https://doi.org/10.1109/MSR.2009.5069475

  • Cito J, Schermann G, Wittern JE, Leitner P, Zumberi S, Gall HC (2017) An empirical analysis of the Docker container ecosystem on GitHub. In: 2017 IEEE/ACM 14th international conference on mining software repositories (MSR). IEEE. https://doi.org/10.1109/msr.2017.67

  • Claes M, Mäntylä MV (2020) 20-MAD: 20 years of issues and commits of Mozilla and Apache development. In: Proceedings of the 17th international conference on mining software repositories, MSR ’20. Association for Computing Machinery, New York, pp 503–507. https://doi.org/10.1145/3379597.3387487

  • Cosentino V, Izquierdo JLC, Cabot J (2016) Findings from GitHub: methods, datasets and limitations. In: 2016 IEEE/ACM 13th working conference on mining software repositories (MSR), pp 137–141

  • Cosmo RD, Zacchiroli S (2017) Software Heritage: why and how to preserve software source code. In: iPRES 2017: 14th international conference on digital preservation. Kyoto, Japan

  • D’Ambros M, Lanza M, Robbes R (2010) An extensive comparison of bug prediction approaches. In: 2010 7th IEEE working conference on mining software repositories (MSR 2010), pp 31–41. https://doi.org/10.1109/MSR.2010.5463279

  • Demeyer S, Murgia A, Wyckmans K, Lamkanfi A (2013) Happy birthday! A trend analysis on past MSR papers. In: Proceedings of the 10th working conference on mining software repositories, MSR ’13. IEEE Press, pp 353–362

  • Durieux T, Le Goues C, Hilton M, Abreu R (2020) Empirical study of restarted and flaky builds on Travis CI. In: Proceedings of the 17th international conference on mining software repositories, MSR ’20. Association for Computing Machinery, New York, pp 254–264. https://doi.org/10.1145/3379597.3387460

  • Dyer R, Nguyen HA, Rajan H, Nguyen TN (2013) Boa: a language and infrastructure for analyzing ultra-large-scale software repositories. In: Proceedings of the international conference on software engineering, ICSE ’13, vol 2013. IEEE Press, pp 422–431. https://doi.org/10.5555/2486788.2486844

  • Dyer R, Nguyen HA, Rajan H, Nguyen TN (2021) Boa: Mining ultra-large-scale software repositories. http://boa.cs.iastate.edu/boa/. Accessed 14 Oct 2021

  • Flint SW, Chauhan J, Dyer R (2021a) Escaping the time pit: pitfalls and guidelines for using time-based Git data. In: 2021 IEEE/ACM 18th international conference on mining software repositories (MSR), pp 85–96. https://doi.org/10.1109/MSR52588.2021.00022

  • Flint SW, Chauhan J, Dyer R (2021b) Replication package for “Pitfalls and Guidelines for Using Time-Based GitData From Java, Kotlin, and Python Projects”. https://doi.org/10.5281/zenodo.5558291

  • Gasser L, Ripoche G, Sandusky RJ (2004) Research infrastructure for empirical science of F/OSS. In: Proceedings of the 1st international workshop on mining software repositories

  • Ghezzi G, Gall HC (2013) Replicating mining studies with SOFAS. In: Proceedings of the 10th working conference on mining software repositories, MSR ’13. IEEE Press, pp 363–372

  • Goeminne M, Claes M, Mens T (2013) A historical dataset for the Gnome ecosystem. In: 2013 10th working conference on mining software repositories (MSR), pp 225–228. https://doi.org/10.1109/MSR.2013.6624032

  • Gonzalez-Barahona JM, Robles G, Izquierdo-Cortazar D (2015) The MetricsGrimoire database collection. In: Proceedings of the 12th working conference on mining software repositories, MSR ’15. IEEE Press, pp 478–481

  • Hayashi J, Higo Y, Matsumoto S, Kusumoto S (2019) Impacts of daylight saving time on software development. In: Proceedings of the 16th international conference on mining software repositories, MSR ’19. IEEE Press, pp 502–506. https://doi.org/10.1109/MSR.2019.00076

  • Hemmati H, Nadi S, Baysal O, Kononenko O, Wang W, Holmes R, Godfrey MW (2013) The MSR cookbook: mining a decade of research. In: Proceedings of the 10th working conference on mining software repositories, MSR ’13. IEEE Press, pp 343–352

  • Kagdi H, Yusuf S, Maletic JI (2006) Mining sequences of changed-files from version histories. In: Proceedings of the 2006 international workshop on mining software repositories, MSR ’06. Association for Computing Machinery, New York, pp 47–53. https://doi.org/10.1145/1137983.1137996

  • Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014) The promises and perils of mining GitHub. In: Proceedings of the 11th working conference on mining software repositories, MSR 2014. Association for Computing Machinery, New York, pp 92–101. https://doi.org/10.1145/2597073.2597074

  • Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2016) An in-depth study of the promises and perils of mining GitHub. Empirical Softw Engg 21(5):2035–2071. https://doi.org/10.1007/s10664-015-9393-5

    Article  Google Scholar 

  • Karampatsis RM, Sutton C (2020) How often do single-statement bugs occur?. In: Proceedings of the 17th international conference on mining software repositories. ACM. https://doi.org/10.1145/3379597.3387491

  • Kikas R, Dumas M, Pfahl D (2016) Using dynamic and contextual features to predict issue lifetime in GitHub projects. In: Proceedings of the 13th international conference on mining software repositories, MSR ’16. Association for Computing Machinery, New York, pp 291–302. https://doi.org/10.1145/2901739.2901751

  • Kotti Z, Spinellis D (2019) Standing on shoulders or feet? The usage of the MSR data papers. In: Proceedings of the 16th international conference on mining software repositories, MSR ’19. IEEE Press, pp 565–576. https://doi.org/10.1109/MSR.2019.00085

  • Liu Y, Lin J, Cleland-Huang J (2020) Traceability support for multi-lingual software projects. In: Proceedings of the 17th international conference on mining software repositories, MSR ’20. Association for Computing Machinery, New York, pp 443–454. https://doi.org/10.1145/3379597.3387440

  • Pietri A, Rousseau G, Zacchiroli S (2020) Forking without clicking: on how to identify software repository forks. In: Proceedings of the 17th international conference on mining software repositories. Association for Computing Machinery, New York, pp 277–287

  • Pimentel JaF, Murta L, Braganholo V, Freire J (2019) A large-scale study about quality and reproducibility of Jupyter notebooks. In: Proceedings of the 16th international conference on mining software repositories, MSR ’19. IEEE Press, pp 507–517. https://doi.org/10.1109/MSR.2019.00077

  • Robles G (2010) Replicating MSR: A study of the potential replicability of papers published in the Mining Software Repositories proceedings. In: 7th IEEE working conference on mining software repositories, MSR ’10, pp 171–180. https://doi.org/10.1109/MSR.2010.5463348

  • Robles G, González-Barahona JM, Cervigón C, Capiluppi A, Izquierdo-Cortázar D (2014) Estimating development effort in free/open source software projects by mining software repositories: a case study of OpenStack. In: Proceedings of the 11th working conference on mining software repositories, MSR 2014. Association for Computing Machinery, New York, pp 222–231. https://doi.org/10.1145/2597073.2597107

  • Sadowski C, Lewis C, Lin Z, Zhu X, Whitehead EJ (2011) An empirical analysis of the FixCache algorithm. In: Proceedings of the 8th working conference on mining software repositories, MSR ’11. Association for Computing Machinery, New York, pp 219–222. https://doi.org/10.1145/1985441.1985475

  • Software Heritage developers (2020) Software Heritage archive. https://archive.softwareheritage.org/. Accessed 28 Dec 2020

  • Steff M, Russo B (2012) Co-evolution of logical couplings and commits for defect estimation. In: Proceedings of the 9th IEEE working conference on mining software repositories, MSR ’12. IEEE Press, pp 213–216

  • Walker RJ, Holmes R, Hedgeland I, Kapur P, Smith A (2006) A lightweight approach to technical risk estimation via probabilistic impact analysis. In: Proceedings of the 2006 international workshop on mining software repositories, MSR ’06. Association for Computing Machinery, New York, pp 98–104. https://doi.org/10.1145/1137983.1138008

  • Wang P, Brown C, Jennings JA, Stolee KT (2020) An empirical study on regular expression bugs. In: Proceedings of the 17th international conference on mining software repositories. ACM. https://doi.org/10.1145/3379597.3387464

  • Xu Y, Zhou M (2018) A multi-level dataset of Linux kernel patchwork. In: Proceedings of the 15th international conference on mining software repositories, MSR ’18. Association for Computing Machinery, New York, pp 54–57. https://doi.org/10.1145/3196398.3196475

  • Zhu J, Wei J (2019) An empirical study of multiple names and email addresses in OSS version control repositories. In: 2019 IEEE/ACM 16th international conference on mining software repositories (MSR). IEEE. https://doi.org/10.1109/msr.2019.00068

  • Zimmermann T, Weißgerber P (2004) Preprocessing CVS data for fine-grained analysis. In: Proceedings of the 1st international workshop on mining software repositories, MSR ’04, pp 2–6

Download references

Acknowledgements

The authors would like to thank Yijia Huang, Tien N. Nguyen, and Hridesh Rajan for insightful discussions that inspired this paper. We also thank the anonymous MSR’21 and EMSE reviewers for many suggestions that substantially improved this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Samuel W. Flint.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Communicated by: Kelly Blincoe, Mei Nagappan

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Mining Software Repositories (MSR)

This paper is a revised and extended version of Flint et al. (2021a).

Appendices

Appendix A: List of Boa Jobs

In this section we list public links to all of the Boa queries utilized by our study. Full details as well as all data (including generated data based on the Boa outputs) is available in our replication package (Flint et al. 2021b).

1.1 A.1 Java Queries

All Boa queries were run on the ‘2019 October/GitHub’ dataset.Suspiciously ‘old’ commits: http://boa.cs.iastate.edu/boa/?q=boa/job/public/90164Suspiciously ‘future’ commits: http://boa.cs.iastate.edu/boa/?q=boa/job/public/90973Out-of-order commits: http://boa.cs.iastate.edu/boa/?q=boa/job/public/90169

1.2 A.2 Kotlin Queries

All Boa queries were run on the ‘2021 Aug/Kotlin’ dataset.Suspiciously ‘old’ commits: http://boa.cs.iastate.edu/boa/?q=boa/job/public/95104Suspiciously ‘future’ commits: http://boa.cs.iastate.edu/boa/?q=boa/job/public/95113Out-of-order commits: http://boa.cs.iastate.edu/boa/?q=boa/job/public/95107

1.3 A.3 Python Queries

All Boa queries were run on the ‘2021 Aug/Python’ dataset.Suspiciously ‘old’ commits: http://boa.cs.iastate.edu/boa/?q=boa/job/public/95105Suspiciously ‘future’ commits: http://boa.cs.iastate.edu/boa/?q=boa/job/public/95112Out-of-order commits: http://boa.cs.iastate.edu/boa/?q=boa/job/public/95106

1.4 A.4 SourceForge Queries

All Boa queries were run on the ‘2013 September/SF’ dataset.Suspiciously ‘old’ commits: http://boa.cs.iastate.edu/boa/?q=boa/job/public/95171Suspiciously ‘future’ commits: http://boa.cs.iastate.edu/boa/?q=boa/job/public/95173Out-of-order commits: http://boa.cs.iastate.edu/boa/?q=boa/job/public/95170

Appendix B: List of Selected Papers

In this section, we list all papers selected for inclusion in the study (see Table 1).

1.1 B.1 2004 Selected Papers

Germán DM (2004) Mining CVS repositories, the softChange experience. In: MSRHowison J, Crowston K (2004) The perils and pitfalls of mining SourceForge. In: MSRJensen C, Scacchi W (2004) Data mining for software process discovery in open source software development communities. In: MSRLiu Y, Stroulia E, Wong K, German D (2004) Using CVS historical information to understand how students develop software. In: MSRZimmermann T, Weißgerber P (2004) Preprocessing CVS data for fine-grained analysis. In: MSR

1.2 B.2 2005 Selected Papers

Antoniol G, Rollo VF, Venturi G (2005) Linear predictive coding and cepstrum coefficients for mining time variant information from software repositories. In: Proceedings of the 2005 International Workshop on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’05, p 1–5, https://doi.org/10.1145/1083142.1083156, https://doi.org/10.1145/1083142.1083156Śliwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes? In: Proceedings of the 2005 International Workshop on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’05, p 1–5, https://doi.org/10.1145/1083142.1083147, https://doi.org/10.1145/1083142.1083147

1.3 B.3 2006 Selected Papers

German DM, Rigby PC, Storey MA (2006) Using evolutionary annotations from change logs to enhance program comprehension. In: Proceedings of the 2006 International Workshop on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’06, p 159–162, https://doi.org/10.1145/1137983.1138020,Kagdi HH, Yusuf S, Maletic JI (2006) Mining sequences of changed-files from version histories. In: MSR ’06Knab P, Pinzger M, Bernstein A (2006) Predicting defect densities in source code files with decision tree learners. In: Proceedings of the 2006 International Workshop on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’06, p 119–125, https://doi.org/10.1145/1137983.1138012,Parnin C, Görg C, Rugaber S (2006) Enriching revision history with interactions. In: Proceedings of the 2006 International Workshop on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’06, p 155–158, https://doi.org/10.1145/1137983.1138019,Robles G, Gonzalez-Barahona JM, Michlmayr M, Amor JJ (2006) Mining large software compilations over time: Another perspective of software evolution. In: Proceedings of the 2006 International Workshop on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’06, p 3–9, https://doi.org/10.1145/1137983.1137986Walker RJ, Holmes R, Hedgeland I, Kapur P, Smith A (2006) A lightweight approach to technical risk estimation via probabilistic impact analysis. In: Proceedings of the 2006 International Workshop on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’06, p 98–104, https://doi.org/10.1145/1137983.1138008

1.4 B.4 2007 Selected Papers

Anvik J, Murphy GC (2007) Determining implementation expertise from bug reports. In: Fourth International Workshop on Mining Software Repositories (MSR’07:ICSE Workshops 2007), pp 2–2, https://doi.org/10.1109/MSR.2007.7Bird C, Gourley A, Devanbu P (2007) Detecting patch submission and acceptance in OSS projects. In: Fourth International Workshop on Mining Software Repositories (MSR’07:ICSE Workshops 2007), pp 26–26, https://doi.org/10.1109/MSR.2007.6Canfora G, Cerulo L, Penta MD (2007) Identifying changed source code lines from version repositories. In: Proceedings of the Fourth International Workshop on Mining Software Repositories, IEEE Computer Society, USA, MSR ’07, p 14, https://doi.org/10.1109/MSR.2007.14,Hindle A, Godfrey MW, Holt RC (2007) Release pattern discovery via partitioning: Methodology and case study. In: Fourth International Workshop on Mining Software Repositories (MSR’07:ICSE Workshops 2007), pp 19–19, https://doi.org/10.1109/MSR.2007.28,Minto S, Murphy GC (2007) Recommending emergent teams. In: Proceedings of the Fourth International Workshop on Mining Software Repositories, MSR ’07, pp 5–5, https://doi.org/10.1109/MSR.2007.27Mizuno O, Ikami S, Nakaichi S, Kikuno T (2007) Spam filter based approach for finding fault-prone software modules. In: Fourth International Workshop on Mining Software Repositories (MSR’07:ICSE Workshops 2007), pp 4–4, https://doi.org/10.1109/MSR.2007.29Robbes R (2007) Mining a change-based software repository. In: Fourth International Workshop on Mining Software Repositories (MSR’07:ICSE Workshops 2007), pp 15–15, https://doi.org/10.1109/MSR.2007.18Zimmermann T (2007) Mining workspace updates in CVS. In: Fourth International Workshop on Mining Software Repositories (MSR’07:ICSE Workshops 2007), pp 11–11, https://doi.org/10.1109/MSR.2007.22

1.5 B.5 2008 Selected Papers

Hata H, Mizuno O, Kikuno T (2008) An extension of fault-prone filtering using precise training and a dynamic threshold. In: Proceedings of the 2008 International Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’08, p 89–98, https://doi.org/10.1145/1370750.1370772,Holmes R, Begel A (2008) Deep Intellisense: A tool for rehydrating evaporated information. In: Proceedings of the 2008 International Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’08, p 23–26, https://doi.org/10.1145/1370750.1370755,Layman L, Nagappan N, Guckenheimer S, Beehler J, Begel A (2008) Mining software effort data: Preliminary analysis of Visual Studio Team System data. In: Proceedings of the 2008 International Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’08, p 43–46, https://doi.org/10.1145/1370750.1370762,Pattison DS, Bird CA, Devanbu PT (2008) Talk and work: A preliminary report. In: Proceedings of the 2008 International Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’08, p 113–116, https://doi.org/10.1145/1370750.1370776,Ratzinger J, Sigmund T, Gall HC (2008) On the relation of refactorings and software defect prediction. In: Proceedings of the 2008 International Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’08, p 35–38, https://doi.org/10.1145/1370750.1370759,Thomson C, Holcombe M (2008) Correctness of data mined from CVS. In: Proceedings of the 2008 International Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’08, p 117–120, https://doi.org/10.1145/1370750.1370777,Weißgerber P, Neu D, Diehl S (2008) Small patches get in! In: Proceedings of the 2008 International Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’08, p 67–76, https://doi.org/10.1145/1370750.1370767

1.6 B.6 2009 Selected Papers

Anbalagan P, Vouk M (2009) On mining data across software repositories. 2009 6th IEEE International Working Conference on Mining Software Repositories pp 171–174Bajracharya S, Lopes C (2009) Mining search topics from a code search engine usage log. In: 2009 6th IEEE International Working Conference on Mining Software Repositories, pp 111–120, https://doi.org/10.1109/MSR.2009.5069489Bird C, Rigby PC, Barr ET, Hamilton DJ, German DM, Devanbu P (2009) The promises and perils of mining git. In: 2009 6th IEEE International Working Conference on Mining Software Repositories, pp 1–10, https://doi.org/10.1109/MSR.2009.5069475Boogerd C, Moonen L (2009) Evaluating the relation between coding standard violations and faults within and across software versions. In: 2009 6th IEEE International Working Conference on Mining Software Repositories, pp 41–50, https://doi.org/10.1109/MSR.2009.5069479German DM, Di Penta M, Gueheneuc YG, Antoniol G (2009) Code siblings: Technical and legal implications of copying code between applications. In: 2009 6th IEEE International Working Conference on Mining Software Repositories, pp 81–90, https://doi.org/10.1109/MSR.2009.5069483Happel H, Maalej W (2009) From work to word: How do software developers describe their work? In: 2009 6th IEEE International Working Conference on Mining Software Repositories. MSR 2009, IEEE Computer Society, Los Alamitos, CA, USA, pp 121–130, https://doi.org/10.1109/MSR.2009.5069490, https://doi.ieeecomputersociety.org/10.1109/MSR.2009.5069490Hattori L, Lanza M (2009) Mining the history of synchronous changes to refine code ownership. In: 2009 6th IEEE International Working Conference on Mining Software Repositories, pp 141–150, https://doi.org/10.1109/MSR.2009.5069492Kuhn A (2009) Automatic labeling of software components and their evolution using log-likelihood ratio of word frequencies in source code. In: 2009 6th IEEE International Working Conference on Mining Software Repositories, pp 175–178, https://doi.org/10.1109/MSR.2009.5069499Mockus A (2009) Amassing and indexing a large sample of version control systems: Towards the census of public source code history. In: 2009 6th IEEE International Working Conference on Mining Software Repositories, pp 11–20, https://doi.org/10.1109/MSR.2009.5069476

1.7 B.7 2010 Selected Papers

D’Ambros M, Lanza M, Robbes R (2010) An extensive comparison of bug prediction approaches. In: MSR ’10, pp 31–41, https://doi.org/10.1109/MSR.2010.5463279Ibrahim WM, Bettenburg N, Shihab E, Adams B, Hassan AE (2010) Should I contribute to this discussion? In: MSR ’10, pp 181–190, https://doi.org/10.1109/MSR.2010.5463345Jnior MC, Mendona M, Farias M, Henrique P (2010) OSS developers context-specific preferred representational systems: A initial neurolinguistic text analysis of the Apache mailing list. In: MSR ’10, pp 126–129, https://doi.org/10.1109/MSR.2010.5463339Maalej W, Happel H (2010) Can development work describe itself? In: MSR ’10, pp 191–200, https://doi.org/10.1109/MSR.2010.5463344Nussbaum L, Zacchiroli S (2010) The ultimate Debian database: Consolidating bazaar metadata for quality assurance and data mining. In: MSR ’10, pp 52–61, https://doi.org/10.1109/MSR.2010.5463277Rahman F, Bird C, Devanbu P (2010) Clones: What is that smell? In: MSR ’10, pp 72–81, https://doi.org/10.1109/MSR.2010.5463343

1.8 B.8 2011 Selected Papers

Bhattacharya P, Neamtiu I (2011) Bug-fix time prediction models: Can we do better? In: Proceedings of the 8th Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’11, p 207–210, https://doi.org/10.1145/1985441.1985472, https://doi.org/10.1145/1985441.1985472Bradley AW, Murphy GC (2011) Supporting software history exploration. In: Proceedings of the 8th Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’11, p 193–202, https://doi.org/10.1145/1985441.1985469,Canfora G, Cerulo L, Cimitile M, Di Penta M (2011) Social interactions around cross-system bug fixings: The case of FreeBSD and OpenBSD. In: Proceedings of the 8th Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’11, p 143–152, https://doi.org/10.1145/1985441.1985463Davies J, German DM, Godfrey MW, Hindle A (2011) Software bertillonage: Finding the provenance of an entity. In: Proceedings of the 8th Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’11, p 183–192, https://doi.org/10.1145/1985441.1985468,Eyolfson J, Tan L, Lam P (2011) Do time of day and developer experience affect commit bugginess? In: Proceedings of the 8th Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’11, p 153–162, https://doi.org/10.1145/1985441.1985464, https://doi.org/10.1145/1985441.1985464Giger E, Pinzger M, Gall HC (2011) Comparing fine-grained source code changes and code churn for bug prediction. In: Proceedings of the 8th Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’11, p 83–92, https://doi.org/10.1145/1985441.1985456, https://doi.org/10.1145/1985441.1985456Sadowski C, Lewis C, Lin Z, Zhu X, Whitehead EJ (2011) An empirical analysis of the FixCache algorithm. In: Proceedings of the 8th Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’11, p 219–222, https://doi.org/10.1145/1985441.1985475,Thomas SW, Adams B, Hassan AE, Blostein D (2011) Modeling the evolution of topics in source code histories. In: Proceedings of the 8th Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’11, p 173–182, https://doi.org/10.1145/1985441.1985467,Zeltyn S, Tarr P, Cantor M, Delmonico R, Kannegala S, Keren M, Kumar AP, Wasserkrug S (2011) Improving efficiency in software maintenance. In: Proceedings of the 8th Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’11, p 215–218, https://doi.org/10.1145/1985441.1985474,

1.9 B.9 2012 Selected Papers

Artho C, Suzaki K, Di Cosmo R, Treinen R, Zacchiroli S (2012) Why do software packages conflict? In: 2012 9th IEEE Working Conference on Mining Software Repositories (MSR), pp 141–150, https://doi.org/10.1109/MSR.2012.6224274Baysal O, Holmes R, Godfrey MW (2012) Mining usage data and development artifacts. In: 2012 9th IEEE Working Conference on Mining Software Repositories (MSR), pp 98–107, https://doi.org/10.1109/MSR.2012.6224305Bird C, Nagappan N (2012) Who? Where? What? Examining distributed development in two large open source projects. In: 2012 9th IEEE Working Conference on Mining Software Repositories (MSR), pp 237–246, https://doi.org/10.1109/MSR.2012.6224286Gousios G, Spinellis D (2012) Ghtorrent: Github’s data from a firehose. In: 2012 9th IEEE Working Conference on Mining Software Repositories (MSR), pp 12–21, https://doi.org/10.1109/MSR.2012.6224294Hindle A (2012) Green mining: A methodology of relating software change to power consumption. In: 2012 9th IEEE Working Conference on Mining Software Repositories (MSR), pp 78–87, https://doi.org/10.1109/MSR.2012.6224303Khomh F, Dhaliwal T, Zou Y, Adams B (2012) Do faster releases improve software quality? an empirical case study of Mozilla Firefox. In: 2012 9th IEEE Working Conference on Mining Software Repositories (MSR), pp 179–188, https://doi.org/10.1109/MSR.2012.6224279Rodrguez-Bustos C, Aponte J (2012) How Distributed Version Control Systems impact open source software projects. In: 2012 9th IEEE Working Conference on Mining Software Repositories (MSR), pp 36–39, https://doi.org/10.1109/MSR.2012.6224297Souza R, Chavez C (2012) Characterizing verification of bug fixes in two open source IDEs. In: 2012 9th IEEE Working Conference on Mining Software Repositories (MSR), pp 70–73, https://doi.org/10.1109/MSR.2012.6224301Steff M, Russo B (2012) Co-evolution of logical couplings and commits for defect estimation. In: 2012 9th IEEE Working Conference on Mining Software Repositories (MSR), pp 213–216, https://doi.org/10.1109/MSR.2012.6224283

1.10 B.10 2013 Selected Papers

Alali A, Bartman B, Newman CD, Maletic JI (2013) A preliminary investigation of using age and distance measures in the detection of evolutionary couplings. In: 2013 10th Working Conference on Mining Software Repositories (MSR), pp 169–172, https://doi.org/10.1109/MSR.2013.6624024Fu Q, Lou JG, Lin Q, Ding R, Zhang D, Xie T (2013) Contextual analysis of program logs for understanding system behaviors. In: 2013 10th Working Conference on Mining Software Repositories (MSR), pp 397–400, https://doi.org/10.1109/MSR.2013.6624054Ghezzi G, Gall HC (2013) Replicating mining studies with SOFAS. In: 2013 10th Working Conference on Mining Software Repositories (MSR), pp 363–372, https://doi.org/10.1109/MSR.2013.6624050Goeminne M, Claes M, Mens T (2013) A historical dataset for the Gnome ecosystem. In: 2013 10th Working Conference on Mining Software Repositories (MSR), pp 225–228, https://doi.org/10.1109/MSR.2013.6624032Gousios G (2013) The GHTorent dataset and tool suite. In: Proceedings of the 10th Working Conference on Mining Software Repositories, IEEE Press, MSR ’13, p 233–236Guzzi A, Bacchelli A, Lanza M, Pinzger M, van Deursen A (2013) Communication in open source software development mailing lists. In: 2013 10th Working Conference on Mining Software Repositories (MSR), pp 277–286, https://doi.org/10.1109/MSR.2013.6624039Hamasaki K, Kula RG, Yoshida N, Cruz AEC, Fujiwara K, Iida H (2013) Who does what during a code review? datasets of OSS peer review repositories. In: Proceedings of the 10th Working Conference on Mining Software Repositories, IEEE Press, MSR ’13, p 49–52Jiang Y, Adams B, German DM (2013) Will my patch make it? and how fast? case study on the Linux kernel. In: 2013 10th Working Conference on Mining Software Repositories (MSR), pp 101–110, https://doi.org/10.1109/MSR.2013.6624016Lamkanfi A, Prez J, Demeyer S (2013) The Eclipse and Mozilla defect tracking dataset: A genuine dataset for mining bug information. In: 2013 10th Working Conference on Mining Software Repositories (MSR), pp 203–206, https://doi.org/10.1109/MSR.2013.6624028MacLean AC, Knutson CD (2013) Apache commits: Social network dataset. In: 2013 10th Working Conference on Mining Software Repositories (MSR), pp 135–138, https://doi.org/10.1109/MSR.2013.6624020Mukherjee D, Garg M (2013) Which work-item updates need your response? In: 2013 10th Working Conference on Mining Software Repositories (MSR), pp 12–21, https://doi.org/10.1109/MSR.2013.6623998Nadi S, Dietrich C, Tartler R, Holt RC, Lohmann D (2013) Linux variability anomalies: What causes them and how do they get fixed? In: 2013 10th Working Conference on Mining Software Repositories (MSR), pp 111–120, https://doi.org/10.1109/MSR.2013.6624017Naguib H, Narayan N, Brügge B, Helal D (2013) Bug report assignee recommendation using activity profiles. In: Proceedings of the 10th Working Conference on Mining Software Repositories, IEEE Press, MSR ’13, p 22–30Raemaekers S, van Deursen A, Visser J (2013) The Maven srepository dataset of metrics, changes, and dependencies. In: 2013 10th Working Conference on Mining Software Repositories (MSR), pp 221–224, https://doi.org/10.1109/MSR.2013.6624031Robbes R, Rhlisberger D (2013) Using developer interaction data to compare expertise metrics. In: 2013 10th Working Conference on Mining Software Repositories (MSR), pp 297–300, https://doi.org/10.1109/MSR.2013.6624041Squire M (2013a) Apache-affiliated Twitter screen names: A dataset. In: 2013 10th Working Conference on Mining Software Repositories (MSR), pp 305–308, https://doi.org/10.1109/MSR.2013.6624043Squire M (2013b) Project roles in the Apache Software Foundation: A dataset. In: 2013 10th Working Conference on Mining Software Repositories (MSR), pp 301–304, https://doi.org/10.1109/MSR.2013.6624042Wagstrom P, Jergensen C, Sarma A (2013) A network of Rails a graph dataset of Ruby on Rails and associated projects. In: 2013 10th Working Conference on Mining Software Repositories (MSR), pp 229–232, https://doi.org/10.1109/MSR.2013.6624033Wang S, Khomh F, Zou Y (2013) Improving bug localization using correlations in crash reports. In: 2013 10th Working Conference on Mining Software Repositories (MSR), pp 247–256, https://doi.org/10.1109/MSR.2013.6624036

1.11 B.11 2014 Selected Papers

Baldassari B, Preux P (2014) Understanding software evolution: The Maisqual ant data set. In: Proceedings of the 11th Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR 2014, p 424–427, https://doi.org/10.1145/2597073.2597136, https://doi.org/10.1145/2597073.2597136Bloemen R, Amrit C, Kuhlmann S, Ordóñez–Matamoros G (2014) Gentoo package dependencies over time. In: Proceedings of the 11th Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR 2014, p 404–407, https://doi.org/10.1145/2597073.2597131,Chen TH, Nagappan M, Shihab E, Hassan AE (2014) An empirical study of dormant bugs. In: Proceedings of the 11th Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR 2014, p 82–91, https://doi.org/10.1145/2597073.2597108,Erfani Joorabchi M, Mirzaaghaei M, Mesbah A (2014) Works for me! characterizing non-reproducible bug reports. In: Proceedings of the 11th Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR 2014, p 62–71, https://doi.org/10.1145/2597073.2597098, https://doi.org/10.1145/2597073.2597098Farah G, Tejada JS, Correal D (2014) OpenHub: A scalable architecture for the analysis of software quality attributes. In: Proceedings of the 11th Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR 2014, p 420–423, https://doi.org/10.1145/2597073.2597135, https://doi.org/10.1145/2597073.2597135https://doi.org/10.1145/2597073.2597135Fujiwara K, Hata H, Makihara E, Fujihara Y, Nakayama N, Iida H, Matsumoto K (2014) Kataribe: A hosting service of historage repositories. In: Proceedings of the 11th Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR 2014, p 380–383, https://doi.org/10.1145/2597073.2597125,Gousios G, Zaidman A (2014) A dataset for pull-based development research. In: Proceedings of the 11th Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR 2014, p 368–371, https://doi.org/10.1145/2597073.2597122,Gousios G, Vasilescu B, Serebrenik A, Zaidman A (2014) Lean GHTorrent: GitHub data on demand. In: Proceedings of the 11th Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR 2014, p 384–387, https://doi.org/10.1145/2597073.2597126,Gupta M, Sureka A, Padmanabhuni S (2014) Process mining multiple repositories for software defect resolution from control and organizational perspective. In: Proceedings of the 11th Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR 2014, p 122–131, https://doi.org/10.1145/2597073.2597081Hanam Q, Tan L, Holmes R, Lam P (2014) Finding patterns in static analysis alerts: Improving actionable alert ranking. In: Proceedings of the 11th Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR 2014, p 152–161, https://doi.org/10.1145/2597073.2597100,Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014) The promises and perils of mining GitHub. In: Proceedings of the 11th Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR 2014, p 92–101, https://doi.org/10.1145/2597073.2597074,Khodabandelou G, Hug C, Deneckère R, Salinesi C (2014) Unsupervised discovery of intentional process models from event logs. In: Proceedings of the 11th Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR 2014, p 282–291, https://doi.org/10.1145/2597073.2597101,Lazar A, Ritchey S, Sharif B (2014a) Generating duplicate bug datasets. In: Proceedings of the 11th Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR 2014, p 392–395, https://doi.org/10.1145/2597073.2597128 Lazar A, Ritchey S, Sharif B (2014b) Improving the accuracy of duplicate bug report detection using textual similarity measures. In: Proceedings of the 11th Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR 2014, p 308–311, https://doi.org/10.1145/2597073.2597088,Linares-Vásquez M, Bavota G, Bernal-Cárdenas C, Oliveto R, Di Penta M, Poshyvanyk D (2014) Mining energy-greedy API usage patterns in Android apps: An empirical study. In: Proceedings of the 11th Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR 2014, p 2–11, https://doi.org/10.1145/2597073.2597085, https://doi.org/10.1145/2597073.2597085Mitropoulos D, Karakoidas V, Louridas P, Gousios G, Spinellis D (2014) The bug catalog of the Maven ecosystem. In: Proceedings of the 11th Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR 2014, p 372–375, https://doi.org/10.1145/2597073.2597123, https://doi.org/10.1145/2597073.2597123Passos L, Czarnecki K (2014) A dataset of feature additions and feature removals from the Linux kernel. In: Proceedings of the 11th Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR 2014, p 376–379, https://doi.org/10.1145/2597073.2597124, https://doi.org/10.1145/2597073.2597124Ponzanelli L, Bavota G, Di Penta M, Oliveto R, Lanza M (2014) Mining StackOverflow to turn the IDE into a self-confident programming prompter. In: Proceedings of the 11th Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR 2014, p 102–111, https://doi.org/10.1145/2597073.2597077Robles G, Arjona Reina L, Serebrenik A, Vasilescu B, González-Barahona JM 2014a) FLOSS 2013: A survey dataset about free software contributors: Challenges for curating, sharing, and combining. In: Proceedings of the 11th Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR 2014, p 396–399, https://doi.org/10.1145/2597073.2597129,Robles G, González-Barahona JM, Cervigón C, Capiluppi A, Izquierdo-Cortázar D (2014b) Estimating development effort in free/open source software projects by mining software repositories: A case study of OpenStack. In: Proceedings of the 11th Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR 2014, p 222–231, https://doi.org/10.1145/2597073.2597107,Steidl D, Hummel B, Juergens E (2014) Incremental origin analysis of source code files. In: Proceedings of the 11th Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR 2014, p 42–51, https://doi.org/10.1145/2597073.2597111, https://doi.org/10.1145/2597073.2597111Valdivia Garcia H, Shihab E (2014) Characterizing and predicting blocking bugs in open source projects. In: Proceedings of the 11th Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR 2014, p 72–81, https://doi.org/10.1145/2597073.2597099,Williams JR, Di Ruscio D, Matragkas N, Di Rocco J, Kolovos DS (2014) Models of OSS project meta-information: A dataset of three forges. In: Proceedings of the 11th Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR 2014, p 408–411, https://doi.org/10.1145/2597073.2597132,Zhang C, Hindle A (2014) A green miner’s dataset: Mining the impact of software change on energy consumption. In: Proceedings of the 11th Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR 2014, p 400–403, https://doi.org/10.1145/2597073.2597130,Zhang F, Mockus A, Keivanloo I, Zou Y (2014) Towards building a universal defect prediction model. In: Proceedings of the 11th Working Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR 2014, p 182–191, https://doi.org/10.1145/2597073.2597078,

1.12 B.12 2015 Selected Papers

Ahmed TM, Shang W, Hassan AE (2015) An empirical study of the copy and paste behavior during development. In: MSR ’15, p 99–110Altinger H, Siegl S, Dajsuren Y, Wotawa F (2015) A novel industry grade dataset for fault prediction based on model-driven developed automotive embedded software. In: 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, pp 494–497, https://doi.org/10.1109/MSR.2015.72Barik T, Lubick K, Smith J, Slankas J, Murphy-Hill E (2015) Fuse: A reproducible, extendable, internet-scale corpus of spreadsheets. In: 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, pp 486–489, https://doi.org/10.1109/MSR.2015.70https://doi.org/10.1109/MSR.2015.70Bird C, Carnahan T, Greiler M (2015) Lessons learned from building and deploying a code review analytics platform. In: MSR ’15, p 191–201Burlet G, Hindle A (2015) An empirical study of end-user programmers in the computer music community. In: MSR ’15, p 292–302Choetkiertikul M, Dam HK, Tran T, Ghose A (2015) Characterization and prediction of issue-related risks in software projects. In: MSR ’15, p 280–291Claes M, Mens T, Di Cosmo R, Vouillon J (2015) A historical analysis of Debian package incompatibilities. In: MSR ’15, p 212–223German DM, Adams B, Hassan AE (2015) A dataset of the activity of the git super-repository of Linux in 2012. In: Proceedings of the 12th Working Conference on Mining Software Repositories, IEEE Press, MSR ’15, p 470–473Gonzalez-Barahona JM, Robles G, Izquierdo-Cortazar D (2015) The MetricsGrimoire database collection. In: 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, pp 478–481, https://doi.org/10.1109/MSR.2015.68Habayeb M, Miranskyy A, Murtaza SS, Buchanan L, Bener A (2015) The Firefox temporal dataset. In: 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, pp 498–501, https://doi.org/10.1109/MSR.2015.73Jiang Y, Adams B (2015) Co-evolution of infrastructure and source code - an empirical study. In: 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, pp 45–55, https://doi.org/10.1109/MSR.2015.12Krutz DE, Mirakhorli M, Malachowsky SA, Ruiz A, Peterson J, Filipski A, Smith J (2015) A dataset of open-source Android applications. In: 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, pp 522–525, https://doi.org/10.1109/MSR.2015.79Lin Z, Whitehead J (2015) Why power laws? an explanation from fine-grained code changes. In: Proceedings of the 12th Working Conference on Mining Software Repositories, IEEE Press, MSR ’15, p 68–75Linares-Vásquez M, White M, Bernal-Cárdenas C, Moran K, Poshyvanyk D (2015) Mining Android app usages for generating actionable GUI-based execution scenarios. In: MSR ’15, p 111–122Mauczka A, Brosch F, Schanes C, Grechenig T (2015) Dataset of developer-labeled commit messages. In: Proceedings of the 12th Working Conference on Mining Software Repositories, IEEE Press, MSR ’15, p 490–493Moura I, Pinto G, Ebert F, Castor F (2015) Mining energy-aware commits. In: 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, pp 56–67, https://doi.org/10.1109/MSR.2015.13Ohira M, Kashiwa Y, Yamatani Y, Yoshiyuki H, Maeda Y, Limsettho N, Fujino K, Hata H, Ihara A, Matsumoto K (2015) A dataset of high impact bugs: Manually-classified issue reports. In: 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, pp 518–521, https://doi.org/10.1109/MSR.2015.78Ray B, Nagappan M, Bird C, Nagappan N, Zimmermann T (2015) The uniqueness of changes: Characteristics and applications. In: MSR ’15, p 34–44Sawant AA, Bacchelli A (2015) A dataset for API usage. In: 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, pp 506–509, https://doi.org/10.1109/MSR.2015.75Spinellis D (2015) A repository with 44 years of Unix evolution. In: 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, pp 462–465, https://doi.org/10.1109/MSR.2015.64Vasilescu B, Serebrenik A, Filkov V (2015) A data set for social diversity studies of GitHub teams. In: Proceedings of the 12th Working Conference on Mining Software Repositories, IEEE Press, MSR ’15, p 514–517Wermelinger M, Yu Y (2015) An architectural evolution dataset. In: 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, pp 502–505, https://doi.org/10.1109/MSR.2015.74Yu Y, Wang H, Filkov V, Devanbu P, Vasilescu B (2015) Wait for it: Determinants of pull request evaluation latency on GitHub. In: MSR ’15, p 367–371Zacchiroli S (2015) The Debsources dataset: Two decades of Debian source code metadata. In: 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, pp 466–469, https://doi.org/10.1109/MSR.2015.65Zanjani MB, Kagdi H, Bird C (2015) Using developer-interaction trails to triage change requests. In: MSR ’15, p 88–98

1.13 B.13 2016 Selected Papers

Ahasanuzzaman M, Asaduzzaman M, Roy CK, Schneider KA (2016) Mining duplicate questions of Stack Overflow. In: 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR), pp 402–412Beyer S, Pinzger M (2016) Grouping android tag synonyms on Stack Overflow. In: Proceedings of the 13th International Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’16, p 430–440, https://doi.org/10.1145/2901739.2901750, https://doi.org/10.1145/2901739.2901750Damevski K, Chen H, Shepherd D, Pollock L (2016) Interactive exploration of developer interaction traces using a Hidden Markov Model. In: Proceedings of the 13th International Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’16, p 126–136, https://doi.org/10.1145/2901739.2901741Dilshener T, Wermelinger M, Yu Y (2016) Locating bugs without looking back. In: MSR ’16, p 286–290, https://doi.org/10.1145/2901739.2901775Gómez M, Rouvoy R, Adams B, Seinturier L (2016) Mining test repositories for automatic detection of ui performance regressions in android apps. In: Proceedings of the 13th International Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’16, p 13–24, https://doi.org/10.1145/2901739.2901747Kikas R, Dumas M, Pfahl D (2016) Using dynamic and contextual features to predict issue lifetime in GitHub projects. In: 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR), pp 291–302Moslehi P, Adams B, Rilling J (2016) On mining crowd-based speech documentation. In: MSR ’16, p 259–268, https://doi.org/10.1145/2901739.2901771Ortu M, Murgia A, Destefanis G, Tourani P, Tonelli R, Marchesi M, Adams B 2016) The emotional side of software developers in JIRA. In: Proceedings of the 13th International Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’16, p 480–483, https://doi.org/10.1145/2901739.2903505,Rahman MT, Querel LP, Rigby PC, Adams B (2016) Feature toggles: Practitioner practices and a case study. In: 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR), pp 201–211Rozenberg D, Beschastnikh I, Kosmale F, Poser V, Becker H, Palyart M, Murphy GC (2016) Comparing repositories visually with repograms. In: Proceedings of the 13th International Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’16, p 109–120, https://doi.org/10.1145/2901739.2901768Squire M (2016) Data sets: The circle of life in Ruby hosting, 2003–2015. In: Proceedings of the 13th International Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’16, p 452–459, https://doi.org/10.1145/2901739.2903509, https://doi.org/10.1145/2901739.2903509Yang D, Hussain A, Lopes CV (2016a) From query to usable code: An analysis of stack overflow code snippets. In: Proceedings of the 13th International Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’16, p 391–402, https://doi.org/10.1145/2901739.2901767, https://doi.org/10.1145/2901739.2901767Yang X, Kula RG, Yoshida N, Iida H (2016b) Mining the modern code review repositories: A dataset of people, process and product. In: Proceedings of the 13th International Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’16, p 460–463, https://doi.org/10.1145/2901739.2903504,Zagalsky A, Teshima CG, German DM, Storey MA, Poo-Caamao G (2016) How the R community creates and curates knowledge: A comparative study of stack overflow and mailing lists. In: 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR), pp 441–451Zhu J, Zhou M, Mei H (2016) Multi-extract and multi-level dataset of Mozilla issue tracking history. In: 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR), pp 472–475

1.14 B.14 2017 Selected Papers

Aivaloglou E, Hermans F, Moreno-Leon J, Robles G (2017) A dataset of Scratch programs: Scraped, shaped and scored. In: 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), pp 511–514, https://doi.org/10.1109/MSR.2017.45Bao L, Xing Z, Xia X, Lo D, Li S (2017) Who will leave the company? a large-scale industry study of developer turnover by mining monthly work report. In: Proceedings of the 14th International Conference on Mining Software Repositories, IEEE Press, MSR ’17, p 170–181, https://doi.org/10.1109/MSR.2017.58, https://doi.org/10.1109/MSR.2017.58Beller M, Gousios G, Zaidman A (2017) Oops, my tests broke the build: An explorative analysis of Travis CI with GitHub. In: 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), pp 356–367, https://doi.org/10.1109/MSR.2017.62Cito J, Schermann G, Wittern JE, Leitner P, Zumberi S, Gall HC (2017) An empirical analysis of the Docker container ecosystem on GitHub. In: 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), pp 323–333, https://doi.org/10.1109/MSR.2017.67Claes M, Mäntylä M, Kuutila M, Adams B (2017) Abnormal working hours: Effect of rapid releases and implications to work content. 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR) pp 243–247Dehghan A, Neal A, Blincoe K, Linaker J, Damian D (2017) Predicting likelihood of requirement implementation within the planned iteration: An empirical study at IBM. In: 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), pp 124–134, https://doi.org/10.1109/MSR.2017.53Gharehyazie M, Ray B, Filkov V (2017) Some from here, some from there: Cross- project code reuse in github. In: 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), pp 291–301, https://doi.org/10.1109/MSR.2017.15Madeyski L, Kawalerowicz M (2017) Continuous defect prediction: The idea and a related dataset. In: 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), pp 515–518, https://doi.org/10.1109/MSR.2017.46Molderez T, Stevens R, De Roover C (2017) Mining change histories for unknown systematic edits. In: 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), pp 248–256, https://doi.org/10.1109/MSR.2017.12Rausch T, Hummer W, Leitner P, Schulte S (2017) An empirical analysis of build failures in the continuous integration workflows of java-based open-source software. In: 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), pp 345–355, https://doi.org/10.1109/MSR.2017.54Robles G, Ho-Quang T, Hebig R, Chaudron MRV, Fernandez MA (2017) An extensive dataset of UML models in GitHub. In: Proceedings of the 14th International Conference on Mining Software Repositories, IEEE Press, MSR ’17, p 519–522, https://doi.org/10.1109/MSR.2017.48,Sadat M, Bener AB, Miranskyy A (2017) Rediscovery datasets: Connecting duplicate reports. In: 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), IEEE, pp 527–530Silva D, Valente MT (2017) Refdiff: Detecting refactorings in version histories. In: 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), pp 269–279, https://doi.org/10.1109/MSR.2017.14Tiwari NM, Upadhyaya G, Nguyen HA, Rajan H (2017) Candoia: A platform for building and sharing mining software repositories tools as apps. In: 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), pp 53–63, https://doi.org/10.1109/MSR.2017.56Wan Z, Lo D, Xia X, Cai L (2017) Bug characteristics in blockchain systems: A large-scale empirical study. In: 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), pp 413–424, https://doi.org/10.1109/MSR.2017.59Watanabe T, Akiyama M, Kanei F, Shioji E, Takata Y, Sun B, Ishi Y, Shibahara T, Yagi T, Mori T (2017) Understanding the origins of mobile app vulnerabilities: A large-scale measurement study of free and paid apps. In: 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), pp 14–24, https://doi.org/10.1109/MSR.2017.23Xu L, Dou W, Gao C, Wang J, Wei J, Zhong H, Huang T (2017) Spreadcluster: Recovering versioned spreadsheets through similarity-based clustering. In: 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), pp 158–169, https://doi.org/10.1109/MSR.2017.28Yamashita A, Abtahizadeh SA, Khomh F, Guhneuc YG (2017) Software evolution and quality data from controlled, multiple, industrial case studies. In: 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), pp 507–510, https://doi.org/10.1109/MSR.2017.44Zhu C, Li Y, Rubin J, Chechik M (2017) A dataset for dynamic discovery of semantic changes in version controlled software histories. In: 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), pp 523–526, https://doi.org/10.1109/MSR.2017.49

1.15 B.15 2018 Selected Papers

Accioly P, Borba P, Silva L, Cavalcanti G (2018) Analyzing conflict predictors in open- source Java projects. In: MSR ’18, p 576–586, https://doi.org/10.1145/3196398.3196437Arima R, Higo Y, Kusumoto S (2018) A study on inappropriately partitioned commits: How much and what kinds of IP commits in Java projects? In: MSR ’18, p 336–340, https://doi.org/10.1145/3196398.3196406Baltes S, Dumani L, Treude C, Diehl S (2018) SOTorrent: Reconstructing and analyzing the evolution of Stack Overflow posts. In: MSR ’18, p 319–330, https://doi.org/10.1145/3196398.3196430Benkoczi R, Gaur D, Hossain S, Khan MA (2018) A design structure matrix approach for measuring co-change-modularity of software products. In: Proceedings of the 15th International Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’18, p 331–335, https://doi.org/10.1145/3196398.3196409Bernardo JaH, da Costa DA, Kulesza U (2018) Studying the impact of adopting continuous integration on the delivery time of pull requests. In: MSR ’18, p 131–141, https://doi.org/10.1145/3196398.3196421Calciati P, Kuznetsov K, Bai X, Gorla A (2018) What did really change with the new release of the app? In: 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR), pp 142–152Chatzidimitriou K, Papamichail M, Diamantopoulos T, Tsapanos M, Symeonidis A (2018) npm-miner: An infrastructure for measuring the quality of the npm registry. In: 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR), pp 42–45Claes M, Mäntylä M, Kuutila M, Farooq U (2018) Towards automatically identifying paid open source developers. In: MSR ’18, p 437–441, https://doi.org/10.1145/3196398.3196447Geiger FX, Malavolta I, Pascarella L, Palomba F, Di Nucci D, Bacchelli A (2018) A graph- based dataset of commit history of real-world Android apps. In: 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR), pp 30–33Markovtsev V, Long W (2018) Public Git Archive: A big code dataset for all. In: 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR), pp 34–37Martins P, Achar R, V Lopes C (2018) 50k-c: A dataset of compilable, and compiled, Java projects. In: 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR), pp 1–5Nayebi M, Kuznetsov K, Chen P, Zeller A, Ruhe G (2018) Anatomy of functionality deletion: An exploratory study on mobile apps. In: Proceedings of the 15th International Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’18, p 243–253, https://doi.org/10.1145/3196398.3196410,Nayrolles M, Hamou-Lhadj A (2018) CLEVER: Combining code metrics with clone detection for just-in-time fault prevention and resolution in large industrial projects. In: MSR ’18, p 153–164, https://doi.org/10.1145/3196398.3196438Paixao M, Krinke J, Han D, Harman M (2018) CROP: Linking code reviews to source code changes. In: 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR), pp 46–49Rath M, Lo D, Mäder P (2018) Analyzing requirements and traceability information to improve bug localization. In: MSR ’18, p 442–453, https://doi.org/10.1145/3196398.3196415Sanchez BA, Barmpis K, Neubauer P, Paige RF, Kolovos DS (2018) Restmule: Enabling resilient clients for remote APIs. In: 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR), pp 537–541Schermann G, Zumberi S, Cito J (2018) Structured information on state and evolution of Dockerfiles on GitHub. In: 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR), pp 26–29Spinellis D (2018) Documented Unix facilities over 48 years. 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR) pp 58–61Wang H, Li H, Li L, Guo Y, Xu G (2018) Why are Android apps removed from Google Play? a large-scale empirical study. In: 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR), pp 231–242Widder D, Vasilescu B, Hilton M, Kstner C (2018) I’m leaving you, Travis: A continuous integration breakup story. In: 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR), pp 165–169Xu Y, Zhou M (2018) A multi-level dataset of Linux kernel patchwork. In: Proceedings of the 15th International Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’18, p 54–57, https://doi.org/10.1145/3196398.3196475, https://doi.org/10.1145/3196398.3196475Yamashita A, Petrillo F, Khomh F, Guéhéneuc YG (2018) Developer interaction traces backed by IDE screen recordings from think aloud sessions. In: Proceedings of the 15th International Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’18, p 50–53, https://doi.org/10.1145/3196398.3196457Yu Y, Li Z, Yin G, Wang T, Wang H (2018) A dataset of duplicate pull-requests in Github. In: Proceedings of the 15th International Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’18, p 22–25, https://doi.org/10.1145/3196398.3196455,

1.16 B.16 2019 Selected Papers

Ahmad M, Cinnéide MO (2019) Impact of Stack Overflow code snippets on software cohesion: A preliminary study. In: MSR ’19, p 250–254, https://doi.org/10.1109/MSR.2019.00050 Chren S, Micko R, Buhnova B, Rossi B (2019) STRAIT: A tool for automated software reliability growth analysis. In: MSR ’19, p 105–110, https://doi.org/10.1109/MSR.2019.00025Gote C, Scholtes I, Schweitzer F (2019) Git2net: Mining time-stamped co-editing networks from large git repositories. In: Proceedings of the 16th International Conference on Mining Software Repositories, IEEE Press, MSR ’19, p 433–444, https://doi.org/10.1109/MSR.2019.00070,Hayashi J, Higo Y, Matsumoto S, Kusumoto S (2019) Impacts of daylight saving time on software development. In: Proceedings of the 16th International Conference on Mining Software Repositories, IEEE Press, MSR ’19, p 502–506, https://doi.org/10.1109/MSR.2019.00076Hoang T, Dam HK, Kamei Y, Lo D, Ubayashi N (2019) DeepJIT: An end-to-end deep learning framework for just-in-time defect prediction. In: MSR ’19, p 34–45, https://doi.org/10.1109/MSR.2019.00016Kiehn M, Pan X, Camci F (2019) Empirical study in using version histories for change risk classification. In: MSR ’19, p 58–62, https://doi.org/10.1109/MSR.2019.00018Ma Y, Bogart C, Amreen S, Zaretzki R, Mockus A (2019) World of Code: An infrastructure for mining the universe of open source VCS data. In: MSR ’19, p 143–154, https://doi.org/10.1109/MSR.2019.00031Mitropoulos D, Louridas P, Salis V, Spinellis D (2019) Time present and time past: Analyzing the evolution of JavaScript code in the wild. In: MSR ’19, p 126–137, https://doi.org/10.1109/MSR.2019.00029Mondal S, Rahman MM, Roy CK (2019) Can issues reported at Stack Overflow questions be reproduced? an exploratory study. In: Proceedings of the 16th International Conference on Mining Software Repositories, IEEE Press, MSR ’19, p 479–489, https://doi.org/10.1109/MSR.2019.00074, https://doi.org/10.1109/MSR.2019.00074Pietri A, Spinellis D, Zacchiroli S (2019) The Software Heritage graph dataset: Public software development under one roof. In: MSR ’19, p 138–142, https://doi.org/10.1109/MSR.2019.00030Pimentel JF, Murta L, Braganholo V, Freire J (2019) A large-scale study about quality and reproducibility of Jupyter notebooks. In: 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), pp 507–517, https://doi.org/10.1109/MSR.2019.00077Schipper D, Aniche M, van Deursen A (2019) Tracing back log data to its log statement: From research to practice. In: 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), pp 545–549, https://doi.org/10.1109/MSR.2019.00081Serra D, Grano G, Palomba F, Ferrucci F, Gall HC, Bacchelli A (2019) On the effectiveness of manual and automatic unit test generation: Ten years later. In: 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), pp 121–125, https://doi.org/10.1109/MSR.2019.00028van Tonder R, Trockman A, Le Goues C (2019) A panel data set of cryptocurrency development activity on GitHub. In: 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), pp 186–190, https://doi.org/10.1109/MSR.2019.00037Treude C, Wagner M (2019) Predicting good configurations for GitHub and Stack Overflow topic models. In: Proceedings of the 16th International Conference on Mining Software Repositories, IEEE Press, MSR ’19, p 84–95, https://doi.org/10.1109/MSR.2019.00022,Yang AZH, da Costa DA, Zou Y (2019) Predicting co-changes between functionality specifications and source code in behavior driven development. In: Proceedings of the 16th International Conference on Mining Software Repositories, IEEE Press, MSR ’19, p 534–544, https://doi.org/10.1109/MSR.2019.00080,Zhai H, Casalnuovo C, Devanbu P (2019) Test coverage in Python programs. In: MSR ’19, p 116–120, https://doi.org/10.1109/MSR.2019.00027Zhu J, Wei J (2019) An empirical study of multiple names and email addresses in OSS version control repositories. In: Proceedings of the 16th International Conference on Mining Software Repositories, IEEE Press, MSR ’19, p 409–420, https://doi.org/10.1109/MSR.2019.00068, https://doi.org/10.1109/MSR.2019.00068

1.17 B.17 2020 Selected Papers

Abdellatif A, Costa D, Badran K, Abdalkareem R, Shihab E (2020) Challenges in chatbot development: A study of Stack Overflow posts. In: MSR ’20, p 174–185, https://doi.org/10.1145/3379597.3387472Barmpis K, Neubauer P, Co J, Kolovos D, Matragkas N, Paige RF (2020) Polyglot and distributed software repository mining with Crossflow. In: MSR ’20, p 374–384, https://doi.org/10.1145/3379597.3387481Bello-Jiménez L, Escobar-Velásquez C, Mojica-Hanke A, Cortés-Fernández S, Linares- Vásquez M (2020) Hall-of-apps: The top Android apps metadata archive. In: Proceedings of the 17th International Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’20, p 568–572, https://doi.org/10.1145/3379597.3387497, https://doi.org/10.1145/3379597.3387497Chatterjee P, Damevski K, Kraft NA, Pollock L (2020) Software-related Slack chats with disentangled conversations. In: Proceedings of the 17th International Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’20, p 588–592, https://doi.org/10.1145/3379597.3387493,Chen Y, Santosa AE, Yi AM, Sharma A, Sharma A, Lo D (2020) A machine learning approach for vulnerability curation. In: MSR ’20, p 32–42, https://doi.org/10.1145/3379597.3387461https://doi.org/10.1145/3379597.3387461Claes M, Mäntylä MV (2020) 20-mad: 20 years of issues and commits of Mozilla and Apache development. In: Proceedings of the 17th International Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’20, p 503–507, https://doi.org/10.1145/3379597.3387487, https://doi.org/10.1145/3379597.3387487https://doi.org/10.1145/3379597.3387487Corò F, Verdecchia R, Cruciani E, Miranda B, Bertolino A (2020) Jtec: A large collection of Java test classes for test code analysis and processing. In: Proceedings of the 17th International Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’20, p 578–582, https://doi.org/10.1145/3379597.3387484https://doi.org/10.1145/3379597.3387484, https://doi.org/10.1145/3379597.3387484Dey T, Mousavi S, Ponce E, Fry T, Vasilescu B, Filippova A, Mockus A (2020) Detecting and characterizing bots that commit code. In: MSR ’20, p 209–219, https://doi.org/10.1145/3379597.3387478https://doi.org/10.1145/3379597.3387478Diamantopoulos T, Papamichail MD, Karanikiotis T, Chatzidimitriou KC, Symeonidis AL (2020) Employing contribution and quality metrics for quantifying the software development process. In: Proceedings of the 17th International Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’20, p 558–562, https://doi.org/10.1145/3379597.3387490, https://doi.org/10.1145/3379597.3387490https://doi.org/10.1145/3379597.3387490Durieux T, Le Goues C, Hilton M, Abreu R (2020) Empirical study of restarted and flaky builds on Travis CI. In: MSR ’20, p 254–264, https://doi.org/10.1145/3379597.3387460El Zarif O, Da Costa DA, Hassan S, Zou Y (2020) On the relationship between user churn and software issues. In: MSR ’20, p 339–349, https://doi.org/10.1145/3379597.3387456Fan J, Li Y, Wang S, Nguyen TN (2020) A C/C++ code vulnerability dataset with code changes and CVE summaries. In: Proceedings of the 17th International Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’20, p 508–512, https://doi.org/10.1145/3379597.3387501, https://doi.org/10.1145/3379597.3387501https://doi.org/10.1145/3379597.3387501Golubev Y, Eliseeva M, Povarov N, Bryksin T (2020) A study of potential code borrowing and license violations in Java projects on GitHub. In: MSR ’20, p 54–64, https://doi.org/10.1145/3379597.3387455Gonzalez D, Zimmermann T, Nagappan N (2020) The state of the ML-universe: 10 years of artificial intelligence & machine learning software development on GitHub. In: MSR ’20, p 431–442, https://doi.org/10.1145/3379597.3387473Henkel J, Bird C, Lahiri SK, Reps T (2020) A dataset of Dockerfiles. In: Proceedings of the 17th International Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’20, p 528–532, https://doi.org/10.1145/3379597.3387498https://doi.org/10.1145/3379597.3387498, https://doi.org/10.1145/3379597.3387498Hung CS, Dyer R (2020) Boa views: Easy modularization and sharing of MSR analyses. In: MSR ’20, p 147–157, https://doi.org/10.1145/3379597.3387480Karampatsis RM, Sutton C (2020) How often do single-statement bugs occur? the ManySStuBs4J dataset. Association for Computing Machinery, New York, NY, USA, MSR ’20, p 573–577, https://doi.org/10.1145/3379597.3387491, https://doi.org/10.1145/3379597.3387491https://doi.org/10.1145/3379597.3387491Liu P, Li L, Zhao Y, Sun X, Grundy J (2020a) Androzooopen: Collecting large-scale open source Android apps for the research community. In: Proceedings of the 17th International Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’20, p 548–552, https://doi.org/10.1145/3379597.3387503https://doi.org/10.1145/3379597.3387503, https://doi.org/10.1145/3379597.3387503Liu Y, Lin J, Cleland-Huang J (2020b) Traceability support for multi-lingual software projects. In: MSR ’20, p 443–454, https://doi.org/10.1145/3379597.3387440Mockus A, Spinellis D, Kotti Z, Dusing GJ (2020) A complete set of related git repositories identified via community detection approaches based on shared commits. In: Proceedings of the 17th International Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’20, p 513–517, https://doi.org/10.1145/3379597.3387499, https://doi.org/10.1145/3379597.3387499Mujahid S, Abdalkareem R, Shihab E, McIntosh S (2020) Using others’ tests to identify breaking updates. In: MSR ’20, p 466–476, https://doi.org/10.1145/3379597.3387476Muse BA, Rahman MM, Nagy C, Cleve A, Khomh F, Antoniol G (2020) On the prevalence, impact, and evolution of SQL code smells in data-intensive systems. In: MSR ’20, p 327–338, https://doi.org/10.1145/3379597.3387467Nakamaru T, Matsunaga T, Yamazaki T, Akiyama S, Chiba S (2020) An empirical study of method chaining in Java. In: MSR ’20, p 93–102, https://doi.org/10.1145/3379597.3387441https://doi.org/10.1145/3379597.3387441Parra E, Ellis A, Haiduc S (2020) Gittercom: A dataset of open source developer communications in Gitter. In: Proceedings of the 17th International Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’20, p 563–567, https://doi.org/10.1145/3379597.3387494, https://doi.org/10.1145/3379597.3387494https://doi.org/10.1145/3379597.3387494Pietri A, Rousseau G, Zacchiroli S (2020) Forking without clicking: On how to identify software repository forks. In: MSR ’20, p 277–287, https://doi.org/10.1145/3379597.3387450https://doi.org/10.1145/3379597.3387450Politowski C, Petrillo F, Ullmann GC, de Andrade Werly J, Guéhéneuc YG (2020) Dataset of video game development problems. In: Proceedings of the 17th International Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’20, p 553–557, https://doi.org/10.1145/3379597.3387486, https://doi.org/10.1145/3379597.3387486Rodrigues IM, Aloise D, Fernandes ER, Dagenais M (2020) A soft alignment model for bug deduplication. In: MSR ’20, p 43–53, https://doi.org/10.1145/3379597.3387470Spinellis D, Kotti Z, Kravvaritis K, Theodorou G, Louridas P (2020) A dataset of enterprise-driven open source software. In: Proceedings of the 17th International Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’20, p 533–537, https://doi.org/10.1145/3379597.3387495, https://doi.org/10.1145/3379597.3387495Svitkov S, Bryskin T (2020) Visualization of methods changeability based on VCS data. In: MSRC ’20, pp 477–480Walden J (2020) The impact of a major security event on an open source project: The case of OpenSSL. In: MSR ’20, p 409–419, https://doi.org/10.1145/3379597.3387465Wang P, Brown C, Jennings JA, Stolee KT (2020) An empirical study on regular expression bugs. In: MSR ’20, p 103–113, https://doi.org/10.1145/3379597.3387464Wu Y, Zhang Y, Wang T, Wang H (2020) An empirical study of build failures in the Docker context. In: MSR ’20, p 76–80, https://doi.org/10.1145/3379597.3387483Xavier L, Ferreira F, Brito R, Valente MT (2020) Beyond the code: Mining self-admitted technical debt in issue tracker systems. In: MSR ’20, p 137–146, https://doi.org/10.1145/3379597.3387459https://doi.org/10.1145/3379597.3387459Zhang X, Rastogi A, Yu Y (2020) On the shoulders of giants: A new dataset for pull-based development research. In: Proceedings of the 17th International Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR ’20, p 543–547, https://doi.org/10.1145/3379597.3387489, https://doi.org/10.1145/3379597.3387489https://doi.org/10.1145/3379597.3387489

1.18 B.18 2021 Selected Papers

(2021) Self-admitted technical debt in R packages: An exploratory study. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 179–189, https://doi.org/10.1109/MSR52588.2021.00030Al Alamin MA, Malakar S, Uddin G, Afroz S, Haider TB, Iqbal A (2021) An empirical study of developer discussions on low-code software development challenges. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 46–57, https://doi.org/10.1109/MSR52588.2021.00018Albonico M, Malavolta I, Pinto G, Guzman E, Chinnappan K, Lago P (2021) Mining energy-related practices in robotics software. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 483–494, https://doi.org/10.1109/MSR52588.2021.00060Alfadel M, Costa DE, Shihab E, Mkhallalati M (2021) On the use of Dependabot security pull requests. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 254–265, https://doi.org/10.1109/MSR52588.2021.00037Alghamdi M, Hayashi S, Kobayashi T, Treude C (2021) Characterising the knowledge about primitive variables in Java code comments. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 460–470, https://doi.org/10.1109/MSR52588.2021.00058Ciniselli M, Cooper N, Pascarella L, Poshyvanyk D, Di Penta M, Bavota G (2021) An empirical study on the usage of BERT models for code completion. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 108–119, https://doi.org/10.1109/MSR52588.2021.00024Codabux Z, Vidoni M, Fard FH (2021) Technical debt in the peer-review documentation of R packages: a rOpenSci case study. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 195–206, https://doi.org/10.1109/MSR52588.2021.00032Cndido J, Haesen J, Aniche M, van Deursen A (2021) An exploratory study of log placement recommendation in an enterprise system. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 143–154, https://doi.org/10.1109/MSR52588.2021.00027Dabic O, Aghajani E, Bavota G (2021) Sampling projects in GitHub for MSR studies. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 560–564, https://doi.org/10.1109/MSR52588.2021.00074Ding ZY, Le Goues C (2021) An empirical study of OSS-Fuzz bugs. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 131–142, https://doi.org/10.1109/MSR52588.2021.00026Durieux T, Soto-Valero C, Baudry B (2021) Duets: A dataset of reproducible pairs of Java library-clients. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 545–549, https://doi.org/10.1109/MSR52588.2021.00071Eng K, Hindle A (2021) Revisiting Dockerfiles in open source software over time. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 449–459, https://doi.org/10.1109/MSR52588.2021.00057Eskandani N, Salvaneschi G (2021) The Wonderless dataset for serverless computing. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 565–569, https://doi.org/10.1109/MSR52588.2021.00075Flint SW, Chauhan J, Dyer R (2021) Escaping the time pit: Pitfalls and guidelines for using time-based git data. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 85–96, https://doi.org/10.1109/MSR52588.2021.00022Fournier Q, Aloise D, Azhari SV, Tetreault F (2021) On improving deep learning trace analysis with system call arguments. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 120–130, https://doi.org/10.1109/MSR52588.2021.00025Gholamian S, Ward PAS (2021) On the naturalness and localness of software logs. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 155–166, https://doi.org/10.1109/MSR52588.2021.00028Gote C, Zingg C (2021) gambit – an open source name disambiguation tool for version control systems. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 80–84, https://doi.org/10.1109/MSR52588.2021.00021Haben G, Habchi S, Papadakis M, Cordy M, Le Traon Y (2021) A replication study on the usability of code vocabulary in predicting flaky tests. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 219–229, https://doi.org/10.1109/MSR52588.2021.00034Hora A (2021a) Googling for software development: What developers search for and what they find. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 317–328, https://doi.org/10.1109/MSR52588.2021.00044Hora A (2021b) What code is deliberately excluded from test coverage and why? In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 392–402, https://doi.org/10.1109/MSR52588.2021.00051Imam A, Dey T (2021) Tracking hackathon code creation and reuse. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 615–617, https://doi.org/10.1109/MSR52588.2021.00085Imran MM, Ciborowska A, Damevski K (2021) Automatically selecting follow-up questions for deficient bug reports. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 167–178, https://doi.org/10.1109/MSR52588.2021.00029Kim M, Kim Y, Lee E (2021) Denchmark: A bug benchmark of deep learning-related software. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 540–544, https://doi.org/10.1109/MSR52588.2021.00070Kinsman T, Wessel M, Gerosa MA, Treude C (2021) How do software developers use GitHub actions to automate their workflows? In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 420–431, https://doi.org/10.1109/MSR52588.2021.00054Malavolta I, Chinnappan K, Swanborn S, Lewis GA, Lago P (2021) Mining the ROS ecosystem for green architectural tactics in robotics and an empirical evaluation. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 300–311, https://doi.org/10.1109/MSR52588.2021.00042Manes SS, Baysal O (2021) Studying the change histories of Stack Overflow and GitHub snippets. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 283–294, https://doi.org/10.1109/MSR52588.2021.00040Marcilio D, Furia CA (2021) How Java programmers test exceptional behavior. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 207–218, https://doi.org/10.1109/MSR52588.2021.00033Mondal S, Uddin G, Roy CK (2021) Rollback edit inconsistencies in developer forum. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 380–391, https://doi.org/10.1109/MSR52588.2021.00050Nielebock S, Blockhaus P, Krger J, Ortmeier F (2021) Androidcompass: A dataset of Android compatibility checks in code repositories. 2103.09620Opdebeeck R, Zerouali A, De Roover C (2021) Andromeda: A dataset of Ansible Galaxy roles and their evolution. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 580–584, https://doi.org/10.1109/MSR52588.2021.00078Papoutsoglou M, Wachs J, Kapitsaki GM (2021) Mining DEV for social and technical insights about software development. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 415–419, https://doi.org/10.1109/MSR52588.2021.00053Pei J, Wu Y, Qin Z, Cong Y, Guan J (2021) Attention-based model for predicting question relatedness on Stack Overflow. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 97–107, https://doi.org/10.1109/MSR52588.2021.00023Pfeiffer RH (2021) Identifying critical projects via PageRank and truck factor. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 41–45, https://doi.org/10.1109/MSR52588.2021.00017Pornprasit C, Tantithamthavorn CK (2021) Jitline: A simpler, better, faster, finer-grained just-in-time defect prediction. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 369–379, https://doi.org/10.1109/MSR52588.2021.00049Quaranta L, Calefato F, Lanubile F (2021) Kgtorrent: A dataset of python Jupyter notebooks from Kaggle. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 550–554, https://doi.org/10.1109/MSR52588.2021.00072Santos F, Wiese I, Trinkenreich B, Steinmacher I, Sarma A, Gerosa MA (2021) Can i solve it? identifying APIs required to complete OSS tasks. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 346–257, https://doi.org/10.1109/MSR52588.2021.00047Schuler A, Kotsis G (2021) Mining API interactions to analyze software revisions for the evolution of energy consumption. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 312–316, https://doi.org/10.1109/MSR52588.2021.00043Scoccia GL, Migliarini P, Autili M (2021) Challenges in developing desktop web apps: a study of Stack Overflow and GitHub. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 271–282, https://doi.org/10.1109/MSR52588.2021.00039Sharma T, Kessentini M (2021) QScored: A large dataset of code smells and quality metrics. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 590–594, https://doi.org/10.1109/MSR52588.2021.00080Sri-iesaranusorn P, Kula RG, Ishio T (2021) Does code review promote conformance? a study of OpenStack patches. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 444–448, https://doi.org/10.1109/MSR52588.2021.00056Sridharan M, Mantyla M, Rantala L, Claes M (2021) Data balancing improves self-admitted technical debt detection. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 358–368, https://doi.org/10.1109/MSR52588.2021.00048Sviridov N, Evtikhiev M, Kovalenko V (2021) Tnm: A tool for mining of socio-technical data from git repositories. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 295–299, https://doi.org/10.1109/MSR52588.2021.00041Svyatkovskiy A, Lee S, Hadjitofi A, Riechert M, Franco JV, Allamanis M (2021) Fast and memory-efficient neural code completion. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 329–340, https://doi.org/10.1109/MSR52588.2021.00045Tu H, Papadimitriou G, Kiran M, Wang C, Mandal A, Deelman E, Menzies T (2021) Mining workflows for anomalous data transfers. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 1–12, https://doi.org/10.1109/MSR52588.2021.00013Ucha A, Barbosa C, Coutinho D, Oizumi W, Assuno WKG, Vergilio SR, Pereira JA, Oliveira A, Garcia A (2021) Predicting design impactful changes in modern code review: A large-scale empirical study. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 471–482, https://doi.org/10.1109/MSR52588.2021.00059Vagavolu D, Agrahari V, Chimalakonda S, Venigalla ASM (2021) GE526: A dataset of open-source game engines. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 605–609, https://doi.org/10.1109/MSR52588.2021.00083Wendland T, Sun J, Mahmud J, Mansur SMH, Huang S, Moran K, Rubin J, Fazzini M (2021) Andror2: A dataset of manually-reproduced bug reports for Android apps. 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR) https://doi.org/10.1109/msr52588.2021.00082,Yin L, Zhang Z, Xuan Q, Filkov V (2021) Apache Software Foundation Incubator Project sustainability dataset. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 595–599, https://doi.org/10.1109/MSR52588.2021.00081Yitagesu S, Zhang X, Feng Z, Li X, Xing Z (2021) Automatic part-of-speech tagging for security vulnerability descriptions. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 29–40, https://doi.org/10.1109/MSR52588.2021.00016Young JG, Casari A, McLaughlin K, Trujillo MZ, Hbert-Dufresne L, Bagrow JP (2021) Which contributions count? analysis of attribution in open source. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 242–253, https://doi.org/10.1109/MSR52588.2021.00036Zerouali A, Velzquez-Rodrguez C, De Roover C (2021) Identifying versions of libraries used in Stack Overflow code snippets. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp 341–345, https://doi.org/10.1109/MSR52588.2021.00046

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Flint, S.W., Chauhan, J. & Dyer, R. Pitfalls and guidelines for using time-based Git data. Empir Software Eng 27, 194 (2022). https://doi.org/10.1007/s10664-022-10200-y

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-022-10200-y

Keywords

Navigation