Skip to main content

Is It Dangerous to Use Version Control Histories to Study Source Code Evolution?

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 7313))

Abstract

Researchers use file-based Version Control System (VCS) as the primary source of code evolution data. VCSs are widely used by developers, thus, researchers get easy access to historical data of many projects. Although it is convenient, research based on VCS data is incomplete and imprecise. Moreover, answering questions that correlate code changes with other activities (e.g., test runs, refactoring) is impossible.

Our tool, CodingTracker, non-intrusively records fine-grained and diverse data during code development. CodingTracker collected data from 24 developers: 1,652 hours of development, 23,002 committed files, and 314,085 testcase runs.

This allows us to answer: How much code evolution data is not stored in VCS? How much do developers intersperse refactorings and edits in the same commit? How frequently do developers fix failing tests by changing the test itself? How many changes are committed to VCS without being tested? What is the temporal and spacial locality of changes?

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adams, B., Jiang, Z.M., Hassan, A.E.: Identifying crosscutting concerns using historical code changes. In: ICSE (2010)

    Google Scholar 

  2. Apache Gump continuous integration tool, http://gump.apache.org/

  3. Bamboo continuous integration and release management, http://www.atlassian.com/software/bamboo/

  4. Bragdon, A., Reiss, S.P., Zeleznik, R., Karumuri, S., Cheung, W., Kaplan, J., Coleman, C., Adeputra, F., LaViola Jr., J.J.: Code Bubbles: rethinking the user interface paradigm of integrated development environments. In: ICSE (2010)

    Google Scholar 

  5. Eclipse bug report, https://bugs.eclipse.org/bugs/show_bug.cgi?id=365233

  6. Chan, J., Chu, A., Baniassad, E.: Supporting empirical studies by non-intrusive collection and visualization of fine-grained revision history. In: Proceedings of the 2007 OOPSLA Workshop on Eclipse Technology eXchange (2007)

    Google Scholar 

  7. CVS - Concurrent Versions System, http://cvs.nongnu.org/

  8. Daniel, B., Gvero, T., Marinov, D.: On test repair using symbolic execution. In: ISSTA (2010)

    Google Scholar 

  9. Daniel, B., Jagannath, V., Dig, D., Marinov, D.: ReAssert: Suggesting repairs for broken unit tests. In: ASE (2009)

    Google Scholar 

  10. Demeyer, S., Ducasse, S., Nierstrasz, O.: Finding refactorings via change metrics. In: OOPSLA (2000)

    Google Scholar 

  11. Dig, D., Comertoglu, C., Marinov, D., Johnson, R.: Automated Detection of Refactorings in Evolving Components. In: Hu, Q. (ed.) ECOOP 2006. LNCS, vol. 4067, pp. 404–428. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  12. EclipsEye, http://www.inf.usi.ch/faculty/lanza/Downloads/Shar07a.pdf

  13. Eick, S.G., Graves, T.L., Karr, A.F., Marron, J.S., Mockus, A.: Does code decay? assessing the evidence from change management data. TSE 27, 1–12 (2001)

    Google Scholar 

  14. Eshkevari, L.M., Arnaoudova, V., Di Penta, M., Oliveto, R., Guéhéneuc, Y.G., Antoniol, G.: An exploratory study of identifier renamings. In: MSR (2011)

    Google Scholar 

  15. Fluri, B., Wuersch, M., Pinzger, M., Gall, H.: Change distilling: Tree differencing for fine-grained source code change extraction. TSE 33, 725–743 (2007)

    Google Scholar 

  16. Gall, H., Hajek, K., Jazayeri, M.: Detection of logical coupling based on product release history. In: ICSM (1998)

    Google Scholar 

  17. Gall, H., Jazayeri, M., Klsch, R.R., Trausmuth, G.: Software evolution observations based on product release history. In: ICSM (1997)

    Google Scholar 

  18. Gall, H., Jazayeri, M., Krajewski, J.: CVS release history data for detecting logical couplings. In: IWMPSE (2003)

    Google Scholar 

  19. Girba, T., Ducasse, S., Lanza, M.: Yesterday’s weather: Guiding early reverse engineering efforts by summarizing the evolution of changes. In: ICSM (2004)

    Google Scholar 

  20. Git - the fast version control system, http://git-scm.com/

  21. Gorg, C., Weisgerber, P.: Detecting and visualizing refactorings from software archives. In: ICPC (2005)

    Google Scholar 

  22. Hassaine, S., Boughanmi, F., Guéhéneuc, Y.G., Hamel, S., Antoniol, G.: A seismology-inspired approach to study change propagation. In: ICSM (2011)

    Google Scholar 

  23. Hassan, A.E.: Predicting faults using the complexity of code changes. In: ICSE (2009)

    Google Scholar 

  24. Hindle, A., German, D.M., Holt, R.: What do large commits tell us?: a taxonomical study of large commits. In: MSR (2008)

    Google Scholar 

  25. Hudson extensive continuous integration server, http://hudson-ci.org/

  26. Jenkins extendable open source continuous integration server, http://jenkins-ci.org/

  27. Kagdi, H., Collard, M.L., Maletic, J.I.: A survey and taxonomy of approaches for mining software repositories in the context of software evolution. J. Softw. Maint. Evol. 19 (March 2007)

    Google Scholar 

  28. Kawrykow, D., Robillard, M.P.: Non-essential changes in version histories. In: ICSE (2011)

    Google Scholar 

  29. Kim, M., Notkin, D., Grossman, D.: Automatic inference of structural changes for matching across program versions. In: ICSE (2007)

    Google Scholar 

  30. Kim, S., James Whitehead Jr., E., Zhang, Y.: Classifying software changes: Clean or buggy? TSE 34(2) (2008)

    Google Scholar 

  31. Kim, S., Pan, K., Whitehead Jr., E.J.: Micro pattern evolution. In: MSR (2006)

    Google Scholar 

  32. Kim, S., Zimmermann, T., Pan, K., Whitehead, E.J.J.: Automatic identification of bug-introducing changes. In: ASE (2006)

    Google Scholar 

  33. Lee, T., Nam, J., Han, D., Kim, S., In, H.P.: Micro interaction metrics for defect prediction. In: ESEC/FSE (2011)

    Google Scholar 

  34. Lehman, M.M., Belady, L.A. (eds.): Program evolution: processes of software change. Academic Press Professional, Inc. (1985)

    Google Scholar 

  35. Lehman, M.M.: Programs, life cycles, and laws of software evolution. Proc. IEEE 68(9), 1060–1076 (1980)

    Article  Google Scholar 

  36. Mirzaaghaei, M., Pastore, F., Pezze, M.: Automatically repairing test cases for evolving method declarations. In: ICSM (2010)

    Google Scholar 

  37. Omori, T., Maruyama, K.: A change-aware development environment by recording editing operations of source code. In: MSR (2008)

    Google Scholar 

  38. Omori, T., Maruyama, K.: An editing-operation replayer with highlights supporting investigation of program modifications. In: IWMPSE-EVOL (2011)

    Google Scholar 

  39. Rahman, F., Posnett, D., Hindle, A., Barr, E., Devanbu, P.: BugCache for inspections: hit or miss? In: ESEC/FSE (2011)

    Google Scholar 

  40. Ratzinger, J., Sigmund, T., Vorburger, P., Gall, H.: Mining software evolution to predict refactoring. In: ESEM (2007)

    Google Scholar 

  41. Robbes, R.: Of Change and Software. Ph.D. thesis, University of Lugano (2008)

    Google Scholar 

  42. Robbes, R., Lanza, M.: A change-based approach to software evolution. ENTCS 166, 93–109 (2007)

    Google Scholar 

  43. Robbes, R., Lanza, M.: SpyWare: a change-aware development toolset. In: ICSE (2008)

    Google Scholar 

  44. Robbes, R., Lanza, M., Lungu, M.: An Approach to Software Evolution Based on Semantic Change. In: Dwyer, M.B., Lopes, A. (eds.) FASE 2007. LNCS, vol. 4422, pp. 27–41. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  45. Śliwerski, J., Zimmermann, T., Zeller, A.: When do changes induce fixes? In: MSR (2005)

    Google Scholar 

  46. Snipes, W., Robinson, B.P., Murphy-Hill, E.R.: Code hot spot: A tool for extraction and analysis of code change history. In: ICSM (2011)

    Google Scholar 

  47. Apache Subversion centralized version control, http://subversion.apache.org/

  48. Vakilian, M., Chen, N., Negara, S., Rajkumar, B.A., Bailey, B.P., Johnson, R.E.: Use, disuse, and misuse of automated refactorings. In: ICSE (2012)

    Google Scholar 

  49. Van Rysselberghe, F., Rieger, M., Demeyer, S.: Detecting move operations in versioning information. In: CSMR (2006)

    Google Scholar 

  50. Weissgerber, P., Diehl, S.: Identifying refactorings from source-code changes. In: ASE (2006)

    Google Scholar 

  51. Xing, Z., Stroulia, E.: Analyzing the evolutionary history of the logical design of object-oriented software. TSE 31, 850–868 (2005)

    Google Scholar 

  52. Yoon, Y., Myers, B.A.: Capturing and analyzing low-level events from the code editor. In: PLATEAU (2011)

    Google Scholar 

  53. Zimmermann, T., Nagappan, N., Zeller, A.: Predicting bugs from history. Software Evolution (2008)

    Google Scholar 

  54. Zimmermann, T., Weisgerber, P., Diehl, S., Zeller, A.: Mining version histories to guide software changes. In: ICSE (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Negara, S., Vakilian, M., Chen, N., Johnson, R.E., Dig, D. (2012). Is It Dangerous to Use Version Control Histories to Study Source Code Evolution?. In: Noble, J. (eds) ECOOP 2012 – Object-Oriented Programming. ECOOP 2012. Lecture Notes in Computer Science, vol 7313. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31057-7_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31057-7_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31056-0

  • Online ISBN: 978-3-642-31057-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics