Is It Dangerous to Use Version Control Histories to Study Source Code Evolution?

  • Stas Negara
  • Mohsen Vakilian
  • Nicholas Chen
  • Ralph E. Johnson
  • Danny Dig
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7313)


Researchers use file-based Version Control System (VCS) as the primary source of code evolution data. VCSs are widely used by developers, thus, researchers get easy access to historical data of many projects. Although it is convenient, research based on VCS data is incomplete and imprecise. Moreover, answering questions that correlate code changes with other activities (e.g., test runs, refactoring) is impossible.

Our tool, CodingTracker, non-intrusively records fine-grained and diverse data during code development. CodingTracker collected data from 24 developers: 1,652 hours of development, 23,002 committed files, and 314,085 testcase runs.

This allows us to answer: How much code evolution data is not stored in VCS? How much do developers intersperse refactorings and edits in the same commit? How frequently do developers fix failing tests by changing the test itself? How many changes are committed to VCS without being tested? What is the temporal and spacial locality of changes?


Version Control Code Evolution Code Change Abstract Syntax Tree Text Edit 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Adams, B., Jiang, Z.M., Hassan, A.E.: Identifying crosscutting concerns using historical code changes. In: ICSE (2010)Google Scholar
  2. 2.
    Apache Gump continuous integration tool,
  3. 3.
    Bamboo continuous integration and release management,
  4. 4.
    Bragdon, A., Reiss, S.P., Zeleznik, R., Karumuri, S., Cheung, W., Kaplan, J., Coleman, C., Adeputra, F., LaViola Jr., J.J.: Code Bubbles: rethinking the user interface paradigm of integrated development environments. In: ICSE (2010)Google Scholar
  5. 5.
  6. 6.
    Chan, J., Chu, A., Baniassad, E.: Supporting empirical studies by non-intrusive collection and visualization of fine-grained revision history. In: Proceedings of the 2007 OOPSLA Workshop on Eclipse Technology eXchange (2007)Google Scholar
  7. 7.
    CVS - Concurrent Versions System,
  8. 8.
    Daniel, B., Gvero, T., Marinov, D.: On test repair using symbolic execution. In: ISSTA (2010)Google Scholar
  9. 9.
    Daniel, B., Jagannath, V., Dig, D., Marinov, D.: ReAssert: Suggesting repairs for broken unit tests. In: ASE (2009)Google Scholar
  10. 10.
    Demeyer, S., Ducasse, S., Nierstrasz, O.: Finding refactorings via change metrics. In: OOPSLA (2000)Google Scholar
  11. 11.
    Dig, D., Comertoglu, C., Marinov, D., Johnson, R.: Automated Detection of Refactorings in Evolving Components. In: Hu, Q. (ed.) ECOOP 2006. LNCS, vol. 4067, pp. 404–428. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  12. 12.
  13. 13.
    Eick, S.G., Graves, T.L., Karr, A.F., Marron, J.S., Mockus, A.: Does code decay? assessing the evidence from change management data. TSE 27, 1–12 (2001)Google Scholar
  14. 14.
    Eshkevari, L.M., Arnaoudova, V., Di Penta, M., Oliveto, R., Guéhéneuc, Y.G., Antoniol, G.: An exploratory study of identifier renamings. In: MSR (2011)Google Scholar
  15. 15.
    Fluri, B., Wuersch, M., Pinzger, M., Gall, H.: Change distilling: Tree differencing for fine-grained source code change extraction. TSE 33, 725–743 (2007)Google Scholar
  16. 16.
    Gall, H., Hajek, K., Jazayeri, M.: Detection of logical coupling based on product release history. In: ICSM (1998)Google Scholar
  17. 17.
    Gall, H., Jazayeri, M., Klsch, R.R., Trausmuth, G.: Software evolution observations based on product release history. In: ICSM (1997)Google Scholar
  18. 18.
    Gall, H., Jazayeri, M., Krajewski, J.: CVS release history data for detecting logical couplings. In: IWMPSE (2003)Google Scholar
  19. 19.
    Girba, T., Ducasse, S., Lanza, M.: Yesterday’s weather: Guiding early reverse engineering efforts by summarizing the evolution of changes. In: ICSM (2004)Google Scholar
  20. 20.
    Git - the fast version control system,
  21. 21.
    Gorg, C., Weisgerber, P.: Detecting and visualizing refactorings from software archives. In: ICPC (2005)Google Scholar
  22. 22.
    Hassaine, S., Boughanmi, F., Guéhéneuc, Y.G., Hamel, S., Antoniol, G.: A seismology-inspired approach to study change propagation. In: ICSM (2011)Google Scholar
  23. 23.
    Hassan, A.E.: Predicting faults using the complexity of code changes. In: ICSE (2009)Google Scholar
  24. 24.
    Hindle, A., German, D.M., Holt, R.: What do large commits tell us?: a taxonomical study of large commits. In: MSR (2008)Google Scholar
  25. 25.
    Hudson extensive continuous integration server,
  26. 26.
    Jenkins extendable open source continuous integration server,
  27. 27.
    Kagdi, H., Collard, M.L., Maletic, J.I.: A survey and taxonomy of approaches for mining software repositories in the context of software evolution. J. Softw. Maint. Evol. 19 (March 2007) Google Scholar
  28. 28.
    Kawrykow, D., Robillard, M.P.: Non-essential changes in version histories. In: ICSE (2011)Google Scholar
  29. 29.
    Kim, M., Notkin, D., Grossman, D.: Automatic inference of structural changes for matching across program versions. In: ICSE (2007)Google Scholar
  30. 30.
    Kim, S., James Whitehead Jr., E., Zhang, Y.: Classifying software changes: Clean or buggy? TSE 34(2) (2008)Google Scholar
  31. 31.
    Kim, S., Pan, K., Whitehead Jr., E.J.: Micro pattern evolution. In: MSR (2006)Google Scholar
  32. 32.
    Kim, S., Zimmermann, T., Pan, K., Whitehead, E.J.J.: Automatic identification of bug-introducing changes. In: ASE (2006)Google Scholar
  33. 33.
    Lee, T., Nam, J., Han, D., Kim, S., In, H.P.: Micro interaction metrics for defect prediction. In: ESEC/FSE (2011)Google Scholar
  34. 34.
    Lehman, M.M., Belady, L.A. (eds.): Program evolution: processes of software change. Academic Press Professional, Inc. (1985)Google Scholar
  35. 35.
    Lehman, M.M.: Programs, life cycles, and laws of software evolution. Proc. IEEE 68(9), 1060–1076 (1980)CrossRefGoogle Scholar
  36. 36.
    Mirzaaghaei, M., Pastore, F., Pezze, M.: Automatically repairing test cases for evolving method declarations. In: ICSM (2010)Google Scholar
  37. 37.
    Omori, T., Maruyama, K.: A change-aware development environment by recording editing operations of source code. In: MSR (2008)Google Scholar
  38. 38.
    Omori, T., Maruyama, K.: An editing-operation replayer with highlights supporting investigation of program modifications. In: IWMPSE-EVOL (2011)Google Scholar
  39. 39.
    Rahman, F., Posnett, D., Hindle, A., Barr, E., Devanbu, P.: BugCache for inspections: hit or miss? In: ESEC/FSE (2011)Google Scholar
  40. 40.
    Ratzinger, J., Sigmund, T., Vorburger, P., Gall, H.: Mining software evolution to predict refactoring. In: ESEM (2007)Google Scholar
  41. 41.
    Robbes, R.: Of Change and Software. Ph.D. thesis, University of Lugano (2008)Google Scholar
  42. 42.
    Robbes, R., Lanza, M.: A change-based approach to software evolution. ENTCS 166, 93–109 (2007)Google Scholar
  43. 43.
    Robbes, R., Lanza, M.: SpyWare: a change-aware development toolset. In: ICSE (2008)Google Scholar
  44. 44.
    Robbes, R., Lanza, M., Lungu, M.: An Approach to Software Evolution Based on Semantic Change. In: Dwyer, M.B., Lopes, A. (eds.) FASE 2007. LNCS, vol. 4422, pp. 27–41. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  45. 45.
    Śliwerski, J., Zimmermann, T., Zeller, A.: When do changes induce fixes? In: MSR (2005)Google Scholar
  46. 46.
    Snipes, W., Robinson, B.P., Murphy-Hill, E.R.: Code hot spot: A tool for extraction and analysis of code change history. In: ICSM (2011)Google Scholar
  47. 47.
    Apache Subversion centralized version control,
  48. 48.
    Vakilian, M., Chen, N., Negara, S., Rajkumar, B.A., Bailey, B.P., Johnson, R.E.: Use, disuse, and misuse of automated refactorings. In: ICSE (2012)Google Scholar
  49. 49.
    Van Rysselberghe, F., Rieger, M., Demeyer, S.: Detecting move operations in versioning information. In: CSMR (2006)Google Scholar
  50. 50.
    Weissgerber, P., Diehl, S.: Identifying refactorings from source-code changes. In: ASE (2006)Google Scholar
  51. 51.
    Xing, Z., Stroulia, E.: Analyzing the evolutionary history of the logical design of object-oriented software. TSE 31, 850–868 (2005)Google Scholar
  52. 52.
    Yoon, Y., Myers, B.A.: Capturing and analyzing low-level events from the code editor. In: PLATEAU (2011)Google Scholar
  53. 53.
    Zimmermann, T., Nagappan, N., Zeller, A.: Predicting bugs from history. Software Evolution (2008)Google Scholar
  54. 54.
    Zimmermann, T., Weisgerber, P., Diehl, S., Zeller, A.: Mining version histories to guide software changes. In: ICSE (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Stas Negara
    • 1
  • Mohsen Vakilian
    • 1
  • Nicholas Chen
    • 1
  • Ralph E. Johnson
    • 1
  • Danny Dig
    • 1
  1. 1.Department of Computer ScienceUniversity of Illinois at Urbana-ChampaignUrbanaUSA

Personalised recommendations