Skip to main content

Using process mining for Git log analysis of projects in a software development course

Abstract

Understanding the processes in education, such as the student learning behavior within a specific course, is a key to continuous course improvement. In online learning systems, students’ learning can be tracked and examined based on data collected by the systems themselves. However, it is non-trivial to decide how to extract the desired students’ behavior from the limited data in traditional classroom courses. Software development courses are a domain where student behavior analysis would be especially useful, as continuous teaching improvement in this fast progressing domain is necessary. In this paper, we propose to use process mining for improvement-motivated process analysis of a software development course (web development in particular). To this end, we analyze Git logs of students’ projects to understand their development processes. Process mining has been chosen as it can help us to find a descriptive model of this process. The main contribution of this paper is the detailed methodology of process mining usage for students’ project development analysis, considering various commit characteristics, which are crucial in understanding student coding-behavior patterns. The process mining analysis proved to be very useful, indicating multiple directions for the course improvement, which we also include in this work as a secondary contribution. The third contribution of this work is the summary and discussion of the process mining advantages and current gaps in process mining research for this task. The data we used are made publicly available to other researchers.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Data Availability

Data used for the analysis are publicly available on https://github.com/lasaris/Git-logs-for-Process-Mining.

Notes

  1. 1.

    https://trac.edgewall.org/

  2. 2.

    https://subversion.apache.org/

  3. 3.

    https://git-scm.com/docs/git-log

  4. 4.

    https://github.com/lasaris/Git-logs-for-Process-Mining

  5. 5.

    By position, we mean the order of the commit in the list of project commits ordered by their timestamp from the oldest to the newest ones.

  6. 6.

    We usually consider the median time between commits as a more suitable measure than the mean time due to unevenly distributed work during the semester

References

  1. Anuwatvisit, S., Tungkasthan, A., & Premchaiswadi, W. (2012). Bottleneck mining and petri net simulation in education situations. In 2012 Tenth international conference on ICT and knowledge engineering (pp. 244–251).

  2. Bannert, M., Reimann, P., & Sonnenberg, C. (2014). Process mining techniques for analysing patterns and strategies in students’ self-regulated learning. Metacognition and learning, 161–185.

  3. Bogarín, A, Romero, C., Cerezo, R., & Sánchez-Santillán, M. (2014). Clustering for improving educational process mining. In Proceedings of the fourth international conference on learning analytics and knowledge (pp. 11–15).

  4. Bogarín, A., Cerezo, R., & Romero, C. (2018). A survey on educational process mining. Wiley interdisciplinary reviews: Data mining and knowledge discovery, pp 1–17.

  5. Cairns, A.H., Gueni, B., Fhima, M., Cairns, A., David, S., & Khelifa, N. (2014). Towards custom-designed professional training contents and curriculums through educational process mining. In The fourth international conference on advances in information mining and management (pp. 53–58).

  6. Danubianu, M. (2015). Step by step data preprocessing for data mining. a case study. In Proceddings of the international conference on information technologies (pp. 117–124).

  7. Dolak, R. (2019). Using process mining techniques to discover student’s activities, navigation paths, and behavior in lms moodle. In Innovative technologies and learning (pp. 129–138).

  8. Glassy, L. (2006). Using version control to observe student software development processes. J Comput Sci Coll, 99–106.

  9. Günther, C.W., & Rozinat, A. (2012). Disco: Discover your processes. BPM (Demos), 40–44.

  10. Jones, C. (2010). Using subversion as an aid in evaluating individuals working on a group coding project. Journal of Computing Sciences in Colleges 18–23.

  11. Kay, J., Maisonneuve, N., Yacef, K., & Zaïane, O. (2006). Mining patterns of events in students’ teamwork data. In Proceedings of the workshop on educational data mining at the 8th international conference on intelligent tutoring systems (pp. 45–52).

  12. Liu, Y., Stroulia, E., Wong, K., & German, D. (2004). Using cvs historical information to understand how students develop software. In 26th international conference on software engineering - W17S workshop “International Workshop on Mining Software Repositories” (pp. 32–36).

  13. Mierle, K., Laven, K., Roweis, S., & Wilson, G. (2005). Mining student cvs repositories for performance indicators. SIGSOFT Softw Eng Notes, 1–5.

  14. Mittal, M., & Sureka, A. (2014). Process mining software repositories from student projects in an undergraduate software engineering course. In Companion proceedings of the 36th international conference on software engineering (pp. 344–353).

  15. Mukala, P., Buijs, J., Leemans, M., & van der Aalst, W. (2015). Learning analytics on coursera event data: A process mining approach. In 5th international symposium on data-driven process discovery and analysis (pp. 18–32).

  16. Poncin, W., Serebrenik, A., & van den Brand, M. (2011a). Mining student capstone projects with frasr and prom. In Proceedings of the ACM international conference companion on object oriented programming systems languages and applications companion (pp. 87–96).

  17. Poncin, W., Serebrenik, A., & van den Brand, M. (2011b). Process mining software repositories. In 2011 15th European conference on software maintenance and reengineering (pp. 5–14).

  18. Rafique, Y., & Mišić, VB. (2013). The effects of test-driven development on external quality and productivity: A meta-analysis. IEEE Transactions on Software Engineering, 835–856.

  19. Romero, C., Cerezo, R., Bogarín, A., & Sánchez-Santillán, M. (2016). Educational process mining: A tutorial and case study using moodle data sets. Data mining and learning analytics: Applications in educational research, 1–28.

  20. Schoor, C., & Bannert, M. (2012). Exploring regulatory processes during a computer-supported collaborative learning task using process mining. Computers in Human Behavior, 1321–1331.

  21. Spinellis, D. (2012). Git. IEEE Software, 100–101.

  22. Trcka, N., & Pechenizkiy, M. (2009). From local patterns to global models: Towards domain driven educational process mining. In 2009 Ninth international conference on intelligent systems design and applications (pp. 1114–1119).

  23. van der Aalst, W. (2016). Process mining. Berlin: Springer.

    Book  Google Scholar 

  24. Van Dongen, B.F., de Medeiros, A.K.A., Verbeek, H., Weijters, A., & van Der Aalst, W.M. (2005). The prom framework: A new era in process mining tool support. In International conference on application and theory of petri nets (pp. 444–454).

  25. Yu, S., & Zhou, S. (2010). A survey on metric of software complexity. In 2010 2nd IEEE International conference on information management and engineering (pp. 352–356).

Download references

Funding

This research was supported by ERDF “CyberSecurity, CyberCrime and Critical Information Infrastructures Center of Excellence” (No. CZ.02.1.01/0.0/0.0/16 019/0000822),

Author information

Affiliations

Authors

Contributions

Martin Macak: design of the work, data collection, analysis, interpretation of data, and draft of the work. Daniela Kruzelova: data collection, analysis, interpretation of data, and draft of the work. Stanislav Chren: interpretation of data, and draft of the work. Barbora Buhnova: shaping of the idea, interpretations, and substantive revision of the text. All authors read and approved the manuscript.

Corresponding author

Correspondence to Martin Macak.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Pre-processing of git logs

Listing 1
figured

Basic data collecting and activity categorization

Listing 2
figuree

Merge commit classification

Listing 3
figuref

Author ContributionsThe creation of commit representing New branch

Listing 4
figureg

Test commit classification

Appendix B: Overview of analysis results

Table 2 Overview of commit analysis results

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Macak, M., Kruzelova, D., Chren, S. et al. Using process mining for Git log analysis of projects in a software development course. Educ Inf Technol 26, 5939–5969 (2021). https://doi.org/10.1007/s10639-021-10564-6

Download citation

Keywords

  • Learning analytics
  • Mining software repositories
  • Software development
  • Process mining
  • Educational data mining
  • Git