Introduction

With campus management systems (CMS) providing comprehensive support of student lifecycles, institutions in higher education maintain a rich data source that is yet to be fully explored. A particular field of interest is the use of study path data to support student cohort monitoring and study planning. Study path analysis describes the analysis of student cohort data with a focus on study paths. It can provide meaningful indicators of students’ progress and success in study program completion as well as valuable insights into the commonalities and differences between successful and struggling students. In terms of quality assurance and the support of study planning, study path data analysis supports evidence-based decision-making and advisory.

To analyze study path data, different approaches need to be considered. While universities may already conduct basic data analysis and statistics (e.g., for annual reports on enrollments, graduations, and dropouts), more comprehensive analyses of student cohort data may require different methods. Here, approaches of artificial intelligence (AI) and data science can be applied, similarly to analytics in other domains (e.g., business).

The main question addressed in this article is how AI can be utilized for quality assurance and to support study planning and student cohort monitoring. In response to this question, the article presents approaches and advancements of the project AIStudyBuddy, an interdisciplinary research project between three German universities, combining their expertise in computer science, didactics, economics, and ethics. The project brings together the two paradigms of rule-based AI and process mining to support study planning and student cohort monitoring.

The project focuses on two target groups: (1) students and (2) study program designers. While students are defined as all individuals enrolled in a study program, the group of study program designers consists of several user groups with related but separate tasks and responsibilities. This includes university personnel actively involved in the coordination and design of study programs (e.g., program coordinators or academic advisors). Other areas of responsibility include the planning and administration of study programs, as well as activities related to the (re-)accreditation and quality assurance of study programs. As these people often have their own university-specific designations or tasks distributed differently among various people, they are summarized in the following under the term ‘study program designer’.

The article is structured as follows: After a summary of the current state of research and technology, the conception of a reference architecture and data model for analyzing study path data is presented. Subsequently, the development of an interactive study planning tool—using rule-based AI—and the development of an application for study program monitoring and cohort analysis—using process mining—are described. After presenting current limitations, the article is concluded and an outlook on future research and development around AI-supported study planning and cohort tracking is given.

Related work

Utilizing curriculum and student cohort data to conduct study path analysis falls in the domain of curriculum analytics, a sub-domain that emerged from the field of learning analytics. Curriculum analytics utilizes methods of AI and data science to support and guide evidence-based curriculum development [6]. It can help in identifying students’ needs, monitoring student cohorts, and reducing dropout rates [4].

In the past, different projects have focused on curriculum analytics and the analysis of student lifecycle data (e.g., [5, 12]). Related work on course recommendations often utilizes enrollment histories and module handbook descriptions as input to collaborative filtering (e.g., [10]). Early works utilizing educational process mining show how events in student lifecycle data can be used to identify high-failure rate courses leading to late dropouts [11]. Similarly, process mining may also provide answers to students’ questions regarding successful pathways to their degrees (Schulte et al., 2017).

Meanwhile, only limited effort toward the comprehensive analysis of study paths in combination with the support of study planning (e.g., [3]) has been made. Researchers only just recently started exploring interactive study planning tools (e.g., [1, 7]) and supportive applications for academic advisory services, guiding students towards successful study paths [2].

Overall, related work in the field of curriculum analytics shows the potential of using methods of AI with student lifecycle data to guide education. AI can be utilized for evidence-based curriculum development and may also be used to support study planning for students. Whether it is used to generate course recommendations for students or to gain insights into successful study paths, AI yields an opportunity for combining study path analyses and study planning.

Concept of a reference architecture and data model for study path analytics

The foundation for any analyses within the project scope is created by the data extracted from various source systems at a university (e.g., CMS or student information systems). It needs to be provided in a way that can be easily understood and processed efficiently and correctly. Additionally, the usage of different systems at different universities requires the definition of a uniform data reference model (DRM) that models all the necessary data of the study lifecycle as well as the curriculum.

Utilizing the ETL process [13], each partner must query their own CMS to Extract the needed data, Transform it in the DRM locally, and finally push them to a Data Warehouse (DWH), where it is stored (Loaded) and later provided for analysis. The DRM serves as a standardized interface for all analytics engines that process defined data subsets to produce results and enable getting desired insights into study paths. This also eases the inclusion of further universities in the future. The concept for the whole architecture is depicted in Fig. 1.

Fig. 1
figure 1

Conceptual overview of the architecture

In contrast to the possible automatic data extraction from the CMS, examination regulations and module handbooks, as the data source for curriculum data, might not be present in a machine-readable form. Therefore, an editor to transform this data manually will be provided.

To create the opportunity to not only analyze data for each university individually but also across institutions, a central DWH is set up as well as a centralized infrastructure to conduct the analyses. For data storage, this requires post-processing local IDs, e.g., for study programs or modules to avoid conflicts between different universities. As data retrieval should be fast to not interrupt a user’s flow of thoughts [9], as many analyses as possible should be done proactively and their results stored in a result store, instead of running them every time results are requested by the web applications or other analyses.

Data privacy issues are managed by a rights engine and applied at the required interfaces. Students can manage their consent on which data should be collected and used for which purpose/analysis. Further, they can control who can see their data. While, usually, students might want to keep it to themselves, when attending an advisory session, they want to share the data with the student advisor. Finally, depending on their role, suitable analytic results will be provided to the users of the web applications.

Development of an interactive study planning tool

User-centered application development

The development of the student-facing web application StudyBuddy follows an iterative user-centered design approach containing four steps: (1) understanding, (2) defining, (3) designing, and (4) evaluating. Over multiple iterations, the prototype is being developed and tested with its target group. The objective is to integrate diverse user needs and requirements in the design and implementation to increase acceptability among students with varying challenges and constraints in study planning (e.g., part-time students or students who have switched study programs).

To gain an understanding of the target group, its behavior, interests, needs, and requirements, a mixed method approach was chosen, inviting students to partake in focus groups at the three participating universities and sending out an online survey to all students. Furthermore, the project integrated a first study planning tool prototype that had been developed for computer science students at RWTH Aachen University in a previous project. The tool was initially based on available study planning resources, and design considerations were drawn from the limitations of fixed, recommended plans in examination regulations and related work [7, 8]. In combination with the focus group and survey results, requirements and user stories were formulated to guide the design of the StudyBuddy. As a core functionality of StudyBuddy, we aim to integrate rule-based AI to provide individual feedback to students’ planning actions and conformance checking of resulting study plans.

In its current prototype version (see Fig. 2), the user interface relies on a table-like visualization of mandatory modules and placeholders for elective modules. All modules can be moved between semesters, which allows students to easily enter the current study path they have taken so far and also plan future semesters. While the columns of the interface indicate semesters, rows indicate different areas of a study program. All mandatory areas of the study program are listed. Elective areas can be added if students actively select them. Students can mark passed or failed courses in past semesters, indicating their study progress. By using placeholders for electives, students can plan abstractly at first and can make it more explicit later, i.e., filling in the actual module when they have decided which one to take. Students can select their start semester and adjust all modules based on their current study progress as well as their plans for upcoming semesters. Additional columns for future semesters can be added with one mouse click, allowing students to plan beyond the expected standard duration of a study program.

Fig. 2
figure 2

Sample screenshot of the current version of StudyBuddy

While the first user evaluation using the interactive prototype has not yet been completed, early results show generally good handling of the interface and encouraging new feature wishes mentioned during the sessions [8]. While in its current form, planning feedback is only limited to fixed feedback on module cycles and hard requirements, ongoing developments for the next prototype include rule-based AI feedback.

Using rule-based AI to check study plans

When considering the study program with its examination regulations, containing rules (e.g., credit limits on certain elective areas) and recommendations (e.g., prior completion of modules to ensure sufficient prior knowledge), study planning can be understood as a structured domain and may benefit from rule-based AI approaches. In the scope of the project, rule-based AI is applied to students’ planning actions and resulting study plans when using StudyBuddy. The objective is to make rules and recommendations available to students during the planning process and reduce the potential for human error (i.e., when students need to manually check rules and regulations to check the conformance of their study plan). Through rule-based AI feedback, students receive immediate support and can make informed study planning decisions.

To provide rules and recommendations for study programs through rule-based AI, a machine-readable representation of the study program model is required. While module handbook entries might be already available through the CMS, examination regulations themselves are usually text-based documents following legal notations. As such, manual conversion into formal, machine-readable notations is needed to provide it as input to the rule-based AI component. Using a symbolic approach, rules and recommendations are transformed into a custom notation, suitable for evaluation with formal logic calculus. Alternatively, symbolic AI approaches could be used from the very beginning to model and create examination regulations, thereby alleviating the issue of machine-readability, as proposed in [15]. To allow for the assessment of past events and planned events, the event calculus notation is extended with eventualities and concepts of actions used in planning (a sub-discipline of AI). Lastly, combining all AI feedback with human-readable explanations allows to provide explainable feedback to students and for them to change and adapt their plans accordingly.

While plans can be checked using rule-based AI, we also aim to support planning behavior through recommendations. Combining rule-based AI with data-driven approaches, past study paths can be analyzed to provide recommendations on how to plan. Besides general recommendations of successful study paths, including the students’ current study progress allows the tailoring of recommendations and the proposals of modules where choices must be made or support automated planning features with the rule-based AI. Here, established methods of recommender systems will also be considered, but through process mining; the focus will be on generating recommendations in the form of rules that can be used in combination with the rule-based feedback in StudyBuddy. In process mining, process models based on past study paths are computed and, depending on data filter approaches, different models for different cohorts of students (e.g., based on overall grade) can be computed. These models can then be used to extract association rules which can be used as recommendations in the rule-based AI component of StudyBuddy. This way, process mining can be combined with rule-based AI to support study planning. More abstract, process mining allows mining successful behavior study paths and generates recommendations for individualized planning support.

Development of a dashboard for study program monitoring and cohort analysis

By exploring study path data from different student cohorts, various data analyses can be performed to monitor student cohorts and systematically identify successful study paths and those that should be favored, as well as less successful or problematic paths. Using methods of process mining, study path data can be transformed into process models and used for process analyses, conformance checking, and process optimization. The analysis results can then be prepared in the form of dashboards with interactive visualizations, allowing users to gain data-driven insights into different student cohorts and study paths. The focus here is on the target group of study program designers, which, as described above, includes personnel involved in the coordination and design of study programs, as well as academic advisory services and quality assurance. With access to study path analysis results, study program designers can move toward evidence-based curriculum development and make improvements and adaptations to study programs.

To support study program designers in their respective tasks, we design and develop BuddyAnalytics, a dashboard-based web application for study program monitoring and cohort analysis.

User-centered application development

Similarly to StudyBuddy, the development of BuddyAnalytics also follows an iterative user-centered design approach of understanding, defining, designing, and evaluating. In each iteration, the application prototype is explicitly evaluated with the target group to tailor it to their needs and requirements.

In the first step, study program designers were invited to an interactive workshop to gain a solid understanding of the user group’s needs and requirements, as well as their tasks and responsibilities. As a result, we identified the four task groups of study program planning, coordination and administration, advisory services, and quality assurance and (re-)accreditation. We then identified the available workflows and tools already used by study program designers, in order to collect requirements and desired features for BuddyAnalytics. While the identified tasks helped us to understand the user groups’ responsibilities, collecting information on existing workflows and tools provided valuable insights into issues that are not yet covered or can be improved. The quality of available data and the format and accessibility of information were often criticized. While some tools offer basic analyses for common reporting tasks, the study program designers expressed a need for specialized analyses, going beyond simple statistics.

Next, requirements and user stories were formulated to guide the design and implementation of the dashboard. Using wireframes as low-fidelity prototypes, the design of the first version of BuddyAnalytics was outlined and then implemented as a web application. The core feature of the application is customizable dashboards with widgets for different analysis results. The dashboards are available for different study programs as well as different modules, presenting a set of analyses using different visualizations. Visualizations can be filtered (for customization) and exported (for re-use outside of BuddyAnalytics). Figure 3 depicts a dashboard for a computer science study program, with multiple widgets showing analysis results.

Fig. 3
figure 3

Sample screenshot of the current version of BuddyAnalytics

First user evaluation results with pilot testers and a small group of study program designers show that dashboards with multiple widgets, showing various visualizations, can be overwhelming and difficult to overview. Thus, besides dashboards, a tree-based reporting view was developed. By categorizing the different analyses for study programs and modules, study program designers can navigate to specific analyses and customize the visualization to their needs. Furthermore, we aim to develop customizable dashboards to support study program designers in creating their own task-specific dashboards, for example, a dashboard containing all indicators required when creating reaccreditation reports. Future evaluations will provide further insights into the suitability of features and design decisions for the target group.

Using process mining to analyze student cohorts

Going beyond the statistical analysis of study path data, process mining approaches can be used to investigate student behaviors by interpreting them as processes in which students register, attend, and pass or fail courses and exams throughout multiple semesters. Process mining systematically utilizes event data to improve operational processes [14]. It utilizes methods of data science and machine learning, and thus, we categorize it toward the data-driven AI paradigm. In the scope of the project, we refer to educational process mining using event logs that are retrieved or computed from the event data of CMS. By discovering process models representing different variants of students’ paths in a study program, we can gain insights into cohort behavior and compare students’ paths to recommended study plans (also perceivable as process models). Furthermore, different types of process mining tasks can be conducted. Here, we can distinguish between backward-looking tasks of conformance checking, performance analysis, and comparative process mining, as well as forward-looking tasks of predictive process mining and action-oriented process mining, turning diagnostics into action [14]. Thus, educational process mining can not only provide insights into students’ behavior within a study program, but it can also help to guide curriculum development.

In the following, we outline three exemplary analyses based on educational process mining:

  • Firstly, study program designers are interested in course orders and students’ conformity towards recommended study plans. As such, process discovery is used to generate a process model of past study paths. While, hypothetically, the majority of students may follow a recommended study plan, and the process model may confirm the most frequent variant, process models also allow for review of other common variants of study paths and can help study program designers to understand which course orders can be changed and still complete the study program successfully.

  • Secondly, study program designers are interested in identifying more and less successful study path variants. By filtering the input data respectively, process discovery may lead to different process models depending on indicators such as a short study duration and a good grade point average (GPA). To this end, comparative process mining approaches can be used to gain insights into differences and commonalities between well and badly performing students with a perspective on study paths.

  • Lastly, when analyzing student behavior, different levels of granularity can be considered. Using the event data from a CMS allows us to analyze registration and deregistration behaviors in combination with successfully passing modules. Process mining can be applied to analyze how students deal with difficult modules, and when they decide to defer exams instead of taking them. This may give study program designers insights into the behavior of struggling students and could be used for advisory purposes.

Overall, educational process mining using event data from a CMS allows in-depth data analysis while perceiving students’ actions as processes. Similarly to study planning, study paths can be viewed as processes of students moving through a study program over time. A rather detailed analysis of course orders, conformance, and overall performance allows quality assurance and evidence-based curriculum development. By including process mining results in BuddyAnalytics, they become available to the target group of study program designers without the requirements of being data scientists themselves.

Limitations

While exploring the utilization of rule-based AI and process mining to support study planning and cohort monitoring, some limitations must be respected. Although the project involves the perspective of three German universities, it was not possible to include all study programs of each university. The data reference model considers the models of different study programs and the needs for the different data processing techniques. Still, other study programs or universities may bring up new models that might require adjustments. Thus, the resulting data model is only generalizable to a certain extent, and future iterations may widen its applicability for different study programs in more institutions.

It should be noted that considering only a limited number of study programs, to keep the project manageable, implies that trained models used for different reports or to generate recommendations for study planning, must be carefully reviewed before applying them to analyze further study programs (e.g., due to overfitting). Further, data quality has to be examined. The heterogeneity of a universities’ data might require interpretation when transforming it to the data reference model, which might impact analysis outcomes. Finally, the number of enrolled students in a study program should be considered. Too small cohorts might not create a sufficient amount of event data that can be analyzed suitably.

When recommending detected indicators for success to students and study program designers, the presentation must be considered carefully. Especially with recommendations for study planning, it is important to respect different optimization objectives in a balanced manner. For example, students should not be recommended to not take the elective courses they are interested in, simply because the average grade of these courses tends to be lower than the one of others.

A challenge in the project and beyond is the acquisition of machine-readable study program models. While module handbook data might be extractable from an institution’s CMS, the examination regulations for a study program mostly consist of legal formulations that need to be carefully transformed. So far, this was done for the study programs considered and is required for future study programs in order to enable AI-based support of study planning. With the advance of large language models, future work may explore how models can be extracted (semi-)automatically.

Overall, limitations of this work present opportunities for further research as well as developments to enable universities to employ AI in support of study planning and cohort monitoring. This should include close integration of the different target groups to ensure suitability and benefits for their respective tasks and responsibilities.

Conclusion and outlook

Applying AI and data science to analyze study paths yields the potential to provide meaningful insights to students and study program designers that can support study planning and student cohort monitoring. However, a standardized infrastructure to collect and process the necessary data foundation as well as report the analysis results in suitable ways, adapted to different requirements or desires for insights, is still missing. Therefore, the project AIStudyBuddy aims to define the necessary interfaces, including a data reference model, and provide an implementation of the conceptualized components. While the user-facing web applications are developed iteratively with target users in the loop, the data warehouse and analytics components are developed according to best practices, considering performance, fast result retrieval, and security.

Previous work and the first results from the project indicate the creation of valuable benefits for students and study program designers, which should be explored further. The automatic processing of large amounts of data allows study program designers to spend more time reflecting on analysis results and advising students. Students can benefit from potential lessons learned from previous study paths and create individual paths more adjusted to and suitable for their situation, which might lead to increases in study success and a reduced number of dropouts.

Besides the technical solutions presented in this article, organizational challenges at a single university need to be reflected. Especially data protection has to be considered carefully. While ethical guidance within the project conducts a data impact assessment that is also usable by others, specific guidelines and requirements at any institution should be assessed individually.

While the project will only provide a first prototypical implementation of AI-supported study path analysis, the approaches and advancements of the project might also apply to other types of monitoring and different education-related data, for example, correlating student lifecycle data with activity data from learning management systems to investigate course activities and their influence on students’ study paths.