1 Introduction

Self-regulation is the process by which students take control of their learning process to enhance their academic performance and learning during a course (Vilkova 2022). Prior studies have shown that a lack of self-regulated learning (SRL) skills can be an important factor that leads to failure and students dropping out of courses (Kizilcec et al. 2017). It has also been shown that effective feedback and action recommendations are essential for SRL, and are significantly correlated with students’ learning and performance (Algayres and Triantafyllou 2020). However, teachers find it challenging to understand which course activities and resources should be delivered in the form of action recommendations to help students to improve their learning and performance (O’Lynn 2021; Fariani et al. 2022).

To address these issues, several studies (Corrin and De Barba 2014; Kizilcec et al. 2017; Sáiz Manzanares et al. 2017; Garcia et al. 2018; Matcha et al. 2019; Afzaal et al. 2021; Corrin and De Barba 2014; Di Mitri et al. 2017) have been conducted in the recent past with a focus on providing SRL support to improve students’ learning and performance. Researchers have developed learning analytics (LA)-based feedback and reflection dashboards for students to support them in the various phases of SRL, such as forethought, performance monitoring, and reflection (Garcia et al. 2018). In regard to the forethought phase, some studies have used questionnaires and have applied regression approaches to students’ responses to examine the significance of goal setting, tasks planning and motivation in improving students self-regulation (Corrin and De Barba 2014; Kizilcec et al. 2017; Sáiz Manzanares et al. 2017). For performance monitoring, learning strategies and interactive visualisations have been developed to support students with timely feedback and guidance that can help them to analyse their performance (Matcha et al. 2019; Afzaal et al. 2021). In the reflection phase, students have been assisted in adjusting their learning behaviour according to reflections provided by LA-based dashboards (Corrin and De Barba 2014; Di Mitri et al. 2017). For instance, in one study (Yousef and Khatiry 2021), students were supported through performance predictions and visualisations of their learning progress, which enabled them to adjust their learning behaviour and make better decisions.

However, these approaches have often used only self-reported SRL strategies (Alonso-Mencía et al. 2021), rather than actual strategies obtained from students’ interactions (event-based strategies, such as SRL sequence patterns), and have not evaluated the ways in which SRL strategies can have an impact on students’ performance (Kizilcec et al. 2016; Wong et al. 2019; Moreno-Marcos et al. 2020). Moreover, such dashboards have used performance prediction output as feedback, and although this can be helpful to some extent, it does not provide any meaningful insights or actionable information on the reasons behind the prediction (Baneres et al. 2019; Akçapınar et al. 2019; Mubarak et al. 2022; Cano and Leonard 2019); that is, students do not receive actionable feedback on their learning. Although some researchers have moved beyond this and have used factors as feedback that can affect students’ performance over time (Afzaal et al. 2021; Cavanagh et al. 2020; Nouri et al. 2019), these approaches do not provide students with actual explanations of these predictions and factors, which could help students to regulate their learning behaviour in a data-driven manner.

Thus, the development of dashboards that can provide automatic and intelligent feedback in the form of data-driven recommendations to support students’ self-regulation is an issue that has been overlooked (Ramaswami et al. 2022). Recently, explainable artificial intelligence (AI) approaches (e.g., counterfactual explanations (CFs) (Verma et al. 2020)) have been proposed to explain these predictions and generate insights from predictive models that focus on the relevant actions that would be required for a particular student to achieve higher grades in ongoing courses (Adadi and Berrada 2018). Such insightful actions could be used as action recommendations for students, allowing them to regulate their behaviour in a data-driven manner. In this context, we pose the following research questions

  • How can we employ an explainable AI approach to compute data-driven feedback and generate actionable recommendations to support students’ self-regulation?

  • What are the performance effects of a dashboard that provides self-regulation support to students based on the above explainable AI approach?

The aim of this paper is to answer the above questions. We propose an explainable AI-based approach that predicts students’ performance and computes informative feedback and actionable recommendations to support self-regulation by students. Moreover, we develop and evaluate a dashboard based on the proposed approach.

An evaluation was conducted in a real educational setting to assess the effects of our dashboard on students’ academic performance and to examine the benefits and limitations of this dashboard in terms of SRL. The results of this evaluation revealed that the proposed approach provided intelligent recommendations to students that were both relevant and useful. An investigation of the effects of the dashboard indicated that students who used it achieved higher performance than those who did not. Moreover, the dashboard helped students to set their learning goals, plan and monitor their actions, and reflect on their current learning progress in the course.

The main contributions of this paper are threefold; firstly, we propose an approach that provides SRL support to students during the courses using explainable AI. Secondly, we develop a CF-based algorithm that gathers data on each student’s interactions with regard to the currently running course, and produces a set of recommendations that can support students in achieving higher grades/performance. Thirdly, the proposed approach is implemented in the form of a dashboard, which is evaluated in regard to a university course to examine its effects on students’ academic performance.

2 Self-regulated Learning

SRL has been defined as “the processes whereby students activate and sustain cognitions, affects, and behaviours that are systematically oriented toward the attainment of personal goals” (Zimmerman and Schunk 2011). According to the model in (Zimmerman 1990, 2002), SRL consists of three main phases: (i) the forethought phase, which refers to thinking before action, and includes a set of tasks such as goal setting, strategic planning, motivation, and self-efficacy; (ii) the performance phase, which involves putting plans into action, monitoring the progress of study, and time management; and (iiii) the reflection phase, which involves evaluating the learning actions and performing self-reflection on the learning process.

In prior years, several studies have been carried out on each of these phases to improve SRL processes. The following sections present a review of recent studies of these phases.

2.1 Forethought Phase

The forethought phase is highly significant in SRL, as it initiates the process of strategic planning for future tasks. As mentioned above, the forethought phase consists of a set of tasks with a focus on goal setting and strategic planning, to help students attain their learning goals. The authors of Kizilcec et al. (2017) applied logistic regression to responses from 4,831 students about goal setting, strategic planning, self-evaluation and help seeking. They found that goal setting and strategic planning were significant positive predictors of goal attainment, while help seeking was a significant negative predictor of goal attainment. Similarly, the study in Siadaty et al. (2016) examined the effects of technological scaffolding interventions on the micro-level processes of SRL (referring to activities such as goal setting in the planning phase), and found that a specific intervention (e.g., learning paths and learning activities) was a determinant for the micro-level processes of goal setting and making personal plans within the macro-level engagement process.

Motivation and self-efficacy are the core tasks in the forethought phase, and are based on an individual’s desires and beliefs in terms of achieving a certain goal. The majority of studies of the forethought phase have used a questionnaire at the beginning of a semester or course to measure student motivation. For example, the authors of Corrin and De Barba (2014) used the Motivated Strategies for Learning Questionnaire (MSLQ), whereas in Sáiz Manzanares et al. (2017), observable variables such as engagement were employed along with a questionnaire; it was found that the latter combination was valuable, as the self-reported motivation and other SRL variables provided insight into the students’ perspectives (i.e., learner experiences). The study in Cicchinelli et al. (2018) explored the correlation between self-efficacy and self-regulation strategy using three datasets (questionnaire, activity logs, and performance), and found that students who believe in their own efficacy also use cognitive strategies appropriately and are good at self-regulation.

2.2 Performance Phase

In regard to the performance phase, a number of studies (Davis et al. 2016; Matcha et al. 2019; Molenaar et al. 2021; Ott et al. 2015) have developed LA feedback dashboards for students based on their interactions via a learning management system (LMS), in order to provide help in terms of monitoring their performance, activities and SRL behaviour. For example, in Davis et al. (2016), a learning tracker widget was developed that supported students with timely and goal-oriented feedback, and its use was found to improve students’ performance. The authors of Molenaar et al. (2021) found that students needed teacher regulation and additional practice to acquire skills, but were mostly able to regulate their learning to a reasonable level. However, the researchers in Ott et al. (2015) reported that although students valued the provision of information and guidance, this did not change their behaviours. Their findings also suggested that a high level of engagement does not necessarily trigger action. Many studies (Afzaal et al. 2021; Matcha et al. 2019; Montgomery et al. 2019; You 2016) have therefore applied further analysis to students’ engagement with LMSs to identify the protentional indicators or factors that can help students to achieve higher grades. For example, in Afzaal et al. (2021), an explainable machine learning (ML) approach was implemented within an online course to identify the factors (e.g., active participation, performing exercises and etc) that could play a significant role in improving students’ academic performance.

Other recent studies (Cicchinelli et al. 2018; Kizilcec et al. 2017; Lau et al. 2017; Manso-Vázquez et al. 2018; Matcha et al. 2019) have identified different kinds of learning strategies, and have explained the characteristics of these learning strategies through an analysis of students LMS data. For example, the work in Matcha et al. (2019) applied clustering, sequence mining, and process mining approaches to online preparation activities and detected three groups of strategies, which were denoted as strategic-moderate engagement, highly selective-low engagement, and intensive-high engagement. The authors found that higher performers used many different learning strategies, whereas low performers relied on a single strategy. However, these approaches could not determine whether students were easily able to modify their learning strategies based on the feedback provided. Exploring this research gap could contribute to an understanding of effective approaches for providing feedback to students, and the resulting guidelines could be put into action (Matcha et al. 2019).

Time management is another important task in the performance monitoring phase that can help students to manage their work. For example, in Tabuenca et al. (2015), a mobile LA approach was proposed that logged students’ study time via a mobile app and then presented an insightful visualisation to help students not only to manage their time but also to improve their time management skills. Similarly, the authors of Papamitsiou et al. (2018) applied a qualitative comparative analysis based on fuzzy sets to students’ response times to the questions in a progress assessment test, and found that students with good time management skills spent very little time on answering the questions and gained higher grades on their courses.

2.3 Reflection Phase

Several studies (Di Mitri et al. 2017; Gašević et al. 2014; Manso-Vázquez et al. 2018; Rohloff and Meinel 2018; Scheffel et al. 2013; Silva et al. 2018) have focused on the reflection phase, in which students evaluate their learning process. For example, the work in Gašević et al. (2014) examined the use of video annotation software to analyse graded and non-graded reflection in relation to the linguistic and psychological processes linked to self-reflection annotation in videos. The results showed that in the graded reflection scenario, students used more linguistic and psychological processes in their reflections using the video annotation tool. Moreover, their results showed that students made most of their reflections at the beginning of the videos. Similarly, in Corrin and De Barba (2014), an LA-based reflection dashboard was developed that used data on students’ academic performance and engagement to help students to self-reflect. The authors found that this improved students’ learning reflection in terms of motivation, planning, goal-setting, design, performance, and engagement. The authors of Scheffel et al. (2013), Di Mitri et al. (2017) used written feedback to provide prompts to learners to reflect on their learning experiences, while the scheme in Yousef and Khatiry (2021) supported students’ awareness, reflection, and learning processes using cognitive and behavioural LA dashboards, and the researchers found that cognitive dashboards had the advantage of promoting awareness and self-reflection, and had an impact on the learning process.

3 Method

In this study, we present an explainable AI-based approach and a dashboard that aims to compute data-driven feedback and generate actionable recommendations to support students’ self-regulation. The proposed approach (as depicted in Fig. 1) processes data on students’ educational activities to build predictive models using ML algorithms. Predictive models are then employed along with explainable AI methods to generate feedback and recommendations for each student on an individual basis. The outcomes of the proposed approach are presented in the form of a dashboard, which was designed and developed based on the cyclic phase model of SRL suggested by Zimmerman and Campillo (2003). This dashboard was evaluated in a real educational setting (a university course) to examine its effects on students’ academic performance. In the following sections, the phases of the proposed approach and the design, development, and evaluation of the dashboard are discussed in detail.

Fig. 1
figure 1

An explainable AI-based approach to support student self-regulation

3.1 Data Collection and Pre-processing

The data in this study relate to a programming course that was taught over 10 consecutive years using a Moodle-based LMS at (anonymous) University. Since the course requirements changed over the years, we used only the last year (2020) of student data to conduct this study, as this course was fairly similar to the one in the current year (2021). In 2020, 585 students enrolled on the course, and several inclusion criteria were applied: the students were required to pass the course and to attempt quizzes and assignments, to participate actively in the course activities, and to watch video lectures via the LMS. When these inclusion criteria were applied, 139 students did not qualify for further processing and were excluded from the list. Data on the remaining 446 students were extracted from the LMS, including information on quizzes, assignments, exercises, video logs, and reading logs.

After the data had been collected, pre-processing was performed to maintain the students’ anonymity and to remove redundant entries. Firstly, to maintain the students’ anonymity, we did not use their names directly but replaced them with unique identifiers (IDs) that were generated randomly. Secondly, to remove redundant data, incomplete and duplicate quizzes, exercises, and assignment attempts were removed from the dataset. Finally, each student was labelled based on their performance in the final exam. The criteria were set according to the grading system shown in Table 1, where 300 was the maximum score and 140 was the minimum needed to pass the final exam.

Table 1 Label assignment based on final exam score

3.2 Feature Generation

After pre-processing, features were generated that could be grouped into six categories: (i) quiz attributes, which included each quiz score (one for a correct answer and zero for an incorrect answer), and the total quiz score; (ii) assignment attributes, which included the grade achieved; (iii) exercise attributes, which referred to information about the scores obtained in the exercises that the students performed throughout the course; (iv) activity completion attributes, which included views of course material (e.g., articles, PowerPoint slides, demo code examples, and external resources); for example, a score of one was given if course material had already been viewed, and zero otherwise; (v) lecture activity attributes, which referred to information about the scores obtained in the exercises performed by students during the lectures; and (vi) video lecture attributes, which included the coverage of lecture videos, for example whether a student had watched an entire lecture or only specific parts of it.

3.3 Predictive Model Building and Evaluation

When the features had been generated, the next phase was to build predictive models. In this study, we considered six well-known ML algorithms (logistic regression (LR), k-nearest neighbours (KNN), support vector machine (SVM), random forest (RF), artificial neural network (ANN) and BayesNet) to predict the students’ academic performance in the final examination. The aim was to comprehensively evaluate and compare the performance of these models to determine which one consistently performed best. At the model building stage, data resampling was performed using the synthetic minority oversampling technique (SMOTE) (Chawla et al. 2002), which randomly increases the number of minority class instances to eliminate the issue of class overfitting that arises due to class imbalances in the data. To evaluate the built models, a nested cross-validation procedure (Wainer and Cawley 2021) was applied to the experimental setup in which 10 folds were created. In order to ensure that the built models were generalisable, their performances were evaluated using the metrics of accuracy, precision, recall and the F-measure.

3.4 Generation of Recommendations Using Explainable AI

Having built the predictive models and evaluated their performance, the next phase in our approach involved developing a counterfactual explanation CF-based algorithm (Algorithm 1) that used the best-performing predictive model to generate personalised actionable recommendations for each student on an individual basis. CF involves the creation of insightful causal explanations about the output (prediction) of a predictive model, with a focus on the actions required for a particular student to achieve better academic performance in a course.

In proposed algorithm, we adopted the diverse counterfactual explanations (DICE) technique (Mothilal et al. 2020), which produces multiple CFs that are different from each other. For example, a predictive model may take the features of a student’s activity (as mentioned in Sect. 3.2) and predict the student as a middle achiever (predictive performance) on a course. We would then like to know what minimal changes in these student features would lead to a prediction of a high achiever (desired performance). In this case, DICE generates several CFs that are associated with different minimal changes in the given student features but which all lead to the same prediction of high achievement (desired performance).

The workflow of the algorithm is as follows: it takes as input the features of a student’s activity, a predictive model, the desired performance, and DICE, as shown in line 1. The output is a suggested set of activities (recommendations) needed to reach the desired performance, as shown in line 2. The initialisation step of the algorithm in line 3 specifies the user-defined explanations threshold (i.e., the number of different CFs produced), which in this case is set to three. DICE supports the generation of a set of diverse CFs that are different from each other using tunable parameters for the diversity and proximity weights. In general, it is considered that a CF that is closer to an individual student’s profile will be more feasible (Mothilal et al. 2020). However, diversity is also important, in terms of allowing us to choose from among multiple possible CFs. Hence, using tunable DICE parameters for proximity and diversity, we generated three CFs. After testing several different proximity and diversity weights, we found that a proximity weight of 2.0 and a diversity weight of 1.5 were suitable for generating diverse CFs on the collected dataset.

To produce the CFs, all inputs (predictive model, student features, desired performance and threshold) are passed to the DICE function, as shown in line 4, and three CFs are created; although these are different from each other, their target performance is similar to the desired performance, as shown in Table 2. Each CF represents the smallest change in the student feature values that would alter the prediction to give a predefined output (desired performance). For example, as shown in Table 2, if the U5.1 activity score is improved from four to eight, this will enable the student to achieve the desired performance. However, of the three suggested CFs, it is necessary to choose a suitable option that is easy to achieve and has a high probability of yielding the desired performance.

figure a

To find the most suitable CF for a student, we calculate the probability value of each CF with respect to the desired performance, and the one with the highest probability value is chosen, as shown in lines 5 and 6. For example, in Table 2, we see that CF2 has a higher probability value (p = 0.92) than CF1 and CF3 (p = 0.72 and p = 0.83, respectively). Consequently, based on a comparison of probability values, CF2 is chosen as a suitable CF for the student. If all of the probability values for the CFs are the same, then the correlation between the values of the actual student features and the three CFs is calculated and the closest option selected, as shown in line 7. When the CF has been selected, the student knows which set of recommendations should be worked on, but it is not clear which recommendations are more important than the others, meaning that the student may work on all the recommendations except the one that is most important in terms of achieving the desired performance.

Table 2 Counterfactual explanations produced

Hence, to generate more actionable recommendations, the impact of each feature is calculated and the features are then sorted accordingly. To reach this goal, features with values equal to or lower than their actual values are removed, as the student does not need to work on these features. The remaining feature values are then changed one by one as recommended in CF2 (Table 2), to calculate the impact of each feature (by recalling the first step of the algorithm) on the student’s performance, as shown in lines 8 and 9. Following this, the features are sorted based on their impact scores, as shown in Table 3. As a result, more actionable recommendations are generated, thus enabling the student to focus on recommendations that are highly important first, and then to address them in order.

Table 3 Counterfactual explanations produced

3.5 Designing of Dashboard to Support Student Self-regulation

The output of the algorithm consists of a list of features and their values; however, it is still difficult for a student to understand and interpret this information. Hence, in order to provide information that can act as effective feedback, we designed and developed a dashboard, as shown in Fig. 2, to help students understand the information and to make it easy to follow the recommendations. The dashboard was designed based on the cyclic phase model of SRL suggested by Zimmerman and Campillo (Zimmerman and Campillo 2003). This model is organised into three phases: forethought, performance and reflection. In particular, the dashboard focuses on setting course goals, planning and performing actions/recommendations on learning activities, monitoring, and reflecting on performance and learning behaviour.

In the forethought phase, the dashboard helps students to set their course goal (desired performance) and to arrange their learning activities (recommendations) based on this goal. For example, suppose a student wants to become a high achiever on the course and sets a goal accordingly: recommendations will then be generated based on this goal, using Algorithm 1. The recommendations in Fig. 2 not only inform students about the essential tasks but also provide a priority for each task, along with a required score for each activity that would need to be achieved to reach the course goal. These additional parameters were added to increase students’ motivation towards the recommendations and allow them to plan their activities based on the priority level.

In the performance phase, the dashboard groups the list of recommendations into different categories, such as assignments, lecture tasks, lecture videos, and exercises, and provides an interface allowing the student to perform actions on them. For instance, when a student has fulfilled the requirements associated with a recommendation, he/she can click a checkbox to complete the recommendation and receive new recommendations. Furthermore, students can monitor their current progress on the course, as the system informs them about the areas in which they are falling behind and how much progress is required. For example, the student in Fig. 2 has completed 100 percent of the requirements related to the video lectures, but has not completed the other tasks needed to reach the desired goal (high achiever). As a result, the progress information in the form of progress circles along with a percentage could provide effective monitoring of current performance.

In the reflection phase, the dashboard shows a performance prediction for the student’s final exam as a low, middle, or high achiever based on their actions (completed recommendations). This predicted performance can help students to reflect on their learning behaviour and motivate them to perform more of the actions recommended by the dashboard. Moreover, as the course progresses, the dashboard also provides an evaluation of the completed recommendations. For example, the dashboard highlights (as shown with a yellow arrow in Fig. 2) those recommendations that were marked as complete by the students but did not meet the requirements (e.g., a minimum score on a quiz). As a result, students can take action on these recommendations again to complete the recommendation requirements.

3.6 Evaluation of the Dashboard

In this section, we describe the process of evaluating the dashboard, which consisted of three steps: first, limitations and gaps in the dashboard were identified based on a previously conducted study (Afzaal et al. 2021). Next, the dashboard was redesigned and developed based on the identified gaps and limitations, such as setting course goals, displaying the required score for each quiz or exercise, sorting the recommendations based on their importance, and issuing timely notifications when a student is at risk of failure. Finally, an improved version of the dashboard (Fig. 2) was evaluated in a real educational setting to measure its effects on students’ academic performance. We drew up four hypotheses for this evaluation, as follows:

  • Hypothesise 1 (H1): Academic performance differs between students who use the dashboard and those who do not.

  • Hypothesise 2 (H2): Students’ academic performance is correlated with dashboard usage.

  • Hypothesise 3 (H3): Students’ dashboard usage varies between different academic performance groups.

  • Hypothesise 4 (H4): The dashboard, based on data-driven feedback and action recommendations, can support students in self-regulation

3.6.1 Participants

We recruited 356 students from an undergraduate programming course offered in the fall of 2021 at Stockholm University, Sweden. The course was taught in an online learning format via the Moodle LMS. The dashboard was available to students between the third and tenth weeks of a 10-week course, where the start date was determined by the teachers. Students were following diverse programs within the university, such as business administration, marketing communication, software engineering, interaction design, digital media, and enterprise systems. Half of the students (N=178) were randomly selected to use the dashboard throughout the course, whereas the remainder (N=178) were not allowed to interact with the dashboard during the course.

Fig. 2
figure 2

Dashboard to support student self-regulation

3.6.2 Procedure

At the beginning of the course, a detailed demonstration of the dashboard was presented to the students, and each feature was explained; for instance, they were shown how to follow a recommendation. A usage guide was also developed and given to the students. The dashboard access link was shown at the top of the course page in Moodle. Students were informed that their dashboard usage would not be reported to their course teachers, and consent to collect their usage data was received. The dashboards were turned on after the demonstration, and were turned off at the end of the course. Each student’s interactions with the dashboard were logged. At the end of the course, participants were asked to fill out a detailed open-ended survey, in which the questions focused on describing how the dashboard helped them in their learning and the challenges they faced while using the dashboard.

3.6.3 Data Collection

Once students had finished the course and responded to the survey, data collection was performed to evaluate the effects of the dashboard on the students’ performance. Data collection was performed in three steps to gather all sources of information that could be beneficial for evaluation: (i) performance data including each student’s score in the final exam of the course, where 300 was the maximum score and 140 was the minimum score to pass the final exam; (ii) dashboard usage data, including how much time a student spent on the dashboard on a daily basis, how many times a student logged into the dashboard to visualise feedback and recommendations, and how many recommendations a student completed throughout the course; and (iii) survey response data, including the answers from each student to the questions about the benefits and challenges of the dashboard. The survey was sent to the 178 participants who used the dashboard, and 103 responded.

3.6.4 Data Analysis

When the data had been collected, they were analysed in three steps. In the first, the performance of the students, their dashboard usage, and the survey data were screened to ensure the quality of the collected data and to avoid problems such as outliers, duplicates, and missing values. For example, for the survey responses, we adopted the responses screening method (DeSimone et al. 2015) to determine which participants paid attention and avoid the particular type (e.g., the same answer to each question) of responses. For the dashboard usage data, we considered only the completed/followed recommendations where the student fulfilled the criteria provided via the dashboard (e.g., a required score). In the second step, a number of statistical tests were performed to test the proposed hypotheses. A t-test, a series of one-way ANOVA, and Tukey post hoc tests were conducted to compare the effects of dashboard usage on performance (Hypothesis 1) and the differences between and within groups showing different levels of performance (Hypothesis 3). A Pearson’s correlation test was performed to explore the associations between dashboard usage and the students’ performance (Hypothesis 2).

In the third step, a thematic analysis was conducted on the survey data, and themes were identified from students’ responses to measure how the dashboard guided students in terms of SRL (Hypothesis 4). In this analysis, seven main themes emerged, namely goal setting, planning, motivation, plans into actions, monitoring, time management, and self-evaluation, which correspond to the three main phases of SRL. Following this, a sentiment analysis was conducted to classify the students’ responses in regard to each theme into two categories: (i) positive responses, which expressed a positive opinion with no concerns, and (ii) negative responses, which expressed negative observations or concerns.

4 Results

In this section, we present the results of the experiments conducted to evaluate the effectiveness of our proposed approach and the developed dashboard. Our approach was evaluated in terms of its ability to predict student academic performance, whereas the dashboard was evaluated based on its ability to support students in SRL.

4.1 Results of Student Academic Performance Prediction

The predictive performance of our approach in terms of students’ scores on the final exam, based on their interaction with the course material and assessment activities, is presented in Table 4. The results show that RF and ANN outperformed the other ML algorithms on all evaluation measures, with scores of 0.88 for the metrics of accuracy, precision, recall and F-measure. LR and KNN were the second-best algorithms, with an accuracy of 0.86 of LR and 0.85 for KNN (i.e., scores of about 0.02 below those of RF and ANN). The remaining ML algorithms delivered the lowest performance, with an accuracy of 0.66 and values of 0.66 for the precision, recall, and F-measure. From Table 4, it is evident that RF and ANN are the best algorithms for analysing the student data and should be adopted in the subsequent phases of the proposed approach.

Table 4 Final exam predictions using different ML algorithms

To achieve better prediction performance, the hyperparameters of ML algorithms were tuned. For the ANN, we tested several different versions and obtained the best results using two hidden layers, where each layer contained 30 neurons/cells. We then implemented the gridsearchCV algorithm (an exhaustive search over specified parameter values for an estimator) to test which combination of learning rate (0.1, 0.01, 1e-3, 1e-4, 1e-5) and batch size (10, 20, 40, 60) was best suited to this problem. We found that the best learning rate was 0.01 with a batch size of 32. For RF, we again implemented gridsearchCV to determine which combination of hyperparameters produced the best results, and found that a combination of 106 trees, a max number of features (considered for splitting a node) of four, and a maximum of eight depth levels yielded the best results.

4.2 Results of Dashboard Evaluation

In this section, we present the results of an evaluation of the dashboard in order to examine its effects on the students’ performance and to determine its utility and limitations in terms of SRL.

4.2.1 Hypothesis 1 (H1): Effects of the Dashboard on Student Performance

An independent samples t-test was performed to test the hypothesis that the students’ academic performance would differ between those who used the dashboard and those who did not. In this test, students were divided into two groups: Group 1 consisted of the students who used the dashboard during the course, while Group 2 contained the students who did not. Table 5 presents statistics for both groups. In this analysis, the final exam scores of students in each group were used to perform an independent samples t-test.

Table 5 Statistics on the groups of users

The results obtained for the t- and p-values (Table 6) showed a significant statistical difference (t = 2.302, p = 0.022) between these two groups. A comparative analysis of the means for both groups was conducted to determine whether this difference positively or negatively affected the students’ performance. The results indicated that mean for Group 1 (m = 201.63) was higher than for Group 2 (m = 186.93), as shown in Table 5. The higher mean for Group 1 shows that students who used the dashboard during the course achieved better final exam scores than those who did not.

Table 6 Results of an independent samples t-test

4.2.2 Hypothesis 2 (H2): Correlations Between Dashboard Usage and Performance

After evaluating the effects of the dashboard on students’ performance, we performed Pearson correlation to test the hypothesis: that students’ academic performance correlates with dashboard usage. Since the dashboard not only provides action recommendations but also allows for monitoring and reflection on learning, we divided dashboard usage into three usage variables: (i) followed recommendations (FR), (ii) spent time (ST), and (iii) login access (LA). FR represented how many of the recommendations provided via the dashboard were completed by the student, while ST indicated how much time a student spent on the dashboard per day in order to analyse the feedback provided and to view and complete the recommended tasks. LA shows how many times a student logged in to the dashboard to visualise feedback and recommendations. Table 7 presents the correlation results between usage variables and student performance.

After evaluating the effects of the dashboard on the students’ performance, we used a Pearson correlation to test the hypothesis that the students’ academic performance was correlated with dashboard usage. Since the dashboard not only provided action recommendations but also allowed for monitoring and reflection on learning, we divided the dashboard usage into three usage variables: (i) recommendations followed (RF), (ii) time spent (TS), and (iii) access logins (AL). RF represented how many of the recommendations provided via the dashboard were completed by the student, while TS indicated how much time a student spent on the dashboard per day in order to analyse the feedback provided and to view and complete the recommended tasks. AL shows how many times a student logged into the dashboard to visualise feedback and recommendations. Table 7 presents the correlation results between the usage variables and student performance.

Table 7 Correlations between dashboard usage variables and performance

It can be seen from Table 7 that there were significant correlations between all three usage variables and student performance. RF was more positively correlated with performance (r = 0.82) than TS and AL (r = 0.75 and r = 0.69, respectively). However, the significance level (p-value) of all usage variables was less than 0.001. Based on this correlation analysis, the conclusion can be drawn that students who used the dashboard more frequently and followed the recommendations were able to achieve better scores in the final exam of the course.

4.2.3 Hypothesis 3 (H3): Differences Between Performance Groups (Low, Middle, and High Achievers)

We also explored the differences between the performance groups in terms of RF, TS, and LA using a one-way ANOVA test and a Tukey post hoc to test the hypothesis that the students’ dashboard usage varied between different performance groups. In this analysis, performance groups were created based on course grades, as shown in Table 1. We then compared the different performance groups, but excluded students who failed the course (achieved a score of below 140). The ANOVA results (Table 8) showed significant differences between the three performance groups in terms of RF, TS and AL at the p< 0.001 level. When measured against student performance, significant differences were revealed between the performance groups for RF (F = 84.37, p< 0.001), TS (F = 94.08, p< 0.001), and AL (F=47.89, p< 0.001). In the following, we present a detailed post hoc test analysis of the differences between performance groups in terms of all usage variables (RF, TS, AL).

Table 8 ANOVA test results

The post hoc test revealed that the high and middle achiever groups followed more recommendations than the low achiever group, and the difference was significant at the p< 0.001 level. However, there was no significant difference between the high and middle achiever groups (p> 0.05), and they followed almost the same number of recommendations. The results for the time spent on the dashboard differed slightly from those for RF, as the high achiever group spent more time on the dashboard than the middle and low achiever groups, and a substantial difference was found at the p ≤ 0.001 level. The middle and low achiever groups showed no significant differences in terms of TS. For AL, the results showed that similarly to TS, the high achiever group was significantly different from the remaining two groups at p< 0.001. Based on these results, it is evident that dashboard usage positively affected the high and middle achiever groups, since they followed more recommendations and spent more time on the dashboard. As students used the dashboard more often, it improved their academic performance in the final exam.

4.2.4 Hypothesis 4 (H4): Dashboard support for students self-regulation

This section presents the results of a survey carried out to examine how the dashboard supported students in self-regulating their learning throughout the course. The survey was sent to the 178 participants who used the tool, of which 103 responded. Table 9 shows the percentages of participants’ responses to the different themes, based on the coding scheme described in Sect. 3.6.4. Overall, the results of the survey analysis seemed to support Hypothesis 4. In the following sections, we present details of how the dashboard supported students in each phase of SRL.

Table 9 Survey responses Forethought

Goal setting Students who used the dashboard were able to specify their course goals and make changes in their goal settings over time. As shown in Table 9, 70 participants responded in regard to goal setting, of which 92% (N = 64) provided positive responses. They were satisfied with the goal-setting feature and endorsed the importance of setting goals. Several respondents summarised the experience with statements such as: “Goal setting helped me to know what to study to reach the goal.” Another respondent echoed this positive view, saying, “I found it useful that I could fill in a goal grade and get tasks to do based on the grade.” On the negative side, 8% (N = 6) of respondents raised questions about the accuracy of the goal-setting feature; for example, a few participants mentioned that they set a goal but could not reach it, even though they followed most of the recommendations provided.

Planning When the goals had been set, the dashboard planned the essential activities and helped the students to achieve the goals. Overall, 98 participants responded about planning, and 93% (N = 91) were satisfied with the planning provided in the form of recommendations. Several respondents summarised the experience as follows: “Planning the activities gave a good overview of what needed to be done. It made it easier to plan studying hours.” A number of respondents agree with this positive view, saying: “Better organisation of the course, telling us what tasks to do when and guiding us to reach our goals.” One participant said: “It was straightforward to see how far I had come and where I was headed.” On the negative side, 7% (N=7) of respondents identified some weaknesses of the provided planning and recommendations. One participant responded, "The tool is not a calendar, and this course went so fast! Perhaps offer some date/time functionality to plan your studies better, such as when to complete specific assignments.

Motivation To achieve their learning goals, students needed different sources of extrinsic motivation, and the survey therefore asked students whether the dashboard motivated them during the course. Overall, 81 participants responded about motivation, and 81% (N = 66) felt that the dashboard had motivated them, for three main reasons: (i) it provided an up-to-date summary of their study progress; (ii) they were able to self-regulate their learning; and (iii) their grades on the assignments were improving. Several respondents made statements such as: “When you see your progress toward the goal, you feel motivated.” On the negative side, 19% (N = 15) of respondents did not feel motivated when using the tool. One participant responded, “I felt more anxiety looking at it than motivation (and I studied a lot).” Another participant responded, “It feels like the dashboard could be better at motivating the student and saying more encouraging stuff, instead of just ‘failed’. Maybe the wording could be different. Performance

Putting plans into action Once the goals had been set and the dashboard had planned the essential activities, students started performing these tasks. As shown in Table 9, 85 participants responded about actions, and 87% (N = 77) responded positively. Several respondents summarised their experiences as follows: “It was easy to see which assignments etc. were left and which completed on time.” Other respondents echoed this positive view, saying, “The dashboard provided a nice overview with a priority listing that was very convenient and showed which assignments should be prioritised to reach your grade, and which parts were lacking.” One participant said: “An overview of what I needed to work on made it easier for me to study.” On the negative side, 13% (N = 8) of respondents identified some challenges in terms of performing actions. A few participants responded, “There were not enough hints on what we were supposed to look for and what to change.” One participant responded, “It did not show the assignments as completed even though they were.

Monitoring As they were following the recommendations, the students monitored their learning via the dashboard to modify their learning behaviour. Table 9 shows that 88 participants responded about monitoring, and an overwhelming 98% (N = 86) responded positively. Several respondents summarised the experience as follows: “It showed me what needed to be done by showing the high, low or middle priority of the tasks that needed to be done to pass.” Another respondent agreed with this positive view, saying: “The progress bar is a great way to see how far you have progressed.” On the negative side, only 2% (N = 2) of respondents said that “The dashboard never adjusted itself to what I had accomplished.

Time management In addition to planning and monitoring, it was important for students to manage their time well by avoiding “procrastination” or “dragging out their tasks”. As shown in Table 9, although 87 participants responded about time management, only 43% (N = 37) had positive opinions, while 57% (N = 50) responded negatively. Several respondents summarised the experience as follows: “Maybe having deadlines, with clearer “warnings”, or suggesting some sort of schedule planning would be better.” Another respondent echoed this negative view, saying, “It was impossible for me to spend more time studying, so there wasn’t really any help on managing time. Reflection

Self-evaluation To enable students to evaluate their learning, the dashboard presented feedback to students on their learning and compared it with their learning goals. Overall, 101 participants responded about self-evaluation, of which 95% (N = 96) were satisfied with the course and the task-based evaluations. Several respondents summarised the self-evaluation as follows: “The dashboard shows what grade you will get at the current stage you are, and it tells you what you need to work on more.” Another respondent stated: “You get tips on which parts you have performed worse in and need to improve on.” On the negative side, 5% (N = 5) of respondents raised questions about the provided feedback. One participant mentioned, “I thought the bar would give me a calculated grade based on my activity. But I didn’t get that impression.

5 Discussion

5.1 Findings and Implications

We started with the first research question: how can we employ an explainable AI approach to compute data-driven feedback and generate actionable recommendations to support students’ self-regulation? As described in Sects. 3.4 and 3.5, we proposed and implemented an explainable AI approach along with a CF-based algorithm that generates recommendations based on each student’s course goal (desired course score or performance), as defined by the students themselves, to support them in SRL. For example, for a student who wants to become a high achiever in the course and who defines this as a course goal via the dashboard, the algorithm takes the student interaction data and course goal (high achiever) as input and generates recommendations that can help the student reach this goal. The results of the study show that the feedback and recommendations generated by the proposed approach were adequate, relevant, and useful for students in terms of self-regulating their studies during the course.

These findings lead us to the second research question, in which we aimed to examine the effects of the dashboard on students’ academic performance through providing self-regulation support. Four hypotheses were proposed for the second research question in Sect. 3.6. The analysis of the dashboard evaluation results in Sect. 4.2 supported our hypotheses and gave rise to four key findings. Firstly, we discovered that dashboard usage significantly affected the students’ performance, and that students who used the dashboard during the course obtained higher scores in the final exam than those who did not (Hypothesis 1). Secondly, the action recommendations generated by the proposed algorithm positively impacted students’ performance, as a positive correlation was found between following/completing these recommendations and the students’ performance (Hypothesis 2). Thirdly, our results showed that students who spent more time on the dashboard had a tendency to achieve better scores in the final exams of the course than those who spent less time on the dashboard (Hypothesis 3). Finally, the dashboard helped students to self-regulate their learning throughout the course, for example by setting desired course goals, planning learning activities, monitoring and reflecting on their performance, and following action recommendations (Hypothesis 4). However, the dashboard was not effective in supporting time management, due to a lack of appropriate scheduling for academic activities. Based on these findings, the answer to the second question is that the dashboard positively affected students’ academic performance and helped them to plan, monitor, and reflect on their learning according to their desired learning goals.

Our findings indicate that this study has direct implications for students, while teachers and institutes are indirect recipients. With regard to students, we present evidence that the proposed approach and dashboard can support students in self-regulating their studies by prioritising their learning activities, monitoring their course progress, and visualising real-time exam predictions. The provision of SRL support results in a change in the students’ learning behaviour, allowing them to complete the course on time. Another implication of this study is that intelligent action recommendations and timely feedback can motivate students to perform more actions (complete/follow recommendations), which leads to improved grades/scores for the assessments and final exams. As students start regulating themselves and obtaining better grades/scores in the course assessments, this indirectly supports teachers, as they are responsible (especially in massive courses) for continually intervening and guiding students through course activities, for example by emphasising the importance of a task, the time needed to complete it, and the required score or grade. Since the developed dashboard handles these responsibilities and gives information to students on a regular basis, this makes the teacher’s job easier. In addition, the development of dashboards such as these has implications for broader efforts by institutes to reduce the dropout from courses, especially for distance or online learning courses. In this context, the participants’ responses show that the dashboard provided early feedback on their progress and performance, making it easier for them to plan and monitor their studies effectively, which may eventually help to reduce course dropouts.

5.2 Limitations and Future Work

The proposed approach and dashboard were evaluated in only one programming course, which limits their generalisability. However, in future work, we plan to evaluate the dashboard as part of a diverse range of educational courses such as data science, social science, and management science. Furthermore, the sample size of this study was also not sufficiently large for reliable conclusions, and in the future, we plan to increase the sample size along with widening the range of courses. In terms of data collection, we note that student data on reading of the study material were not properly tracked. Unlike video lecture logs, it is not possible to determine the time spent on reading study material, which may cause inaccurate recommendations to be generated. The recommendation generation algorithm suggests activities that can help students to reach the desired performance, but does not offer a timeline for completing a particular activity. It therefore lacks a time management or scheduling functionality, which would help students to complete their activities on time.

In future work, it would be interesting to determine which types of recommendations (e.g., exercises, reading material, videos) are more actionable and can help students to reach their desired goals. In the same vein, it would also be interesting to conduct a detailed analysis of tunable DICE parameters, such as proximity and diversity weights. Furthermore, we plan to provide support to students in their daily learning behaviour and tasks, such as daily activity and behaviours, conceptual difficulties, and a comparison of their progress with others in the class, which would help students to visualise their daily improvements. Similarly, when we deploy the dashboard over the long term at the institution level, we intend to include previous dashboard usage (e.g., recommendations followed, concept-based knowledge) in the set of student features to generate more insightful feedback and recommendations for students. Another direction for future work would be to create a dashboard for teachers that could be useful in pedagogical practice.

6 Conclusion

In this paper, we propose an explainable AI-based approach together with a CF-based algorithm that uses students’ LMS data and generates recommendations at the student level. This approach opens the way for intelligent learning systems that automatically provide students with effective, data-driven recommendations. A dashboard was developed based on the proposed approach, and was evaluated in a real educational setting to determine its effects on students’ performance. The results showed that dashboard usage positively affected student performance, and that the generated recommendations were significantly correlated with student performance. The results of a survey revealed that the dashboard supported the students in self-regulating their studies by helping them with goal setting, planning, monitoring, and self-reflecting on their learning. In the future, we will perform experiments on a diverse range of courses to ensure that the proposed approach is generalisable and to determine which type of recommendations (e.g., exercises, reading material, videos) are more actionable.