1 Introduction

Educational institutions generate large amounts of data with the adoption of student information systems (SIS), learning management systems (LMS), and other technologies. These massive data sets provide an untapped potential to assist and improve decision-making and operations. A challenge is to process such vast data sets into valuable, actionable, and useful insights, and present this information to various stakeholders in meaningful ways (Schwendimann et al., 2017). The educational data mining (EDM) (Romero & Ventura, 2020) and learning analytics (LA) (Clow, 2013) fields consequently emerged to make use of such data sets generated from the educational domain to improve outcomes.

An area that has gained attention in research is to predict students’ academic performance (Abu Saa et al., 2019; Alalawi et al., 2023; Christy & Rama, 2016; Kumar et al., 2017; Roy & Garg, 2017; Shahiri & Husain, 2015). Data mining (DM) and machine learning (ML) techniques have shown success in developing predictive models based on large data sets (Alalawi et al., 2023; Albreiki et al., 2021). The ability to predict students’ performance and identify at-risk students early on can be crucial, as effective interventions can be targeted to improve educational outcomes (Kovacic, 2010).

There are many studies that focus on predicting student performance, but comparatively fewer studies that take action based on these predictions. Learning analytics interventions (LAIs) have emerged as an approach that aims to address this gap (Larrabee Sønderlund et al., 2019; Rienties et al., 2017; Wong & Li, 2020). In LAI studies, students’ risk levels are predicted early during the academic term, typically using EDM predictive models and disseminated to stakeholders (instructors, administrators, and students) as actionable insights. Typically, educators lead interventions for at-risk students to improve outcomes. The specific nature of the interventions is left to the discretion of the educator, instructor and/or administrator, who are aware of the learning context and student cohort and have the discretion to make decisions on interventions. LA tools such as learning analytics dashboards (LADs) and personalised communication tools are used to facilitate interventions. Finally, LAI studies evaluate the impact of these interventions.

A number of LAI studies have led to improved outcomes, such as improved pass rates, retention, and grades (Arnold & Pistilli, 2012; Baneres et al., 2019; Borrella et al., 2022; Burgos et al., 2018; Figueroa-Cañas & Sancho-Vinuesa, 2021; Jayaprakash et al., 2014; Milliron et al., 2014; and others). Although LAI studies have shown significant positive impacts, LAIs are not widely used in the education domain. Our analysis has found that the lack of access to the LAI infrastructure to facilitate interventions is one of the main obstacles that constrains educators from piloting LAIs in their courses. Developing LAI infrastructure in an institution requires significant effort and investment to access various data sets from its information systems to develop institution-specific student predictive models and related LA tools.

This paper presents an LAI framework, termed Student Performance Prediction and Action (SPPA), which allows educators to pilot LAIs in their courses without the need for large-scale institutional-level investments, thereby addressing the main obstacle that prevents the uptake of LAIs. Educators generate course-specific predictive models using historical continuous assessment data for the course, which is typically accessible to educators. SPPA leverages ML algorithms to develop predictive models that can accurately identify at-risk students early in the course, enabling timely and personalised interventions. LADs are used to review student progress and risk levels. Furthermore, SPPA incorporates sound pedagogical principles in course design and interventions to assist in identifying gaps in students’ knowledge and guide effective interventions. The conceptual design of SPPA was first presented in Alalawi, Athauda, and Chiong (2021a).

This paper presents the evaluation of SPPA by academics who used it to provide LAIs in a large real-world course setting. Academics have led effective LAIs using SPPA resulting in improved outcomes (improved pass rates and final grades). This study presents a number of contributions: (i) Firstly, the evaluation of SPPA demonstrates evidence of SPPA’s ability to facilitate effective LAIs; (ii) Secondly, academics who participated in the study provide positive feedback, which further strengthens SPPA’s approach to facilitating LAIs and its uptake by academics; (iii) Finally, this study depicts SPPA’s pioneering self-service model for LAIs whereby academics can pilot LAIs in their courses without the need for large-scale institutional investment, addressing one of the main obstacles for the uptake in LAIs. We believe that SPPA shows promise as a potential catalyst to facilitate mainstream adoption of LAIs.

The rest of this paper is organised as follows: Sect. 2 presents related work. Section 3 describes the design choices of SPPA to enable academic-led LAIs in their courses. Section 4 presents the research questions, study design and context of the pilot study. Section 5 presents the results from the study addressing the research questions. A summary of our findings, implications and limitations of the study are presented in Sect. 6. Finally, the paper is concluded with a view of future research directions in Sect. 7.

2 Related work

In the literature, there are many studies that focus on predicting student performance, specifically on methods and techniques such as ML models and EDM techniques, data sets and features, aims of prediction (e.g. predict students at risk of failing, dropping out, time to graduate) and others (Abu Saa et al., 2019; Alalawi et al., 2023; Christy & Rama, 2016; Kumar et al., 2017; Roy & Garg, 2017; Shahiri & Husain, 2015). Alalawi et al. (2023) reviewed 162 studies that used ML techniques to predict student performance between 2010 and 2022. Comparatively, there are limited LAI studies that take actions based on these prediction results (Foster & Francis, 2020; Larrabee Sønderlund et al., 2019; Wong & Li, 2020).

A pioneering study in LAI is the Course Signals (CS) project at Purdue University (Arnold & Pistilli, 2012). In CS, predictive models detect at-risk students using four data sources: current course grades, LMS engagement data, past academic history such as high school grade point averages, and demographic data. Based on the outcomes of these predictive models, the CS system categorises students into three groups: “red” signals indicate that students are likely to fail, “yellow” signals suggest students might fail, and “green” signals means that students are likely to pass their course. Academics create an intervention strategy based on the outcomes of the students’ signals, which may consist of posting the students’ signals on the students’ LMS home pages, sending them customised emails or texts, referring them to academic advisers or academic resource centres, or scheduling face-to-face meetings. The CS system was used at Purdue in the 2007, 2008 and 2009 cohorts to assess its effects on student performance and retention (Arnold & Pistilli, 2012). Across several different cohorts at Purdue, the approach increased both the success and retention rates.

The Open Academic Analytics Initiative (OAAI) project by Jayaprakash et al. (2014) takes a similar approach to CS. They investigated portability of predictive models across multiple institutions. Also, two different intervention strategies were evaluated for at-risk students. The intervention groups have shown statistically significant improvement in final grades.

Milliron et al. (2014) described three case studies where Civitas Learning’s Illume platform was used to predict and provide interventions for at-risk students in three different institutions. Data from the SIS and LMS were used to develop institution-specific predictive models. A number of tools and apps were used to provide action analytics to facilitate interventions. The case studies were deployed over multiple teaching periods with increasing student numbers and courses in the three institutions. Over time with multiple iterations of deployments and interventions, statistically significant improvements in retention were achieved. A number of insights into LAIs were presented, including the need for institution-specific predictive models and the need to fine-tune outreach over multiple iterations.

A group of academics at a fully online university—Universitat Oberta de Catalunya (UOC)—have been working to create an early warning system (EWS) to identify students who are at risk of failing. Similar to CS, the EWS employs a variety of classifiers that use data from the UOC data mart and dashboards to present prediction results and related information to facilitate interventions. The EWS and interventions were evaluated in a variety of courses at UOC (Baneres et al., 2019; Baneres, Rodríguez et al., 2020; Guerrero-Roldán, Rodríguez-González, Baneres, Elasri-Ejjaberi, & Cortadas, 2021; Figueroa-Cañas & Sancho-Vinuesa, 2021; Rodríguez et al., 2022; Baneres et al., 2023). Results are encouraging, showing higher pass rates, lower dropout rates and positive student and teacher acceptance.

Cogliano et al. (2022) examined the impact of a learning analytics-driven prediction model combined with digital self-regulated learning (SRL) skills training program on undergraduate biology students. The research focused on identifying students at risk of poor performance and providing timely interventions to improve their academic outcomes. The prediction model was created using logistic regression with students’ prior knowledge scores and LMS log data from the first two weeks of the course. This model flagged students likely to earn a grade of C or worse. These flagged students were then randomised into two groups: a treatment group that received SRL training and a control group that did not. The SRL training was a brief, digital intervention designed to improve learning strategies. The study found that students who received the training performed significantly better on exams and the final course grade compared to those in the control group, effectively closing the achievement gap between at-risk and high-performing students.

Kew and Tasir (2022) explored an LA intervention aimed at predicting and improving student outcomes in e-learning. Using a pre-test–post-test design with 50 undergraduate students, the study integrated the Felder-Silverman learning style model and Keller’s ARCS motivational model to create personalised learning objects (LOs). The approach involves identifying at-risk students through ML techniques, analysing e-learning log files to assess motivation, engagement, retention, and academic performance. Personalised LOs tailored to individual learning styles and motivational needs are introduced every two weeks to enhance students’ motivation, cognitive engagement, cognitive retention, and academic performance. The results of the LA intervention significantly improved students’ learning performance with a large effect size (Cohen’s d = 5.669).

Utamachant et al. (2023) proposed i-Ntervene, a platform integrating LMS, automatic code graders, and learning analytics for instructional interventions in programming courses. The platform identifies at-risk students by analysing engagement and understanding levels using data from class attendance, in-class engagement, assignment practices, and self-study. The platform aims to address high failure rates in programming courses by providing actionable insights for timely interventions. Students with a high activity gaps or low understanding levels are flagged as at-risk students. Instructors receive lists of at-risk students along with visualisations of their activity trends and understanding levels. Based on this data, instructors decide on appropriate interventions, such as motivational emails or tutorial sessions. The platform tracks the effectiveness of these interventions by comparing pre- and post-intervention data. In a Java course with 253 students, instructors performed 12 interventions and found that extrinsic motivation emails had more impact in promoting learning behaviour compared to other types of messages and providing tutorial sessions was not an effective approach to improving students’ subject understanding in complex algorithmic topics.

While many LAI studies have focused on the use of predictive models to detect at-risk student and provide interventions (e.g. Burgos et al., 2018; Choi et al., 2018; Lu et al., 2018; Espinoza & Genna, 2021; Wang et al., 2022; and others), other researchers have focused on interventions to improve educational outcomes using LA and other strategies. These efforts do not classify students risk levels and target at-risk students, rather focus on developing and evaluating interventions strategies to improve outcomes especially improve SRL in students using LA and other techniques.

Cavus Ezin and Yilmaz (2023) used a mobile application-based learning environment to deliver learning analytics feedback to students. Data such as logins, quiz scores, and content views were collected and visualised, with personalised feedback sent to the experimental group. This intervention aimed to enhance academic achievements and SRL skills. The results showed significant improvements in these areas for the experimental group but no significant difference in motivation.

Yang and Ogata (2023) explored the impact of a personalised learning analytics intervention using the BookRoll ebook browsing system and a recommendation system to enhance student learning in blended learning. BookRoll tracks student interactions, while the recommendation system provides personalised feedback based on historical learning data and then suggest actions like increased reading time and use of highlights and memos. Association rule mining (ARM) identifies patterns between engagement and achievements to inform recommendations. A quasi-experiment showed that those receiving personalised interventions significantly outperformed the control group in learning achievements (mean post-test scores: 87.36 vs. 76.68) and behavioural engagement.

Ustun et al. (2023) examined the impact of LA-based interventions on students’ academic performance and SRL skills. Over a 10-week period, students received weekly personalised visual feedback and written recommendations based on their LMS log data, which included login records, video views, and test scores. These interventions aimed to enhance student engagement and performance. The study demonstrated significant improvements in both academic achievement and SRL skills within the experimental group.

Borrella et al. (2022) evaluated four types of interventions (A, B, C, D) in massive open online courses (MOOCs) to reduce dropouts. MOOCs have a high prevalence to dropouts. They identified that interventions A and B, which provided encouragement emails before assessments and exam preparation materials, did not result in statistically significant improvements in dropout rates. However, interventions C and D, which identified assessments and topics perceived to be difficult by students and revised the assessments and contents perceived to be difficult using a didactic scaffolding approach, resulted in a statistically significant reduction in dropout rates.

In summary, LAI studies have shown significant potential to improve educational outcomes. Given that the pioneering LAI study on CS was published in 2012, well over a decade ago, and with the growing evidence of positive impacts by LAI shown by subsequent LAI studies, we are yet to see general uptake and wide-scale adoption of LAIs in the education domain. Analysing the reasons as to the slow adoption of LAIs led to a number of challenges and obstacles that restrain educators from piloting LAIs in their courses being identified.

Firstly, LAIs need to identify students for interventions (especially for large class sizes), typically using predictive models, which require accessing a diverse set of data such as student assessment data, student demographic data, and student engagement data collected across many systems including SIS, LMS and others. Next, the development of predictive models requires input from EDM specialists to develop institution specific predictive models (Milliron et al., 2014). They also require LA tools such as LADs to disseminate student risk levels to stakeholders and decision-makers as well as tools to provide personalised communication. It is evident that without significant prior buy-in and investments by the institution on LAI infrastructure, educators are unable to pilot LAIs in their courses. Furthermore, as evaluated in Borrella et al. (2022), not all interventions are successful. The interventions themselves are under the discretion of the academics and the effectiveness of these interventions may vary based on academics’ ability to provide effective interventions, learning contexts and other factors. Indeed, providing effective interventions is considered the greatest challenge in LA (Milliron et al., 2014; Rienties et al., 2017; Wong & Li, 2020). Therefore, these reasons have led to a situation whereby although LAIs have the potential to have significant impact, the risks in successful interventions and the need for investments to develop institution-specific predictive models have held back institutions from investing in LAI infrastructure and thus academics from piloting LAIs.

This paper presents the evaluation of the SPPA framework, which allows academics to pilot LAIs in their courses. SPPA follows a similar approach to existing LAI studies where students’ risk-levels in a course are predicted early on providing actionable insights to academics for interventions. SPPA is also distinct from previous approaches in that academics can seamlessly access and use its features to provide LAIs without the need for institutional-level investment in LAI infrastructure, addressing a critical obstacle to widescale uptake in LAIs. In SPPA, academics use historical continuous assessment data to develop course-specific predictive models, and use these insights to identify at-risk students for interventions. This approach avoids the need to integrate multiple data sets from different sources to develop predictive models. SPPA pioneers a new self-service model for LAIs by academics. Furthermore, SPPA incorporates sound pedagogical principles throughout the course’s life cycle with the aim to facilitate effective interventions (including subsequent course iterations). This study evaluates SPPA by academics piloting LAIs in a large undergraduate course at the University of Newcastle, Australia.

3 SPPA’s design decisions

This section presents the design decisions in developing SPPA that allow academics to pilot LAIs for their own courses. The conceptual design of SPPA is presented in Alalawi et al. (2021a).

3.1 Ease of access, usability and extendibility

One of the critical design considerations in SPPA was to provide seamless access to educators. Academics are time-poor and ease of use is critical for success. SPPA is hosted on the web and is accessible via a web browser without needing to install or configure any software. SPPA is also available as an open-source projectFootnote 1 with access to source code for extensibility and hosting on an institution’s own IT infrastructure. In designing SPPA, educators were consulted to ensure that the workflows seamlessly integrate to existing teaching workflows and that user interfaces are intuitive and easy-to-use by educators. User manuals are also available for educators as a quick reference.

3.2 Predictive models

A critical factor in LAI infrastructure is to develop predictive models using various data sources (student demographic, student performance, engagement and others) (Arnold & Pistilli, 2012; Baneres et al., 2019; Milliron et al., 2014). Access to such data sets requires institution-wide efforts accessing various IT systems, which is typically inaccessible to individual educators. SPPA avoids the need to use different data sets by developing course-specific predictive models using historical continuous assessment data of the course. These data sets are typically accessible to educators. These data sets are used to develop ML predictive models that can predict students’ risk levels after each continuous assessment. The predictive models provide educators with crucial early insights on students who are at risk of underperforming, failing, or potentially dropping out, allowing for timely and targeted interventions. Our previous work has shown that historical data on continuous assessments in a course can be used to create effective high-performing predictive models (Alalawi, Chiong, & Athauda, 2021b). This approach allows us to develop ML predictive models without the need for institutional access to a variety of data sources.

Next, SPPA avoids the need for educators to be experts in EDM or ML. SPPA allows educators to upload historical assessment data for a course intuitively (in a pre-defined format), which then develops predictive models using well-known ML algorithms and selects the highest performing models for prediction after each continuous assessment by default. Our previous work (Alalawi et al., 2023) identified the ten most popular ML algorithms and techniques applied for predicting student performance in the literature. SPPA implements five of these algorithms used to develop predictive models of student performance, including the Logistic Regression (LR), Support Vector Machine (SVM), Decision Tree (DT), k-Nearest Neighbours (KNN) and Naïve Bayes (NB).

LR is an ML method for binary classification that predicts the likelihood of an outcome using the logistic function. It is known for its simplicity, interpretability, and efficacy, and works best under the assumption of linearity. However, it may underperform on complex tasks. LR also offers probability estimates and supports regularisation to avoid overfitting (Hosmer et al. Jr, 2013; James et al., 2013).

The SVM is a supervised learning model utilised for classification and regression tasks. It operates by locating the hyperplane that best separates the data into different classes and is particularly effective in high-dimensional spaces. However, the SVM model can be less effective on large data sets and it is sensitive to the choice of kernel (Bishop & Nasrabadi, 2006; Cortes & Vapnik, 1995).

DTs are interpretable models that use series of sequential tests for classification. They are easy to comprehend and use, but they are susceptible to overfitting. When used in ensembles, DTs require more storage and computation, making them less comprehensible (Kotsiantis, 2013).

The KNN model is a non-parametric classification and regression method that predicts outcomes based on the majority class of its k nearest neighbours in the feature space. It is simple to implement and versatile. However, its performance can degrade with high-dimensional data and unbalanced class distributions (Alpaydin, 2020; Hastie et al., 2009).

NB is a simple yet highly effective probabilistic classifier that operates under the assumption of feature independence. This simplicity makes it particularly efficient and easy to implement, especially when dealing with large data sets. However, its performance can be impacted in complex data scenarios due to inaccurate probability estimates arising from its independence assumption (Rish, 2001).

These algorithms were selected for their computational efficiency and effectiveness. Their use in SPPA is particularly advantageous for web-based applications, where it is crucial to balance model complexity with the web server’s computation and training constraints.

We evaluate the ML models using various metrics, with a focus on accuracy, precision, recall, and F-measure (Bishop & Nasrabadi, 2006). These metrics are particularly valuable in the context of identifying at-risk students due to their direct relevance to practical outcomes. The parameters of these evaluation metrics (true positive (TP), true negative (TN), false positive (FP), and false negative (FN)) are calculated based on the confusion matrix. They are defined as follows:

$$\:Accuracy=\frac{TN+TP}{TN+FP+FN+TP}$$
$$\:Precision=\frac{TP}{TP+FP}$$
$$\:Recall=\frac{TP}{TP+FN}$$
$$\:F-Measure=\frac{2*\left(Recall\:*\:Precision\right)}{(Recall\:+\:Precision)}$$

Accuracy provides an overall measure of correctness in predictions. It is the ability to predict both at-risk and not at-risk students correctly in relation to all predictions.

Precision measures the proportion of correctly identified at-risk students among all predicted positive cases. It ensures that interventions are targeted efficiently, minimising the risk of false alarms and unnecessary interventions (Sokolova & Lapalme, 2009).

Similarly, recall measures the proportion of actual at-risk students correctly identified by the model. In the context of student intervention, maximising recall helps in identifying all students in need, thereby guaranteeing that no student is overlooked (Sokolova & Lapalme, 2009).

Furthermore, F-measure, which represents the combined mean of precision and recall, offers a balanced assessment of the classifier’s performance. This metric is particularly useful in situations where both precision and recall are equally important, as it provides a single score that considers both aspects simultaneously (Sokolova & Lapalme, 2009).

In SPPA, the historical continuous assessment data sets for a course are split with 70% allocated for training and 30% for testing. The training data is used to develop predictive models while the testing data is used to evaluate the predictive models. In developing the predictive models, 5-fold cross-validation is employed to determine the optimal hyperparameters for each ML algorithm. This process involved dividing the data into five subsets, training the models on four subsets, and validating them on the fifth subset. This process was repeated five times, each time using a different subset for validation. With these optimal hyperparameters, the predictive models are constructed. Figure 1 presents the pseudocode for training, cross-validating, and evaluating the ML models.

Fig. 1
figure 1

Pseudocode for training, cross-validating, and evaluating ML models

In this study, a final overall score of less than 50% is considered a failure grade. Binary predictive models, which predict if a student is “likely to pass” (i.e. grade > = 50%) or “likely to fail” (grade < 50%), are constructed. Multi-class predictive models, which predict if the student is “likely to fail” (grade < 45%), “borderline” (45% <= grade < = 55%) or “likely to pass” (grade > 55%), are also created after each continuous assessment in the course. The best performing models for binary and multi-class predictive models are selected to predict students’ risk levels. A student who is classified as “likely to fail” in a binary predictive model or “likely to fail” or “borderline” in a multi-class predictive model is considered to be “at risk”. The predictive models are created during the course design phase before the course is delivered.

3.3 Tools to facilitate interventions

During the delivery of a course, educators can upload assessment scores of current students to predict their risk levels. LADs are used to review student risk levels and provide personalised interventions. The assessment data and risk levels can be downloaded as an Excel/CSV file to determine groups for interventions by educators. Familiar tools such as filtering and Mail Merge are used by educators to select groups of students and send personalised intervention emails.

3.4 Guiding effective course design, interventions, and evaluations

Similar to other LAI studies, the specific interventions themselves are at the discretion of the educators in SPPA, who are familiar with the learning context and student cohort. However, SPPA aims to facilitate and guide educators to provide effective interventions using sound pedagogical principles and approaches.

SPPA uses Biggs’s Constructive Alignment (CA) principles (Biggs, 1999) to create a CA mapping model for the course during the course design phase. In CA, a course’s learning outcomes (CLOs) are mapped to assessment tasks (ATs) and teaching & learning activities (TLAs). This approach ensures that educators reflect and design their course using CA principles.

The CA mapping model is used during interventions and course evaluation phases. During the course delivery phase, the CA mapping model can be used to identify gaps in students’ knowledge and skills. Furthermore, personalised study/revision plans can be generated based on the knowledge/skills gaps. For instance, if a student performs poorly in an assessment or a sub-task for an assessment, the CA mapping model can identify TLAs mapped to the assessment (or sub-task) (i.e. identify knowledge gaps). Next, the mapped TLAs can be used to provide personalised study plans to the student. Such information can be crucial in providing effective feedback (Hattie & Timperley, 2007). SPPA generates personalised feedback reports for students following Hattie and Timperley (2007)’s model for effective feedback. These reports are available to educators to use during interventions.

The CA mapping model also assists during course evaluation by identifying ATs with poor performance for review and potential revision for future offerings. In general, if students are performing poorly in a particular AT indicates students’ struggling in the particular AT or students finding the TLAs mapped to the particular AT as challenging. These insights can help academics to review and revise ATs and TLAs for course improvements in future offerings.

4 Evaluation design: Aims, research questions, study design and context

To evaluate SPPA, we first recruited academics to pilot LAIs in their courses using SPPA. The SPPA was used by academics in the course design phase to create the CA mapping model, develop course specific predictive models and during the course delivery to provide interventions to at-risk students. We evaluate the effectiveness of interventions and also gather views from academics. This study aims to (i) identify SPPA’s ability to identify at-risk students, (ii) identify effectiveness of interventions by academics using SPPA, and (iii) identify the views and uptake of SPPA by academics. The following research questions are posed in this study:

  • RQ1: How precise are SPPA’s predictive models in identifying students at risk of failure?

  • RQ2: To what extent did the interventions by academics using SPPA impact the performance of at risk of failure students, if any?

  • RQ3: To what extent did the SPPA impact the performance of the overall student cohort, if any?

  • RQ4. What are the views and uptake of academics who used SPPA?

If the predictive models are not able to identify at-risk students accurately, then the academics will not be able to effectively target students who need assistance. RQ1 enables us to verify whether the use of continuous assessment data to develop predictive models in SPPA is effective in accurately identifying at-risk students.

The academics provide interventions to identified at-risk students using SPPA. RQ2 evaluates the effectiveness of the interventions provided by academics using SPPA. Using SPPA can have impact beyond just the intervened students; it has impact on the course design by using CA principles and also on academics being aware of students’ progress in terms of at-risk cohorts. RQ3 evaluates the impact SPPA has had on the entire cohort, which includes students who were not intervened as well.

Academics are key to widespread adoption and uptake of LAIs. Their views are important to the uptake in SPPA-led LAIs. RQ4 analyses academics’ views on the use of SPPA to provide LAIs in their courses.

4.1 Study context

Two academics (A1 and A2) were recruited as research participants. The academics were introduced to SPPA, its workflows and given access credentials. The academics decided to pilot LAIs in a large first year undergraduate course (Course A) in Computing and IT. A1 was the lecturer and course coordinator for Course A while A2 was the tutor for the labs. Details of Course A and how it was used in the study are described below:

  • Course A: Course A introduces web technologies and the fundamental concepts of Internet architecture and how they support the massive growth and varied uses of the medium. The course is designed to give students a sound understanding of the potential as well as the limitations of web technologies. Course A ran for 15 weeks, and students are required to attend 12 × 2-hour lectures and 11 × 2-hour laboratory practical sessions during the semester. As part of the course activities, students were requested to prepare for every practical session by completing pre-laboratory activities online in the LMS. Students were also expected to reinforce their learning in the lectures by completing weekly online quizzes in the LMS. Overall, Course A included four summative assessments whose marks contributed to the final course grade: a mid-term exam (worth 15% of the final grade and due in week 6), project 1: a web-based assignment 1 (worth 20% of the final grade and due in week 8), project 2: a web-based assignment 2 (worth 25% of the final grade and due in week 11), and a final exam (worth 40% of final grade and undertaken during the exam period).

  • CA Mapping Model: The academics outlined their design for Course A mapping the CLOs, TLAs, and ATs following CA (Biggs, 1999). A template was provided to the academics to help them follow CA principles. Course A had four CLOs mapped to ten TLAs, and the four ATs.

  • Predictive Models: The academics uploaded 497 students’ continuous assessment records from hisrotical cohorts (2017–2020) to develop the student performance predictve models. The predictive models were developed using data from 497 student records collected over the period from 2017 to 2020. Given that the analysis primarily focused on assessment data, which offered a limited set of features, no feature engineering processes were applied. Table 1 provides a breakdown of the pass/fail distribution of the data set by year (2017–2020). It shows the counts and percentages of students who passed and failed the course each year. For instance, in 2017, out of 29 students, 7 failed (24.1%), while 22 passed (75.9%), totaling 29 students. Similar distributions are provided for the years 2018 to 2020, with the corresponding percentages within each year and the total count. Table 2 illustrates the grades distribution of the data set by year (2017–2020). It presents the counts and percentages of students achieving different grades (F, P, C, D, HD) each year. For example, in 2017, out of 29 students, 7 received an F (24.1%), 7 received a P (24.1%), 5 received a C (17.2%), 6 received a D (20.7%), and 4 received an HD (13.8%), totaling 29 students. Similar distributions are provided for the years 2018 to 2020, along with percentages within each year and the total count.

  • Instructor-led interventions: The academics used predictive models to identify at-risk students after the mid-term exam in week 6 and after project 2 in week 11. Once at-risk students were identified, the academics intervened by sending them personalised emails.

Table 1 Pass/Fail distribution of data set by Year (2017–2020)
Table 2 Grades distribution of data set by Year (2017–2020)

4.2 Study design

This section discusses the study design for RQs 1–4:

  • Study Design for RQ1: To answer RQ1, the four performance metrics discussed in Sect. 3.2 were used: accuracy, F-measure, recall, and precision. By default, SPPA selects the predictive model with the highest accuracy first, followed by F-measure, recall and precision, respectively. The five ML algorithms (i.e. LR, SVM, DT, KNN, and NB) are used to develop both binary and multi-class classification models after each continuous assessment in the course and evaluated using the performance metrics. See further details in Sect. 5.1.

  • Study Design for RQ2: RQ2 aims to determine the extent of any impact due to SPPA-led interventions by academics. A quasi-experimental approach was used to determine the impact by comparing the outcomes of a control and an experimental group. The control group consisted of students from a previous cohort who were predicted to be at-risk using the predictive models but were not subject to SPPA intervention. Their results were compared to the results of the students in the experimental cohort who received the SPPA-led interventions. Specifically, we compared the pass rates, fail rates, withdrawal rates and average grades of the two groups. We used a chi-square (χ²) contingency test (Rao & Scott, 1984) to determine whether there was a significant difference between the two groups’ pass, fail, and withdrawal rates. We compared the mean grades of the two groups using a t-test.

  • Study Design for RQ3: The adoption of SPPA also shaped the design of the course via CA mapping and informed the academics of student performance through the academic dashboard. As a result, we propose RQ3 to explore whether there was any observable impact on the entire cohort in comparison to historical cohorts. To identify any impact, we identified a control group from a past cohort of students who were not exposed to SPPA and had similar characteristics to the experimental group. Next, we compared the pass rates, fail rates, withdrawal rates and average grades between the two groups. To obtain a similar control group to the experimental group, Propensity Score Matching (PSM) (Austin, 2011; Rosenbaum & Rubin, 1983) was used. PSM is used in situations when randomised controlled trials are not possible, as in the current context. In particular, PSM attempts to remove bias by identifying control and treatment groups from the study cohort in such a way that students had a similar probability of being assigned to the control and treatment group based on a set of baseline characteristic variables (Lim et al., 2021; Mojarad et al., 2018).

  • Study Design for RQ4: We interviewed the academic participants to obtain their perspectives on using SPPA to pilot LAIs. The interviews were transcribed and classified into themes and concepts (Lune & Berg, 2017). Thematic analysis was used to interpret and analyse the qualitative interview data due to its flexibility to be applied across a variety of epistemological and theoretical approaches and its capacity to detect emerging themes (Braun & Clarke, 2006). Relevant ethics approval was obtained from the university’s Human Ethics Research Committee prior to conducting the study.

5 Results

This section outlines the results of our evaluation.

5.1 RQ1: Performance of Prediction models

SPPA leverages ML algorithms to predict student performance, specifically identifying students at risk of failure and facilitating timely interventions. The predictive models in SPPA are generated after each continuous assessment. Historical data (i.e. students’ assessment marks) from the course’s cohorts between 2017 and 2020 (497 students’ records) was used to create the models. As discussed previously, two classes of predictive models were created for each assessment: (i) binary predictive models (pass, fail) and (ii) multi-class classification predictive models (pass, borderline, fail). The uploaded data was split into 70% for training and 30% for testing. SPPA uses 5-fold cross-validation for the training set to determine the settings of the models’ hyperparameters. Next, the ML predictive models’ performance was evaluated.

Evaluation metrics for each predictive model are presented in Table 3 for the binary models and Table 4 for the multi-class classification models. Note that the predictive models were created at three different points during the semester as continuous assessment results became available, first using the first assessment (mid-term exam), then, using the mid-term exam and project 1: web-based assignment 1, and finally using all three continuous assessments (mid-term exam, project 1: web-based assignment 1 and project 2: web-based assignment 2), and reported accordingly. The final exam was not considered in the predictive models because it is the last assessment conducted at the conclusion of the academic term and was therefore too late for intervention. Note that in Tables 3 and 4 the highest performance value for each metric among all algorithms is bolded.

Table 3 Performance metrics for binary classification
Table 4 Performance metrics for multi-class classification

The results of the binary models (see Table 3) indicate that LR has the best predictive performance after the first assessment in terms of accuracy, F-measure, and recall (0.869, 0.929 and 0.986, respectively). The KNN model had the highest precision at 0.904. For the second assessment (project 1: web-based assignment 1), NB is the best model with accuracy of 0.9, F-measure of 0.942 and precision of 0.931. LR had the best recall at 0.97. The KNN model was selected as the best performing predictive model after the third assessment (project 2: web-based assignment 2) with accuracy at 0.93, F-measure at 0.959 and recall at 0.962. NB had the best precision at 0.963.

In contrast, the results of the multi-class classification models (see Table 4) indicate that the LR multi-class classification model has the best predictive performance for the three continuous assessments.

SPPA, by default, selects the best predictive model based on accuracy first, followed by F-measure, recall and precision. Academics can override the default choice to choose any other predictive model if they wish when they perform the predictions. In this study, given the high performance of the predictive models, the academics opted to use the default selected models. We can observe that, as additional data became available (i.e. second and third assessments), the predictive models’ performance also increased. In this study, the academics considered students as “at-risk” if they were predicted as “likely to fail” in the binary models or “likely to fail” or “borderline” in the multi-class classification models and were provided interventions.

5.2 RQ2: Impact on At-Risk students

To determine the impacts on at-risk students, the control and experimental groups were selected. The experimental group consisted of students who were identified as at-risk of failure and were subject to academics’ interventions in the 2021 cohort of the course. For the control group, students in Course A from a previous cohort (2019) were selected. The 2020 cohort was not considered for the control group due to the impact of the Covid pandemic and many changes that occurred (such as moving to online mode). The best performing predictive models were used (see RQ1 results in Section 5.1) to identify students at risk of failure, with the students in the control group not exposed to SPPA or instructor interventions. Students’ risk levels were predicted two times during the semester (in week 6 after mid-term exam and week 11 after project 2: assignment 2) to identify at-risk students for the control and experimental groups.

In the first prediction after mid-term test in week 6, 37 and 16 students were predicted at risk of failure in the experimental and control groups, respectively. In the second prediction after project 2: assignment 2 in week 11, 38 and 52 students were predicted as at risk of failure in the experimental and control groups, respectively. Across these two sets of predicted students at risk, there were 57 unique students in the experimental group and 55 unique students in the control group. Table 5 provides the outcomes for these at-risk students in each group, while Table 6 presents the results of the chi-squared tests comparing the results in Table 5.

Table 5 Treatment and control groups for at-risk students
Table 6 Students’ performance for the experimental and control groups

An overall chi-square (χ²) contingency test was used to compare the performance rates (i.e. pass, failure, and withdrawal rates) across the two groups. There is strong evidence of an association between the grade outcomes (pass, fail, and withdrawal) and the treatment with a test statistic of χ² =24.258, and a corresponding p-value of 0.000005. Chi-Square post hoc tests between students’ pass, failure, and withdrawal rates of the treatment and control groups were carried out and tested against a Bonferroni-adjusted alpha level of 0.00833 (0.05/6) (Beasley & Schumacker, 1995; Garcia-Perez & Nunez-Anton, 2003). As seen in Table 6, pass and failure rates’ results were statistically different between the control and experimental groups, while the withdrawal rate’s results were not statistically different.

Specifically, the experimental group had a significantly higher pass rate of 45.5%, compared to just 10.9% in the control group, yielding a p-value less than 0.0001. Correspondingly, the experimental group had a significantly lower failure rate of 28% compared to 72.7% in the control group, yielding a p-value less than 0.00001. There was no significant difference between the withdrawal rates of the two cohorts, with a p-value of 0.2005.

The significant differences in passing and failure rates between the two groups are reflected in the overall grade distributions. The experimental group had a mean grade of 41.1557 with a standard deviation of 19.38583, compared to a mean of 32.6362 and a standard deviation of 16.20588 for the control group. An independent t-test indicated the population mean difference was statistically significant (t (80.247) = 2.225, p = 0.029).

Collectively, these results showed a significant positive impact in the experimental groups compared to the control group, with higher pass rates, lower failure rates, and a higher mean grade.

5.3 RQ3: Impact on overall cohort

To answer, RQ3, we evaluated the impact of SPPA on Course A’s cohort in 2021. The control group was selected using PSM (Austin, 2011; Rosenbaum & Rubin, 1983) from the Course A cohort from 2019 (n = 236) with similar characteristics to the experimental group, which is the Course A cohort in 2021 who were subject to the SPPA framework (n = 248). Matching was performed using four baseline students characteristics: gender, program entry score, age, and citizenship (whether the student is domestic or international) similar to Lim et al. (2021). Program entry score (UAC rank between 0 and 99) is the percentile score used for Australian University admissions representing how well Year 12 (secondary school) students have performed in their examinations compared to their peers. A zero was entered for the program entry score for students who were admitted using other pathways.

The propensity score of students is calculated using LR, with a binary outcome of whether the student is in the experimental group and independent attributes being the four baseline characteristics stated above. The PSM produced two equal-sized treatment and control groups (n = 248 in each group). Propensity score matching is carried out as a one-to-one match using the nearest neighbour method, which locates the closest match based on the distance between propensity scores. Therefore, a control match is chosen as the individual with the closest propensity score for each experimental subject. Table 7 provides an overview comparison of the students’ demographic data (the baseline characteristics) of the two groups in Course A.

Table 7 An overview of the students’ composition in the Course A between control and experimental groups after propensity matching

Although PSM was used to identify similar cohorts of students in the control and experimental groups, there were variables beyond the control of the study in real-world settings. The course syllabus and assessment tasks remained the same complexity between the control (2019) and experimental (2021) cohorts. However, the course coordinator who taught Course A during 2019 had retired, and a new course coordinator and tutor delivered the course in 2021. Table 8 summarises the performance results.

Table 8 Students’ performance for the experimental and control groups

The chi-square (χ²) contingency test is used to compare the performance rates (i.e. pass, failure, and withdrawal rates) across the two groups. There is strong evidence of an association between the grade outcomes (pass, fail, and withdrawal) and the treatment: χ² (df = 2, N = 496) = 15.376, p = 0.000458. Chi-Square post hoc tests between students’ pass, failure, and withdrawal rates of the treatment and control groups were carried out and tested against a Bonferroni-adjusted alpha level of 0.00833 (0.05/6).

The experimental group had a significantly higher pass rate (61.7% in the experimental group vs. 46.4% in the control group), with a test statistic of χ² = 11.7 and a corresponding p-value of 0.0006.

Moreover, the failure rates of the experimental group were significantly lower (26.2% in the experimental group vs. 42.7% in the control group) leading to a test statistic of χ² =14.98 and a corresponding p-value of 0.0001.

There was no significant difference in the withdrawal rates of the two cohorts, with a test statistic of χ² = 0.18 and a corresponding p-value of 0.6745.

Despite the experimental group having significantly higher pass rates and significantly lower failure rates, the overall mean score of the two cohorts was not significantly different. The mean scores were 56.2661 and 54.4751 for the experimental and control groups, respectively, and the standard deviations were 27.95112 and 18.94695, respectively. Student’s t-test yields a test statistic of 0.785 and a p-value of 0.433, showing that the mean difference is not statistically significant.

Overall, there is a positive impact in the experimental group compared to the control group but not as much as of a margin as per the at-risk cohort. This is expected, given that in RQ2, the experimental group had targeted interventions for all participants in the experimental group which had a significant impact.

5.4 RQ4: Academics views and uptake

To address RQ4, at the end of the academic term, semi-structured interviews were carried out with the two academics who used the SPPA in Course A to gain a deeper understanding of academic perspectives and acceptance of the proposed system. These interviews helped to identify academics’ views on SPPA with regard to usefulness, features, impact and possible improvements. A thematic analysis (Braun & Clarke, 2006) was performed to identify and analyse themes and patterns. A number of themes and patterns emerged from our analysis.

  • Course Design using CA Mapping Process: The academics felt that course design with the CA mapping process helped them reflect on the course’s design and how it met the learning outcomes.

  • Student Performance Prediction: The academics viewed that the prediction feature was useful to detect students who need extra attention and/or personalised intervention.

  • Intervention through personalised feedback: Academics view the ability to provide personalised feedback for at-risk students as very useful.

  • Ease of use and usability: Academics perceive that the proposed system is intuitive and generally easy to use.

  • Impact: The academics perceive that the SPPA had a positive impact on students’ performance as interventions have triggered action in many cases. An interesting comment by academic A2 (see below) claims that the impact should be higher than the results show.

I think more than 50% of the students who fail the course, they actually did not attend any of the tests or assignments at all. That means those students will not have passed anyway, despite our intervention, because they were going to drop out anyway. So, if we look at the results closely, the impact of the intervention was actually higher than what the results show.

  • Suggestions for improvements: There were some suggestions for improvements. The academics felt that the CA mapping model feature could be improved. Academic A2 felt that entering the items to map for the CA mapping model was time-consuming and a better approach may be to upload a pre-filled template document rather than entering through the user interface. Also, academics felt that a feature to display students’ feedback via a student-based LADs may be useful.

Overall, it was evident that academics had a positive outlook and uptake of SPPA. They found it easy to use, made them reflect on the course design, found the prediction and interventions helpful, felt the framework made a positive impact on the student outcomes and even indicated positive sentiments to using SPPA in future.

6 Discussion

The study’s evaluation was able to address RQs 1–4. The results from addressing RQ1 demonstrated that the predictive models for both binary and multi-class classification performed at a high level even after the first continuous assessment in the course. Of course, as more continuous assessment data became available (i.e. the second and third assessments), the prediction models’ performance increased. This further demonstrates that using continuous assessment data to develop student performance predictive models is effective, adding further evidence to our previous work (Alalawi, Chiong, & Athauda, 2021b). The SPPA compares five predictive models created using ML algorithms and by default selects the best performing model (for binary and multi-class classification). This approach enables the academics to select the best performing model by default without the need for EDM expertise.

Educators used SPPA to provide LAIs to a cohort of students in a large undergraduate computing course, which resulted in significantly improved outcomes for students who were identified as being at risk of failure, with significantly higher pass rates (34.6%) and significantly lower failure rates (44.7%) as well as a significant increase in the mean marks (8.5%) compared to students predicted to be at risk compared to a prior cohort that had no intervention. There was even evidence of improvement among the entire cohort, with a significantly higher pass rate (15.3%) and lower failure rate (16.5%) among the experimental group in comparison to the control group. In addition to these success indicators, the academics who participated in the study identified strengths in course design, ease of use and usefulness of SPPA. Overall, academics had a positive outlook to using SPPA in their courses.

Our study’s findings closely align with those of previous research in the field of LAIs. Comparing our results to the CS project at Purdue University (Arnold & Pistilli, 2012), Jayaprakash et al.‘s work (2014) and Wang et al.‘s work (2022), it is evident that predictive models were pivotal in identifying at-risk students, leading to personalised interventions and enhanced student success. Notably, this study simplifies predictive model creation without the need for institution-wide investment in LAI infrastructure in comparison to previous studies.

A major obstacle to LAI and its adoption has been the lack of access to LAI infrastructure by educators to pilot LAIs in their courses. Previous studies required accessing data from various IT systems (such as SIS, LMS and others) institution-wide to develop predictive models. SPPA’s approach of providing LAI infrastructure to educators through a web application and using data sets accessible to educators to develop predictive models allow a self-service model for educators to pilot LAIs in their courses. Also, SPPA’s code base is available as an open-source project for further research, development and customisation with the ability to host SPPA on the institution’s IT infrastructure. This approach avoids the need for large-scale institutional investment in LAI infrastructure prior to piloting LAIs and has the potential to encourage uptake of LAIs by educators and institutions.

In dropout reduction efforts, Baneres et al. (2023) achieved a substantial reduction through personalised email interventions, while Borrella et al. (2022) found interventions A and B to be inconclusive. This highlights the variability in results during interventions, underscores the need for context-specific approaches and also the challenge in providing effective interventions. The current status-quo in LAIs is to allow educators to decide on interventions providing them with information on predicted students’ risk levels. In SPPA, this approach is extended by providing additional relevant information (e.g. students’ knowledge gaps; personalised study/revision plans) and guiding course design and interventions using well-established pedagogy principles and learning theories (e.g. use of CA; Hattie & Timperley’s model for effective feedback). The positive results from this study provide further evidence in support of SPPA’s approach of integrating pedagogy principles/learning theories with technology innovations to facilitate effective interventions.

An intervention may require the analysis of previous experience of what went well or not from a course iteration, and then develop interventions for future course iterations based on analysis and experience (e.g. interventions C and D in Borrella et al. (2022). Although multiple course iterations are not evaluated in this study, SPPA considers the whole life cycle of courses iteratively (course design, course delivery and course evaluation phases) envisaging a continuous improvement model for courses where the course is evaluated in its course evaluation phase with improvements and interventions for the next iteration.

Overall, this study has contributed to the LAI domain evaluating a novel framework for LAIs (i.e. SPPA) whereby educators can pilot LAIs without the need for significant institutional-level investments in LAI infrastructure addressing the main impediment for LAI uptake. This study evaluated the use of SPPA to provide LAIs in a large first year course resulting in improved educational outcomes. Academics who have used SPPA had a positive outlook and uptake on SPPA’s approach. SPPA addresses the challenge of providing effective interventions by educators, giving them access to relevant information (such as students’ knowledge gaps) and guiding course design, interventions and continuous improvements over multiple iterations of the course incorporating learning theories and pedagogy principles.

There are a number of implications from this study to the LAI field:

  • This study demonstrated the possibility for a self-service model to pilot LAIs by educators in their courses without the need for large-scale efforts/investment in LAI infrastructure. This approach addresses one of the main impediments for educators and institutions in piloting LAIs with the potential for increase uptake in LAIs.

  • Providing effective interventions is a challenge in LA (Milliron et al., 2014; Rienties et al., 2017; Wong & Li, 2020). The approach of SPPA to incorporate pedagogy principles in course design and interventions with technology artefacts to guide effective interventions provides further information for effective interventions by educators.

  • Courses are typically offered multiple times in different academic terms. Experience from one iteration of the course can be used to improve outcomes in the next iteration of the course. In the literature, LAI studies focus on interventions based on predicted students’ risk levels in a single course iteration. SPPA outlines a process whereby interventions can span multiple course iterations, where the course iteration is evaluated before the next iteration and interventions/improvements incorporated for the next iteration.

It is important to note a number of limitations of this study that needs to be kept in mind when generalising these findings.

  • Although SPPA provides a general approach to creating ML models from historical continuous assessment data where student risk can be predicted after each continuous assessment in the course, this approach has the assumption that the course is designed with a number of continuous assessments during the academic term. Although designing courses with continuous assessments to determine student learning milestones during the academic term is pedagogically a sound approach, this may not be applicable to all course contexts.

  • Also, this study’s evaluation of SPPA was based on a small sample size. This study recruited two academics and evaluated piloting LAIs in a single large course. Future work will need to evaluate SPPA in different learning contexts by many academics. These evaluations can provide further evidence to the effectiveness of SPPA’s approach and experience leading to further improvement and fine-tuning of the framework.

  • Furthermore, a longitudinal study in the future with multiple iterations of courses is needed to evaluate SPPA’s approach to facilitate a continuous improvement model in LAIs.

  • SPPA’s current approach to generating predictive models assumes that the structure of continuous assessments (i.e. number of assessments etc.) are similar to past course offerings where historical data is used for generating predictive models. Future research needs to consider incorporating predictive models with evolving features (Hou et al., 2022) to allow for predictive models to keep up with changing course assessment structures.

7 Conclusion

Student performance prediction is a research area that has gained much attention. Although there is a focus in the literature on developing predictive models, there are comparatively few studies that explore actions that can be taken based on prediction results. LAI studies aim to address this gap. In LAI studies, predictive models on student performance are created using related data sets, which in turn inform stakeholders (educational institutions, academics and students) to take appropriate actions. Typically, academics, armed with such insights, take pre-emptive actions, such as interventions focused on at-risk students to improve outcomes. In the literature, there are many LAI studies that demonstrate positive outcomes such as improved pass rates, retention and final grades.

Although LAI studies have shown positive outcomes, uptake in LAIs has been slow. Our analysis identified the need for significant institutional investment in LAI infrastructure prior to LAI studies as the main impediment to widespread adoption in LAIs. Another challenge is providing effective interventions with no guarantee of success.

This paper presented the evaluation of a novel LAI framework, termed SPPA. SPPA allows academics to pilot LAIs in their courses without the need for significant investment in LAI infrastructure, thereby addressing the main impediment for LAI’s uptake. This approach brings about a new self-serve LAI paradigm for educators. SPPA allows academics to access LAI infrastructure seamlessly using a web browser and create course-specific predictive models using historical continuous assessment data sets. SPPA seamlessly integrates LAIs into existing teaching workflows. SPPA also incorporates pedagogical approaches and learning theories in course design, delivery and evaluations with the intention to assist effective interventions and improve course outcomes. SPPA was evaluated by academics to provide LAIs in a large first year course. SPPA-led LAIs demonstrated significant improvements in pass rates, final grades, and reduced failure rates for at-risk students as well as positive impacts on the overall cohort. The academics had a positive outlook of SPPA, its features, impact, usefulness, and uptake.

There are a number of areas for future work that can be considered, including: (i) the integration of SPPA to LMS has the possibility for academics and student to seamlessly access SPPA and use its features from the LMS itself. This approach has the added advantage of SPPA being able to access student engagement data from the LMS to improve predictive models; (ii) research work on creating predictive models with evolving course assessment structures is an area that can be explored in future; (iii) future studies deploying SPPA in different courses and learning contexts as well as longitudinal studies with multiple course iterations can provide evidence and evaluations on the effectiveness of its approach; and (iv) enhance SPPA to provide students with analytical information on their learning progress with the aim to improve SRL skills.