1 Introduction

Learning analytics research focuses on the usage of learners’ data for developing personalized applications that support a wide range of stakeholders (e.g., learners, instructors, advisers, administrators). Learners’ data are analyzed for key performance indicators capturing learner behavior patterns, which are then visualized (Baker, 2016). The move towards visual displays has led to the development of learning analytics dashboards (LADs) which can prompt user reflection via relevant depicted insights, and potentially inform stakeholders on required interventions for optimizing current learning environments. Schwendimann et al. (2017, p. 37) defined LADs as “displays that aggregate different indicators about learner(s), learning process(es) and/or learning context(s) into one or multiple visualizations”, which can also incorporate text-based content (Podgorelec and Kuhar, 2021). Though the layouts of individual LADs may differ, up to now, they primarily leverage two kinds of analytics, namely, descriptive and predictive (Afzaal et al., 2021). Descriptive analytics tends to look backwards to capture trends that are drawn from digital footprints left behind by learners while interacting within virtual environments. Digital footprints can result from accessing various learning resources or by undertaking collaborative activities or other communication interactions (Delen & Demirkan, 2013). Descriptive analytics also provides learners with snapshots of their current learning status, drawn again by tracking various online movements with the goal of empowering learners with self-knowledge about their learning behavior and learning progression at different points in time. In contrast, predictive analytics is forward looking. It provides forecasts about individual learners such as their probable performance in upcoming assignments and future grades on completion of a course. The integration of both predictive with descriptive analytics brings out richer insights, ultimately adding more value to LADs. However, as it currently stands, predictive analytics are only now beginning to emerge within LADs (Susnjak et al., 2022).

LADs serve multiple stakeholders. For one, they provide a scalable way for instructors and administrators to monitor student engagement in real-time. Emergent student activity patterns reveal actionable and useful insights, especially when instructors and students are physically separated from each other as in online environments. Students acquire a greater degree of self-knowledge through LADs by gaining more visibility into their online learning behaviors (Verbert et al., 2013a, b)which facilitate more informed study-related decisions. Most importantly, personalized learner-centric metrics on LADs can enable a form of self-reflection that results in positive behavioral adjustments. Meanwhile, instructor-facing LADs can identify at-risk students especially through predictive analytics, enabling instructors to initiate various forms of interventions which increase the probability of attaining successful course outcomes (Greller & Drachsler, 2012; Yoo et al., 2015).

Online learning environments are increasingly using broader types of dashboards to track student activities (Verbert et al., 2013a, b). These dashboards differ widely in the type of data used for analysis, the visual information displayed, the audience that they target, their overarching purpose (or theme) of the LADs as well as in their strategies to evaluate their effectiveness. To date, limited systematic literature reviews have been conducted summarizing the above aspects of LADs. Indeed, with the emergence of predictive analytics, no systematic literature review exists which considers the LADs from the perspective of what is being predicted, what input data is being used, which algorithms are employed and how the accuracy is evaluated. This review article seeks to fill this gap.

We first present an overview of the published literature reviews to build the groundwork already established in the LAD domain. The overview identifies the current state-of-the art and gaps in LADs, leading to three research questions to guide our study. Next, we present our own systematic literature review (SLR) in which we carry out an in-depth investigation into recent LAD case studies (published between 2011 and 2021) to answer these questions and provide recommendations for future LAD designs.

2 Overview of Systematic Literature Reviews on LADs

This section provides an overview of key findings covering the topic of LADs from four systematic literature reviews that were published between 2017 and 2019. These reviews were selected on the basis that they followed a structured article search process to identify ongoing trends in student-facing LADs and to assess the current state of this field.

Schwendimann et al. (2017) examined LADs based on categorization of learning contexts, data sources, visualizations and analysis types. Their review comprised 55 journal papers published between 2010 to 2015. They found that predominant theme of focus of the dashboards was on monitoring student progress, while being both student-facing and instructor-facing, and mostly relying on a single data source (i.e., log files from a learning management system (LMS)). Given that most studies were exploratory or proof-of-concept in nature, the conclusion was that very little analysis was performed on the effectiveness of LADs to impact learning outcomes.

Bodily and Verbert (2017) conducted an SLR on LAD studies published between 2005 and 2016. A total of 93 papers were reviewed from the perspective of LAD functionality, data sources used, design features as well as perceived and actual effects they elicit. The findings were inconclusive in respect to the effectiveness of LADs overall and specific designs, with the authors suggesting further research be undertaken.

Matcha et al. (2020) evaluated the impact of LADs on learning and teaching. From a review of 29 papers published between 2010 and 2017, the authors noted that existing LADs are insufficiently grounded in learning theory and do not offer insights into what constitutes effective learning approaches. The review suggests that evaluation of LADs should be done in learning contexts across numerous iterations for a stronger methodological foundation.

A recent systematic review conducted by Valle et al. (2021) focused on reviewing articles between 2012 and 2019 to determine whether evaluations of learner-facing LADs actually measured their efficacy. Similar to other reviews, a lack of alignment between the intended outcome and their evaluation measures was noted. The review added that current LADs are rarely designed to support students’ self-regulated learning, and the study again recommended the need for appropriate measures for evaluating the efficacy of LADs.

All four LAD reviews focused on who the target users were, what types of data were used, and which evaluation method was used. Bodily and Verbert (2017) expanded the contribution by devising a LAD evaluation criterion that considers their effectiveness in how they impact learner behaviors and overall achievements. In their subsequent review, they also took account of the sample size of the participants and whether they were from a single or from multiple courses, while Schwendimann et al. (2017) differed in their perspective and categorized studies in terms of technologies used for presenting LADs to users.

Existing reviews have provided substantial contributions to LAD research; however, with the emergence of predictive analytics and its growing importance in enhancing LAD capabilities (Bodily & Verbert, 2017; Susnjak et al., 2022), there is now a gap in the body of literature exploring this recent technological development. This is especially pertinent as predictive modelling is now recognized as an effective approach for identifying at-risk students, while also having the potential to improve both learner outcomes and retention rates (Umer et al., 2021). Hence, the point of difference in our systematic review is in examining LADs which incorporate predictive analytics. Accordingly, our research questions seek to address the previously unexamined role of predictive modelling within LADs, together with replicating and extending some findings from prior reviews.

3 Research Questions


What are the key themes underlying LAD usage in existing literature? Further, who are the target users and what visualization techniques are employed?


What types of data and which machine learning algorithms are commonly used for predictive analytics? What is being predicted and how is the prediction accuracy being assessed?


What evaluation methods have been conducted for studying the effectiveness of LADs?

4 Methodology

Our systematic literature review followed guidelines proposed by Kitchenham and Charters (2007). The field of learning analytics (LA) and educational data mining (EDM) has grown significantly since 2010 (Park & Jo, 2019), hence this study examined literature published in journals and conferences between 2011 to 2021. This review lays exclusive emphasis on LA systems that collect student data, apply predictive modelling, while delivering the prediction results to students, instructors or academic advisers in the form of visualizations or text-based feedback. We included studies which communicated results either via dashboard displays or through reports sent as email attachments. Conferences provide a publishing outlet for emergent fields; accordingly, those conferences with an explicit focus on LA and EDM (e.g., International Conference on EDM and International Conference on LA and Knowledge) that met the inclusion criteria were considered.

Following a similar methodology used in earlier studies (Matcha et al., 2020; Schwendimann et al., 2017), we considered five main academic databases, namely, ACM Digital Library, Science Direct, IEEE Xplore, SpringerLink and Wiley. Additional databases including Google Scholar and Scopus were included in our SLR, as is recommended by Kitchenham and Charters (2007) for detecting relevant articles that could be overlooked since these databases are not typically indexed in common literature databases. Our search query comprised keywords pertinent to LAD literature (i.e., ‘learning analytics dashboard’ AND (‘students feedback system’ OR ‘early warning system’ OR ‘predictive analytics’ OR ‘visualization tool’)). This yielded 4350 articles across various academic data sources. Next, we removed duplicate papers and reduced to 2840. The exclusion criteria were applied where papers that were not written in English or contained less than 3 pages or were dissertations were removed. The papers that used MOOC datasets were also filtered. We assessed titles and abstracts of each article and retained those studies which were related to LADs and early warning systems (EWS) with embedded predictive analytics. EWS studies were included since some used visual tools for delivering information. Finally, 11 journal articles and 4 conference proceedings (indexed in the Scopus/ACM digital library) were considered admissible for answering the research questions. Figure 1 describes the overall methodology.

Fig. 1
figure 1

Methodology used for inclusion and exclusion

5 Review of Dashboards Meeting the Inclusion Criteria

This section provides a summary of 15 LADs in view of the three research questions. Table 1 presents an overview of the reviewed papers highlighting key LAD characteristics prevalent in current LADs. Overall, we observed an increasing trend in the number of LAD studies which have been conducted over the search period. This can be seen in Fig. 2 which shows frequencies presented as a three-year rolling average in order to smoothen the data and remove noise.

Table 1 Reviewed papers overview
Fig. 2
figure 2

Trend in number of LAD studies as a three-year rolling average

Arnold and Pistilli (2012) from Purdue University created Course Signals, a student- and instructor-facing dashboard which tracks student progress. The data comprised student course performance, student engagement in terms of interactions with the LMS, academic history, and demographics. A visualization leveraging colors in the form of traffic lights to convey three risk levels, high (red), medium (yellow) and low (green), was developed and made available to the target users as early as the second week of the semester. However, the tool did not provide direct insights into model reasoning, that is, why a particular student was considered to be at risk of failure, making it difficult to recommend a specific remediation.

Essa and Ayad (2012) developed a student success system (S3) to alert instructors on student risk levels and to provide feedback to the corresponding students. Similar to Course Signals, S3 employed color as the primary visual encoding channelFootnote 1in the form of a traffic light visualization to express risk probabilities. The authors used an ensemble modelling strategy which combined the predictions of several base models. S3 had no student-facing functionality, and no evaluation of the tool was conducted.

Agnihotri and Ott (2014) from the New York Institute of Technology designed a dashboard to identify at-risk students needing support, aiming to increase retention of first-year students. Survey, financial and pre-enrolment data were used for the tool. It was observed that students predicted to be at-risk by the model indeed aligned with those who did not return in the subsequent year.

Hu et al. (2014) developed an EWS to identify at-risk students. Algorithms such as C4.5, regression trees, logistic regression (LR), and boosting were applied to develop an EWS for online undergraduate courses. Learner data comprising login behavior, online course material interaction, assignment submission status and forum discussion activities were collected during three different time intervals in a semester. The instructors used the tool to adjust their teaching methods for poor-performing students, with the study concluding that EWS improved learners’ performances and reduced attrition rates.

Jayaprakash et al. (2014) designed the Open Academic Analytics Initiative (OAAI) with an EWS aimed at identifying at-risk students. The authors claimed that the system achieved accuracy of 84%. The system provided two intervention strategies, one entailed a message with guidance being sent to at-risk students, while the other involving online academic support. The study reported that students who received alert messages experienced a positive impact on their final outcomes although no clear gains in course grades were observed by the group receiving additional online academic support.

Corrigan et al. (2015) from Dublin City University developed PredictED to forecast students’ final exam grades using student behavior data from Moodle (LMS). Support Vector Machines (SVM) was used to classify students as either passing/failing for each week of the semester until the exams. The opt-in students received weekly emails based on prediction results and showed an improvement in final grades of nearly 3% on average. Like Course Signals, this tool did not provide explainability of the model’s reasoning making it difficult to recommend a specific remediation for at-risk students.

In another study, Kuzilek et al. (2015) from Open University developed a dashboard to predict at-risk students at early stages of their study. A dashboard with two views – a course view and a student view – was implemented. The course view provided an aggregated view of student activities and overall assessment results, while the student view presented a table with individual student results and predictions regarding the submission of the upcoming assessment. Weekly emails were sent to course coordinators. Demographic data and interaction data from the virtual learning environment (LMS) were used for the analysis. However, the tool did not provide any direct insight to students.

The goal of LISSA (Learning dashboard for Insights and Support during Study Advice) was to assist academic advisers in helping students plan a more attainable study programme and not waste time on unsuitable choices (Charleer et al., 2018). Student grades and historical data were used for creating a prediction bar depicting the duration of their progress in the bachelor programme. A custom color-based visualization was used that communicated predicted outcome categories of success, mediocre pass and fail as green, orange and red respectively. The tool was only used to support the advising session and did not target students directly.

A prototype LAD was developed by He et al. (2018) aiming to provide students with academic performance guidance for STEM courses. The authors used RF (Random Forest) for predicting students’ results. A gauge plot presented the predicted results to student advisers who relied on the tool to initiate interventions. The dashboard did not depict students’ learning progression,. Neither was it deployed on real courses nor was any tool evaluation conducted.

Wang et al. (2018) designed an EWS with the goal of reducing student dropout and reducing graduation delays. This system included data from the library, dormitory, grades, attendance, and engagement. The library and dormitory data enabled closer monitoring of study habits. The NB (Naïve Bayes) algorithm attained classification accuracy of 86%, with grades and library data being among the key variables. The tool was neither productionized nor subjected to evaluation.

To motivate and personalize the student learning, de Quincey et al. (2019) developed a student-facing LAD which focused on showing student weekly progress along with their predicted semester-end scores. Similar to Hellings and Haelermans (2020), a weekly reminder email was sent to students during the first few weeks, with personalized recommendations also displayed on the dashboard. A mixed method LAD evaluation showed favorable feedback regarding only the usability of the dashboard.

Bañeres et al. (2020) presented two EWSs, one instructor-facing and the other student-facing, to identify at-risk learners and provide them with personalized feedback. A different model was built for each course based on students’ grades using NB, Decision Tree (DT), K-Nearest Neighbour (kNN) and SVM algorithms. The authors found NB to be the best for their dataset; however, changes in course assessments led to inconsistencies in the predictions.

Gutiérrez et al. (2020) developed LADA (Learning Analytics Dashboard for Advisers) to support academic advisers in setting up students’ semester plans. Data used in the analysis included student grades, course credits and lists of courses taken or enrolled into by students. The dashboard provided predictions on students’ probability of success and a further indication regarding the confidence of predictions. An evaluation of the dashboard undertaken by advisers found that non-experts perceived LADA to be more useful compared to experts.

Hellings and Haelermans (2020) designed a dashboard aiming to provide study progress updates to students along with their predicted probability of success in a course accompanied with the predicted grade. A weekly email with the dashboard link was sent to the participants with the intention of encouraging them to use the dashboard frequently. Linear models were used to predict the grade mark, while AdaBoost predicted the course outcomes. Dashboard usage was correlated with a positive impact on student online engagement, but no impact on final exam grades and course completions was detected.

Plak et al. (2021) evaluated an EWS with student counsellors to estimate the effect of counselling on first-year student dropout rates and academic performance. To predict student dropout, the authors used SVM, RF, and LR models. LR outperformed all the other algorithms in this context. The findings indicated that EWS‐assisted counselling did not reduce dropout or increase the credits obtained by the end of the academic year. It was hypothesized that this was due to non-targeted and non-actionable feedback/recommendations being provided to at-risk students.

6 Results

This section summarizes the key areas of interest that emerged from the 15 selected studies. We present them in the following order, key themes representing the overarching purpose of the LADs, target users, data types used, prediction methods, visualization types, evaluation techniques and finally study limitations.

6.1 Key Themes

The 15 dashboards have been categorized into two overarching themes. Eleven dashboards were used to track study planning. Their primary aim was to support academic advisers or instructors in assisting students during courses and in more effectively planning their subsequent course selections. Study planning advice was provided by conducting a session with students or sending them an awareness message with guidance during an ongoing course. With majority of studies focusing on either providing study advice or sending awareness messages to at-risk students, we found one study by Corrigan et al. (2015), that delivered advice messages to both at-risk and not-at-risk students.

The second theme centered on monitoring learning progress from student learning activities. Four studies aimed at improving learning outcomes by increasing student online engagement.

6.2 Target Users

Our study found that the LADs target learners, instructors and academic advisers, with learners being the most common target group (as shown in Fig. 3). In 11 cases, the LADs targeted a single user group. In the remaining four, two targeted both instructors and learners while the other two targeted instructors and academic advisers.

Fig. 3
figure 3

Target users of the dashboards. Some dashboards target multiple audiences

6.3 Types of Data Used for Prediction

Data for enabling the LADs were collected from a variety of sources including LMSs, demographics, pre-academic, library interaction as well as assessment grades, with studies using various combinations of these inputs.

LMSs provide online learning digital footprints like logs of interactions such as forum activities (messages viewed and posted), durations of sessions, quizzes taken, resources accessed etc. Most studies utilized logs of these interactions as absolute counts for making onward predictions, without normalizing them by representing values for each student in relation to those of their cohort. Pre-academic data comprised students’ academic background information such as high school GPAs, past academic history, aptitude tests and enrolment details, while characteristics of the demographic data were age, ethnicity, and gender. In addition to these data, most of the researchers used assessment grades (quiz scores, in-between assignment grades, course grades) made available during the semester progression as well as prior semesters’ grades.

6.4 Prediction Methods

Our review reveals a wide range of machine learning algorithms being utilized for predictive analytics (Table 1). These algorithms differ in various aspects, such as in their inherent biases and assumptions, their ability to generalize beyond training data and the amount of data required for training the models (Osmanbegovic & Suljic, 2012). Sweeping claims that one algorithm is superior to another across all datasets in this domain are unsupported.

Algorithms include DTs (Quinlan, 1986) which use partition rules to divide the data into groups based on a single variable. The process continues until all the variables have been utilized (Ferreira et al., 2001). The other tree-based classifiers commonly used for predictions are CART (Breiman et al., 1984) and RF (Breiman, 2001). CART has robust mechanisms for pruning large DTs in order to reduce overfitting which compromises generalization. While RF is based on combining multiple DTs into a single model also known as ensemble methods. Several studies have leveraged ensemble-based algorithms as they tend to produce classifiers having more robust generalizability mechanisms.

Logistic Regression (LR) (Cole, 1991) and SVM (Cortes et al., 1995) are examples of function-based algorithms that extract knowledge that is encoded in the form of mathematical functions, with LR being used in more studies compared to SVM, possibly due to LR being easier to specify in its tunable parameters. Two papers used Neural Networks (NNs) for predicting student risk levels.

LADA was the only study that used clustering which is an unsupervised learning technique. Clustering automatically groups individuals into distinctive groups based on commonalities. Once a cluster model is generated, new data points can subsequently be assigned to a cluster based on some distance criterion (Ochoa, 2016).

Ten studies utilized predictive analytics for forecasting the final grades with the aim of identifying at-risk students. One study relied on predictive analytics to determine whether students would submit their upcoming assignments. Another attempted to predict whether students would return to complete their studies in the following semester, while one study generated an estimation regarding the expected duration until qualification completion for individual students. One study sought to assist students in selecting course options by predicting their performance outcomes across different options.

However, while all the reviewed LADs utilized predictive analytics, none provided model interpretability to explain how the model works overall, nor were there any instances of insights being directly communicated about model reasoning (such as how a particular student was predicted to have a given outcome) making it difficult to recommend a specific remediation or even to trust the system.

6.5 Evaluating the Prediction Accuracy

Once predictive models are created, it is crucial to evaluate their generalizability on new (test) data. A variety of measures can be used for this task, and it is good practice to assess models using a suite of evaluation metrics. Common metrics in literature are accuracy, F-measure, mean absolute error (MAE), precision, recall, false positive (FP) and false negative (FN) rates, and receiver operating characteristic (ROC) curves. Accuracy is simply defined as a proportion of correct classifications. Evaluating a model with accuracy alone can be misleading if a dataset is imbalanced between samples belonging to different labels (outcomes). Majority of studies combined accuracy with other metrics in determining the efficacy of their models. However, two studies (Essa & Ayad, 2012; Wang et al., 2018) used accuracy alone. Given that the cost of misclassification is different for those who are at-risk as opposed to those who are not, it is also important to consider the model behavior on each of the two groups separately by using measures like recall, precision, and FP/FN rates. This was followed by some studies.

Few studies used the F-measure which is the harmonic mean of precision, and recall, and is arguably a more informative evaluation metric compared to accuracy for imbalanced datasets. MAE which is the difference between the model’s predicted probability of non-completion and the mean of actual non-completion students was used by Plak et al. (2021). Meanwhile, the more advanced ROC curve that represents a classifier’s performance on a set of performance thresholds balancing recall and FP rates (Fawcett, 2006) was used by three studies. A summary of these evaluation measures is shown in Table 1.

6.6 Visualization Methods

Graphical data representation is a key consideration in dashboards, since the main purpose of data visualization is effective communication. When designed correctly, data visualizations can provide support to students, teachers or educational administrators (Buenaño-Fernández et al., 2019). Figure 4 depicts the frequency of usage of various visualization charts. The most widely used visualizations are bar charts (Sahin & Ifenthaler, 2021). Bar charts encode quantitative data effectively by relying on the ability of the human visual cortex to accurately distinguish relative differences in lengths of graphical cues (Mackinlay, 1986). Other visualization techniques which are accurate at communicating quantitative information are line (capturing trends), scatter (conveying relative positions) and box-and-whisker plots (showing distributions), though their usage is not as widespread.

Fig. 4
figure 4

Visualization types (We coalesced some chart types that rely on identical visual encoding channels into more general categories. Donut and Gauge charts have been placed into the Pie Chart category. Slider, Win-loss, Histograms and Stacked Bar Graphs have been grouped into a Bar Graph category. Color-based visualizations that which are non-standard and employ custom displays with an overarching emphasis on color as a visual encoding channel, have been grouped into Color-based Custom Charts. This category includes visualisations like traffic signals and risk-quadrant.)

Color-based custom charts representing graphs which rely solely on color as the visual encoding channel to communicate information are also used extensively. Examples of these charts are traffic signals and risk-quadrants aiming at communicating nominal data. In all instances, these have been used to convey risk categories. Tables containing text-based information have also been widely used. Tables provide an effective and powerful way to present data meaningfully (Midway, 2020).

Various forms of pie charts, including donut and gauge plots were also used frequently. However, pie charts are problematic when the aim is to precisely convey quantitative information due to their reliance on humans being able to reliably quantify slopes and differences in angles. Angles and slopes are poor visual encoding channels and LADs utilizing them will have met their goals sub-optimally. Two studies used the gauge plot to present risk levels to students. While this is not an optimal visualization technique, given the simplicity of the information being communicated, arguably these charts were likely adequate. One instance shows the use of sociograms for presenting online communication and networking data as well as one use of a radar chart; the latter exhibiting problematic characteristics for interpretation due to its reliance on the reader being able to accurately quantify differences between areas of irregular shapes. Three studies did not mention details regarding their visualizations and are not included in Fig. 4.

6.7 Dashboard Evaluations

Our findings reveal that evaluation approaches to measure dashboard effectiveness were followed with variable rigor among the reviewed papers (see Table 1). Most of the studies did not mention any evaluation criteria nor conduct any evaluations. Of the remaining studies, the majority used mixed methods that combined qualitative and quantitative methods for evaluating the usefulness of the dashboards. One study conducted a qualitative approach by evaluating general aspects like LAD usability by inquiring into users’ perceptions of their interaction with the tool.

Four studies investigated the effects of LADs on students’ final outcomes. Of these, Arnold and Pistilli (2012), and Jayaprakash et al. (2014) found that providing risk-level predictions to students helped them achieve better grades in their courses. Bañeres et al. (2020) claimed that usage of the LAD showed some positive impact on students’ final outcomes, but they could not determine whether this was attributable to the utilization of the tool or to interventions. Hellings and Haelermans (2020) claimed that LAD usage showed positive effects on the student online engagement, although no similar effect was found for students’ final exam performances.

7 Discussion

Our research shows that until now only a limited number of dashboards have incorporated predictive analytics. However, the trend is increasing (Fig. 2). The few which have implemented and evaluated this technology have mostly reported positive effects. A proportion of the positive effects can undoubtedly be ascribed to associated interventions initiated with at-risk students, raised by predictive models’ early warnings. However, some effects can likely also be attributed to LAD-embedded predictive analytics and the self-reflection that they trigger in at-risk students, which result in positive behavioral adjustments. While it is not possible at this stage to disentangle the contribution that different factors have on the effects in eliciting positive outcomes, the utilization of predictive models for real-time identification of at-risk students remains a forward-thinking strategy that has so far indicated to hold promise as a tool for enhancing student success and retention rates.

Many of the reviewed articles were used to track study plans for at-risk learners but did not provide any direct insights into detailed causes behind the models’ risk predictions, thus making it difficult to offer learners a tailored set of remedial actions. In the case of Course Signals, Tanes et al. (2011) performed content analysis of the feedback messages sent by instructors to students after receiving alerts. The authors noted the lack of instructive or process feedback types in the instructor messages sent to students, meaning that the students did not know what specific behavioral adjustments needed to take place. Therefore, alongside accurate predictions, it appears important to consider how prediction mechanics and model outputs are presented as advice to at-risk students.

Predictive analytics, when performed well, presents fresh opportunities in terms of the timing of intervention strategies. Machine learning can detect subtle yet complex multi-dimensional patterns early on in a semester that humans cannot, enabling an early initiation of potentially more effective interventions. However, determining an optimal time to intervene with identified at-risk students is somewhat of a challenge since predictive models improve in their accuracy with more data as a semester progresses and each students’ digital footprint increases. Most of the studies have used black-box machine learning algorithms. Unfortunately, researchers in the LAD field have not yet drawn from emerging machine learning technologies that convert uninterpretable black-box models into ones that are understandable, and which consequently offer students and instructors deeper insights about what are the key drivers of negative predicted outcomes for a given student.

Jayaprakash et al. (2014) advise that predictive models do not influence course completion and retention rates unless they are combined with effective intervention strategies aimed at supporting at-risk students. This could again be due to the lack of interpretability of the predictive models and the absence of prediction explainability to learners as to how exactly the models arrived at given conclusions (Mathrani et al., 2021). Model interpretability and explainability can be realized with the use of the Shapely Additive Explanations (SHAP) method (Lundberg et al., 2021), Local Interpretable Model-agnostic Explanations (LIME (Ribeiro et al., 2016)) or anchors (Ribeiro et al., 2018) which have recently become popular.

With recent advances, current machine learning technologies are able to not only explain models and their predictions but are also able to offer data-driven and automated counterfactuals which offer prescriptive capabilities that can clearly articulate to the learners what behavioral adjustments would hypothetically result in future positive predictive outcomes (Susnjak et al., 2022). As it currently stands, existing LADs only display the predicted outcomes, indicating significant gaps and pointing towards rich future research opportunities.

Findings from the reviewed papers revealed a considerable diversity in the utilization of predictive analytics. Unsurprisingly, majority of reviewed articles integrated predictive analytics for forecasting the final course academic performance. Others saw the state of being at-risk as being broader, and used proxies such as assessment submission prediction, likelihood of returning to complete current studies and the prediction of the overall qualification completion as alternate approaches to formulate the problem.

LADs frequently aim to communicate quantitative information which requires precision and, as alluded to earlier, an appropriate matching between the data types and the visual encoding channels (Munzner, 2008). Unsuitable visualization design choices add to the cognitive load of users and fail in their original intent, and equally this occurs when color is used indiscriminately in dashboards (Bera, 2016). Visualization theory has firmly established the primacy of various visualization encoding channels (e.g., position, length, slope/angle, area, depth, hue/tint, shape, curvature, volume etc.) when communicating data, whether it be for continuous, ordinal, or categorical types (Munzner, 2014). We observed in our survey that a number of studies used graphing components like donut graphs, gauge plots, pie and radar charts. These rely on visual encoding channels like slope/angle and area which are suboptimal for communicating information, be it quantitative or qualitative. This raises questions about the degree to which visualization theory is being drawn upon in the design of some recent LADs. The issue of inappropriate visualizations specifically within LADs has already been raised. Park and Jo (2019) add that unclear visualizations can restrict the student’s sensemaking of the LADs. While Schwendimann et al. (2017) posit that the granularity of information being displayed on the dashboard should be determined with proper visualization techniques so that users are not confused or overwhelmed by the amount of information presented to them. Our study shows that 40% of the dashboards use three or more different visualization techniques, with one dashboard using seven which likely leads to cognitive overload rather than clarity. Midway (2020) also mentioned that effective visualizations foster in the audience an intended understanding and interpretation of the data, while the reverse holds.

It is also the case that different users have different informational needs, and the way they perceive messages influences their actions. Hence, it is recommended that the user perspective is kept at the forefront during dashboard design. Perception of learning varies among users (Verbert et al., 2013a, 2013b); therefore, LAD designers must appreciate how individuals interact with visuals, and what impact visuals have on their learning processes (Bodily & Verbert, 2017; Corrin & de Barba, 2014).

The evaluation of the impact of LADs towards learner outcomes is an area that requires much attention. Studies which had conducted evaluations were mainly focused on the functionality and usability of LADs, and while they highlighted the potential effect on learning, they did not demonstrate quantifiable effects of LADs as a pedagogical treatment. Though conducting usability testing has merit, future studies should also assess whether LAD usage has positive effects on students’ final outcomes. This can be determined by conducting an experimental study among two groups (control and treatment) over some length of time in order to determine if there is a difference in outcomes. We also found that most of the LADs were working prototypes; hence, students were unable to use them in real course settings and provide reliable feedback on their effectiveness.

8 Recommendations for LAD Design

The key points that have emerged from our review of the 15 selected LADs are the necessity of fully leveraging machine learning for predictive and prescriptive analytics, the need to design dashboard displays with information visualization theory in mind, having a focus on appropriately presenting information to end-users, adopting a learner’s perspective and conducting an evaluation strategy (refer Fig. 5). Presenting mere outputs of prediction models on their own within LADs are unlikely to result in significantly improved learning outcomes. The information accompanying students’ predicted outcomes needs to include model explainability and ideally, prescriptive analytics which can support a tailored intervention strategy to assist at-risk students. Next, visualization choices need to be grounded in established theory and best practice in order to maximize effectiveness and minimize cognitive load. Further, being mindful of how learners perceive information is key to the effectiveness of the message that is being conveyed. Therefore, strategies to engage users (via personalized recommendations and other targeted motivating messages) should be at the forefront.Finally, institutions should have some evaluation strategy to ensure that the LADs meet the goals for which they have been designed. We propose that such evaluations should be done with end-users (i.e., learners, instructors and academic advisers) for ongoing improvements to the LAD designs. Figure 5 frames our recommendations for LAD design, deployment and its effective utilization as a technological tool possessing a pedagogical value.

Fig. 5
figure 5

Recommendations for Learning Analytics Dashboard Design

9 Conclusion

This article has conducted an extensive literature review on various LADs that tracked student data and reported insights to various stakeholders. LADs leveraging predictive analytics were considered in this study. Our review indicates significant limitations in the current positioning of LADs in real-world educational settings. Our analysis finds that evidence of LADs’ effectiveness to impact student learning outcomes is inconclusive. Current LAD evaluations are mostly limited to functionality and usability aspects only.

The reviewed LADs predominately aim at facilitating study planning and monitoring student learning progress, while targeting learners, academic advisers and instructors. Given the importance of effective visual presentation, our study finds that there is some improvement to be made in terms of correct usage of visualization techniques in order to enhance precision and to reduce cognitive load.

We also find that predictive analytics are only now beginning to emerge as a tool within LADs. Our study concludes that its usage lacks maturity and lags in leveraging state-of-the-art machine learning technologies which enable model interpretability and explainability. We maintain that predictions of student’s outcomes should be accompanied with outputs that provide high-level information about the mechanics of the models, as well as explanations of how a specific prediction was derived for a given student. Ideally, we should now also be starting to see the emergence of prescriptive analytics within LADs that utilize counterfactuals which can provide at-risk learners with actionable insights.

Our survey has enabled us to identify strengths and gaps in current LAD studies, and to formulate a framework comprising a set of recommendations for future LAD designs. We believe that adhering to the proposed LAD framework can aid educational providers in implementing more effective LADs, which facilitate improved learning outcomes.