Keywords

1 Introduction

The use of online learning systems provides a rich set of data that makes it possible to extract information about student learning behaviours. This information provides an opportunity for understanding and improving education, which has motivated many universities to invest in learning analytics dashboards (LADs) [6, 28, 41, 48]. These dashboards generally provide visualisations of student data, collected from a variety of educational systems, to assist educators in making decisions [41]. However, the increasing popularity and improvement of online learning systems over the years has resulted in a significant increase data in terms of its volume, velocity and variety. Consequently, making sense of data in LADs has become more challenging compared to earlier years [43].

In some domains, a common approach to navigating large complex multi-dimensional data sets is to use drill-downs [39]. A drill-down operation, in an educational setting, allows users to explore the behaviour of sub-cohorts of students by progressively adding filters. Manual drill-down operations can generally be used by instructors to effectively investigate curiosity-driven questions that are related to student attributes. For example, it is possible to use a drill-down filter to find how international or female students have performed compared to other students. However, instructors may also be interested in finding which drill-down filters lead to insightful results. As an example, an instructor may be interested in finding drill-downs that identify a sub-cohort of students who have significantly different behaviour or performance compared to the rest of the class. Given the availability of a large number of potential drill-downs, manually finding drill-downs that provide insights is a challenging task [1, 42].

In this paper, we report on extending LADs with a functionality that provides recommendations of insightful drill-downs. Our approach takes a process mining lens to examine students’ learning process considering three aspects of their learning behaviour: performed learning activities, the frequency of each activity and the order in which the activities are performed. Utilising the learning process, rather than focusing on aggregated engagement metrics which is the common approach in LADs [41], is increasingly being recognised as essential to understanding and optimising learning [33, 46]. In our approach, the notion of an insightful drill-down is defined as a set of filtering rules that identify a sub-cohort of students whose learning processes are most differentiated from the rest of the students. Our key contribution is the design and development of an algorithm, which we refer to as Learning Process Automated Insightful Drill-Down (LP-AID). LP-AID employs a process mining method called Earth Movers’ Stochastic Conformance Checking (EMSC) [29] to compute the distance between learning processes of different cohorts to recommend insightful drill-downs.

We present a practical application of LP-AID in an existing LAD called Course Insights that provides users with a manual drill-down functionality. Specifically, we apply LP-AID to data from a course with 875 students, with high demographic and educational diversity, to demonstrate the drill-down recommendations and to explore the possible insights that can be derived from them. Our initial findings, and instructor feedback on our approach, suggest that LP-AID can be integrated into LADs to provide automated and insightful drill-down recommendations.

2 Related Work

Learning Analytics Dashboards (LADs). Several recent systematic literature reviews have been published on LADs [6, 41]. Schwendimann et al. [41] provide a comprehensive picture of the common data sources that are used by LADS, which include clickstream logs (e.g., [12, 14, 25, 34]), data related to learning artefacts (e.g., [11, 16, 20, 24, 45]), survey data (e.g., [4, 35, 40]), institutional databases (e.g., [9, 19, 23]), physical user activities (e.g., [16, 31, 44]) and data captured from external educational technologies (e.g., [10, 26, 27, 36]). To make sense of these data LADs provide a variety of visualisation options. Schwendimann et al. [41] outlines the different types of visualisations that are commonly used in LADs, which include bar charts, line graphs, tables, pie charts, and network graphs. While these visualisations simplify the process of making sense of large data sets, they naturally abstract away much of the details related to learning processes, which are essential to understanding and optimising learning [17]. We aim to address this challenge by employing process mining approaches to guide drill-down operations and identification of insightful data.

Smart Drill-Down Approaches. The concept of a drill-down operation was initially introduced in the context of OLAP data cubes. They enabled analysts to explore a large search space to identify exceptions and highlight interesting subsets of data [39]. In recent years, drill-downs have also been employed in analytical dashboards. While their use has enabled users to explore large data-sets, they provide users with too many drill-down choices and also the potential for incorrect reasoning due to incomplete exploration [1]. Several attempts to address these challenges have been made. Many of the proposed methods for discovering insightful drill-downs focus on detecting anomalies in small data portions (e.g. [1, 37, 38]) while some focus on identifying interesting differences in larger data subsets (e.g. [21]). In this paper, we take a similar approach as [42] by letting LAD users request drill-down recommendations at a level of granularity they are interested in, thus reducing drill-down choices without affecting user autonomy. While [42] recommends drill-downs based on the difference between cohorts’ attribute values, this paper bases the recommendations on the difference between cohorts learning processes.

Educational Process Mining. Process mining aims to derive information from historical organisational behaviour, recorded in event logs [2]. Educational process mining uses data from educational contexts to discover, analyse, and visualise educational and learning processes, for instance to analyse whether students’ behaviour corresponds to a learning model, to detect bottlenecks in the educational process, to identify patterns in processes [7], to study administrative processes [18] and to study student learning through their interactions with online learning environments [3, 8, 49]. Prior work [7] indicates that current educational process mining solutions have not adequately provided support for allowing users to identify and investigate cohorts of interest.

3 Automated Insightful Drill-Down Recommendation

Next, we introduce our method for recommendation of insightful drill-down criteria in LADs, by first introducing relevant concepts and defining our problem statement formally, presenting our approach, and illustrating it with an example.

3.1 Notation and Problem Statement

Assume that a LAD has access to an event log L that captures a collection of traces \(T=\{t_1, \ldots t_N\}\), each representing a student. A trace \(t_i\) has a unique identifier (e.g. a student ID), a set of features \(F={f_{1}, \ldots f_{M}}\) where \(f_{im}=v\) presents v being assigned to feature \(f_i\) for user \(s_i\) and a sequence of events \(E_i=\langle e_{i1}, \ldots e_{iL_i}\rangle \) representing the learning path taken by student \(s_i\), where the trace length \(L_i\) can vary for each student. Each event \(e_{iL_i}\) has a timestamp and a label representing the learning activity.

A rule r expresses a condition on a feature (e.g., ‘program’ \(=\) ‘Computer Science’). For a feature with numerical values in an event log L, the corresponding rule value can be a range instead of a single value (e.g., ‘age’\(\,>\,25\)). A drill-down criterion \(\sigma \) is defined as the conjunction of a set of rules (e.g., ‘program’ \(=\) ‘Computer Science’ \(\wedge \) ‘age’  > 25). A drill-down criterion \(\sigma \) is said to cover a student \(s_n\), if all rules in \(\sigma \) are satisfied for the corresponding features of \(s_n\). Consequently, applying \(\sigma \) to L leads to the selection of a set of students \(S' \subseteq S\) such that \(\sigma \) covers each \(s_n \in S'\). We define the coverage of a drill-down criterion \(C_\sigma \) as \(\frac{|S'|}{|S|}\), which is the fraction of students S covered in the resulting sub-cohort \(S'\). Using this notation, our problem can be formalised as follows:

Formal Problem Statement: Given an event log L, a set of features \(F' \subseteq F\), a constant \(0 \le \alpha \le 1\) and a constant k, find a set of drill-down criteria \(\varSigma = \{ \sigma _1, \ldots \sigma _k\)} that uses features in \(F'\) such that each criterion \(\sigma _k\): (1) has a larger coverage than \(\alpha \) (i.e., \(C_{\sigma _k} > \alpha \)), (2) selects a sub-cohort of students \(S'\) that deviates most from the remaining students on their taken learning path \(L'\) in terms of events, relative frequency of each different learning path and the order in which the activities have been triggered (i.e. the distance between the sub-log \(L'\) and the remaining students \(L\setminus L'\)).

3.2 Proposed Approach

We present our approach by first providing a high-level overview of the underlying algorithm, and then describing the automatic drill-down process using an example. Our algorithm takes the students event log as an input and returns a set of drill-down criteria annotated with the learning process distance and students’ population coverage as the output. The algorithm examines all the possible drill-down actions to find the drill-downs that result sub-cohorts with the most deviated learning processes. Algorithm 1 provides the high-level pseudo-code of our proposed approach. It takes four parameters as input: the event log L, the features \(F'\), the minimum coverage \(\alpha \) and the number of drill-down criteria to be recommended k. The output of the algorithm is a set of top k scored drill-down criteria represented by \(\varSigma \). The algorithm consists of three main blocks as described in the remainder of this section.

Create Drill-Down Tree. The BuildTree function takes two parameters as input: the event log L and the list of selected features \(F'\), and returns a drill-down tree. The function obtains all the values of each feature in \(F'\) that exist within L and generates a tree-like collection of nodes T, where each node represents a splitting rule r for one feature. Each path in the tree consists of a set of feature-value pairs.

Score Nodes and Prune the Tree. The tree embodies all possible drill-down paths, of which not all will necessarily result in a cohort with the required minimum size (i.e. \(\alpha \)). PruneAndScore traverses the tree recursively to examine all the possible drill-down actions. ObtainSubLog takes each node, which is a pair of feature/value pairs, and its parent’s event log parentL as input and filters parentL to obtain a sub-log cohortL containing only the data of the sub-cohort. The sub-cohort’s size is checked for the covered fraction of the student population to not be smaller than \(\alpha \) and not greater than \(1-\alpha \). If the condition is met, the main event log L is filtered to obtain the event log of the rest of students remainderL. Otherwise, the node is pruned (if coverage \(\le \alpha \)) or discarded from scoring (if coverage coverage \(\ge 1-\alpha \)). For each drill-down path, computeDistance takes the pair of the sub-cohort and the remaining sub-logs as input and computes the distance between them using Earth Movers’ Stochastic Conformance Checking [29].

Sort and Return the Top K Drill-Down Criteria. topDistances takes the scored drill-down Tree T and k as input and returns k recommendations. To pick the k nodes, this function uses a solution set ranking function that maximizes diversity, similar to the approach by [47]. As an alternative we could pick the k highest scored nodes. However, diversifying the recommendation allows us to provide a wider range of insightful drill-downs. Our algorithm converts the chosen nodes to a set of drill-down criteria \(\varSigma \), each annotated with distance score and returns them as a recommendation to users.

figure a

3.3 Example Illustration

In this section, we illustrate our approach using an event log with a small set of 6 students, and k = 1 and \(\alpha \) = 0.2. We explain how our algorithm is used to find the most insightful drill-down criteria (namely the criteria that identify a sub-cohort with the highest distance) for the event log given in Fig. 1a,b with students \(\{S1 \cdot S6\}\) and the feature set: {Residential Status, Assessment} as \(F'\). Our example course has learning activities of: {Lecture 1, Lecture 2, Quiz A, Lecture 3, Lecture 4, Quiz B and Lecture final}, which were made available to students weekly in the mentioned order. The trace of triggered learning events by each student is shown in Fig. 1a. Each event is represented by an activity label and the timestamp.

Fig. 1.
figure 1

Illustrative example of LP-AID applied on a sample learning event log.

Our algorithm initially extracts all values of \(F'\) that are present in the event log and generates the drill-down tree T. Next, the tree is traversed depth first; based on each node’s filtering criteria, the event log is divided into the sub-cohort’s sub-log and the remaining students’ sub-log. The nodes covering less than \(\alpha = 0.2\) of the student population are pruned. For instance, the node [Assessment=‘Mid Grade’] is pruned as only one student (i.e. 0.16 coverage) adheres to this criteria. As a result, 5 actionable drill-down paths remain (shown in Fig. 1c); P1: [Residential Status=‘Domestic’], P2: [Residential Status=‘International’], P3: [Assessment = ‘Low Grade’], P4:[Assessment = ‘High Grade’ and P5: [Assessment = ‘Low Grade’ and Residential Status = ‘International’]. Our algorithm computes the distance between the sub-logs for each drill-down path and annotates each node by the distance d and the coverage (as shown in Fig. 1c). The drill-down path P5, which has the highest difference (57%), is the resulting recommendation. Figure 1d shows the LP-AID interface in Course Insights, representing the input and the resulting recommendation, including the drill-down criteria, coverage and distance.

To understand the difference between the learning behaviour of the sub-cohort and the remaining students, here we used Disco [15] to visualise the underlying learning processes of each group. Disco generates a Process Map in which: boxes represent activities, numbers in the boxes represent frequency of each activity, arrows represent sequence the activities were performed in (i.e. the control flow), numbers on the arrows represent frequency with which the two connected activities were performed, and thickness of the arrows the activities represent relative frequencies. For the demonstration purpose we highlighted the activities that were performed in a different order in red. To compare the two modelled learning processes, we look at the difference between the activities, their frequencies and their order. For instance, Fig. 1e shows that Lecture 3 was skipped by one of the two students in the cohort, while Fig. 1f shows that the remaining students have done this activity. From a control flow perspective, Quiz A and Quiz B were performed as the last activities by the cohort while the remaining students performed these quizzes during the semester.

4 Practical Application

This section presents an application of our approach using an existing LAD called Course Insights, which is equipped with manual drill-down functionalityFootnote 1. We first provide background on Course Insights and its main segments. We then use data from a course that was integrated with Course Insights to: 1) explore the recommended drill-downs generated by LP-AID; 2) visualise the process deviation for an example drill-down, and 3) report on the comments and feedback that was provided by the course coordinator upon reviewing our recommendations.

Course Insights. Course Insights (CI) is a LAD that provides filterable and comparative visualisations of students’ aggregated daily activities. CI aims to provide actionable insights for instructors by linking data from several sources, including a Student Information System, Blackboard [5], edX Edge [32], and embedded learning tools such as Echo360 [13] and Kaltura [22] to create a multi-dimensional educational data set. CI is embedded in the learning management system of The University of Queenslandand is available to all instructors. It is equipped with filtering functionality to enable instructors to drill-down into the data to explore the behaviour of sub-cohorts of students. Figure 2a illustrates the filter interface, which allows users to select attributes from demographic, assessment, engagement and enrolment features. When a filter is applied, statistical data and a graph representing the filtered versus unfiltered distribution of the target feature is presented (as shown in Fig. 2b).

Fig. 2.
figure 2

The Course Insights learning analytics dashboard.

Drill-Down Recommendations in Action. We applied our technique to an introductory calculus and linear algebra course offered in 2019 to 875 undergraduate students from 16 programs. Following our data cleaning process, we were left with a dataset on 739 students. As the input for our approach, the event log includes three types of learning activities: (1) Accessing course materials: access to course materials by chapter. (2) Submission of formative quiz: submitting chapter based practice quizzes. Practice quizzes were formative assessments and thus optional. (3) Review summative assessment solutions: access to chapter based workbook solutions, released weekly. Workbooks were summative assessments, assigned weekly with a weekly requirement to submit their answer-sheets (paper based submissions).

As the features \(F'\), we selected the attributes Brand New, Final Exam, Gender, Program, and Residential Status. A total of 2447 drill-down actions were possible for this data set. Table 1 presents the recommendations generated for this course using respectively small (\(\alpha \) = 0.05), medium (\(\alpha \) = 0.1) and large (\(\alpha \) = 0.03) coverage.

Visualising Sub-cohort Learning Process Deviations. To investigate what insights can be derived from the recommended drill-downs, we used process discovery methods for the identified sub-cohort and the remaining students. Here, we demonstrate the insights derived from the recommended drill-down (1) (shown in Table 1). This drill-down results in a sub-cohort of: Brand new = ‘Yes’ and Residential status = ‘International’ and Final exam = ‘High’ and Gender = ‘Male’. According to the LP-AID result, this sub-cohort’s learning process is 72% different from the remaining students. To investigate the difference between the two learning processes we visualised the underlying process of the sub-cohort (shown in Fig. 3a) and the remaining students (Fig. 3b). Each box in the map is an activity which is labeled by the action type and the relevant chapter (e.g., Formative Quiz|chapter1). To more clearly visually distinguish the three types of learning activities in the process map, we use color coding. In the sub-cohort’s process, the arrows in between the three different types of activities indicate switching between the types of learning tasks. Such switching can be an indication that the three types of tasks were being performed every week before the next chapter’s activities were made available. In contrast, the underlying process of the remaining students shows that each activity type related to chapters 9 to 18 (highlighted in Fig. 3b) are mainly performed sequentially, which is indicative of students performing them at the end of the semester when all tasks were available.

Table 1. Resulting recommendations generated by our approach.
Fig. 3.
figure 3

Learning behaviours of filtered students (by the recommended drill-down) vs. the rest of students.

To further investigate our initial findings, we used Disco’s Events’ graph to compare the distribution of the events over the semester. Figures 3c and d demonstrate that the sub-cohort was more active during the semester compared with the remaining students. Furthermore, the average number of events per student was 36 in the sub-cohort and 25 for the remaining students, which is significantly different (p = 0.0006). To conclude our analysis, the identified sub-cohort had a high rate of activities throughout the semester compared to the remaining students. One of the common features of this cohort was their high performance in the final exam, which might be correlated with their developed learning process. Some other differences perceived by comparing the two process maps are that the Formative Quiz of chapter 8 was not performed by any students of the sub-cohort, Solution Review of chapters 2, 7, 8 and 9 were the highest-rated activities by the sub-cohort, and that Solution Review of chapters 1, 2, 6, 7, 8 and 9 were the highest-rated activities by the remaining students.

Feedback From the Instructor. We presented the reported drill-down recommendations and the process visualisations to the instructor of the course to capture their feedback and comments on the findings. Their feedback can be summarised as follows: (1) While the instructor had access to Course Insights throughout the semester, they rarely used it and generally found it to be overwhelming. They considered the large number of potential drill-down options within the platform as the main reason that made using the platform overwhelming; (2) Findings of behaviour that have led to successful outcome can be used for positive deviance [30] purposes. The instructor indicated they would like to share Fig. 3 as a recommended pattern of successful learning with their students as evidence that consistent engagement with learning activities throughout the semester is related to better outcomes. (3) Providing the ability to receive drill-down recommendations based on a rule (e.g., ‘midterm’ < 50) would be useful. The instructor indicated that they would like to understand deviations in low performing and at-risk students to help them pass the course.

5 Conclusion and Future Work

The OLAP drill-down operation is commonly used in data-driven dashboards to enable users to meaningfully zoom in to explore data in more detail. For LADs, this operation can be used to enable educators to identify a sub-cohort of students who deviate from class norms and who may require special attention. In this paper, we provide an automated method called LP-AID for finding and recommending a set of insightful drill-down actions to guide data exploration. To support understanding of student learning approaches, we take a process mining lens to examine and compare student learning behaviour in terms of their learning activities, the relative frequency of each different learning path and the order in which the activities were performed. It examines all drill-down paths and uses Earth Movers’ Stochastic Conformance Checking to score the ‘insightfullness’ of each path by examining the distance between learning behaviours of two cohorts. Furthermore, we use a solution set ranking function that maximizes diversity to rank and select the drill-down paths for instructors to consider.

We illustrated how LP-AID can be used as part of a LAD to guide the discovery of insightful drill-downs. The learning processes of students based on the recommended drill-downs were visualised and compared, highlighting how the learning process of the identified sub-cohort deviates from the remaining students. Feedback from the instructor of the course suggests that manual drill-downs without guidance can be overwhelming, and that insights gained from the recommendations can be shared with students to encourage change (i.e. application of positive deviance). Future work aims to embed LP-AID in Course Insights and to partner with course instructors through co-creation to investigate (1) the practical implications of our approach and refine it accordingly; (2) the most effective way to present the drill-down recommendations to instructors and (3) the most appropriate visualisation method(s) to present the learning process deviation of sub-cohorts to instructors.