Investigating variation in learning processes in a FutureLearn MOOC

Studies on engagement and learning design in Massive Open Online Courses (MOOCs) have laid the groundwork for understanding how people learn in this relatively new type of informal learning environment. To advance our understanding of how people learn in MOOCs, we investigate the intersection between learning design and the temporal process of engagement in the course. This study investigates the detailed processes of engagement using educational process mining in a FutureLearn science course (N = 2086 learners) and applying an established taxonomy of learning design to classify learning activities. The analyses were performed on three groups of learners categorised based upon their clicking behaviour. The process-mining results show at least one dominant pathway in each of the three groups, though multiple popular additional pathways were identified within each group. All three groups remained interested and engaged in the various learning and assessment activities. The findings from this study suggest that in the analysis of voluminous MOOC data there is value in first clustering learners and then investigating detailed progressions within each cluster that take the order and type of learning activities into account. The approach is promising because it provides insight into variation in behavioural sequences based on learners’ intentions for earning a course certificate. These insights can inform the targeting of analytics-based interventions to support learners and inform MOOC designers about adapting learning activities to different groups of learners based on their goals.


Introduction
Open-access learning environments such as Massive Open Online Courses (MOOCs) attract people with a wide range of interests and learning objectives, which is reflected in the degree and nature of engagement with the learning content (Milligan and Littlejohn 2017;Kizilcec and Schneider 2015). However, participation levels and assessment outcomes alone do not constitute robust evidence of learning or academic success writ large (Henderikx et al. 2017;Joksimović et al. 2017). While early research on MOOCs focused on understanding completion rates and final course grades, more recent work has examined how learners are moving through the course content as a way of understanding the learning process itself.
Regardless of whether a learner completes a MOOC, academic success or failure may be partly hidden in their journey through the learning activities in the course (Rizvi et al. 2018(Rizvi et al. , 2019. Given the processual nature of learning, we can investigate learning by measuring detailed interactions with learning activities, such as videos, assessments, and interpersonal exchanges, and analysing learners' progression through these activities (Davis et al. 2016;Maldonado-Mahauad et al. 2018). Unlike face-to-face or blended learning environments, online courses are instrumented such that learner interactions are recorded in voluminous system logs, offering an unprecedented granularity for studying learning at scale. Educational research on log-based behavioural modelling in Intelligent Tutoring System (ITS) and Learning Management Systems (LMS) has found that log-based analyses can provide deep insights into how learners engage and interact with different learning activities (Bogarín et al. 2018;Sonnenberg and Bannert 2015). Yet despite increasing efforts to advance learning science research with log-based analyses in formal and blended learning environments, more research is needed to advance our understanding of learning processes in online learning environments (Bogarín et al. 2018;Juhaňák et al. 2017).
To advance an understanding of learner behaviour in MOOCs, studies have used clustering techniques to identify learner subpopulations based upon their overall resource-engagement behaviour (Li and Baker 2018;Ferguson and Clow 2015;Kizilcec et al. 2013), and more recently sequence-mining techniques to identify common engagement sequences that may reflect learning processes (Davis et al. 2018;Guo and Reinecke 2014). In order to understand learning processes in MOOCs, findings from these studies suggest that it helps to first group learners based on their general behavioural profile to reduce variance due to different enrolment intentions, and then to examine fine-grained interaction processes with the learning activities.
While these sequence-mining techniques have provided important insights in how different groups of learners engage in MOOCs, some researchers have argued that these approaches need to be embedded in strong learning science principles (Mangaroska and Giannakos 2018;Winne 2017). Indeed, the design of the online learning environment is known to influence learners' progression in different types of learning activities (Nguyen et al. 2018; Rienties and Toetenel 1 3 2016). Success in online learning has been found to be closely linked to learning design, which is defined as the process of designing pedagogically informed learning activities to support learners while remaining aligned with the curriculum (Conole 2012). Yet research on the pedagogical learning design of MOOCs is at an early stage (Davis et al. 2016;Sergis et al. 2017). We adopt learning design as a lens for investigating learners' interaction processes with the goal of finding empirical support for actionable recommendations to course designers and policymakers who have control over the learning design.
The present research reports on our implementation and evaluation of this approach by combining both sequence-mining techniques with learning design approaches to better understand how and why groups of learners engage in a science MOOC over time. In particular, our current implementation extends prior work that has identified three primary clusters of engagement in courses offered on the Future-Learn platform (Rizvi et al. 2018). The clustering was based on the degree to which learners marked activities in the course as completed: "Markers" are learners who marked all their activities as completed; "Partial-Markers" are those who marked only a few activities, and "Non-Markers" marked none of their activities as completed. For each of these groups, we investigate detailed processes of engagement with the learning activities according to an established taxonomy of course activities in the learning design. The findings of this study can inform approaches to adapting course content and learning activities in particular to different groups of learners based on their learning goals.

Literature review
The intrinsic features of MOOCs make them accessible to diverse populations of learners. This allows for a spectrum of learning approaches and contexts, including a variety of languages, cultural settings, pedagogical strategies, and technologies (Jansen and Schuwer 2015;Morgado et al. 2014). In comparison to other online learning environments, MOOC learning environments are not only "open" but often require learners to be highly self-directed and self-regulated (Maldonado-Mahauad et al. 2018). For MOOC design and development, a variety in content types have been recommended, moving away from the predominantly video-based courses (Jansen and Schuwer 2015). The essential features of MOOCs facilitate learners with a mediated experience: i.e., fewer constraints for time, distance, prerequisites or technological barriers (Sparke 2017;Kizilcec et al. 2017). This "structured-informality" makes MOOCs unique, and different from formal residential learning, even from traditional distance or online learning, and opens doors to large-scale adoption. Our current study is an attempt to understand how the learning design of MOOCs might impact the way learners engage and progress in the course.

3
Investigating variation in learning processes in a FutureLearn…

Learning design
In his seminal work, Mayer (2005) wrote that learning comprised of the active processes of filtering, selecting, organising, and integrating new information. At present, MOOCs developers like FutureLearn, Coursera, and edX seem to optimise the design of MOOCs to increase study success (i.e. completion rates), and to lessen the so-called cognitive load for learners by adjusting topic difficulty and information or task presentation, the robustness of acquired knowledge. By making the acquisition of textual, visual or auditory information natural and easy for learners, MOOC providers aim not only to attract but also retain more learners (Sergis et al. 2017;Rai and Chunrao 2016;Margaryan et al. 2015). Additionally, it is common that learners distribute their time to different learning activities to get the maximum (subjective) benefit within a limited time frame (Maldonado-Mahauad et al. 2018;Wigfield and Eccles 2000). Therefore, the structural constructs (i.e., learning activities) of MOOCs need to be in alignment with respective learning objectives. Thus, the temporal dynamics of designed learning activities are of special interest to researchers and MOOC developers.
Learning design (LD) can be defined as the process of designing pedagogically informed learning activities to support learners while remaining aligned with the curriculum. In a MOOC, LD can provide a consistent way to map individual learning activities. This study has theoretical groundings in the conceptual framework for Learning Design recommended by The OU Learning Design Initiative (OULDI) project (Cross et al. 2012). This conceptual framework provides a foundation for the MOOC designs at FutureLearn platform (Sharples 2015), which is the primary source of MOOC data in this research.
The formal taxonomy for OULDI, shown in Table 1, was developed by Conole (2012). LD has been described as reusable, adaptable description or template which aims to "make the structures of intended teaching and learningthe pedagogy-more visible and explicit thereby promoting understanding and reflection" (Cross and Conole 2009). Reusability, adaptability, and abstraction of the overall course structure are few of the strengths of OULDI. This proposed taxonomy provides a way to abstract different learning activities in a meaningful way. It suggests that all learning tasks can be categorised as one of seven activity types.
In formal online learning contexts the impact of LDs on learners' behaviour, satisfaction, and learning outcomes has been widely acknowledged (Rienties and Toetenel 2016). Likewise, Nguyen et al. (2017) found preliminary support of the impact of LD on learners' online engagement, whereby "LD could explain up to 60% of the variance of the time spent on VLE platform". However, most of the research on LD and learning focused on measures of learning that are not processual (Mangaroska and Giannakos 2018). For example, the impact of LDs on learning outcomes or overall engagement has been analysed by a study of Rienties and Toetenel (2016), but without taking consideration of processual nature of learning. In other words, the OULDI framework has been empirically tested in large-scale studies Nguyen 2017), but not in informal Investigating variation in learning processes in a FutureLearn… learning settings and FutureLearn MOOCs in particular. In the current study, we employ OULDI to investigate the cognitive and pedagogical features of a Future-Learn MOOC in relation to learners' engagement and learning progression.

MOOC event logs and learning processes
Learning in MOOC environments produces large volumes of data, irrespective of how a MOOC has been designed. These data are produced from multiple sources, in a variety of formats, and with different levels of granularity (Romero and Ventura 2013). Within MOOCs, "trace data" or "clickstream data" are typically captured at a very fine-grained level. This participation log data presumably can be considered as a set of silent, passive observations. The volume of data increases immensely as we move from general course-related details to learner-related information. The data size increases even more if we go deeper into each learner's progress, from their learning sessions to individual learning activities accessed within those sessions ( Fig. 1). Stored log data have no inherent meaning per se, as clicking data does not necessarily mean behavioural engagement, let along cognitive processing or learning (Winne 2017). Indeed, Selwyn (2015) argued that the focus on these clicking data could lead to "dataveillance", and perhaps more importantly to a reductionist nature of data-based representation of diverse learners. Nonetheless, a substantial body of literature is emerging that suggests these clicking data streams, if used sensitively and sensibly, could provide important insights into how some groups of learners are engaging in MOOCs, while others might not be. Still, to date, only a small fraction of that data have been explored in extensive, systematic MOOC research (Bogarín et al. 2018;Winne 2017;Joksimović et al. 2017). In other words, there is still a paucity in systematic research exploring what aspects of these data are relevant and helpful in understanding learning processes (Winne 2017;Sparke 2017).
Learning can be assessed in a variety of ways, ranging from the learning outcomes like grades and certifications Wang et al. 2015;Wen and Rosé 2014), to conceptualising learning as a process (Bogarín et al. 2018;Maldonado-Mahauad et al. 2018). While assuming learning as a process, several studies have recently explored log data to understand learners' progress, or processual  Romero and Ventura (2013) learning, in different MOOC activities (Davis et al. 2016;Guo and Reinecke 2014;Kizilcec et al. 2013). For instance, to understand learners progression in Coursera MOOCs, (Kizilcec et al. 2013) used engagement patterns to categorise learners into four categories: completing (completed majority of the assessments), auditing (watched most of the videos but completed assessments infrequently), disengaging (completed assessments at the start of the MOOC, then gradually disengaged). Ferguson and Clow (2015) replicated this method in the context of FutureLearn MOOCs, whereby FutureLearn allows learners to specifically mark activities as 'complete'. Ferguson and Clow (2015) suggested that marking few or all activities as 'completed' signified a certain level of activity-engagement or learning commitment. Also, such clicking behaviour indicated a strategic way of getting a certificate.
Similarly, in a large-scale study of four edX MOOCs (Guo and Reinecke 2014, p. 6) found that participants exhibited a pattern of 'non-linear navigation through the course materials'. In particular, it was reported that so-called "certificate-earners" remained inclined towards the application of non-linear navigation strategies, whereby "certificate earners repeated visiting prior sequences three times as often, presumably to review older content." (Guo and Reinecke 2014, p. 6). Hence, this research suggested distinct navigational strategies, and that clicking (or not clicking) activities as "completed" represented two distinct psychological dispositions: one when a learner might be inclined to attain a certificate; and the other when learner showed no intention to get a certificate, yet, continued to learn.
Along the same lines, several authors (Davis et al. 2016; Guo and Reinecke 2014; Wen and Rosé 2014) have inspected MOOC learning sequences (or learning processes) in connection to assessment results, inclination towards certification, learning strategies or habits. For example, Wen and Rosé (2014) quarried transitions between two activities and linked the findings with behavioural patterns. A relatively similar approach of using two-step transition to map navigational strategies was used in the work of Guo and Reinecke (2014). Both studies found that generally learners progressed linearly, but certificate earners were more inclined to follow unstructured paths. Recently, a slightly different method was used by Davis et al. (2016), who studied MOOC learners' motivations, like binge (video) watching or 'quiz checking' (i.e., checking the quiz answers without attempting the quiz first). To capture the complexities of such motivations, the authors used eight-step long subsets of overall learning sequences. Their findings suggested that learners' progression through activities and the frequency with which they accessed various learning activities should be seen in the context of their inclination towards certification.
Given that our study is situated in the FutureLearn environment, it is noteworthy that FutureLearn's policy on "certificate of participation" allows for a non-linear navigation through the activities. In most courses, a learner must mark at least 50% of the course steps as complete and attempt every test question to get a certificate of participation. An initial analysis (Rizvi et al. 2018) of log data used in current study pointed towards three distinct clicking patterns, potentially representing three unique dispositions: Markers (i.e., those who marked all their activities as completed); Partial-Markers (i.e., those who marked few of the activities they assessed), and Non-Marker (i.e., those who never marked any of their activities as completed). This learners' grouping is unique and so is the MOOC designs offered via FutureLearn platform. Nonetheless, this categorisation is informed by similar categorisation stated in previous MOOC literature (Kizilcec et al. 2013;Ferguson and Clow 2015).
Apart from understanding the similar or dissimilar learning processes or sequences in MOOCs, another important aspect worth exploring is the relative frequency of access for each activity type. One way to recognise learners' interests in different learning activities is to analyse the relative frequency of access that also signifies typical learners' experiences within the respective activities Liu et al. 2016). In particular, it represents general experience when estimated for an entire cohort. Therefore, this study builds upon the existing literature (Rizvi et al. 2018;Davis et al. 2017;Liu et al. 2016) and aims to explore the linkage (if any) between activity types in a MOOC LD, learners' interests (i.e., expressed through relative frequency of access), and processual learning (i.e., learners' progress in time). In current study, we have investigated and compared the most dominant progression and activity access frequencies within aforementioned three groups of learners.

Research questions
Drawing upon the previous research of understanding learner engagement and progressions through structured learning activities, this study implements and evaluates a two-step approach to understanding learning processes in the context of one FutureLearn science MOOC. We aim to compare three groups of learners that have been identified in prior research (Rizvi et al. 2018), Markers, Partial Markers, and Non-Markers, whose general behaviour signals distinct inclinations towards certification. The goal of this study is to uncover similarities and differences in the learning paths of these three groups with respect to the learning design of the course. We therefore pose the following research questions:

RQ1
How and to what extent does engagement with different elements of the learning design differ between these three groups of learners?

RQ2
How and to what extent do temporal learning paths (i.e., sequences of learning activities) differ between these three groups of learners?

RQ3
How and to what extent can subgroups of learners be identified within each of these three groups, based on the similarity of sequence of learning activities?

Context and data
FutureLearn is the largest MOOC provider in Europe and 4th largest in the world in terms of number of enrolled learners (Shah 2016). Compared to other large MOOC providers, FutureLearn follows a social-constructivist pedagogical style by 1 3 promoting 'learning through conversations' (Ferguson and Clow 2015). The course structure comprises a variety of activities: articles, discussion, peer review, quizzes, tests, videos, audio recordings and exercises. Using the theoretical framework for LD discussed in "MOOC event logs and learning processes" section, the majority of FutureLearn courses have a balance of assimilative, communication, adaptive, and assessment activities. The MOOC structure comprised two types of assimilative activities (Video, Article), two types of assessment activities (Test, Quiz) and one communication activity (Discussion). All step categories were available to learners for free, except Test. The assessment activity Test was only available to 'upgraded' learners, i.e., learners who had upgraded a MOOC after paying a certain fee, potentially to obtain unlimited access and a certificate. Unlike Quiz activity, which allowed unlimited attempts, Tests had a maximum of three attempts. Learners' Test scores were then reported on progress page and certificate transcript.
Data for this study were collected in a science MOOC developed by the Open University, which was offered in year 2017 on the FutureLearn platform. The course enrolled a total of 2086 learners and contained 68 learning activities, offered over a span of 4 weeks. Based on how many activities learners have marked as complete in the course, in line with Rizvi et al. (2018) we grouped the study sample into 449 Markers, 832 Partial-Markers, and 805 Non-Markers. For the purpose of our analysis, we extracted the following information from the log files: anonymised learners ID, week number, learning activity-type, learning activity, and timestamps. After the data were collected, we employed the OULDI framework to map the specific activities to general learning design features. Prior to commencing the study, ethical clearance was sought from Human Research Ethics Committee (HREC) at the Open University (OU).

Data analysis
In order to understand learners' progression, as highlighted in "MOOC event logs and learning processes" section researchers have been using several methods to analyse massive clickstream data extracted from the MOOCs. Educational Data Mining (EDM) methods usually treat these MOOC learning environments as a blackbox (Slater et al. 2017;Baker and Inventado 2014;Papamitsiou and Economides 2014). Traditional EDM plays with sophisticated, hidden patterns that are typically input/output-centric, and not process-centric (Bogarín et al. 2018;Slater et al. 2017). Therefore, in order to obtain a potentially better understanding of learners' temporal (time-based) behavioural patterns necessitates constructing learners' navigational patterns (or navigational events) throughout the learning activities.
In this context, several advanced methods are increasingly being used by other researchers. These advanced methods include, but are not limited to, Natural Language Processing (NLP), Sequential Pattern Mining, or associated Stochastic/ Probabilistic predictive methods, such as Hidden Markov Models, and/or illustrative methods, such as Graph Mining or Social Network Analysis (SNA) (Geigle and Zhai 2017;Rizvi and Ghani 2016;Robinson et al. 2016;Wen and Rosé 2014). Sequential Pattern Mining and related methods are suitable for finding partial, subsequent sets of learning events. Similarly, these methods along with SNA provide Investigating variation in learning processes in a FutureLearn… illustrative results of learning engagement, and are particularly suited to find local processes, short sequences, and subgraphs of interest. Nonetheless, such methods may not be appropriate to understand end-to-end transitions, or other temporal dynamics of learning trajectories within a MOOC. Another main disadvantage of using such methods is a lack of comprehensive understanding of end-to-end learning paths followed by large subgroups of learners (Bogarín et al. 2018;Bannert et al. 2014). Therefore, to develop learners' temporal navigational patterns, this study used methods typically associated with Educational Process Mining (EPM).
Process Mining is a set of emerging techniques aimed at extracting process-related knowledge from the events logs. EPM is an application of Process Mining techniques in the educational domain (Bogarín et al. 2018). Apart from drawing the end-to-end learning processes, EPM methods also assist in the comparison of executed processes with normative/intended models (referred to as conformance checking). In Process Mining, the term Variant refers to a simplistic view of end-to-end sequence of activities, followed by significant number of cases. Figure 2 clarifies the concept of this term. Our current study focuses on the estimation and comparison of activity access frequency, and temporal learning pathways of dominant subgroups of learners in all three groups. Each of the three groups demonstrated a relatively unique learning process, and all learners from a respective subgroup tended to follow a particular learning pathway in a MOOC. For the construction of process maps, Discovery software was used, whereby we used an extended and improved version of Fuzzy Miner algorithm (Günther and Van Der Aalst 2007), which creates elaborative, uncomplicated process maps and can easily identify infrequent subgroups. To improve the statistical soundness of our arguments and to see if the subgroups from these three groups were actually different, we used Chi square method.

Results
In the exploratory phase of our analysis, we found three distinct clicking patterns that led us to the learners' categorisation we used in this study; we identified three groups; Markers, Partial-Markers and Non-Markers. The categorisation appeared to be unique within the relevant FutureLearn context, although this categorisation is partially derived from, and partly based upon, similar categorisation used in previous MOOC engagement literature (Davis et al. 2016;Ferguson and Clow 2015;Guo and Reinecke 2014). As can be seen in Fig. 3, the group of Markers remained far more active throughout the MOOC than Partial and Non-Markers in terms of hourly activity. This was particularly noticeable during the first half of the course, whereas overall activity levels diminished with time for all learners afterwards.

Investigating variation in learning processes in a FutureLearn…
In week 1, Markers largely accessed some articles (accessed 3876 time), closely followed by discussion (1135), video (804) and quiz (365). However, typically they spent most time watching video (median up to 8 min 6 s) and spent least time on reading an article (median up to 2 min 48 s). Partial-Markers followed the same pattern. In contrast, Non-Markers preferred watching videos (50% of their overall activities in week 1), followed by article (40.1%), discussion (6.98%), and quiz (2.05%) respectively, but without marking any of the activity as completed in week 1. In week 2, all three groups remained mostly interested in articles. Although discussion was found to be second most frequent activity, learners started to spend less time overall in participating in a discussion (just more than 1 min in case of Markers). In week 3 and 4, Partial-and Non-Markers gradually withdrew from discussions, however they continued to read articles and viewed videos as before. While Markers remained mildly interested in participating in discussion, still typically spending less than 2 min on a discussion activity in last 2 weeks.

RQ1
Variation in engagement with elements of the learning design.
In order to analyse variation in learning behaviour across the three groups, and in line with the prior work of Rizvi et al. (2018), Davis et al. (2017) and Liu et al. (2016), we utilised relative frequency of access for each activity type in relation to the activity distribution in the MOOC. As discussed in "Literature review" section, the relative access frequency can be representative of learners' interests, or a wish to engage with a particular activity type. Furthermore, relative frequency of access also represents (part of the) general experience of the entire cohort. Figure 4 illustrates the distribution of engagement with course activities for the three groups (raw frequencies are provided in "Appendix" Table 2). We found that while Markers and Partial-Markers engagement in assimilative and communication activities is equivalent, Markers are more engaged in assessment activities than Partial-Markers. In contrast, Non-Markers were most engaged with a specific assimilative activity, video watching, but less engaged in other assimilative and communication activities: reading articles and participating in discussion. Non-Markers were also notably less engaged in assessment activities compared to Markers and Partial-Markers. This may be attributed to Non-Markers' lack of interest in active participation or certification attainment.

RQ2 Variation in temporal learning paths.
In order to address RQ2 and RQ3 we mapped the learning paths based on the clickstream data and identified main subgroups within each group. Omitting the self-loop (i.e. repetition) provided more clarity to the process maps. For example, Fig. 5 shows a simplified view of the learning process model for Markers, filtering out some less frequently occurring pathways. Activity access frequency is also denoted alongside each path.
A closer inspection of end-to-end learning pathways confirmed that although a main pathway existed (dark, thick lines on the map), a large number of Markers preferred non-linear, highly unstructured pathways through the course content. For example, Fig. 5 shows 22 Markers skipping an assimilative activity (Article: Activity 1.6) to participate in the subsequent activity (Activity 1.7) which was discussion-based. This non-linear progression was consistently noticed in all three groups but, counter intuitively, persisted mainly in Markers.

RQ3 Subgroups identification.
We compared the 15 most common subgroups identified within each of the three primary groups (data available in "Appendix" Table 3). These 15 subgroups account for different amounts of the overall variance in each group: 68.6% for Markers, 46.5% for Partial-Markers, and 89.8% for Non-Markers. This distribution shows that there was more variance in the learning processes among Partial-Markers than the other two groups of learners, because their overall behaviour was captured less accurately by a small number of subgroups. For each subgroup, we computed the number of activities contained in the learning process. We found that a third of Markers (31.4%) followed a long learning process containing 67 distinct activities. In contrast, two thirds of Non-Markers (67.7%) followed a learning process that only contained one activity before they dropped out of the course. In keeping with this pattern, we found that among the top 15 subgroups, Markers tended to have longer learning processes (6 out of 15 with 50 or more activities), Non-Markers had only short learning processes (11 out of 15 with 5 or fewer activities), and Partial-Markers exhibited a mixture of shorter and longer learning processes (2 out of 15 with 50 or more activities; 4 out of 15 with 5 or fewer activities).
To test the robustness of the observed pattern of variation, we performed a set of χ 2 tests of independence. The results indicated that there was a significant association between type of learning activity and whether learner was a Marker, Partial-Marker or Non-Marker (χ 2 = 1279, df = 8, p < 0.001). We also confirmed that the lengths of the learning processes were significantly different across the three groups (χ 2 = 523, df = 28, p < 0.001).

Discussion and conclusion
The purpose of this exploratory study was to determine the nature and extent of differences in participatory behaviour and temporal learning paths of MOOC learners, in the light of learning activity type attributed from an established learning design model. Another aim of this investigation was to understand the common pathways followed by a substantially large subgroup of learners, referred to as variants in process mining. We found the progression trend for individual groups remained aligned with our previous work (Rizvi et al. 2018) and with other MOOC literature (Kizilcec et al. 2013;Ferguson and Clow 2015). Our current study employed an established learning design taxonomy to investigate the detailed processes of engagement over time. This study extends our prior work that has identified three primary clusters of Notwithstanding the distinct patterns of engagement with different type of activities, the results remained very similar to previous studies in formal online learning setting  showing an overall liking of assimilative activities in general and video-based assimilative activities in particular. Taken together, these results provide insights into learners' temporal progression or pathways in the MOOC. Our overall findings are aligned with the previous research in MOOC learning environment (Ferguson and Clow 2015). While we noticed that top subgroups in all groups left the MOOC right after accessing an assimilative activity (either video or article), and very rarely after accessing an assessment activity or participating in a discussion.
The findings also suggest that academics and course designers should give more thought into designing communication or assessment activities for MOOC learning environment, in order make to such activities more appealing to an informal learner. The findings from this study suggested that Markers and Partial-Markers access frequencies for all activity types were found to be either aligned with the MOOC distribution or else exceeded expectations. Non-Markers demonstrated huge early drop-outs, however if they continued they remained substantially interested in assimilative activities of video watching. This result points that in general, Non-Markers remained interested in video-based content, and not in the textual content per se (whether assimilative or communicative).
We found substantially large number of learners, from all groups, dropping out after participating in one of the assimilative activities. Since the activity engagement behaviour differed in all three groups of learners, we can deduce that that if analyses were done without categorising the learners, the results would have remained strongly biased towards majority class (Partial-Markers in this case). This suggests that while investigating the temporal and engagement behaviour of learners, it is necessary to first categorise the learners into natural groups.
The study contributes to the field by interrogating the behaviour of learners, while considering different categories that go beyond simply looking at those who completed a substantial fraction of the course, or those who dropped out. This leaves a door open to further research on learners' experiences. i.e. while navigating the course, how are they making these decisions to engage more with one or the other type of activity. As mentioned elsewhere, success in MOOCs is relative, still, without a deep knowledge of learners' navigation through the system, it would remain hard to distinguish between good decisions and bad decisions.
The findings from this study can be beneficial for practice in MOOC learning design and are suggestive of the fact that analyses of voluminous data being captured and stored in MOOC clickstream logs, require innovative methods, such as process mining and variant mapping. Such methods intrinsically support exploration of learners' behaviour hidden in voluminous data. Despite its exploratory nature, current study lays the ground work for our future research into behavioural modelling and mapping within MOOC learning environment. In future, more contextual information or demographic data would help us to establish a greater degree of accuracy on this matter.

Investigating variation in learning processes in a FutureLearn…
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creat iveco mmons .org/licen ses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Appendix
See Tables 2 and 3.