1 Introduction

Massive open online courses (MOOCs) have become increasingly popular in education since 2008 as an open-source learning environment to help learners around the world enhance their knowledge and skills. MOOCs have already had a considerable impact on educational research. The large number of learners has provided opportunities for large-scale educational interventions and has introduced new research questions and methods, and techniques to educational research. For example, educators and researchers have recently started paying more attention to the challenges of MOOCs, such as the high dropout rate, low interaction, assessment, and personalization in MOOCs (Zhu et al., 2019, 2018). Using learning analytics (LA) in MOOC research and practice can help researchers and educators understand and address these issues.

Since 2011, LA studies have become increasingly popular in education, psychology, computer science, and data science (Hui & Kwok, 2019). LA is commonly used for studies aiming to understand and support learners’ learning behavior based on large data. Therefore, research on MOOCs and LA is interconnected. Considerable data in MOOCs can be processed using diverse LA technology and techniques. For example, LA can use different levels of MOOC data (e.g., course level, program level, platform level) to improve teaching and learning (Drachsler & Kalz, 2016; Lemay & Doleck, 2020; Mubarak et al., 2021) and instructional design (Shukor & Abdullah, 2019). However, despite the growing popularity of LA studies in MOOCs, the understanding of the publication outlets, research purposes and methods, primary stakeholders, researchers’ institution locations, subject areas, and research context of MOOC LA research is limited. The limited understanding could hinder future research on MOOC LA studies. Therefore, the present systematic review fills this gap to guide future research. We first present the background of MOOCs and LA in a comprehensive literature review, followed by the method section describing the various systematic review approaches, the databases searched, keywords used in the literature search, and selection criteria.

2 Literature review

2.1 MOOC systematic review

A wide range of free and open education courses are available around the world and have received considerable attention as the number of MOOCs continues to grow. In 2020, 16,300 courses were offered by 950 universities, with enrollments of more than 180 million learners globally (excluding data from China) (Shah, 2020). These MOOCs have been offered by numerous providers, including Udacity, Coursera, FutureLearn, and edX (Taneja & Goel, 2014; Sari et al., 2020). MOOCs were introduced in 2007 and have continued to expand globally, evolving into various forms, including cMOOCs, xMOOCs, and blended MOOCs (Creed-Dikeogu & Clark, 2013; Kop, 2011; Mota & Scott, 2014; Zhu, 2021).

With the increasing popularity of MOOCs, a substantial number of literature reviews of MOOC research have been conducted. The authors of the present study (2018; 2020) conducted a systematic review of MOOC empirical research focusing on a general analysis, including the topics, methods, publication outlets, and geographical distributions of authors and courses. Similar systematic review studies were also conducted by Joksimović et al. (2018), Calonge and Shah (2016), and Lambert (2020). Moreover, several systematic review studies have also targeted specific subtopics or foci, such as the languages used in MOOCs (Hidalgo & Abril, 2020; Sallam et al., 2020; Zainuddin et al., 2019), learner engagement (Guajardo Leal et al., 2019; Paton et al., 2018; Rincón-Flores et al., 2019), assessment methods (Alturkistani et al., 2020; Tenório et al., 2016; Wei et al., 2020), self-regulated learning (Alonso-Mencía et al., 2020; Lee et al., 2019; Min & Nasir, 2020; Wong et al., 2019), dropout and completion rates (Bezerra & Silva, 2017; Dalipi et al., 2018), MOOCs in higher education (Albelbisi et al., 2018; Al-Rahmi et al., 2019; Calonge & Shah, 2016), social aspects of MOOCs (Lambert, 2020; Rolfe, 2015; Van de Oudeweetering & Agirdag, 2018), and MOOCs in the Asia Pacific region (Albelbisi & Yusop, 2020; Li et al., 2017). Based on these MOOC research reviews, it is clear that the number of MOOC studies has persistently expanded (Martin et al., 2020; Zhu et al., 2018).

While some scholars have doubted the viability of the future of MOOCs, others have predicted continued growth (Yahoo, 2021). Furthermore, these experts have predicted that MOOCs will “grow by $ 16.01 billion from 2021–2025, progressing at a CAGR of 32% during the forecast period” (Yahoo, 2021). Such optimism is supported by the continuous growth of MOOC enrollment numbers over the past decade. With continued growth, more diverse MOOC research and discussion will be needed to further address concerns such as long-term and short-term progress, the dropout rate, self-directed learning, learner engagement, and course quality (Jona & Naidu, 2014). The current, up-to-date systematic review of MOOC empirical studies aims to serve this purpose.

2.2 Learning analytics overview

The current era of big data has allowed a vast amount of data to be captured and stored digitally (Siemens & Long, 2011), increasing “the volume, variety, velocity and veracity of student data” (Prinsloo & Slade, 2017, p. 8). Using LA techniques to analyze the large data can produce results that help complement and improve the performance of institutions and organizations and provide personalized and learner-centered education (Asamoah et al., 2017; Jantti & Heath, 2016).

Although the definition of LA varies, a commonly used definition by the Society for Learning Analytics Research (SoLAR) defines it as “the measurement, collection, analysis, and reporting of data about learners and their contexts, for purposes of understanding and optimizing learning and the environments in which it occurs” (Siemens & Long, 2011, p.33). LA involves information retrieval, machine learning techniques, statistics, and data visualization (Siemens, 2012). It is also an interdisciplinary area related to the fields of computer science, data science, statistics, behavior science, educational psychology, and instructional technology to enhance education (Tzimas & Demetriadis, 2021).

Initially, LA was primarily used to analyze trace data to capture and describe learner behaviors in online learning (Veenman, 2013). Gradually, it became more broadly applied, including predicting what learners are more likely to drop out of the courses (Sclater et al., 2016). Researchers have classified LA approaches into different categories, such as descriptive analytics, predictive analytics, and prescriptive analytics (Berland et al., 2014). Descriptive analytics reports the process or status of student learning using data sources, including surveys, assessment results, learning management activities, and learner demographic information. Descriptive LA helps educators understand learning behaviors (Wong & Chong, 2018). Predictive analytics aims to anticipate future learner behaviors and learning success. Prescriptive analytics primarily uses algorithms to not only predict learner success but also recommend instructional interventions based on the data (Baker & Siemens, 2015). For example, it can provide helpful suggestions to instructors, learners, and policymakers (Hwang et al., 2014). Currently, LA in higher education is mainly used to improve student learning, understand the motivation to decrease the dropout rate (Colvin et al., 2015; Glick et al., 2019), provide adaptive learning environments to enhance students’ learning trajectory (Ifenthaler & Schumacher, 2016, Ifenthaler et al., 2019), and provide information for institutions to allocate resources and funds for learners’ success (Leitner et al., 2017).

2.3 Systematic review on learning analytics

There has been a recent rapid increase in the number of systematic reviews of LA in education. One of the earliest systematic reviews was by Papamitsiou and Economides (2014), who analyzed 40 LA empirical studies published between 2008 and 2013 to capture the history chronologically. During the past several years, the systematic reviews on LA substantially increased. These systematic reviews included diverse foci, such as serious games (Alonso-Fernandez et al., 2019), machine learning (Korkmaz & Correia, 2019), LA interventions (Knobbout & Van Der Stappen, 2020; Larrabee Sønderlund et al., 2019), visual LA (including LA dashboard) (Schwendimann et al., 2016; Vieira et al., 2018), visual LA and self-regulated learning (Matcha et al., 2019), LA and learning outcomes or performance (Blumenstein, 2020; Costa et al., 2020; Foster & Francis, 2020), data sources used in LA (Samuelsen et al., 2019), LA and instructional design (Mangaroska & Giannakos, 2018), ethic issues in LA (Tzimas & Demetriadis, 2021), flipped classrooms (Algayres & Triantafyllou, 2020), mobile learning settings (Pishtari et al., 2020), and LA at different educational levels, such as for young children (Crescenzi-Lanna, 2020) and higher education (Avella et al., 2016; Ifenthaler & Yau, 2020), and LA in vocational education (Gedrimiene et al., 2020).

For example, Foster and Francis (2020) reviewed 34 articles published between 2007 and 2018 on the effectiveness of using LA to improve learning outcomes in higher education. They reported that 75% of the studies reported that LA was effective in increasing learners’ learning outcomes. However, the authors suggested that the relationship between LA and learning outcomes needed further investigation. Recently, the ethics of using LA has also gained attention from researchers. Tzimas and Demetriadis (2021) reviewed 53 articles published (2011–2018) on LA ethics. They found that there are inadequate empirical guidelines on LA guidelines and suggested establishing policies or codes to monitor and evaluate LA practice ethics.

Although numerous systematic reviews on LA have been conducted, studies of a systematic review related to the use of LA in MOOCs are lacking. The aim of the present study was to systematically review the empirical research literature related to LA and MOOCs to identify the current status and trends in this field. The following research questions guided this study:

  1. (1)

    What are the publication outlets of MOOC LA research published in the past eleven years?

  2. (2)

    What are the research purposes and methods utilized in the MOOC LA research published in the past eleven years?

  3. (3)

    Who are the primary stakeholders of MOOC LA studies published in the past eleven years?

  4. (4)

    Where are the employers/institutions of MOOC researchers located, and what are the subject areas of the researchers of these MOOC LA studies published in the past eleven years?

  5. (5)

    Which countries and subjects were studied most in MOOC research in the past eleven years?

3 Methods

3.1 Data collection

The present study followed the eight-step process proposed by Okoli and Schabram (2010). To conduct a rigorous systematic review, the following eight steps were followed (Okoli, 2015): (1) identify the review purpose; (2) create protocol and conduct training; (3) define screen criteria; (4) literature search; (5) extract information; (6) evaluate quality; (7) synthesize data; and (8) report the review results (see Fig. 1).

Fig. 1
figure 1

The systematic review process

The flowchart in Fig. 1 presents the process of conducting the systematic review. Following the purpose and research questions stated earlier, the research team created a research protocol and detailed the procedures for conducting the review. To increase the validity of the study, a training session regarding the database and reviewing techniques was held with the reviewers. The selection criteria used in the review process are as follows:

  • Studies were related to LA and MOOCs;

  • Studies were published from 2011 to the end of April 2021 because the first learning analytic conference was held in 2011 when this field garnered researchers’ attention (Khalil & Ebner, 2016), and MOOCs started to gain popularity in 2011 (Zhu, 2021);

  • Studies were published in English;

  • Studies were empirical studies from peer-reviewed journal articles or the proceedings of the International Learning Analytics and Knowledge (LAK) conferences. These conference proceedings were chosen because the conferences serve as major forums for LA research and employ a rigorous selection review. Peer-reviewed journals were chosen for their higher standards of research objectivity and credibility (Utah State University Library, 2020) compared to non-peer-reviewed book chapters, blogs, and magazines, which were excluded.

The following steps were used for the literature search.

  1. (1)

    The first two authors did the initial search in various journal databases. One researcher searched articles in Scopus, Science Direct, Web of Science, and ERIC (EBSCO) databases. The second researcher searched Wiley Interscience journals, Sage journals online, DBLP (computer science bibliography) database, and the Proceedings of the Conference on Learning Analytics and Knowledge and Learning Analytics Journal.

  2. (2)

    A combination of the search keywords “learning analytics and MOOCs” and “learning analytics and massive open online courses” were used to screen abstracts and titles of research articles on MOOC LA.

  3. (3)

    The initial results yielded 512 articles. To increase the validity, two researchers cross-checked the data and reached a consensus. After reviewing duplicates, 214 articles were deleted, resulting in 298 remaining articles.

  4. (4)

    Each researcher screened the full text of the 298 articles using the criteria, evaluated the articles, and crossed checked the other researchers’ results. A total of 166 articles met the review criteria with an overall inter-rater agreement of 95.2%.

Guided by the five research questions, the following information was extracted and recorded from each article: publication year, research purposes, general research approaches, data sources, data analysis techniques, stakeholders, author locations, author subject areas, MOOC locations, and MOOC subject areas. The names of the journals and the SCImago index were also recorded. The final stage of the systematic review included reporting and disseminating the findings of the papers, including the findings, discussions, limitations, implications, and conclusions.

3.2 Data analysis

To answer Research Question (RQ) #1, we counted the number of MOOC LA publications from each publication outlet. In addition, the quartile rank for each journal was collected from the SCImago website. For RQ #2, we coded the research purposes using the following four categories: teaching, learning, research, and others. Teaching refers to studies that used LA techniques primarily to improve teaching practices; learning refers to the studies that primarily used LA techniques to improve learners’ learning in practice; research refers to the studies that only used LA techniques to conduct research; the studies that did not belong to these three categories were categorized as others. Regarding research methods, the authors analyzed the data sources and LA techniques. For data sources, the authors adapted the categories identified by Tashakkori and Teddlie (2003), such as interviews and surveys, and added new categories including log data, achievement data, demographic data, discussion forums, surveys, assignments, quizzes, videos, social media, and interviews. Log data in the present study refers to the data generated and recorded on learners’ behaviors and activities in the MOOC platform, such as clickstream logs and video/page views. Achievement data refers to data such as learners' test scores and completion rates. For data analysis techniques, the present research adapted categories from Khalil and Ebner (2016) and Alonso-Fernandez et al. (2019) and classified them into two categories: qualitative and quantitative data. The qualitative data were categorized into the traditional qualitative analysis (e.g., thematic analysis, content analysis, and discourse analysis) and LA text analysis techniques (e.g., text mining, semantic and linguistic analysis, and natural language processing). For quantitative data, we categorized the data into traditional statistics, social network analysis, and algorithm methods. Traditional statistics included regression, correlation analysis, ANOVA, MANOVA, chi-squared, and logistic regression. Algorithm methods included supervised learning (e.g., linear and logistic regression, regression and decision trees, support vector machines, Bayesian networks, neural networks, naive Bayes, and Bayesian knowledge tracing), unsupervised learning (e.g., correlation, clustering, factor analysis), and data visualization (e.g., performance metrics, heatmaps of interactions). For RQ #3 on the primary stakeholders, the authors categorized stakeholders into instructors, learners, providers (i.e., MOOC institutions, MOOC platforms), and instructional designers. Stakeholders represent to whom the research study findings most benefit. To answer RQ #4, we counted the countries and subject fields of all MOOC authors’ affiliations in this study. For RQ #5, we counted the countries of users (students) and subject areas of the MOOCs being studied. For the studies that did not provide specific information on the MOOCs, the authors coded them as NA. Excel 16.49 and Python were used to calculate the numbers and create visual presentations.

4 Results

The authors collected 166 MOOC LA research studies (see Fig. 2). In terms of publication dates, most of the studies were published since 2016: one article (0.6%) was published in 2011, two articles (1.8%) in 2013, six articles (3.6%) in 2014, 10 (6.0%) in 2015, 30 (18.1%) in 2016, 24 (14.5%) in 2017, 25 (15.1%) in 2018, 27 (16.9%) in 2019, 32 (19.3%) in 2020, and 7 (4.2%) by the end of April 2021 (see Fig. 2). This indicates that the number of MOOC LA studies has progressively increased (by 2020). Given that only 2021 articles published by the end of April 2021 were included, we are unsure whether the number of publications in 2021 will show an increase or decrease.

Fig. 2
figure 2

The number of MOOC learning analytic studies published each year (2011–2021) (n = 166)

4.1 Research Question #1 (RQ #1): What are the publication outlets of MOOC learning analytics research published in the past eleven years?

The 166 MOOC LA studies in this review were published in 72 different journals or proceedings. We also analyzed the journal rank in which the LA studies were published using quartile rankings (Q1-Q4). Quartile ranking refers to the impact factor of a journal in a specific subject/field from Q1 (the most prestigious) to Q4 (the least prestigious) (Rømer et al., 2020; Tóth & Demeter, 2021). More than 40% of MOOC LA articles were published in top-tier journals (Q1). Specifically, among the journals and proceedings we analyzed, 71 articles were published in Q1 journals, 23 articles in Q2 journals, 13 in Q3 journals, three in Q4 journals, and 54 articles in journals/proceedings that were not assigned in any quartile rank (see Fig. 3). The Proceeding of the International Learning Analytics & Knowledge Conference and the Journal of Learning Analytics, which are hosted by the Society of Learning Analytics Research Association, were the primary publication outlets for the MOOC LA studies.

Fig. 3
figure 3

Ranks of journals where MOOC learning analytic studies were published (2011–2021) (n = 166)

The top 15 publication outlets included: Proceedings of the Sixth International Conference on Learning Analytics & Knowledge (n = 13), Proceedings of the Seventh International Learning Analytics & Knowledge Conference (n = 10), Journal of Learning Analytics (n = 9; Q1), Journal of Computer Assisted Learning (n = 7; Q1), Proceedings of the Fifth International Conference on Learning Analytics And Knowledge (n = 6), Computers & Education (n = 6; Q1), Computers in Human Behavior (n = 6; Q1), Proceedings of the Eighth International Conference on Learning Analytics and Knowledge (n = 5), International Review of Research in Open and Distance Learning (n = 5; Q1), Proceedings of the Tenth International Conference on Learning Analytics & Knowledge (n = 5), Proceedings of the Fourth International Conference on Learning Analytics And Knowledge (n = 5), Computer Applications in Engineering Education (n = 4; Q2), International Journal of Artificial Intelligence in Education (n = 3; Q1), Technology, Knowledge and Learning (n = 3; Q2), and International Journal of Emerging Technologies in Learning (n = 5; Q3) (see Fig. 4). Besides the conference proceedings, the Journal of Learning Analytics (n = 9), Journal of Computer Assisted Learning (n = 7), and Computers & Education (n = 6) are the top three journals that published MOOC LA related articles. This showed that the Proceeding of the International Learning Analytics & Knowledge Conference and the Journal of Learning Analytics, which are hosted by the Society of Learning Analytics Research Association, were the primary publication outlets for MOOC LA studies.

Fig. 4
figure 4

Journals that published MOOC LA studies (the top 15 journals/proceedings) (2011–2021) (n = 166)

4.2 RQ #2 What are the research purposes and methods utilized in the MOOC learning analytics research published in the past eleven years?

Among the 166 reviewed MOOC LA studies, the majority used LA techniques for the primary purpose of research (n = 108) instead of teaching practice (see Fig. 5). The remaining studies used LA for education practices such as learning (n = 33) and teaching (n = 32). Given that fewer studies used analytics to directly improve teaching and learning practices compared to the research purposes, studies on how to use LA to improve educational practices need further research.

Fig. 5
figure 5

Purposes of MOOC learning analytic studies published (2011–2021) (n = 166)

Regarding the research methods used, among the 166 articles, 102 articles (61.4%) were quantitative, 61 studies (36.7%) used mixed methods, and the remaining three articles (1.9%) were qualitative (see Fig. 6). It is not surprising that a majority of the MOOC LA studies were quantitative studies. Over one-third of the studies also included qualitative data in the research. The three qualitative studies primarily used a qualitative method to explore the MOOC LA phenomenon.

Fig. 6
figure 6

Research methods used in MOOCs LA studies (2011–2021) (n = 166)

More specifically, among the various research methods, the studies in this review used diverse data sources. The most used data source was log data (n = 105) followed by achievement data (n = 77), demographic data (n = 70), discussion forum (n = 64), survey (n = 36), assignments or quizzes (n = 19), videos (n = 10), social media (n = 8), interviews (n = 6), MOOC descriptions (n = 4), observations (n = 2), and eye-tracking data (n = 2) (see Fig. 7). The results revealed that auto-generated data from MOOCs were the primary data sources in MOOC LA studies.

Fig. 7
figure 7

Data sources used in MOOCs LA studies (2011–2021) (n = 166)

The top ten specific data analysis techniques used in the MOOC LA studies were statistics (n = 133), machine learning (n = 43), content analysis (n = 23), social network analysis (n = 22), text analysis (n = 17), data visualization (n = 14), thematic analysis (n = 5), and interaction analysis (n = 5) (see Fig. 8). The majority of the studies used statistical techniques, which included descriptive statistics, correlation analysis, regression analysis, and structural equational modeling. The second most used analytic technique was machine learning, which included supervised learning and unsupervised learning approaches.

Fig. 8
figure 8

Top eight data analysis techniques used in MOOC LA studies (2011–2021) (n = 166) (Note: One article may have used more than one data analysis technique)

4.3 RQ #3. Who are the primary stakeholders of MOOC learning analytics studies published in the past eleven years?

The primary stakeholders of MOOC LA studies were instructors (n = 115), followed by learners (n = 50), providers (n = 32), and instructional designers (n = 20) (see Fig. 9). Thus, it seems that MOOC LA studies aimed to benefit instructors the most.

Fig. 9
figure 9

The primary stakeholders of MOOC LA studies (2011–2021) (n = 166)

4.4 RQ #4. Where are the employers/institutions of MOOC researchers located, and what are the subject areas of the researchers of these MOOC learning analytics studies published in the past eleven years?

Among the 166 MOOC LA studies, authors of 62 studies were from the USA, followed by authors from China (n = 25), Australia (n = 24), Spain (n = 22), UK (n = 15), the Netherlands (n = 11), Germany (n = 7), and Canada (n = 6) (see Fig. 10). Therefore, most studies were conducted by authors from developed countries, with no authors from developing countries except China.

Fig. 10
figure 10

The number of studies with authors from an individual country (2011–2021) (n = 166)

The number of authors who collaborated in each study also varied, but more than 92% of the MOOC LA studies were collaborations. Most of the 166 studies had three authors (n = 47), followed by four authors (n = 33), five authors (n = 26), two authors (n = 24), six authors (n = 16), a solo author (n = 12), seven authors (n = 3), nine authors (n = 2), eight authors (n = 1), ten authors (n = 1), and eleven authors (n = 1) (see Fig. 11). The most common collaborations included three or four authors in MOOC LA studies.

Fig. 11
figure 11

The number of studies with different numbers of authors (2011–2021) (n = 166)

Regarding international collaboration, most collaborations were with authors from the same country. Most authors were from one (i.e., the same) country (n = 113), followed by authors collaborating from two countries (n = 41), three countries (n = 6), and four countries (4) (see Fig. 12). Notably, there were two studies with authors from seven or eight countries. In addition, approximately one-third of the MOOC LA studies were international collaborations.

Fig. 12
figure 12

The number of studies that had authors from one or more countries (2011–2021) (n = 166)

Researchers from computer science and education fields were the primary disciplines conducting MOOC LA studies. Among the 166 studies, the top five subjects of the authors in the reviewed studies were computer science (n = 32), education (n = 29), engineering (n = 18), business and management (n = 14), and language (n = 7) (see Fig. 13). Moreover, over 58% of studies (n = 97) involved interdisciplinary collaborations (see Fig. 14), indicating that the majority of MOOC LA studies involve interdisciplinary collaborations.

Fig. 13
figure 13

The top five subject areas of the authors of MOOC LA studies (2011–2021) (n = 166)

Fig. 14
figure 14

The number of articles with authors from different subject areas (2011–2021) (n = 166)

4.5 RQ #5 Which countries and subjects were studied most in MOOC research in the past eleven years?

In terms of the countries that were the targets of the most MOOC research, the top eight counties included the USA (n = 47), followed by China (n = 18), Spain (n = 15), Australia (n = 13), the Netherlands (n = 10), the UK (n = 9), Switzerland (n = 6), and Canada (n = 4) (see Fig. 15). This corresponded with the locations of the authors’ institutions. Developed countries and China were the targets of all of the MOOC LA research in the 166 reviewed studies.

Fig. 15
figure 15

The top eight countries of MOOC delivery that have been studied in the MOOCs LA studies (2011–2021) (n = 166)

In terms of the subject areas with the most MOOC research, the top five MOOC subjects were computer science (n = 33), followed by education (n = 28), engineering (n = 18), business and management (n = 14), and language (n = 8) (see Fig. 16). This aligns with the authors’ subject backgrounds of computer science and education being the most popular. It makes sense that the authors from different subject areas would teach MOOCs in their subjects and subsequently conduct research on these topics.

Fig. 16
figure 16

The top five subject areas that have been studied in MOOC LA studies (2011–2021) (n = 166)

5 Discussion

The nature of MOOCs with massive open access provides a good justification for utilizing LA to capture learners’ learning behavior to evaluate or improve their learning process (Coffrin et al., 2014; Wibawa et al., 2021). Since 2011, the number of MOOC learning analytics studies has steadily increased with a sharp peak in 2016 (see Fig. 2). This finding is supported by a study conducted by Khalil and Ebner (2016) which found that the combined terms of Learning Analytics and MOOCs were the most cited in Google Scholar between 2013 and 2015. Both MOOCs and LA gained considerable attention in these years (Zhu et al., 2020; Shi & Cristea, 2018). Furthermore, based on our search, the number of articles published in 2017 decreased but gradually increased again until 2020. Although DeMatthews et al. (2020) stated that the research in education slowed down due to the lack of access to data, MOOC LA studies increased as more learners enrolled in MOOCs during the pandemic, and the log data became easier for researchers and educators to access. The ending point of our data collection for this paper was April 2021, which explains why only seven articles were found in 2021. We expect a much larger number for the entire year of 2021.

Our findings show that 72 different journals and proceedings published the 166 MOOC LA articles in this study. There are several possible explanations for this wide range of journals and conference proceedings. The fields of MOOC and LA are still in the initial phase, and few journals or conferences specifically focus on this field. In addition, LA emerged from different fields, including “business intelligence, web analytics, educational data mining, and recommender systems” (Ferguson, 2012, p.304). Additionally, MOOC research is usually categorized under the umbrella of online learning.

It is also important to highlight that almost one-third (54 articles) of the 166 reviewed articles were published in journals or proceedings that are not assigned to any quartile rank (Q1-Q4). Fifty-four of the articles were published in the Learning Analytics and Knowledge (LAK) conference proceedings, which has not been assigned any quartile rank by the Scimago Journal Ranking. However, this conference is quite established and managed by the Society for Learning Analytics Research (SOLAR), which has members and is supported by many well-known institutions around the world.

Among the 166 MOOC LA research studies that were published in the past 11 years, a majority of the studied used LA for conducting research purposes rather than directly improving teaching and learning practices (see Fig. 5). Given the emerging needs for MOOCs and large-scale online courses, it is critical to put more effort into leveraging LA to improve MOOC teaching practices. In addition, among the 166 reviewed studies, almost two-thirds (61%) used quantitative methods, and 37% used mixed methods. This finding is consistent with the nature of MOOC learning analytics data, which mostly deal with numbers and mathematics, and statistics plays a large role in the data analysis techniques (see Fig. 8). The data used in the reviewed studies include log data (e.g., click numbers, view numbers), achievement data (completed tasks or exams), demographic data, survey, and assignment/quizzes scores (see Fig. 7). Although MOOCs provide considerable data with easy access that facilitates LA, data privacy and ethics issues demand more attention in future MOOC LA studies.

The primary stakeholders of MOOC LA research studies are the parties that benefit the most from the findings of these studies. Based on our findings, instructors and learners were the primary stakeholders for these studies, which explain why these stakeholders have been the focus of many MOOCs and LA articles (Foster & Francis, 2020; Ifenthaler & Yau, 2020; Kew & Tasir, 2020; Sønderlund et al., 2019; Zhu et al., 2020). Given that researchers have been interested in understanding the high dropout rates of tertiary learners, LA can help us understand these learners’ behavior (Kew & Tasir, 2020; Sønderlund et al., 2019). For instructors, LA has been used as part of their teaching reflection and to increase their awareness of students’ learning (Dazo et al., 2017). Instructors can benefit from the LA data as support to improve instruction and decide whether or not the students need early intervention (Erdemci & Karal, 2020). LA data may as well be used to assist MOOC designers and providers regarding how to refine their courses (Shukor & Abdullah, 2019). However, this study found that the studies primarily used LA for research rather than to directly inform teachers about interventions to improve teaching and learning. While on the contrary, the pair between researchers and instructional designers can steer to the LA approach and interventions that closely fit with the specific learning environment (Erkan et al., 2019). Ifenthaler and Yau (2020) explained that a benefit of LA is to maximize the learning experience. Thus, more research focusing on using LA interventions for educational practices in the future will be beneficial.

Prior research has found that LA researchers come from multidisciplinary subjects (Suthers & Verbert, 2013; Waheed et al., 2018), and some have conducted international collaborative research on LA (Society for Learning Analytics Research, n.d.). Our findings support these previous findings. Almost all of the researchers were from Western countries, including the US, Australia, Spain, and the UK (see Fig. 10). China was the only Asian country, among the top eight countries, in our studies to publish MOOC LA studies in English. Our finding suggests that there still remains a wide gap of researchers and publications numbers between countries in North America, Europe, and China with the rest of the world. Murugesan et al. (2017) summarize several factors that may cause the research and publication gaps between these countries, such as a short supply of mentors and funding, lack of writing abilities, and insufficient knowledge of publishing practices. Furthermore, we suggest increasing more opportunities for research collaboration and mentoring in the field of MOOC LA to bridge these gaps.

Although most of the studies were collaborations with researchers from the same country, one-third of the studies were collaborations with researchers from two or more countries (see Fig. 12), and more than half of the studies were interdisciplinary (see Fig. 14). Despite the fact that LA is used far more to analyze teaching and learning in the educational field, the research process often requires people with backgrounds from computer science or engineering fields to collect and analyze the LA data. Moreover, the research found that computer science, education, and engineering are the primary subject areas that are being studied in MOOC LA studies. Thus, collaboration among researchers from different subject areas is recommended. Researchers and institutions benefit from research collaboration, such as knowledge exchange and expertise transfer among researchers, cross-fertilization of research ideas, and a deeper understanding of how to approach a research area. Thus, expanding researchers’ networks is needed to enhance our understanding of the outcomes of MOOC teaching and learning (Carroll et al., 2010; Gorska et al., 2020; John-Steiner et al., 1998).

Regarding the targets of most MOOC research, this study found that developed countries and China were the targets of all of the MOOC LA research in the 166 reviewed studies. This finding resonates with our prior systematic review’s findings that the MOOCs from developed countries were dominantly studied (Zhu et al., 2020). MOOCs that are offered using non-English and created through regional initiatives get less recognition from mass media and scientific journals (King et al., 2018; Launois et al., 2019; Murugesan et al., 2017; Ruipérez-Valiente et al., 2022). As a consequence, less frame of reference of MOOC LA are able to get from these countries. Therefore, more MOOC LA research could shift to the Global South (Zhang et al., 2020). This finding showed here could help the developing countries to provide funding on MOOC LA related research and teaching practices.

6 Limitations and future research

Despite covering multiple MOOCs and researchers across the globe in the analysis, there are some limitations to this study. First, only peer-reviewed journal articles and conference proceedings from Conference on Learning Analytics and Knowledge were included. Thus, some important discussions of MOOC LA in book chapters, institutional reports, newspapers, and dissertations could be missing from our systematic review. Future research could expand the outlets for more in-depth results. Second, future systematic reviews in this topic may also be expanded to add more research papers beyond the 2011–2021 time range to enrich the finding of MOOC LA practice. In addition, only publications in English were included in the review. Thus, meaningful research articles published in other languages (e.g., Chinese, Spanish) may have been overlooked in this research. Future systematic reviews could include articles published in diverse languages to obtain a more comprehensive picture of the MOOC LA status and trends worldwide, which could offer further insights to global MOOC LA research communities.

7 Conclusions

LA has been used extensively in education, in general, but has been used only recently in MOOCs over the past decade. A systematic review of MOOC LA has been needed to help us understand the emerging trends of this growing area of education. This study presents a systematic review of MOOC LA studies published from 2011 until the end of April 2021. This study shows that within an approximately ten years range, there have been only around 166 LA empirical studies that have been conducted in a MOOC setting. Given the large expansion of MOOCs, the number of studies in the MOOC LA field is relatively small and new. The number of MOOCs and MOOC learners is still growing rapidly, emphasizing the need for more research on MOOC LA so we can better understand the needs of learners. High-quality journals are still the first option for MOOC LA researchers to disseminate their work, and more than half of the articles were published in prestigious journals (Q1 and Q2 index). However, as MOOC LA research increases, we hope that more research journals will recognize the field and provide publication outlets for this important area. Current MOOC LA studies have primarily been used for research purposes. Thus, future studies on MOOC LA interventions can help improve learning and teaching practices, such as MOOC instructional design, self-directed learning activities, and effective and efficient assessments, etc. Fourth, in terms of the countries and subject areas of the MOOC locations and authors, the US and European countries and researchers in science fields still dominate this research. China is the only Asian country, among the top eight countries, publishing top research in MOOC LA in English. More research from various parts of Asian and African regions and especially from non-science fields are strongly encouraged. These new perspectives can offer richer discussion and valuable decisions for learning. In addition, MOOC LA research requires knowledge and skills from education, computer science, and data science fields. Thus, the active interdisciplinary collaboration will increase the rigor of the studies and the dissemination of the knowledge.