1 Introduction

Distance education has become one of the prominent education methods, especially with the COVID-19 outbreak, and educational institutions worldwide have urgently switched to distance education with the pandemic (Masalimova et al., 2022). Distance education, widely used today, is frequently criticised for its shortcomings, such as needing help to individualise the teaching process and meeting individual needs sufficiently (Meacham et al., 2020). Distance education has some disadvantages as well as advantages. Some of these are that the relationships in face-to-face education cannot be easily provided, students who do not have the habit of self-learning cannot be adequately supported, and they cannot effectively develop skills and attitudes in practice-based courses. In addition, despite the increasing demand for distance education, students generally tend to drop out at a high rate (Bağrıacık Yılmaz et al., 2022). Research on online environments shows that students’ lack of engagement is associated with failure and dropout (Caspari-Sadeghi, 2022). Some students in open and distance education leave their education due to reasons arising from the school and the programme (Okur et al., 2019). Also, students’ motivational attrition, satisfaction, individual perceptions, poorly designed courses, and lack of digital competence may cause them to drop out of distance education (Bawa, 2016).

For this reason, care should be taken to ensure that students are retained and social interaction opportunities are provided, student-student interaction should be guaranteed, and students in the risk group should be identified by taking into account the academic achievements of the students (Bağrıacık Yılmaz et al., 2022). In this direction, learning analytics that enables monitoring, measuring, and visualising online behaviours can be used to identify students in the risk group (Brito et al., 2019; Caspari-Sadeghi, 2022; Celik et al., 2022; Lacave et al., 2018; Queiroga et al., 2020). Analysing this data allows for valuable knowledge discoveries to improve learning and assist institutions in strengthening distance learning courses and reducing student dropout rates (Silva et al., 2022). Furthermore, there is an increasing interest in research on using learning analytics in distance education (Silva et al., 2022). Indeed, the application of learning analytics in an online learning environment is becoming increasingly prevalent among educational researchers as it can help make standardised and measurable decisions about student achievement (Kew & Tasir, 2022).

Learning analytics is a research discipline that uses data analysis to support learners and optimise learning processes and learning environments (Saqr, 2018). Learning analytics, which represents the research area, can be defined as measuring, collecting, analysing, and interpreting the data obtained about the context to make sense of it (Siemens et al., 2011; Siemens, 2013). Learning analytics, a subfield of technology-supported learning with its rapid development and rich outputs is closely related to various fields such as educational data mining, web analytics, statistics, and artificial intelligence (Chen et al., 2022). While some systems with embedded learning analytics tools inform educators about how to intervene in students’ education, some systems provide students with information about their progress (Jones, 2019). Learning analytics provides information on analysing learner data and learning processes and aims to develop evidence-based learning systems. Learning analytics also facilitates personalised learning, enabling students to have a more effective learning experience (Caspari-Sadeghi, 2022). It also guides instructors in grading students’ performance and keeping track of their diaries, attendance, and pace (Smith et al., 2012). Learning analytics encourages trainers and educational institutions to adapt to learners’ needs and ability levels. Learning analytics has rapidly become popular in recent years due to the advantages of data-driven decision-making in teaching and learning (Du et al., 2021). In this context, learning analytics can support and encourage applications to evaluate students’ progress, motivation, attitude, and satisfaction (Mangaroska & Giannakos, 2019). Considering all these, the possibilities offered by learning analytics provide new opportunities for learning (Greller & Drachsler, 2012). In addition, learning analytics, a data-driven approach, has great potential to provide an objective picture of the learning process by revealing the relationships between learning processes and achievement variables (Caspari-Sadeghi, 2022).

One of the purposes of using learning analytics is to understand and improve learning activities in higher education (Corrin et al., 2020). It is essential to understand the decisions made in using learning analytics applications in education for guidance purposes and the pedagogical and ethical implications of these decisions (Siemens, 2019). In particular, several ethical challenges and questions affect higher education optimisation (Long et al., 2011; Slade & Prinsloo, 2013). Ethical principles for learning analytics include data privacy, ownership and control, anonymity, transparency, do no harm, benefit, governance and security, consent, and openness (Corrin et al., 2019).

The number of studies using learning analytics in distance education is relatively high. Still, considering the increasing interest in these studies, a limited number of literature reviews have been published (Hantoobi et al., 2021). In this context, although it is seen that various systematic review studies on the use of learning analytics in educational environments have been conducted in the literature, only Kilis and Gülbahar (2016) have been found in the context of distance education. In the systematic review conducted by Kilis and Gülbahar (2016), the relevant studies in Eric and ScienceDirect databases were examined, and the findings of 25 studies on the subject were presented. Our study differs from this one in that it is a more comprehensive review of databases and analyses the studies in detail. Therefore, more studies are needed to investigate the benefits of learning analytics in distance education, to get detailed information about data collection processes, and to get information about the actual results (Kew & Tasir, 2022; Larrabee Sønderlund et al., 2019). This study is critical in providing guiding findings for future researchers and practitioners. This study aims to examine studies examining the use of learning analytics in distance education and to present a holistic picture of the literature. In line with this purpose, the following research questions were sought to be answered within the scope of the study:

  1. 1.

    What are the methodological characteristics of studies examining learning analytics in distance education?

  1. a.

    How is their distribution according to years?

  2. b.

    How is their distribution according to the journals in which they were published?

  3. c.

    How is their distribution according to the countries where they were conducted?

  4. d.

    How is their distribution according to research methods?

  5. e.

    How is their distribution according to sample groups?

  6. f.

    How is their distribution according to sample sizes?

  7. g.

    How is their distribution according to research topics?

  8. h.

    How is their distribution according to the data analysis methods used?

  9. i.

    How is their distribution according to the types of data used?

  10. j.

    What is their distribution according to the data sources used?

  11. k.

    What is their distribution according to the learning environments used?

  12. l.

    What is their distribution according to the variables tested?

  1. 2.

    How does learning analytics affect student learning in distance education?

  2. 3.

    What are the advantages and challenges of learning analytics in distance education?

2 Method

This study aims to examine scientific studies on learning analytics and descriptive content analysis using a systematic review method to identify the current situation. The systematic review is a method that requires careful analysis of the consistency of research findings (Moher et al., 2009; Tranfield et al., 2003). PRISMA guidelines were used to support careful planning, review, consistent execution of the process, research integrity, and transparency of reviews before starting the research (Moher et al., 2015).

2.1 Data collection

A Web of Science database was used to access the studies in this systematic review. Studies on using learning analytics in distance education were searched among SSCI-indexed journal articles with 12 research queries. The search was completed on 27 November 2023. In the screening process, “Distance Education AND Learning Analytics”, “Distance Learning AND Learning Analytics”, “Distance Teaching AND Learning Analytics”, “Open Learning AND Learning Analytics”, “Open Teaching AND Learning Analytics”, “Open Education AND Learning Analytics”, “Online Learning AND Learning Analytics”, “Online Teaching AND Learning Analytics”, “Online Education AND Learning Analytics”, “e-learning AND Learning Analytics”, “Electronic Learning AND Learning Analytics” and “Electronic Teaching AND Learning Analytics” queries were used. No filtering was made except for the SSCI index and search queries during the data collection phase.

2.2 Data analysis

In this study, the studies accessed with the specified keywords were analysed by two researchers. The other researcher checked the analyses to increase the reliability of the study. A form was created in Microsoft Excel for data analysis of each study. The form includes sections that answer the research questions. These sections consist of the name of the study, type of the study, year of publication, journal in which the study was published, country in which the study was conducted, method of the study, size of the sample, subject of the study, data analysis method, tools used, sources of data obtained in the study, educational level of the sample, type of learning environment, variables tested, evidence of learning analytics to support learning, advantages/findings and challenges. The studies included in the systematic review were meticulously read, and the prepared form sections were filled in separately. The data in the filled forms were transformed into codes, categories, graphs, and tables. The data obtained were presented descriptively.

Using the Web of Science database, as a result of the predetermined search queries, the full text of 4227 studies was accessed, as shown in Fig. 1. Two studies were excluded because they were single-page questionnaires or questionnaire answers. After excluding duplicates, 1823 studies were analysed according to their titles and abstracts. It was determined that 11 of these studies were not in English, 175 were not related to learning analytics, and 1055 were not associated with distance education. The full texts of the remaining 582 studies were analysed for relevance. Regarding appropriateness, 26 studies were not included in the analyses because they were unrelated to learning analytics, 42 had limited methods, and 92 were unrelated to distance education. In addition, 22 review studies were not included in the analysis because they contained limited findings. As a result of all analyses, 400 articles were included in the systematic review.

Fig. 1
figure 1

PRISMA flow diagram

3 Findings

3.1 Distribution of the studies based on the years

The distribution of the studies included in the analyses within the scope of the study by years was examined and presented in Fig. 2. According to Fig. 2, the studies accessed within the scope of the study first started in 2011, and it was determined that the number of published studies increased after 2019. It was determined that the number of studies examined reached the highest in 2021 (n = 74) and 2022 (n = 69), respectively.

Fig. 2
figure 2

Distribution of the studies based on the years

3.2 Distribution of the studies in journals published

The distribution of the studies examined within the scope of the study according to the journals in which they were published was analysed and presented in Table 1. When Table 1 was examined, it was found that most studies were published in “Computers & Education” (n = 33) and “Education and Information Technologies” (n = 26).

Table 1 Distribution of studies according to published journals

3.3 Distribution of the countries where the studies conducted

Table 2 shows the distribution of the analysed studies according to the countries. According to Table 2, most studies were conducted in China (n = 69), the United States of America (n = 62), and Spain (n = 58).

Table 2 Distribution of the countries where the studies were conducted

3.4 Distribution of the methods used in the studies

Figure 3 shows the distribution of the analysed studies according to their methods. When Fig. 3 is examined, it is seen that quantitative (87%) and mixed (13%) research methods are primarily used in the analysed studies.

Fig. 3
figure 3

Distribution of methods used in the studies

3.5 Distribution of the studies according to the sample group

Figure 4 shows the distribution of studies using learning analytics in distance education according to the sample group. As shown in Fig. 4, it was determined that university students (n = 245) were mostly preferred as the sample group in the related studies. In 18 studies, the sample group was not specified.

Fig. 4
figure 4

Distribution of the studies based on the sample group

3.6 Distribution of studies according to sample sizes

The distribution of the studies according to sample sizes is shown in Fig. 5. As shown in Fig. 5, it was determined that the most frequently preferred sample size was between 0 and 499 (n = 189). In 41 studies, it is seen that the sample size is not specified.

Fig. 5
figure 5

Distribution of studies according to sample sizes

3.7 Distribution of studies by research topic

The distribution of studies according to research topics is shown in Fig. 6. When Fig. 6 is analysed, it is determined that most studies were conducted on improving learning processes (n = 98) and creating interesting and effective learning and teaching strategies (n = 71).

Fig. 6
figure 6

Distribution of studies by research topic

3.8 Distribution of studies by data analysis methods

The data analysis methods used in the studies are shown in Table 3. When Table 3 is examined, it is seen that the most commonly used data analysis methods are regression analysis (n = 81), correlation analysis (n = 67), special algorithms-models (n = 56), ANOVA (n = 36), and cluster analysis (n = 33).

Table 3 Data analysis methods used in the studies

3.9 Data types used in studies

The data from the analysed studies were coded and presented in 4 categories: learning behaviour data, learning emotional data, learning network data, and learning level data (Wu et al., 2015). Figure 7 shows the data types used in the analysed studies. When Fig. 7 is examined, it is determined that learning behaviour data (51%) collected with the learning platform, including learners and learning resources, are frequently used.

Fig. 7
figure 7

Data types used in studies

3.10 Data sources used in studies

The data sources used in the analysed studies are given in Fig. 8. When Fig. 8 is analysed, it is seen that the most commonly used data source is student log data (58.38%).

Fig. 8
figure 8

Data sources used in studies

3.11 Learning environments types used in studies

The types of learning environments used in the analysed studies are given in Fig. 9. When Fig. 9 is analysed, it is determined that the most commonly used data learning environments are LMS (48%) and MOOC (21%).

Fig. 9
figure 9

Learning environments types

3.12 Tested variables in studies

The variables tested in the analysed studies are given in Table 4. When Table 4 is examined, it is determined that student data or daily platform behaviours (n = 165) and students’ learning performances (n = 137) are the most tested variables.

Table 4 Variables tested

3.13 Evidence of learning analytics to support learning in studies

The evidence of learning analytics supporting learning in the analysed studies is given in Fig. 10. When Fig. 10 is analysed, it is seen that the findings that learning analytics support learning (52%) are in the majority.

Fig. 10
figure 10

Learning analytics evidence to support learning

3.14 Mentioned advantages of using learning analytics

As a result of the use of learning analytics in distance education, the advantages stated in the analysed studies are given in Table 5. When Table 5 is analysed, it is seen that the findings that learning analytics in distance education improves learning outcomes (n = 145), enables the examine students’ activities (n = 134), explains students’ learning habits (n = 61), provides to develop teaching strategy (n = 58), increases students’ participation (n = 44), provides deep information about students’ learning (n = 44), enables to predict students’ performance (n = 38) come to the fore among the stated advantages.

Table 5 Advantages of using learning analytics in distance education

3.15 Mentioned challenges of using learning analytics

Within the scope of this systematic review, the difficulties encountered due to the use of learning analytics in distance education are given in Table 6. When Table 6 is analysed, it can be said that the challenges of using learning analytics in distance education are that learning analytics is inadequate or impractical in some cases (n = 18), it shows negative or ineffective performance for students (n = 16), students have limited interaction (n = 12), it contains limited data (n = 5), learning analytics is an expensive investment (n = 4), and students need additional support (n = 2).

Table 6 Challenges of using learning analytics in distance education in distance education

4 Discussion

This study analysed the research trends and main findings of the articles on learning analytics in distance education in the SSCI-indexed journals in the Web of Science database. This systematic review showed increased articles on learning analytics in distance education after 2019. After 2019, the increasing number of studies can be explained as a result of the transition of many educational institutions to distance education due to the global pandemic. In addition, it was observed that there was a slight decrease in the number of studies on the subject after 2021. This may indicate that studies in the field of learning analytics have reached a certain level of saturation. In addition, with the prominence of machine learning algorithms and artificial intelligence concepts in recent years, research on the subject may have been gathered under these headings. In this context, it can be said that future studies on learning analytics will be more related to machine learning algorithms. It was determined that the studies on this subject were primarily published in “Computers & Education” and “Education and Information Technologies” journals. The fact that these journals are the leading journals in their fields, with high impact factors and prestigious journals, may explain the high number of publications on learning analytics, a new topic in these journals. In addition, it was determined that most publications on this subject were made in China, the USA, and Spain. However, it was determined that quantitative and mixed research designs were frequently used in the analysed studies. This is an expected result since learning analytics involves the analysis of big data. In the studies on the use of learning analytics in distance education, it has been observed that the sample group consists mainly of university students, and the number of samples is between 0 and 499. This situation can be explained by university students frequently using distance education systems (MOOCs, LMS, etc.). Further, one of the primary purposes of using learning analytics is to understand and improve learning activities in higher education (Corrin et al., 2020). For this purpose, university students may have been focused frequently.

The research results showed that the most researched topic was improving learning processes and creating engaging and effective learning and teaching strategies. At this point, the advantages of learning analytics in distance education can be listed as enhancing learning processes, engaging and effective learning, creating learning strategies, predicting learning outcomes, predicting students’ dropout or attendance, emotional analysis of students, and providing feedback to students or instructors. The results support that researchers frequently use learning analytics, the quality of learning environments is improved by analysing students’ data in distance education environments, and meaningful data about learners can be obtained (Kew & Tasir, 2022). In this direction, the opportunities provided by learning analytics should be recognised, and learning analytics tools and techniques should be used and utilised to improve teaching and learning (Clow, 2013; Drugova et al., 2023).

The studies within this systematic review revealed that analysis methods that examine the relationships between variables, such as regression, correlation, ANOVA and cluster analysis, were primarily used to analyse the data obtained from learning analytics. Learning analytics is based on performing relational analyses to reveal how variables affect each other when analysing big data (Sghir et al., 2023). Therefore, this result is expected. Similar to the findings obtained from the study of Ifenthaler and Yau (2020), it was determined that the most commonly used data source in the studies was student log records. Data on distance education processes carried out through existing digital systems are obtained from system records. The fact that the most used learning environments are LMS and MOOCs supports these results. The variables tested were generally students’ behaviours on the platform, learning performances, learning outcomes, communication status, dropout behaviours, students’ opinions, and ease of use of the platforms. This finding is similar to the results obtained from the study of Sghir et al. (2023). It has been observed that learning behaviour data is frequently used through learning platforms in the studies examined. This finding is similar to the findings of Kew and Tasir (2022). Therefore, learning analytics tools can be developed and used to make healthier decisions about the teaching process (Kitto et al., 2017; Pantazos & Vatrapu, 2016).

In the studies examined, the findings that learning analytics support learning are predominant. This finding coincides with the results of Larrabee Sønderlund et al. (2019)’s review of intervention studies examining the effectiveness of learning analytics. Indeed, with the development of learning analytics, it has become easier to identify students’ problems and intervene promptly using these tools in learning environments (Wong & Li, 2020). The use of learning analytics in distance education has been found to provide many advantages. Some of the advantages of using learning analytics in distance education are improving students’ learning outcomes, enabling students to examine their learning activities, revealing learning habits, developing teaching strategies, allowing the students to participate in the course, receiving feedback from students, and identifying students who are about to drop the course.

In addition, learning analytics has the disadvantages of being inadequate or impractical in some cases, performing negatively for students, causing limited student interaction, and being an expensive investment. There are several ethical challenges and questions for learning analytics, especially ethical principles such as data privacy, ownership and control, anonymity, transparency, do no harm, benefit, governance and security, consent, and openness, which affect higher education optimisation (Corrin at al., 2019; Long et al., 2011; Slade & Prinsloo, 2013). The findings on the disadvantages of learning analytics are in parallel with the results of Avella et al. (2016). Despite all these, results show that learning analytics interventions have the potential to expand the scope of application further (Wong & Li, 2020). It can be said that learning analytics has an excellent potential to advance the innovation of personalised environments in particular (Dessì et al., 2019). Also, learning analytics can be useful with its techniques developed to collect and analyse large amounts of data, which seemed impossible in the early stages of distance education history (Gelan et al., 2018). Therefore, educational institutions may be advised to use learning analytics to identify students who are underperforming or at risk of absenteeism (Larrabee Sønderlund et al., 2019).

5 Conclusion, implications, and limitations

As a result, in this study, 400 journal articles regarding research trends and main findings from the Web of Science database were analysed. This research aimed to contribute by providing a general perspective on the use of learning analytics in distance education. The results showed that most studies were published in 2021, most were published in Computers & Education and Education and Information Technologies journals, and China, the USA, and Spain were leading in this field. Methodologically, it was concluded that quantitative and mixed methods were mainly used, university students and people from different professions were specifically studied, data primarily were obtained from log data, and regression analysis and correlation analysis were frequently preferred.

It was observed that the studies were conducted to improve learning processes and to create engaging/effective learning and teaching strategies, that the focus was mainly on student behaviour data, and that LMS and MOOCs were frequently used. It is seen that the most studied variables are student data or daily platform behaviours and students’ teaching-learning performances. It has been determined that learning analytics in distance education improves learning outcomes, supports learning, and enables the analysis of students’ activities. However, it has also been found that learning analytics is sometimes inadequate or impractical and performs negatively for students. In the light of these results, recommendations are as follows:

  • Considering the rapid spread of distance education processes in recent years and their active use in lifelong learning processes, increasing the number of studies in this field is recommended.

  • Researchers working on learning analytics in distance education can benefit primarily from Computers & Education and Education and Information Technologies journals, which have the highest number of publications.

  • Researchers who want to collaborate internationally and will work on using learning analytics in distance education should contact countries such as China, the USA, and Spain, which are pioneers in this field.

  • Since quantitative methods are frequently used in the studies to be conducted in this field, it is suggested that mixed-method studies should be included more in future studies.

  • In the studies conducted, university students were frequently included as a sample. In future studies, MOOC students, primary and high school students, and teachers can also be included.

  • For future studies, predictive analysis of retention/dropout, student feedback, and emotional analysis variables, which are relatively less focused in the current studies, can be focused on.

  • It is recommended that machine learning and deep learning algorithms, which have been popular in recent years, should be included in data analysis processes for future research using more extensive data.

  • In the existing studies, learning network data and learning level data types are included at a limited level. Future studies can be designed for these data types.