1 Introduction

Today, educational institutions and firms commonly use learning management systems. As a teaching strategy, e-learning has many advantages (Srivastava, 2019; Choudhury & Pattnaik, 2020). It still faces several challenges, however. Examples are the difficulty in evaluating students in large groups, preventing academic dropout, predicting student performance, and redesigning the educators’ roles. These problems are also common in other environments, e.g., intelligent tutoring systems which help students to learn some specific skills individually with the support of intelligent algorithms (see, for instance, Graesser et al. (2018) or Mousavinasab et al. (2021)).

1.1 Learning analytics

Learning analytics is an emerging and promising field of educational research that may help in the learning and teaching process. Analysis of learner data may provide insight into which students are at risk of dropping out or need additional support to further enhance their success (Long and Siemens, 2011). Learning analytics allows objective information on students’ performance to be tracked by using data obtained automatically from their activities (connections, comments, evaluations, shared documents, mouse clicks,...). One of the most widely accepted definitions is the following: “Learning analytics is the measurement, collection, analysis, and reporting of data about learners and their contexts, for the purposes of understanding and optimizing learning and the environments in which it occurs” (Long & Siemens, 2011). The generalized access to computers and smartphones, and the large amount of data generated, have led to the penetration of big data based tools into education (Sin & Muthu, 2015).

Learning analytics has been applied at all educational levels. (Sung et al., 2017) conducted a digital game based learning activity to analyse the pupils’ behavioural patterns in a primary education science course, revealing the differences in those patterns between the high achievers and the low achievers. Quigley et al. (2017) used learning analytics in secondary education biology classrooms to model a local ecosystem. Leitner et al. (2017) reviewed the learning analytics literature for higher education, addressing the limitations and discussing the most promising research lines. Alonso-Fernández et al. (2020) showed how effective learning analytics based on serious games was to engage, motivate, and help students to learn. Specifically, they proposed a combination of game learning analytics and data mining techniques based on in-game student interactions to predict change in knowledge. For courses given in a classroom setting, a blended learning approach that mixes computer interaction with traditional attendance-based teaching methods can also benefit from learning analytics (Bonk and Graham, 2006). For purely online courses, learning analytics is necessary in general, and absolutely necessary if the number of students is large.

1.2 Predicting retention and success

Predicting student learning outcomes is one of the main topics in learning analytics research (Hu et al., 2017). Such prediction depends to a large extent on the specific teaching and learning environment, the platform considered, the type of learning outcomes being considered, the features extracted for use in the prediction, and the prediction algorithms and cross-validation techniques applied. One important outcome addressed in the scientific literature is student retention/dropout, i.e., whether the student follows the course or abandons it at some point before finishing. In this promising field, Chui et al. (2020) predicted students at risk of dropout in a virtual learning environment via a support-vector machine based statistical predictive model for early identification. They also summarized the results of several studies that used different datasets as benchmarks and a wide variety of machine learning algorithms. Herodotou et al. (2019) proposed a learning analytics based system that includes machine learning methods for the early identification of students at risk of dropout. The aim in these cases was to anticipate dropout, and thus provide effective personal intervention support (Xing and Du, 2018).

Another relevant outcome is course performance. Data mining techniques based on students’ activities can be used to predict their performance. This can be done from either a purely online or a blended learning perspective. Zafra and Ventura (2009) used multiple instance genetic programming to predict whether a given student will fail or pass a certain course, and to identify activities that positively or negatively affect learning. Macfadyen and Dawson (2010) applied a logistic regression model which identified 15 variables having a significant correlation with student final grades. Hu et al. (2014) used classification and regression trees to determine 13 predictive variables of student performance. Lu et al. (2018) applied linear regression with principal components to 21 variables for the early prediction of students’ final academic performance, identifying 7 factors affecting that performance. Moreno-Marcos et al. (2020) used learning analytics tools to identify the main factors influencing students’ performance.

1.3 Self-regulated learning

Self-regulated learning is critical to success in learning online. It involves systematic efforts such as planning, implementing, regulating, and evaluating the processes of attaining learning goals (Narciss et al., 2007). You (2016) examined the relationship between course data and course achievement, demonstrating that students’ time of regular study, late submissions of assignments, number of sessions, and proof of reading the course information packets were significant predictors of their course achievement. The use of learning analytics tools can assist with the task of monitoring the learning capacity in self-regulated courses (Papamitsiou and Economides, 2019). Kim et al. (2018) applied k-medoids clustering and Random Forest classification, followed by logistic regression for the analysis of the profiles of the clusters that were identified, to analyse students’ self-learning patterns in asynchronous online courses. Kizilcec et al. (2017) addressed this problem from the viewpoint of massive open online courses (MOOCs).

A great amount of motivation is required in order to succeed in online courses. Motivation is, however, a complex construct, and research on its function in this context is still in an early stage. de Barba et al. (2016) studied how educators and developers of learning management systems can use knowledge emerging from assessments of motivation and participation gleaned from learning analytics for the design of courses. While technology and learning analytics alone cannot solve this problem, it could be addressed by adopting a neurodidactics paradigm, which would mean that the roles of the educators and students would have to be redesigned (Anastasia, 2016).

1.4 Neurodidactics and socio-constructivism

Neurodidactics, also called “brain-based learning”, is the union of cognitive science and neuroscience with education research. Its aim is to improve the quality of education by developing more efficient didactic and methodological strategies (Muchiut et al., 2018). It started as a research topic in the 1980’s. Hart (1981) observed that brain processes should be taken into account in classrooms, Gardner (1983) introduced the multiple intelligences concept for models of thinking, and (Caine and Caine, 1990) focused on the brain-based approach to the teaching process.

New technologies, and in particular technologies used for online learning, have the power to make the learning process more appealing, motivating, collaborative, communicative, interactive, and visual, all of which aspects are relevant in neurodidactics.

To understand neurodidactics and any tool based on it, one also has to understand its supporting structures which form part of the socio-constructivist paradigm. As a model, socio-constructivism has certain similarities with Piaget’s “constructionism” (Piaget & Cook, 1952), but it differs inherently from the latter in some fundamental aspects such as that learning is from the social to the individual and not vice versa, as defined by Vygotsky (see Vygotsky (1986) and Pass (2004)). Socio-constructivism proposes that discovery learning has a positive and pleasing effect on the student, and that it implies building new knowledge on the basis of the knowledge they already had. Links are established between what they already knew and the new information that they learn and apply, thus constructing knowledge which is also enriched by everything that surrounds the learner, i.e., their learning environment. In this way, a series of pieces of knowledge is built up in a gratifying and social way by combining actions (Kahveci and Ay, 2008). It is precisely these motivating and pleasant learning situations which form the link to neurodidactics, according to which the ideal situation to learn from a physical perspective of the brain is through activities that produce motivation.

During the learning process, situations that are gratifying directly affect our attention, effort, engagement, social emotions, and learning outcomes. Thus the process can be reduced to two effects that should be linked and recurrent in the learning process: the effects of pleasantness and of motivational desire. Motivation makes us learn, and after that learning process we feel happy and rewarded. This motivates us again to continue learning. The areas of the brain that produce dopamine are next to those that generate endorphins, and both neurotransmitters are related to pleasure. For this reason, didactic and neurological capacities are studied conjointly, explaining how to improve the learning process from a perspective of how the brain functions (Tang, 2017; Bonomo, 2017).

In the socio-constructivist paradigm, for learning to be determining and decisive it has to be relevant and rewarding. From this point of view, neuroscience proposes social learning as a motivating aspect that generates a state of the brain that is optimal for learning, again achieving a pleasant and rewarding state. Humans are social beings, and therefore nature itself offers the ideal characteristics to favour cooperative group learning (Igel & Urquhart, 2012).

All the aspects of education that have been detailed above are applied in NeuroK (https://neurok.es/en), an online educational platform based on the principles of neurodidactics. This platform was used in the present work.

1.5 Research hypothesis

The research hypothesis to test was that learning analytics tools can be defined in the context of the neurodidactics paradigm to accurately predict students’ performance at the same time as providing invaluable information about the learning process.

1.6 Addressing the problem

To address this problem, the NeuroK platform was chosen to work with. A total of 29 features were specifically defined and implemented in NeuroK to analyse the learning process and predict students’ performance. An international course with 698 students was taken as a case study, and several experiments were performed with different types of classifiers in a cross-validation framework so as to identify the most accurate predictive model and the most important features. These tools were proven to have high predictive capacity. The features were related to different individual and social aspects of neurodidactics in the learning process. To the best of the authors’ knowledge, no other study has been reported relating machine learning with learning analytics and neurodidactics.

The remainder of this article is structured as follows. Section 2 briefly describes the NeuroK e-learning platform and the innovative learning analytics tools considered. Section 3 covers the methodological aspects of the study, including a description of the course used for testing, the features that were defined, and the statistical techniques applied. Section 4 presents the experimental results, and they are discussed in Section 5. The main conclusions are drawn in Section 6, together with some avenues for future research. Finally, an Appendix presents the descriptions of the features.

2 NeuroK - a social web based environment

Neurodidactics is a discipline which deals with optimizing the learning and teaching process from the perspective of how the brain functions. It is based on the neuroscience hypothesis that knowledge cannot be forcibly transferred but must be newly created in the brain. In line with this, neurodidactics proposes a learner-centred form of education in which the instructor’s role is to facilitate the students’ learning as a conductor of the learning process, whether attendance-based or online. The students construct knowledge by themselves, activating their long-term memory through motivation (Edelenbosch et al., 2015). This gives them the incentive to learn and the ability to concentrate (Harandi, 2015). There are various ways to motivate – gamification, learning by doing, flipped classroom, cooperative learning, small group learning, peer tutoring, and the use of real life challenges, inter alia (Uskov et al., 2017; Miranda et al., 2021).

NeuroK constitutes the social web based environment supporting the present work. It was constructed following a brain-based and socio-constructivist approach (Calle-Alonso et al., 2017). Its focus is on collaborative learning, attending to the social dimension of that process. It motivates the student to learn by sharing content, participating in discussions, peer tutoring, taking part in cooperative tasks, and guiding themselves along their own group learning path. With its adoption of a socio-constructivist paedagogical strategy, NeuroK enhances the social dimension since it is structured as a social network (with an interface similar to those of Facebook or Twitter) offering social interactions, exchanges of knowledge, collaborative activities, etc. The teacher is a guide, providing the students with a sharing process, focusing on them and not just on the content, and giving them motivation concordant with neurodidactics. NeuroK fosters learners’ capacity to think both on their own and as part of a group. The key to a successful learning process is that it should be interesting and enjoyable. In this sense, the teachers become the agents who are capable of managing their students’ motivation and encouraging them to discover, share, and learn from each other. The premise is always that of teaching less but learning more.

NeuroK satisfies most of the general principles of socio-constructivism and neurodidactics (Jayasankara Reddy et al., 2021), and has the following characteristics:

  • It moves progressively from individual to cooperative learning (El Mhouti et al., 2014), producing paedagogical tools supporting this new context such as following people, giving favourites or likes, debating (a lot), writing and scoring comments, subjects, and tasks, peer reviewing, and mentoring.

  • The students are not constrained by time and place, and can cooperate.

  • The content is not generated by the teacher at one given time to then remain frozen, but is mostly generated by the students and evolves with their group participation.

  • The student (not the teacher or the material) is the centre of the learning process. The teacher gives guidance to the students and tries to motivate and engage them with such tools in NeuroK as the motivational trigger at the beginning of each learning unit, the general and individual dashboard on which to follow their development, and highlighting some interesting aspects that might not have been achieved or properly dealt with.

  • The search for meaning is innate, and, with some triggers, the students are motivated to learn and cooperate. NeuroK uses an initial hook prepared by the teacher, and then cooperative activities to motivate the students to participate.

  • Emotions play a fundamental role in learning. Attention is needed for instance, and emotions can strongly affect this aspect. Motivation also can be pursued by using some kind of gamification such as giving points for social interactions.

  • Learning involves conscious and non-conscious brain processes, and social debating as included in NeuroK is a very interesting tool for non-conscious learning.

  • Audiovisual resources are especially stimulating for the brain. In e-learning platforms such as NeuroK, they are mostly used to achieve relevant learning. In fact, 90% of learning is visual, with 85% of the brain wired for visual processing (Clemons, 2005).

The attractive structure of NeuroK, which functions as a social network, and the appealing settings of the courses – beginning with the motivational shot that opens each unit or the collaborative learning on which the platform is based – mean that the students’ motivation and work spur them on to maintain their attention and not disengage from the learning units. But if this were to happen, NeuroK incorporates a set of tools available to the teacher to determine each student’s degree of participation during the course, trying to reveal the reasons behind the dropout and hence bring the student back into the course. For example, there is a dashboard that includes a table of statistics to monitor personal and group development. Also, two innovative learning analytics tools have been implemented in NeuroK: word clouds (Calle-Alonso et al., 2018a) and social interaction graphs (Calle-Alonso et al., 2018b). They are based on natural language processing and social network analysis, respectively. These learning analytics tools are easy to understand, and help improve motivation, critical thinking, and engagement.

The word clouds provided are visual representations of the information most frequently used by the students. The information extracted from NeuroK to build the word cloud comes from comments, remarks, ratings, and documents submitted. Every time one of these types of content is posted, the corresponding text is automatically analysed word-by-word. There are three possible word clouds: the student’s cloud, with the words they most frequently used; the target cloud, with the words that the teacher would like their students to talk about; and the mixture of both, with the target words as actually used by the students. Teachers can use this tool to explore what is being discussed in a course, and then make strategic decisions on how to guide the training process, decisions which will be based on the content of what the students have written (Calle-Alonso et al., 2018a).

The social interaction graph provides an interactive display of the students on the e-learning platform and how they relate to each other. To this end, the plots are constructed from the different types of relationships that may exist between two nodes or students (comments, mentions, favourites, and ratings), giving a different weight to each relationship depending on the type. These weights are then used to give a score to each relationship, and then the greater the score, the thicker the line that represents that relationship on the plot. This tool lets teachers follow the participation and the cohesion of a group of students, and also helps them to identify influencers or key players. They can then take advantage of the broad influence of these key players to put forward new lines of action in order to motivate the rest of the students. In this sense, three centrality measures were applied to assess the influence or social activity of a particular student – grade, betweenness, and closeness centralities (Calle-Alonso et al., 2018b).

In sum, NeuroK takes all of the students’ actions (uploaded content, interactions, comments, peer evaluations, etc.) in trying to assign them a score. In combination with the social analytics and content metrics, these features are very important for analysing the students’ development. The data obtained reflect the students’ situation from the perspective of motivation and engagement. The teacher can then act according to each different situation using the tools provided by NeuroK to increase motivation and performance.

3 Methods

This section will briefly describe the particular NeuroK course studied, presenting the features proposed, and the predictive statistical models used.

3.1 Course description

A NeuroK online course called “Aprender a motivar” (Learn how to motivate) was taken as a case study. It is aimed at teachers who want to know the basic mechanisms of neurodidactics and their application in the classroom. A total of 800 teachers of La Salle Schools in Spain, Mexico, Colombia, Ecuador, and Puerto Rico signed up, of whom 698 completed the course. The course was divided into 9 learning units and 9 activities. Its design was asynchronous as the participating students were from different countries and time zones.

The records in the course database were subjected to an exhaustive analysis. Once all the features had been identified, they were exported to a CSV file for subsequent analysis. All data were anonymized before they were used in this research.

The results of the students’ performance in the course and the pass or fail lists of those who completed the course were retrieved. These qualifications were made by the team of teachers through a direct analysis of the students’ work throughout the course without considering any automatically extracted feature. Specifically, the instructors evaluated the task submissions and how the students had participated and collaborated.

After the course, the students were invited to respond to a survey about that course specifically and also about the platform in general. Three hundred students voluntarily answered the questionnaire. The development of the course was rated very positively, as also was NeuroK as a learning framework. In particular, 92.8% responded that they had learnt quite a lot or very much (43.8% and 49%, respectively), and 92.1% that their general experience with NeuroK was quite good or very good (39.3% and 52.8%, respectively) and that they were grateful to the platform. Also notable was that 91.4% mentioned that they would like to participate in another course with the same methodological approach by way of the NeuroK platform.

3.2 Learning analytics based features

From the students’ interactions, 29 features were extracted automatically by the NeuroK platform, and recorded to be analysed. Some of these features, such as centrality measures or concordance, were obtained from the social interaction graph and the word cloud. The features proposed are presented in Table 1, and are described in detail in the Appendix. The data are available upon request to the first author.

Table 1 Feature names and notation

These features can be divided into the following 8 groups:

  • v1: Number of logins.

  • v2 to v5: Social network metrics.

  • v6 and v7: Communications and contacts.

  • v8 and v9: Word cloud metrics.

  • v10 to v12: Points and evaluations.

  • v13 to v20: Comments, favourites, mentions, and ratings.

  • v21 and v22: Documents and videos.

  • v23 to v29: Everything shared/published/evaluated.

One principle of neurodidactics is that mind is social (Caine et al., 2009). Features v2 to v7 (social and communications), v13 to v20 (social network metrics), and v23 to v29 (collaborative content) are all related to social interactions. They offer different perspectives of the social facet of education, and comprise the majority of the features considered. Following the principles of socio-constructivism and neurodidactics, they constitute very important material for NeuroK. The focus of features v1 (logins), v8 to v12 (concepts and evaluation), and v21 and v22 (materials) is on other aspects related to individual results. There are features that provide information on both individual and social aspects. The word cloud metrics, for example, reflect the topics most used by a student, but also by the whole group.

Not all the features have the same importance in predicting success in the course, but they will be ranked in importance through the use of predictive models.

3.3 Statistical methods

Two experiments were performed to test the statistical classification models. The dependent variable was taken to be the grade (pass/fail). At first, to give a baseline for purposes of comparison, all 29 features were taken into account. Then, a recursive feature elimination (RFE) (Kuhn, 2011) technique was applied for the selection of features associated with each of the classification models considered. This feature selection method is based on the Gini criterion (Kubus, 2014), and attempts to maximize the accuracy while minimizing the number of features. At each step in the recursion, the features with the lowest variability explaining the dependent variable are eliminated, maximizing the accuracy achieved in that step. The results provide the optimal number of features for each classification model.

Ten classification models were tested to obtain the greatest accuracy possible in predicting whether students pass or fail the course. These models were: Random Forest (RF), k-Nearest Neighbors (KNN), Logistic Regression (LR), Elastic Net (ENET), Neural Networks (NNETS), TreeBagging (TB), Logit Boost (LB), Bayesian Additive Regression Trees (BART), Boosted Tree-based C5.0 (C5.0), and Oblique Random Forest SVM (ORFSVM). These are some of the most extensively used classification models found in the scientific literature of recent years. They yield high accuracy for many different kinds of problems. See, for example, Probst et al. (2019) for RF, Jiang et al. (2014) for LR, Er et al. (2017) for ENET, Paszke et al. (2016) for NNETS, Gorad et al. (2017) for C5.0, and Correia and Schwartz (2016) for ORFSVM.

To generalize the models’ performance, a repeated stratified k-fold cross-validation was used Refaeilzadeh et al. (2009). In this technique, the dataset is randomly split into k equally sized subsets, preserving the same pass/fail ratio for each. Of these subsets, k − 1 are used for training, and the remaining one for validation to test the model. This process is repeated k times, with each of the k subsets used once for validation. Finally, the k results are averaged to produce overall statistical measures of the classifier’s performance such as accuracy, area under the ROC curve (AUC), sensitivity, and specificity. Since the study had 698 subjects, taking k = 10 led to subsets with approximately 69 subjects (30/31 passing the course, and 39/38 failing it). In this experiment, the complete 10-fold cross-validation process was repeated 10 times for each model, and the results were averaged.

Hypothesis tests were applied to reveal statistically significant differences: the Mann-Whitney U −test to compare the results of the 10 classifiers, since the sample sizes were small and normality could not be assumed, and the independent t-test to reveal statistically significant differences between the means of each feature of the two groups of students (pass/fail). Two-tailed p-values less than 0.05 were considered statistically significant. The statistical analyses were performed using R software (R Core Team, 2021) and the caret package (Kuhn, 2008).

Figure 1 synthesizes the methodological process followed. In this, all the students’ actions are recorded in a database while they are connected to NeuroK. This data is put into a common format for analysis. Subsequently, the model is trained and a repeated cross-validation is applied to assess performance in order for the results to be generalizable. Finally, the evaluation prediction is output, and a new iteration with new data corresponding to more information from the students (or even from new courses) can be initiated to feed and update the model.

Fig. 1
figure 1

The study’s methodological framework

4 Results

The classifiers with the cross-validation scheme presented in Section 3 were applied to the course data in two scenarios to evaluate the predictive power of the proposed features.

4.1 All features

In the first scenario, all the features were considered to assess the models’ performance. The number of predictors (29) balances with the number of subjects (698) for the results to be reliable. Table 2 lists the accuracy, AUC, sensitivity, and, specificity of the considered classification models in a 10-fold cross-validation framework.

Table 2 Accuracy metrics for 10 classification models with 10-fold cross-validation using all features

The results are remarkably good. The accuracies range from 0.9255 to 0.9971, with 7 out of 10 classifiers having accuracies greater than 0.95. The mean ± standard deviation of the 10 accuracies is 0.9703 ± 0.0287. The results for the other overall metric, AUC, are also very good, ranging from 0.9575 to 0.9998 (0.9871 ± 0.0158). The best performance corresponds to RF, with all four metrics greater than 0.99. The results for LB, TB, BART, and C5.0 are also very competitive. All the models except TB and C5.0 are more sensitive than specific, but the differences are very small and not statistically significant (p-value= 0.123), and both the passes and the fails are identified very well.

4.2 Automatic feature selection

In the second scenario, the RFE selection method presented in Section 3 was applied to all the classification models. This procedure recursively eliminates the features with the lowest variability explaining the dependent variable at the same time as maximizing the accuracy. Feature selection and classification were performed, hence the accuracy metrics are related to the selected variables. It should be noted that this is not a subjective method of selection done by one expert in education, because their own experiences would bias the selection. Instead, the method is completely objective, based just on data variability.

Table 3 presents the accuracy metrics calculated using the optimal features given by RFE for each classifier.

Table 3 Accuracy metrics for the 10 classification models with 10-fold cross-validation using the RFE-given optimal features

The accuracies (and AUC) range from 0.9398 to 0.9986 (and 0.9675 to 0.9999), with mean and standard deviation given by 0.9797 ± 0.0213 (and 0.9927 ± 0.0108). Again, the best performance corresponds to RF, with all four metrics greater than 0.99, using only 7 features. TB, LB, BART, and C5.0 also give accuracies greater than 0.99 with their optimal features. All the approaches except LB and C5.0 are more sensitive than specific, but the differences are not statistically significant (p-value= 0.195). Again, the students who pass and those who fail are very well identified.

The gray colour highlighting in Table 4 corresponds to the optimal features obtained by RFE for each classification model. There is a great variability on the number of optimal features selected for each model. The optimal numbers of features range from 3 to 27 (out of 29), with mean 9.90 (± 8.58). Observe that all the features were selected at least once. This means that the 29 features all have some usefulness. It stands out that Learning activities rating (v12) and Peer reviews performed (v29) were selected by all the models. Participation score (v10) was selected by 8 out of 10 classifiers. Therefore, these three features are very important to predicting the course success. They are related to evaluations of group activities and to scoring participation in the social interactions (favourites, likes, responses, proposals,...), highlighting the important role of social aspects as proposed in the neurodidactics paradigm.

Table 4 Optimal features given by RFE for each classifier. Gray-shaded cells represent the features selected for the corresponding classifier

There needs to be a trade-off attained between the predictive power and the number of features. This would avoid unnecessary or redundant predictors that may add noise, at the same time as reducing computational effort. The best models giving high predictive power with the fewest features are RF, TB, and BART since they provide accuracies of 0.9986, 0.9971, and 0.9900, respectively, with reduced numbers of optimal features (7, 8, and 5, respectively, of which 4 are common, from the original 29 features). Specifically, RF also provides the best AUC, sensitivity, and specificity. Finally, NNETS and ORFSVM are also very competitive with accuracies close to 0.99 in considering only 3 features.

The mean accuracy in this scenario is 0.0093 greater than that of the scenario corresponding to all the features. While this difference is not statistically significant (p-value= 0.545), the great advantage is that, for all the classifiers, the number of features was reduced without losing accuracy (even increasing it).

The best model found, RF, had 7 features extracted automatically with RFE: Influence (v5), Participation score (v10), Learning activities rating (v12), Submissions (v25), Published content (v26), Learning activities completed (v28), and Peer reviews performed (v29). These features provide important insights into neurodidactics. Social interaction achievements are reflected in Influence (v5). Participation score (v10) is the most general metric of development in the course. Learning activities rating (v12) and Peer reviews performed (v29) are the interactions that are rating or being rated by the group. Finally, Submissions (v25), Published content (v26), and Completed learning activities (v28) correspond to the information shared with the group to be commented on or reviewed. The automatic feature selection yielded a really interesting combination of attributes related to the social and collaborative learning that was developed in NeuroK.

Motivation is decisive although it is very hard to measure. These results, however, have made it possible to propose some very good features related to relevant aspects in neurodidactics for a significant, pleasant, and engaging learning process:

  • Good general development in the course (Participation score).

  • Differential social interaction achievement (Influence).

  • Information shared with the group to be commented on or reviewed (Submissions, Published content, Learning activities completed).

  • Interactions that are rating or being rated and are improving the activities of all the group (Learning activities rating by other students, Peer reviews performed).

Table 5 presents the main descriptive statistics of these 7 features according to the response variable together with the independent t-test p-values. All seven hypothesis tests gave statistically significant differences between the means of the features of those students who failed and those who passed. Figure 2 shows box plots representing these differences. In sum, these 7 features discriminate the students by performance very well.

Table 5 Descriptive statistics for v5, v10, v12, v25, v26, v28, and v29 separated by performance and p-values for the independent t-tests
Fig. 2
figure 2

Bar chart for the grades and box plots for the 7 features extracted automatically by RFE (v5, v10, v12, v25, v26, v28, and v29)

In the following section, we shall explore the significance of these results.

5 Discussion

Predicting performance is not an easy task in online education. The main problem lies in identifying features that provide high predictability. Such predictability can help the instructors to evaluate courses with many students, to monitor poorly performing students at risk of dropout, and to motivate students to succeed. Instructors may use learning analytics tools to grade students, which is especially relevant in online courses with large numbers of students.

A total of 29 features were defined, implemented, and tested. They were defined to cover fundamental aspects of online learning related to key facets of neurodidactics for meaningful significant learning, with some of them being related to aspects of traditional learning. Given the generalized level of accuracy obtained with 10 classifiers, the features defined are good predictors of performance in terms of the students’ grades. The classifiers considered are of common use in the scientific literature. It should be noted that the use of 10 methods was just a way to reinforce the emerging picture of the usefulness of the features considered.

From the statistical perspective, the accuracies obtained in this learning analytics study ranged from 0.9255 to 0.9986, and the AUCs from 0.9575 to 0.9999 in the two scenarios. Other results have been reported in similar work on students’ performance. For example, Hu et al. (2014) compared several data mining techniques, including logistic regression, C4.5 (Quinlan, 1993), and a classification and regression tree (CART) (Breiman et al., 1984). They achieved accuracies ranging from 0.754 to 0.953 with three datasets. However, neither this nor other results can be fairly compared with the present case, because data come from different features, different learning management systems, different classifiers, and different cross-validation frameworks. Nevertheless, what is important is that the methodological approach is statistically rigorous, and can be applied to give accurate results in real situations.

The best model found was RF, with 7 features extracted automatically by RFE. Its accuracy, AUC, sensitivity, and specificity were 0.9986, 0.9999, 1.000, and 0.9974, respectively. This means that all the students who passed the course were classified correctly, and that on average just one student who failed was wrongly classified. This model’s accuracy metrics and the limited set of features it selects are the reasons for its being considered the best, although other models obtained with RFE are also very competitive (specially some that are based on an ensemble architecture). Note that RF is a powerful decision-tree based classification algorithm that gives good results in many different contexts (see, for example, Fernández-Delgado et al. (2014)).

The 7 features that RF selected with RFE were: Influence (v5), Participation score (v10), Learning activities rating (v12), Submissions (v25), Published content (v26), Learning activities completed (v28), and Peer reviews performed (v29). All seven features gave statistically significant differences between those students who failed and those who passed. This selection is supported by both the accuracy metrics and the paedagogically meaningful information that they provide.

Educationally, one can gain invaluable insight from the best model’s selected features. Of the social network metric features (v2–v5), Influence is the most determinant. It measures collaboration with other students through the social network. It is calculated from NeuroK’s social interaction graph, and allows the positive extremes, i.e., the key players or influencers, to be identified. It gives emphasis to the students’ role in the social network since their labour in disseminating knowledge is invaluable in helping the group in its learning process. This social intelligence is the ability to process the intentions of others (Brothers, 1990), and the areas of the brain that process social events also process cognitive events. According to neurodidactics, this Influence feature would be very important for predicting success, as was confirmed numerically in this experiment. Hernández-García et al. (2015) applied social learning analytics to message boards in online distance learning, examining the relationship between social network analysis parameters and the students’ performance. While they concluded that this approach can help reveal the interactions occurring in online distance education, they advised against relying exclusively on social network parameters for predictive purposes. Their box plots identified a number of outliers of interest, being the students giving the largest values. Indeed, key players or influencers are relevant in the online learning and teaching process to disseminate information, among other tasks (Probst et al., 2013).

Motivation and collaborative learning form the basis of the principles of neurodidactics, enhancing participation. For this reason, the features measuring participation were good predictors of the students’ grades, and high accuracy was achieved with just a reduced set of them. Since the platform used encourages participation and collaboration, some of these measures identified these actions performed by each student in the learning discussions.

Participation score (v10) was selected by 8 of the 10 models considered. It represents the points a student automatically obtained during the course from different activities representing their overall participation. A similar general score feature could be obtained from any learning management system, but NeuroK includes information about social interactions or flipped classroom activities from a socio-constructivist perspective. There were statistically significant differences between the mean scores of students who failed and those who passed (206.41 vs 1219.15). This feature must be combined with others to provide good discrimination.

Learning activities rating (v12) and Peer reviews performed (v29) are both based on peer reviews, but the former corresponds to receiving ratings and the latter to giving ratings. Peer review is an educational resource. It is not only cooperative learning but also cooperative evaluation, which is in essence another way of learning from companions. Peer learning and evaluation yield good results. When individuals share knowledge, what they think may change substantially. From a brain-based learning perspective, combined learning as a group is greater than the sum of its parts (Tokuhama-Espinosa, 2010).

Submissions (v25), Published content (v26), and Learning activities completed (v28) represent a part of the learning process that in traditional learning management systems would be autonomous, but in NeuroK, submitting activities, sharing content, and completing tasks finish with a peer review and/or a group debate about the evaluations given and received and the material shared. All the activities were designed to follow the guidelines of neurodidactics and socio-constructivism. All the students who passed the course had completed all the learning activities, whereas only 15.5% of those who failed had done so.

The last feature selected is Peer reviews performed (v29) which represents the number of other students’ learning activities that are evaluated by the given student. This is different from the Learning activities rating feature (v12) which represents the evaluations that other students make about the given student’s learning activities. Therefore, both features are based on peer reviews, but from complementary perspectives – receiving (v12) and performing (v29).

Emotions are critical to the brain-based learning process, and using social networks has been demonstrated to arouse positive emotions, which thus produces a pleasant sensation. Since this is an e-learning platform, the students can be comfortable at home surrounded by their favourite aroma and/or music, which is also favourable for increasing performance (Wilmes et al., 2008). Moreover, they can (and are recommended to) do some mild exercise every hour.

In sum, automatic feature selection seems to be a good alternative to consider for predicting students’ performance in the experiments that were conducted. The approaches tested selected reasonable features in both number and content, obviating the need to use all of them which would increase the computational effort. It also obviates the need for features to be selected by experts, which would introduce subjectivity, and lessen the interpretability of the principal components if they are considered. Indeed, high levels of accuracy were attained with just a limited number of interpretable features. Educationally, the features extracted fit properly with the neurodidactics paradigm, being closely related to such concepts as peer-evaluation (Gokkurt et al., 2012), cooperative learning (Roseth et al., 2006), and learning by doing and participating (Reese, 2011).

To the best of the authors’ knowledge, no other study has been reported relating learning analytics and neurodidactics with machine learning. This is a first step in contributing to relating learning analytics tools and neurodidactics. We have proposed some features that address educational aspects related to neurodidactics, and a robust machine-learning procedure for selecting features and predicting students’ performance. This proposal is methodologically relevant, and can be used with other datasets. Indeed, the main limitation of this study is that it was tested on a single course, so the generalizability of the features that were found to other courses and other learning management systems has yet to be studied. It would be interesting to implement the proposed features or equivalent ones in different learning management systems so that the approach could be applied to and tested with other courses. The students’ interaction will differ from one course to another. This would introduce a between-course variability of the data, which might lead to the features selected not being the same. But in any case, the information collected from each course could be used and contextualized in accordance with the different kinds of activities performed. Within-course variability is different, it corresponds to the same course over a number of years, i.e., one teacher directing the learning process and students changing every year. In this case, there will be less data variability, and historical data could be used. Sharing data publicly would help the education research community determine which features are most relevant in different scenarios.

6 Conclusions

Learning analytics based features automatically extracted from students’ interactions on the NeuroK platform can be used to predict success with high accuracy. A total of 29 features were tested with different classifiers in a variable-reduction framework. Accuracies greater than 0.99 were obtained. The best performance was obtained with an RF model that selected 7 relevant features, all of them having a clear interpretation in the teaching and learning process and being related to key concepts of neurodidactics and socio-constructivism. The findings suggest that neuroscience has the potential to support students’ success.

The experiments conducted showed how these automatically extracted features can be used with the classifiers considered and the feature selection method to provide a reliable learning analytics tool with the aim of predicting student success in an online course following the principles of neurodidactics. The implementation of these or other related features in different learning management systems is encouraged in order to apply the proposed machine learning approach and increase the number of experiments so that they can be compared. Sharing data would also be interesting.

Future research could consider the use of these features from an early stage, and then track in real time their progression with regression models to detect the likelihood of students’ dropout or failure. This would allow preventive actions to be taken during the development of the course.