1 Introduction

With the rapid development of online education in recent years, research direction has been shifted to using data-driven learning prediction models to obtain new insights on how students learn and how to improve their learning performance (Picciano 2014; Siemens and Baker 2012). With the development of educational data mining (Ahmad et al. 2015; Baker and Yacef 2009) and learning analytics (Siemens and Long 2011), relevant data cleaning, mining, and analytics techniques have been used to understand, report, and optimize online learning and learning environments. For example, the applications of the data-intensive approaches have been used to predict student exam performance (Agudo-Peregrina et al. 2014), create learning prediction models (Paquette et al. 2015), and develop feedback dashboards (Jivet et al. 2018). Those applications have significantly improved the understanding over students’ learning process, performance, and context in online education (Ouyang et al., 2022; Verbert et al. 2012).

Academic performance prediction is one of the most important aspects in online education, which is typically conducted to estimate students’ learning performance using learning information and artificial intelligence (AI) algorithms (Tomasevic et al. 2020). Different types of AI algorithms have been used to develop prediction models, e.g., evolutionary computation (Fung et al. 2004; Takagi 2001), deep learning (Fok et al. 2018), decision trees (Kabra and Bichkar 2011), and Bayesian network (Sharabiani et al. 2014). However, the existing prediction models experience challenges in obtaining quantitative relations between the inputs (i.e., learning information and data) and outputs (i.e., academic performance) due to the following two dilemmas. First, there is a lack of criteria for selecting and transforming learning data (including process and performance data) into explainable parameters due to the complexity of teaching and learning contexts and processes (Hussain et al. 2019; Madhavan and Richey 2016; Uccio et al. 2020). Second, difficulties are found in obtaining high precisions of the relations between the learning inputs and performance outputs (Chassignol et al. 2018; Godwin and Kirn 2020; Tomasevic et al. 2020). As a consequence, it is of research interest to address the challenges and develop accurate performance prediction models in online education contexts.

To address those two gaps, this study uses the advanced AI technique—evolutionary computation (EC)—to develop a prediction model of student academic performance and test the precision of the model. We first identify students’ learning data from the entire learning process and establish the specific criteria to define the variables for characterizing the learning processes. Next, we develop a quantitative prediction model using a robust branch of EC namely genetic programming (GP). The prediction model accurately and efficiently predicts the students’ academic performance. Finally, the analytical and pedagogical implications are provided using the empirical research results to guide the development of performance prediction models and design of online engineering courses. The original contributions of this study can be drawn as:

  • Processing learning process and summative data and establishing specific criteria for selecting these data for prediction;

  • Developing an AI model to predict students’ academic performance; and

  • Design guidance and assistance for online engineering education.

2 Literature review

2.1 Existing studies on academic performance prediction

Academic performance prediction is critical for online education since it helps identify students who are likely to fail, provide student-centered learning pathways, and optimize instructional design and development (Asif et al. 2017; Chen et al. 2020; McArthur et al. 2005; Mozer et al. 2019; Roll and Wylie 2016). Different AI algorithms have been used in the existing studies to predict students’ examination performance using classification and regression (Tomasevic et al. 2020). For example, Kotsiantis et al. (2003) applied multiple machine learning (ML) techniques (e.g., Naïve Bayes, k-nearest neighbors) to categorize students in “pass” or “fail”. Minaei-Bidgoli et al. (2003) used several learning algorithms to classify the student results into different categories, including (a) “pass” or “fail”, (b) high, middle and low levels, and (c) nine classes based on the achieved grades. Marquez-Vera et al. (2012) used genetic programming and other data mining algorithms to predict student failure at school. Marquez-Vera et al. (2015) conducted a prediction model for early dropout in high school using data mining methods. More recently, Cano and Leonard (2019) developed an early warning system to underrepresented student populations. In the domain of linear regression, the research problem is predicting the exact scores that students may earn (Drucker et al. 1997). In summary, most of the existing studies have focused on the performance classification and the regression problems of identifying explicit scores.

Other than applying AI algorithms for the classification and regression, AI-enabled prediction models have been developed to predict academic performance based on the specific input variables that can characterize student learning. A review has been carried out to summarize the performance prediction models into three categories: the similarity-based, model-based and probabilistic approaches (Tomasevic et al. 2020). The review highlighted that the two critical components to develop the prediction models were (a) identifying variables that can characterize the learning processes and performances, and (b) analyzing the learning data using the appropriate AI algorithms. However, there are gaps in the current development of prediction models related to the data identifications and data analytics. First, regarding the data identification, researchers tend to consider all the available student information data (e.g., age, gender, religion, place of living, job, grades, etc.) in the prediction models (Asif et al. 2017), rather than using the data that reflect the specific learning process (Suthers and Verbert 2013). In other words, the identification of student data in most of the existing prediction models is not underpinned by specific standards that characterize the students’ learning process. For example, the most frequently used input data in the prediction models include students’ prior performance, engagement level, and demographic information (Lam et al. 1999; Nicholls et al. 2010; Tomasevic et al. 2020). However, the prediction results usually indicate that no classifier or variable plays a more significant role than the others in academic performance (Asif et al. 2017; Oskouei and Askari 2014; Yehuala 2015). One way to address the issue is to deliberately choose the student data that are underpinned by a learning theory in order to reflect the specific learning process. Because one of the goals of the performance prediction models is to optimize student-centered learning pathways, the choice of students’ input data should be guided by the student-centered learning principle and reflect the student-centered learning processes (Ouyang & Jiao, 2021). There are emerging studies that focus on using online learning behavior data from the process-oriented perspective to accurately predict academic performance, rather than merely using student information data (e.g., demographics) or performance data (e.g., final grades) (Bernacki et al. 2020). Echoing this research trend, this research designs a collaborative learning mode in online courses and deliberately chooses student data from the collaborative process to make academic performance predictions.

Second, regarding the data analytics, ML algorithms have been widely used in developing the prediction models for the academic performance of students, e.g., artificial neural networks (ANN), support vector machines (SVM), and decision trees (DT) (Chassignol et al. 2018; Fernandes et al. 2019; Fok et al. 2018). The existing studies have dedicated the efforts on estimating the implicit correlations between the observed learning data and predicted performance (Tomasevic et al. 2020). The ANN prediction models typically consist of series of connected artificial neurons to simulate the neurons in the biological brains, which are critically affected by many factors, e.g., learning rate, objective function, weights initialization, etc. (Chen et al. 2020). SVM has been used to address the challenges in the classification of learning data, i.e., data regression, which is efficient in obtaining the nonlinear classification and implicitly mapping input variables into the higher dimensional space (Drucker et al. 1997). DT has been reported to develop the prediction models using the tree-shaped graph to model the possible decisions and the corresponding consequences (Chaudhury and Tripathy 2017). However, quantitative prediction models have not yet been developed using those AI algorithms to accurately identify the exact relations between the input variables in learning process and output academic performance. This study fills such gap by developing the EC model to explicitly represent the quantitative relations of multiple learning variables in order to predict students’ academic performances in online education.

2.2 Quantitative prediction using evolutionary computation (EC)

This study uses genetic programming (GP), a branch of evolutionary computation (EC), to develop the prediction model. Inspired by the evolution process in the natural world—Survival of the Fittest, EC has been reported as a powerful subdivision in the AI domain, which includes evolutionary strategies (ESs) and evolutionary programming (EP). These techniques are collectively known as evolutionary algorithms (EAs). The EAs are powerful tools to accurately address complex datasets and identify the quantitative relations between the input and output variables (Pena-Ayala 2014). GP is a specialization of EAs that offers high model transparency and knowledge extraction, leading to the conceptualization of the phenomena and derivation of mathematical structures for complex problems. GP evolves prediction models with tree-like structures, which can be recursively evaluated. Therefore, GP typically uses the programming languages that naturally embody tree structures. The nodes in such tree-like computer programs consist of operator functions while all the terminal nodes have operands. This unique model structure facilitates evolving and evaluating millions of mathematical expressions correlating the input variables to the output. GP contains the initial populations of arbitrary variables improved by a series of the genetic operators such as recombination, mutation, or reproduction (Saa 2016). The arbitrary variables are the encoded solutions given by the binary strings of numbers and assessed by certain fitness functions (Xing et al. 2015). The initial populations are randomly selected and the variables are analyzed based on the fitness function. The populations with the most efficient fitness are defined as the higher chance to become the parents for the next generation in GP (Timms 2016). Consequently, improving the initial population of GP algorithms is the key process to obtain the fittest solution with the most efficient convergence.

Compared with their counterparts in AI, GP has the advantages of producing highly nonlinear prediction functions without requirements on predefined existing relations of the variables (Xing et al. 2015). The prediction models evolved by traditional GP are typically in the tree-shaped structures and programmed in the functional programming language (e.g., LISP) (Zaffer et al. 2017). Therefore, GP can be used to develop the quantitative prediction models to establish the exact relations between the input variables and output response in predicting the learning performance of students (Martin and Betser 2020). However, the application of GP in online education to obtain the quantitative prediction models has yet to be exploited, especially by establishing certain criteria to analyze and convert learning processes into input data. GP and other AI methods are based on the data alone to develop the structure of the models. This implies the important role of collecting datasets with wide ranges for the predictor variables to develop more robust models. For cases where the databases are not large enough, advanced validation methods (e.g. k-fold cross validation) can be deployed to verify the efficacy of these methods.

2.3 Requirements for the quantitative prediction models in online education

The main challenges in this study is to define the main requirements of the GP prediction model (Nguyen et al. 2020). The prediction models should be able to accurately predict the learning performance of students with respect to the five considerations, i.e., interpretability, accuracy, speed, robustness and scalability. In particular, interpretability refers to the criteria established to interpret the learning data from the online courses, which serves as the fundamental stone of the AI predication models. Accuracy refers to the correctness of the AI models in predicting the academic performance of students, which can typically be validated with predication results from other models. Speed refers to the low computational cost used to obtain the prediction results, which is particularly important to the AI models that are designed to address the real-time learning data (e.g., learning-feedback-adjusting process in online education). Robustness refers to the reliability of predicting the learning performance of students from noisy data, which is critical in effectively mining the learning data. Scalability refers to the capability of obtaining prediction results from large volume of multimodal learning data. The prediction model developed in this study aims to achieve these five characteristics. In addition, previous studies usually used the same student dataset to conduct the cross-validation in order to evaluate the prediction results; in other words, the same student cohort was used to develop and validate the prediction models (Asif et al. 2017). But from an educational perspective, it is more appropriate to build the prediction model using the dataset from one group of students, and evaluate the five characteristics of the model by another group of students (Asif et al. 2017). To achieve a more accurate assessment, this study takes different datasets to build and evaluate the prediction model.

3 Research methodology

3.1 Research context, participants and course design

The research context is an online engineering course—Smart Marine Metastructures—offered in Spring 2020 (8 weeks) in the Ocean College at Zhejiang University in China. Smart Marine Metastructures was offered by the first author as a completely online course hosted through the Blackboard online platform (XueZaiZheDa http://course.zju.edu.cn/) and DingTalk (China’s version of Zoom), where the course syllabus, materials, and other resources were uploaded and recorded. Participants were 35 full-time graduate students from the Ocean College at the university; they were all Chinese aged from 22 to 27 years old (female: N = 11; male: N = 24). Prerequisite courses were required for basic background knowledge in structural analysis and artificial intelligence.

This course was designed as a collaborative learning course by the interdisciplinary research team. Grounded upon the social perspectives of learning (Vygotsky 1978), collaborative learning is defined as a small group of people participate in coordinated activities to maintain mutual understandings of problems, to advance joint meaning-making, and to create new knowledge or relevant artifacts (Dillenbourg 1999; Goodyear et al. 2014; Roschelle and Teasley 1995). In engineering education, learning to be an engineer means learning to participate in engineering discourses: the words, discourses, and narratives through which engineers think and communicate (Betser and Martin 2018; Rojas 2001). As a student-centered learning, collaborative learning can foster engineering learners’ cognitive thinking during the intrapersonal process as well as knowledge construction through social interactions (Damşa 2014; Liu and Matthews 2005; Ouyang and Chang 2019). This research followed the ethical, legal requirements from the university’s research ethics committee. All the students were well informed and agreed to participate in the experiments reported in this study.

Following the collaborative learning mode, the instructor (the first author) designed the three components in this course, including the online lecture, group discussion, and literature review writing. The first component, namely the online lecture, included the basic introduction and advanced learning. The instructor first introduced the main concepts and theories of the field in the first two weeks. Then, during the advanced learning process in the following six weeks, the instructor introduced three gradually-advanced content, i.e., advanced marine metastructures, artificial intelligence in engineering, and structural health monitoring in marine engineering. The former part served as the prerequisite knowledge for the latter part, such that students could progressively build their understandings of the course content. The second component was the group discussion, in which students explored the concepts, exchanged understandings, and constructed meanings in small groups through DingTalk. Students autonomously formed five groups (seven students/group) based on their research interests (see Table 1). The instructor provided prompting questions to guide students’ inquiry of research topics in advance; and the instructor was not engaged in the discussion process. The third component was the group work of literature review writing completed by the groups. The writing process included four steps: first, developing the outline of literature review, second, writing the initial drafts, third, make revisions based on the instructor’s feedback, and finally finalizing the literature review. Students in the same group were graded as the same grade. At the end of the semester, online oral presentations were delivered to the class by the small groups based on their write-up of the literature review. Students made peer-assessment of the oral presentation based on a rubric. The instructor evaluated the groups’ literature review write-up based on a rubric. After the course, the students made a self-reflection to evaluate their acquisition of knowledge in this course comparing to their knowledge prior the course.

Table 1 Comparison of the research background, group discussion topics and semester reports in the five groups

3.2 Research purpose and questions

The research purpose is to obtain a quantitative prediction model to predict the learning performance of the students in online learning, which can be used to analyze the contributions of the input variables to the academic performance, and therefore, optimize the design of online courses based on the results. Taken together the existing studies and the requirements for the prediction models, this study addresses the two research challenges, i.e., the learning data identifications due to the lack of criteria, and the learning data analytics due to the lack of the appropriate AI algorithms. The research questions can be summarized as:

  1. 1.

    How to identify the dominant learning variables that significantly affect the student learning performance?

  2. 2.

    How to develop the robust quantitative prediction model to predict students’ performance with a reasonable accuracy?

  3. 3.

    How to optimize the online course based on the performance prediction results generated by the model?

Resolving the research questions, we propose the characterization criteria to particularly identify the learning processes and obtain the learning data, and then use the learning data to develop the quantitative prediction model using the GP algorithm.

3.3 Identification of criteria, variables definition and preliminary data analytics

The criteria are defined to categorize and analyze the students’ learning results obtained from the 35 graduate students in the online ocean engineering course. Student performance variables are identified from both the summative and process perspectives, which include the pre-course prerequisite knowledge, participation performance, procedural performance, summative performance, and post-course knowledge acquisition. Students’ pre-course prerequisite knowledge and post-course knowledge acquisition are self-evaluated by students in terms of a questionnaire. This questionnaire includes ten questions; each question asks students to evaluate their knowledge level of one main topic covered in this course. All responses are measured with a 5-point scale, ranging from 1 point (do not understand the topic), 2 points (understand the topic but need external assistance for clear explanation), 3 points (can directly explain the topic without any assistance), 4 points (can directly explain the topic and its applications in the research and practice without any assistance), to 5 points (understand this topic, can elaborate its applications in research and practice, and are able to apply it in my own study). The student participation includes their participation frequency in the class discussions and in the group discussions. The frequencies are measured as their interaction frequency in the class-level and the group-level discussions. Procedural performance includes students’ performance in group discussions, write-ups and group presentations. Quantitative content analysis is used to evaluate students’ procedural performance. Their oral and written content from discussion, write-ups and presentations were recorded and transcribed into text and two trained raters code those text content in terms of a coding scheme (see Table 2). This coding scheme includes three levels of knowledge contribution, namely superficial-, medium-, and deep-level knowledge. A weighted score (i.e., NSK + 2NMK + 3NDK) is calculated for each student as the procedural performance. The summative performance is the instructor’s evaluation of students’ final write-up of the literature review. Note that the final learning effectiveness of the students Lrneff (i.e., the output variable in the GP prediction) is defined as the total grades of students.

Table 2 The content analysis coding scheme (Ouyang and Chang 2019)

Next, we analyze the learning data to define the variables for developing the GP prediction model. According to the academic performance criteria created in the previous section, a total of 8 input variables are used in the EC model, which are categorized into five types including the prerequisite knowledge (i.e., background of students PKs), the participation frequency in the class and group discussions (i.e., Parclass and Pargroup), the procedural performance (i.e., discussion performance Perfdis, write-up performance Perfwrite, and presentation performance Perfprez), the summative performance (i.e., summative evaluation of the final write-up Perfsum), and the knowledge acquisition (i.e., self-evaluation of knowledge acquisition after the course KAkn) (see Table 3).

Table 3 Input and output variables

We use a traditional linear regression approach to make preliminary data analytics. It can be seen that similar performance (i.e., patterns that either evenly distributed or critically fluctuated) are observed on every variable for the five groups, which indicates that the students in the entire class are learning with similar effectiveness (see Fig. 1). However, the results show that the linear regression analysis approach is ineffective in analyzing the complex learning data in the online engineering course (Multiple R = 0.479, R2 = 0.230, Std. error = 6.240). According to the linear regression of Lrneff in the five groups, the deviation of the first group G1 is the smallest while the fifth group G5 is the largest. Consequently, a more advanced approach is needed to develop the quantitative prediction model. In next section, an advanced AI technique, i.e., GP, is applied to develop the predictive model for the learning data.

Fig. 1
figure 1

The input and output variables of the student groups in the online engineering course. (All the input scores are normalized to the full score as 100 to ensure that the influence of certain variables will not be eliminated)

4 Development of the performance prediction model using the GP algorithm

4.1 Definition of the GP model

According to the previous discussion, the output of the prediction models is defined as the learning effectiveness Lrneff of the graduate students in the online engineering course. The input variables are classified into five categories, i.e., the prerequisite, participation, procedural performance, summative performance, and knowledge acquisition, as summarized in Table 3. As a consequence, the quantitative prediction model (i.e., objective function) can be written as

$${\text{Lrn}}_{{{\text{eff}}}} = f\left( {\underbrace {{{\text{PK}}_{{{\text{bg}}}} }}_{{{\text{prerequisite}}}},\underbrace {{{\text{Par}}_{{{\text{class}}}} ,{\text{Par}}_{{{\text{group}}}} }}_{{{\text{participation}}}},\underbrace {{{\text{Perf}}_{{{\text{dis}}}} ,{\text{Perf}}_{{{\text{write}}}} ,{\text{Perf}}_{{{\text{prez}}}} }}_{{{\text{procedural}} {\text{perf}}}},\underbrace {{{\text{Perf}}_{{{\text{sum}}}} }}_{{{\text{summative}}\;{\text{perf}}}},\underbrace {{{\text{KA}}_{{{\text{kn}}}} }}_{{{\text{knowledge}}}}} \right)$$
(1)

where f represents the unknown highly nonlinear relationships between the output and inputs, which can only be determined using the AI techniques. The GP prediction model presented in Eq. (1) is defined in terms of 8 input variables, that includes both process and summative learning data.

4.2 Training and testing of the GP model

The GP algorithm directly learns from the learning data and extracts the subtle functional relations between the independent input variables (see Table 3). The GP model efficiently considers the interactions between the output variable Lrneff and the input variables. Due to the lack of the learning data (i.e., only 35 datapoints) in the online engineering course, the k-folder cross validation has been applied for the model development, where k = 5 in this study. A series of preliminary runs revealed the influences of PKbg, Parclass, Pargroup, Perfdis, Perfwrite, Perfprez, Perfsum, and KAkn on improving the prediction performance of the GP model. Extensive preliminary analyses were performed to tune the GP parameters including the crossover rate, initial population size, mutation rate, program head size, etc. In particular, more than fifty different combinations of the factors were considered for deriving the best prediction GP model for Lrneff. Details of training parameters and ranges used in this study are listed in Table 4. Parameter ranges were selected based on a trial study, and according to previous studies (Roy et al. 2010; Gandomi and Alavi 2012; Zhang et al. 2021). Three replications were conducted for every factor combination, and the GP algorithm was until no significant improvements was observed inLrneff. The simulations were conducted on a desktop computer with CPU: Intel® Xeon® CPU E5-1650 v4 @ 3.60 GHz, GPU: NVIDIA Quadro K420, RAM: 31.9 GB. The total time for training the current dataset was 15 min and 32 s for the optimal model. The best prediction model was selected from all models with highest accuracy and lowest loss. The gene trees of the best model are illustrated in Fig. 2. Individual gene expression and final simplified model are shown in Eqs. (2)-(9).

$${\text{Gene}}_{1} = - 6.5 \cos \left( {{\text{Perf}}_{{{\text{prez}}}} } \right),$$
(2)
$${\text{Gene}}_{2} = - 104200\frac{{\cos \left( {{\text{Perf}}_{{{\text{sum}}}} } \right)}}{{{\text{Perf}}_{{{\text{dis}}}}^{3} }} ,$$
(3)
$${\text{Gene}}_{3} = 7.1 \cos \left( {\cos \left( {\sqrt {{\text{PK}}_{{{\text{bg}}}} } } \right)} \right)^{{\cos \left( {{\text{KA}}_{{{\text{kn}}}} \cos \left( {{\text{Perf}}_{{{\text{sum}}}} } \right)} \right)}} ,$$
(4)
$${\text{Gene}}_{4} = - 676.4\cos \left( {\cos \left( {\log \left( {{\text{Par}}_{{{\text{class}}}} } \right)} \right)} \right)^{{\cos \left( {{\text{Perf}}_{{{\text{prez}}}} } \right)}} ,$$
(5)
$${\text{Gene}}_{5} = - 0.9{\text{KA}}_{{{\text{kn}}}} \cos \left( {\cos \left( {\log \left( {{\text{KA}}_{{{\text{kn}}}} } \right)} \right)} \right)\cos \left( {{\text{KA}}_{{{\text{kn}}}} } \right),$$
(6)
$${\text{Gene}}_{6} = 0.7{\text{Par}}_{{{\text{class}}}} \cos \left( {{\text{KA}}_{{{\text{kn}}}} } \right),$$
(7)

and

$${\text{Bias}} = 754.5.$$
(8)
Table 4 GP training parameter settings
Fig. 2
figure 2

Individual gene trees for developed GP model

The simplified GP model is given as:

$${\text{Lrn}}_{{{\text{eff}}}} = 754.5 - 6.5\cos \left( {{\text{Perf}}_{{{\text{prez}}}} } \right) - \frac{{104200\cos \left( {{\text{Perf}}_{{{\text{sum}}}} } \right)}}{{{\text{Perf}}_{{{\text{dis}}}}^{3} }} + 7.1\cos \left( {\cos \left( {\sqrt {{\text{PK}}_{{{\text{bg}}}} } } \right)} \right)^{{\cos \left( {{\text{KA}}_{{{\text{kn}}}} \cos \left( {{\text{Perf}}_{{{\text{sum}}}} } \right)} \right)}} - 676.4\cos \left( {\cos \left( {\log \left( {{\text{Par}}_{{{\text{class}}}} } \right)} \right)} \right)^{{\cos \left( {{\text{Perf}}_{{{\text{prez}}}} } \right)}} - 0.9{\text{KA}}_{{{\text{kn}}}} \cos \left( {\cos \left( {\log \left( {{\text{KA}}_{{{\text{kn}}}} } \right)} \right)} \right)\cos \left( {{\text{KA}}_{{{\text{kn}}}} } \right) + 0.7{\text{Par}}_{{{\text{class}}}} \cos \left( {{\text{KA}}_{{{\text{kn}}}} } \right),$$
(9)

where Pargroup and Perfwrite are omitted due to the similarity in the learning data. Equation (2) defines the quantitative relations between the input variables and the learning effectiveness of the students in the online engineering course. It can be seen that the GP model is a complex combination of the variables and operators to predict Lrneff. The fittest solutions obtained by the GP model after controlling the millions of preliminary linear and nonlinear running via the evolutionary process.

In order to benchmark the prediction power of GP against other conventional AI methods, ANN and SVM prediction models are developed using the same database. MATLAB version 2019a is used to develop the ANN and SVM models. The model performance evaluation of GP, ANN and SVM models are carried out. Figure 3 presents the comparisons of the measured and predicted learning effectiveness using the GP, ANN and SVM prediction models. This figure also presents the correlation coefficient (R) and mean squared error (MSE) performance indexes for the model for the training and testing data. It is seen that GP outperforms SVM on the training and testing data. GP outperforms ANN on the training data and provides a comparable performance with ANN on the testing data. However, it should be noted that ANN suffer from some major shortcomings. The first issue is that the knowledge extracted by ANNs is stored in a set of weights that cannot be properly interpreted. In ANN-based simulations, the weights and biases are randomly assigned for each run. These assignments considerably change the performance of a newly trained network even all the previous parameter settings and the architecture are kept constant. This leads to extra difficulties in selection of optimal architecture and parameter settings. Moreover, the structure and network parameters of ANNs should be identified in advance. This is usually done through a time-consuming trial and error procedures. The GP method presented in this study overcomes these shortcomings by introducing the completely new characteristics and traits. One of the major distinctions of EC lies in its powerful ability to model the learning behavior without requesting prior form of the existing relationships. The numbers and combination of terms are automatically evolved during the model calibration in GP, which is different from that in ANNs. The other superiority of the proposed GP method over ANN and almost all other AI methods pertains to its ability to extract the complex functional relationships for the investigated system.

Fig. 3
figure 3

Comparisons of the measured and predicted learning effectiveness using the GP and ANN models: a GP model on the training data, b ANN model on the training data, c SVM model on the training data, d GP model on the testing data, e ANN model on the testing data, and f SVM model on the testing data

A parametric analysis is performed to ensure the robustness of the developed models. This analysis is based on varying one parameter within a practical range, while other parameters are kept constant value. Figure 4 presents the parametric analysis results for the learning effectiveness Lrneff with respect to the input variables PKbg, Parclass, Perfdis, Perfprez, Perfsum, and KAkn. It can be seen that Lrneff is heavily sensitive to participation in class Parclass, summative performance Perfsum and knowledge acquisition KAkn. As we can see, knowledge acquisition affects the learning effectiveness in a sinusoidal trend. The learning effectiveness is increasing with higher participations in class and summative performance. Also, increasing performance of presentation (Perfprez) and discussion (Perfdis) result in higher learning effectiveness. On the other hand, the learning effectiveness is decreasing with higher prerequisite knowledge (PKbg).

Fig. 4
figure 4

Sensitivity analysis obtained using the GP prediction model for the learning effectiveness Lrneff with respect to the variables of a PKbg, b Parclass, c Perfdis, d Perfprez, e Perfsum, and f KAkn Pargroup and Perfwrite are omitted)

To simplify the analysis while investigating the most dominant variables (i.e., Parclass, Perfsum, and KAkn) in the optimal design for the online education course, PKbg, Perfdis and Perfprez are fixed as the mean values of the 35 students (i.e., PKbg = 43, Perfdis = 55, and Perfprez = 89). Substituting the constants to Eq. (9), the GP prediction model is reduced to:

$${\text{Lrn}}_{{{\text{eff}}}} = 754.4 - 0.6\cos \left( {{\text{Perf}}_{{{\text{sum}}}} } \right) + 7.1 \cdot 0.6^{{\cos \left( {{\text{KA}}_{{{\text{kn}}}} \cos \left( {{\text{Perf}}_{{{\text{sum}}}} } \right)} \right)}} - 676.4\cos \left( {\cos \left( {\log {\text{Par}}_{{{\text{class}}}} } \right)} \right)^{0.02} - 0.9 {\text{KA}}_{{{\text{kn}}}} \cos \left( {\cos \left( {\log {\text{KA}}_{{{\text{kn}}}} } \right)} \right)\cos \left( {{\text{KA}}_{{{\text{kn}}}} } \right) + 0.7 {\text{Par}}_{{{\text{class}}}} \cos \left( {{\text{KA}}_{{{\text{kn}}}} } \right),$$
(10)

4.3 Validation of the GP model

To validate the proposed GP prediction model, we apply the model to another online course—Information Technologies and Education—designed by the same research team (taught by the second author), using the same pedagogy (i.e., collaborative learning mode) and technologies (i.e., XueZaiZheDa and DingTalk) as the online engineering course. The validation course is a graduate-level, 8-week course offered in 2020 summer semester by the Educational Technology (ET) program at the same university. This course focuses on learning theories, instructional design, educational technologies, emerging tools, and trending topics related to the application of information technologies in education. 19 graduate students (female: 10; male: 9) from the College of Education enrolled in this course. Learning data collection is the same as the online engineering course.

Since the validation course used the same pedagogy and technology, we use the learning effectiveness Lrneff of the 19 students in this course to validate the proposed GP prediction model. Note that the learning effectiveness Lrneff in the validation course was measured as the total grades of the students in the class. Figure 5 compares the learning effectiveness Lrneff between the actual performance of the validation course and result generated by the GP model. It can be seen that the GP model accurately obtains the distribution pattern of Lrneff over the entire 19 students in the validation course. The maximum difference between the GP model and validation course is 5%, which demonstrates the accuracy and efficiency of the developed prediction model. In addition, the GP model accurately predicts the learning effectiveness of the students below the average of the validation course, i.e., \({\text{Lrn}}_{{{\text{eff}}}} \le {\text{Lrn}}_{{{\text{eff}}}}^{{{\text{mean}}}}\). Therefore, the GP model is able to find the students with inadequate Lrneff; the information can be provided to the instructor for providing further interventions of those low-performing students.

Fig. 5
figure 5

Comparison of the learning effectiveness Lrneff between the GP prediction model and the existing validation course

4.4 Optimization of online learning using the GP model

Here, we study Lrneff function in Eq. (10) to obtain the optimal online course design (i.e., the maximum learning effectiveness) for the online engineering education. Investigating the contributions of the dominant variables (i.e., Parclass, Perfsum, and KAkn) to Lrneff, we find the most important variable to affect the learning effectiveness. Therefore, Lrneff of students can be effectively improved by increasing the most important variable, which provide helpful guidance to instructors in online engineering education.

The extremum of the Lrneff function in Eq. (10) can be determined with respect to Parclass, Perfsum, and KAkn as

$$\left\{ {\begin{array}{*{20}l} {\frac{{\partial {\text{Lrn}}_{{{\text{eff}}}} }}{{\partial {\text{Par}}_{{{\text{class}}}} }} = 0} \hfill \\ {\frac{{\partial {\text{Lrn}}_{{{\text{eff}}}} }}{{\partial {\text{Perf}}_{{{\text{sum}}}} }} = 0} \hfill \\ {\frac{{\partial {\text{Lrn}}_{{_{{{\text{eff}}}} }} }}{{\partial {\text{KA}}_{{{\text{kn}}}} }} = 0} \hfill \\ \end{array} } \right.,$$
(11)

and the Hessian matrix is used to determine the maximum Lrneff as

$${\text{HM}}_{{{\text{Lrn}}_{{{\text{eff}}}} }} = \left[ {\begin{array}{*{20}c} {\frac{{\partial^{2} {\text{Lrn}}_{{{\text{eff}}}} }}{{\partial {\text{Par}}_{{{\text{class}}}}^{2} }}} & {\frac{{\partial^{2} {\text{Lrn}}_{{{\text{eff}}}} }}{{\partial {\text{Par}}_{{{\text{class}}}} \partial {\text{Perf}}_{{{\text{sum}}}} }}} & {\frac{{\partial^{2} {\text{Lrn}}_{{{\text{eff}}}} }}{{\partial {\text{Par}}_{{{\text{class}}}} \partial {\text{KA}}_{{{\text{kn}}}} }}} \\ {\frac{{\partial^{2} {\text{Lrn}}_{{{\text{eff}}}} }}{{\partial {\text{Par}}_{{{\text{class}}}} \partial {\text{Perf}}_{{{\text{sum}}}} }}} & {\frac{{\partial^{2} {\text{Lrn}}_{{{\text{eff}}}} }}{{\partial {\text{Perf}}_{{{\text{sum}}}}^{2} }}} & {\frac{{\partial^{2} {\text{Lrn}}_{{{\text{eff}}}} }}{{\partial {\text{Perf}}_{{{\text{sum}}}} \partial {\text{KA}}_{{{\text{kn}}}} }}} \\ {\frac{{\partial^{2} {\text{Lrn}}_{{{\text{eff}}}} }}{{\partial {\text{Par}}_{{{\text{class}}}} \partial {\text{KA}}_{{{\text{kn}}}} }}} & {\frac{{\partial^{2} {\text{Lrn}}_{{{\text{eff}}}} }}{{\partial {\text{Perf}}_{{{\text{sum}}}} \partial {\text{KA}}_{{{\text{kn}}}} }}} & {\frac{{\partial^{2} {\text{Lrn}}_{{{\text{eff}}}} }}{{\partial {\text{KA}}_{{{\text{kn}}}}^{2} }}} \\ \end{array} } \right].$$
(12)

Substituting Eq. (10) into Eqs. (11) and (12), however, we encounter the difficulty in analytically solving for the maximized Lrneff due to the complexity nature of the objective function. As a consequence, numerical method is used to maximize Lrneff by discretizing the objective function and variables. Figure 6 demonstrates the flowchart of maximizing the learning effectiveness function Lrneff using the analytical and numerical methods. Eventually, we obtain the maximum learning effectiveness as \({\text{Lrn}}_{{{\text{eff}}}}^{\max } = 132.3\), and the corresponding optimal variables are: \({\text{Par}}_{{{\text{class}}}} = 80\), Perfsum = 78.6, and KAkn = 91.2.

Fig. 6
figure 6

Flowchart to maximize the learning effectiveness function Lrneff using the analytical and numerical methods

Figure 7 presents the distributions of the learning effectiveness Lrneff with respect to Parclass, Perfsum and KAkn. Figure 7a indicates the influences of KAkn and Perfsum on Lrneff at Parclass = 80. It can be seen that Lrneff is critically fluctuated with KAkn, while it is not significantly affected by Perfsum to the same level. Therefore, we find that KAkn plays a much more significant role on Lrneff. Figure 7b indicates the influence of KAkn and Parclass on Lrneff at Perfsum = 78.6. Although similar findings are obtained that KAkn greatly affects the learning effectiveness, Parclass is likely to play a role as well. Figure 7c shows the influences of Perfsum and Parclass on Lrneff at KAkn = 91.2. Comparing between Parclass and Perfsum, it is found that the class participation is more important. As a consequence, the significance of the three variables to the learning effectiveness can be characterized as \({\text{KA}}_{{{\text{kn}}}} > {\text{Par}}_{{{\text{class}}}} > {\text{Perf}}_{{{\text{sum}}}}\).

Fig. 7
figure 7

Distributions of the learning effectiveness Lrneff with respect to a KAkn and Perfsum at Parclass = 80, b KAkn and Parclass at Perfsum = 78.6, and c Perfsum and Parclass at KAkn = 91.2

5 Discussions

5.1 Addressing the research questions

To answer the three research questions of this study, we first identify the dominant variables that significantly affect the student learning performance. The learning effectiveness (i.e., academic performance) function Lrneff obtained by the GP prediction model demonstrates that the dominant variables that affect student learning in the online engineering course are the knowledge acquisition KAkn, following by the participation in class Parclass, and the summative performance Perfsum. Furthermore, the prerequisite knowledge PKbg tends not to play a key role, which indicates that the students with different levels of background knowledge could reach similar learning effectiveness after course learning. Regarding the second research question, we apply the developed prediction model to another online course and the results indicate that a reasonable accuracy of the model for predicting students’ learning performance (with a maximum difference between of 5%). Therefore, the results demonstrate the accuracy and efficiency of the developed prediction model. According to the results, students’ self-evaluation of knowledge acquisition, class-level participation frequency and the instructor’s summative evaluation serve as the critical indicators of particularly good or poor performance. Finally, the results indicate that we can optimize the online course based on the performance prediction results generated by the reported model.

5.2 Pedagogical implications

Academic performance prediction is a difficult problem to solve due to the large number of factors or characteristics that can influence students’ performance (Romero et al. 2013). Based on the empirical research results, we conclude that instructional design of online course should take into consideration students’ self-evaluation, discussion participation, and instructor’s summative evaluation. First of all, because students’ self-evaluation serves as a key to predict student performance, the online instructors can use self-evaluation as formative rather than a summative tool to foster student motivation for high achievement in course design (Arthur 1995). Secondly, consistent with previous research (Ouyang et al. 2020; Romero et al. 2013), our research results show that students’ participation in class discussions is a critical indicator of their learning performance and effectiveness. It is reasonable to conclude that students who obtain higher scores in the course are those who participate in a more active fashion in the class discussions, while those students who obtain lower scores in the course are the ones who participate less active. However, our results indicate that group discussion participation is not a critical indicator for student performance, which could be explained by Chinese students’ cultural tendency to put more emphasis on their performance under the instructor presence rather than peer collaborative learning (Ouyang et al. 2021; Zhang 2013). Thirdly, like previous research indicates, the instructor’s summative evaluation plays an important role to predict student performance; however, it is less important than students’ self-evaluation of their own learning. Taken together, this research provides pedagogical implications of online course design related to critical factors of student evaluation, instructor presence, and cultural background.

5.3 Analytical implications

Previous research indicates that the difficulties in identifying the dominant variables from learning process are likely to significantly affect the student performance, which are the obstacles in the development of quantitative prediction models. Most previous studies rely on the data that are not directly generated from the learning process (e.g., demographic data or other personal information); therefore, the data collection and analytics significantly decrease the effectiveness of the data-driven supports and increase the time and personnel required to manage such initiatives (Bernacki et al. 2020). This study proposes the AI-enabled prediction model using the advanced EC technique to accurately predict the student performance based on the learning variables generated from the students’ collaborative learning process. We argue that the identification of the data variables should be grounded upon the theoretical underpinnings rather than the non-malleable student factors or fixed information. Therefore, we obtain the robust prediction model and bridge the gap between the data-driven approaches and learning theories (Suthers and Verbert 2013). A challenge we face during the prediction model development is the data cleaning and analytics: we manually code students’ oral and written content as one element of the input variable; future work should integrate automatic content analytics to provide real-time prediction models. In addition, the generalizability of the prediction model should be strengthened in two ways: first, taking one dataset to build the model and evaluate the model with another dataset; second, enlarging the validation of the model through multiple iterations of different online courses.

5.4 Limitations

Limitations of the reported GP prediction model can be summarized with respect to the data, algorithm, ethics and generalizability. First, the number of participants used to develop the GP prediction model is relatively small. The proposed model can be enhanced by taking into account more data (e.g., more participants or multiple-round of experiments). Second, the GP algorithm, similar to other supervised AI methods (e.g., ANNs), cannot be deployed to learn in real-time, incrementally or interactively. However, a GP model well-trained with a larger database can be a viable option for offline analysis. Third, the ethical issue, such as the potential influence of student learning outcome performed by AI-enabled models, should be considered by researchers. Future work needs to focus on providing real-time predictions, timely warnings, and advice during online engineering education to ensure students obtain positive influences by AI prediction models (e.g., Asif et al. 2017). Finally, future work should deepen the generalizability of the prediction model through multiple iterations of empirical research in different educational contexts and consider the influence of other external factors such as holidays, family events, social relations, etc.

6 Conclusions

In higher education, the design of prediction models and early warning systems has become a critical enterprise (Bernacki et al. 2020). However, the prediction models suffer the issues related to the learning data identification and analytics. This study addressed the issues by developing the AI model for the quantitative prediction of the academic performance in online engineering education. Like the emerging performance prediction approaches (Bernacki et al. 2020), the learning data identified in the current prediction model overcame the weaknesses of the prior approaches that rely on the non-malleable factors (e.g., student demographics) or the confounded factors (e.g., early performance). In addition, the current prediction model analyzed the quantifiable contributions of the dominant variables in the online engineering course to make an accurate prediction. The main findings indicated that the dominant variables in the online engineering course were the knowledge acquisition, following by the participation in class and the summative performance, while the prerequisite knowledge tended not to play a key role. Based on the prediction results, we provided the pedagogical and analytical implications for the online course design and prediction model development. The AI-based quantitative prediction model can be used to evaluate and predict the learning performance in online engineering education.