Abstract
With the COVID19 outbreak, schools and universities have massively adopted online learning to ensure the continuation of the learning process. However, in such setting, instructors lack efficient mechanisms to evaluate the learning gains and get insights about difficulties learners encounter. In this research work, we tackle the problem of predicting learner performance in online learning using a deep learningbased approach. Our proposed solution allows stakeholders involved in the online learning to anticipate the learner outcome ahead of the final assessment hence offering the opportunity for proactive measures to assist the learners. We propose a twopathway deep learning model to classify learner performance using their interaction during the online sessions in the form of clickstreams. We also propose to transform these time series of clicks into images using the Gramian Angular Field. The learning model makes use of the available extra demographic and assessment information. We evaluate our approach on the Open University Learning Analytics Dataset. Comprehensive comparative study is conducted with evaluation against stateofart approaches under different experimental settings. We also demonstrate the importance of including extra demographic and assessment data in the prediction process.
Avoid common mistakes on your manuscript.
1 Introduction
Education is the cornerstone of civilizations. Its increasing importance moved education to be a priority in everyone’s daily life. The recent advances in technological developments have given rise to a new medium for learning and tutoring, termed among practitioners as Elearning or online learning [28]. Online learning comes with several advantages. Flexibility in enrollment management, lower cost of online course, and time saved for both students and teachers have contributed to its popularity in education [29]. It also can be offered to everybody without university enrollment requirement and with affordable costs. In addition, learners can arrange their schedule according to their convenience without schedule restrictions. Furthermore, Elearning platforms have massive users data. Hence, machine learning and specifically recommendation systems are deployed to enhance the learning experience by personalizing the content for each learner. It is forecasted that the online education market will reach $350 billions by 2025 and the forecast does not count for the growth impacts of COVID19 on the online learning market [32]. Elearning platforms are continuously proliferating. Among the famous ones, we can cite Udemy,^{Footnote 1}, Coursera^{Footnote 2}, Udacity^{Footnote 3} and Edx^{Footnote 4}. In recent times, the COVID19 pandemic forced educational institutes to observe the importance of online teaching. Elearning has become the defacto solution to maintain the continuation of the educational process around the world [31, 38]. This opens doors to huge research opportunities [30].
Dossou [1] conducted a study to determine the probability of "ELearning’s De Facto" in educational sector during the COVID19 pandemic. Findings show that the probabilities of Elearning’s ’de facto’ implementation is 0.65 in the world, 0.87 in HighIncomes, 0.70 in UpperMiddleIncomes, 0.52 in LowerMiddleIncomes, and 0.29 in LowIncome economies. Unlike inclass, online learning session exhibits less interaction between instructor and students. Consequently, instructor has less parameters athand that enable him to accurately assess student performance in the session which is usually derived from cognition, sentiment, visual contact etc. Jordan [2] reported that for Massive Open Online Courses (MOOCs), the completion rate is low (0.7%52.1%) with median value of 12.6%. Similar conclusion have been drawn in [3] for a study of online courses offered by Open University UK and China.
In this work, we address the problem of predicting learner performance in Elearning by adopting a novel strategy. Unlike stateofart techniques where LongShort Term Memory (LSTM) is commonly adopted, we propose a strategy where the clickstream time series (the interaction of learners during the online session) are transformed into images using the Gramian Angular Field hence allowing to benefit from the power of Convolution Neural Network (CNN) and avoiding LSTM shortcomings such as the vanishing/exploding gradient problem. In addition, the proposed model incorporates demographic and online assessment data of learners by dedicating a second pathway of fully connected layers. Both pathways are aggregated and followed by a classifier to output probabilities of classes: Distinction, Pass, Fail and Withdrawal. This model is strengthened by the Batch Normalization to regularize the model and hence cope with the problem of unbalanced classes. We demonstrate the benefit of the proposed approach by conducting comprehensive experiments with comparison against stateofart approaches.
The rest of the paper is organized as follow: ins section 2, we present the research works that addressed the problem of predicting learner performance in Elearning. In section 3, we details the dataset used in this study. Section 4 details the proposed approach. Experimental results and validation of our approach are presented in section 5. We conclude and present future directions in the last section.
1.1 Machine learning techniques for student performance prediction
Machine learning has revolutionized various sectors, industries and services. Educational data mining has emerged as a new field aiming at supporting decision makers [33, 34]. With MOOC and online learning becoming more and more popular, multiple research efforts have focused on applying stateofart machine learning techniques to address several research questions including but not limited to student performance prediction. AlShabandar et al. [4] conducted an exploratory data analysis on MOOC data with the objective of predicting student outcome in a course. Authors identified a strong correlation between clickstream actions and the outcome of the online learners. In addition, multiple machine learning algorithms including Decision Tree (DT), Random Forest (RF), Support Vector machine (SVM), Naive Bayes (NB), Feedforward Neural Network (NN), Linear Discriminant Analysis (LDA) and Self Organized Map (SOM) have been used to predict learner outcome. RF achieved the best performance with 98.8% accuracy when trained on all features. When trained on the most important features, RF achieved 98.5% accuracy. In [5], a semiautomated peerassessment system is used for two computer courses for undergraduate students to collect data. Students asked questions about topics addressed during the sessions, and answered questions from their peers. Answers were then rated. Several features are extracted and used to train multiple linear regression models to predict student scores which range from 18 to 30 out of 30. Root Mean Square Error (RMSE) is used for evaluation. The final trained model achieved a RMSE=2.93 and 3.44 for the first and second course respectively. Authors argued that this prediction could provide insights about online course attrition rate. Azizah et al. [6] conducted a performance comparative study between C4.5 [35] and Naive Bayes algorithms for prediction of student performance in virtual learning environment. The data consist of web history and the sum of webpage interaction of the students. Both algorithms achieved comparable performance with 63% of the data instances accurately classified. In [7], authors adopted an ensemble learning approach to predict students’ academic performance based on their socioeconomic status and historic grades. Ensemble approaches have shown significant performance improvement compared to classic ones in several learning tasks. The idea is to aggregate the decision of multiple classifiers. In other words, the decision is taken collectively by relying on "wisdom of the crowd". Authors used three classification algorithms: DT , kNearest Neighbors (kNN) and Aggregating OneDependence Estimators (AODE). This ensemble is tested on three datasets and achieved 87% average accuracy, the best compared to single models. Peach et al. [8] adopted a different strategy by applying an unsupervised learning technique. Specifically, given time series of student engagement during online session, the objective is to identify groups or clusters of learners having similar temporal behavior. Dynamic Time Warping is first used to compute pairwise similarity between time series of learner actions. Then, a multiscale graph clustering algorithm is applied to identify the groups. Findings show distinct engagement patterns of learners with different levels of regularity, adherence to preplanned course structure and task completion. Results also revealed that low performance learners are grouped in distinct cluster, hence accurately identified.
Although machine learning algorithms and techniques are popular choices in educational data mining and have shown great performance, their applicability is still limited to small datasets.
1.2 Deep learning for student performance prediction
Since AlexNet breakthrough in 2012 at ImageNet competition [9], Deep Learning (DL) has been the leading technology in several applications including computer vision, healthcare, security, selfdriving car, and so many others. Recently, DL has attracted practitioners and researchers in educational data mining [39]. Aljohani et al. [10] used clickstream of students during online learning sessions on weekly basis to predict whether learners will pass or fail using deep learning. The data are formed by sequentially stacking the sequence of weekly clickstream. These data are part of the wellknown Open University Learning Analytics (OULA) dataset [11]. The deep neural network consists of a stack of LSTM [12] layers. LSTM is a specific type of neural network perfectly adapted for sequential data such as time series and text. To overcome the notorious problem of overfitting, dropout [13] is used. This model achieved 95.23% accuracy when predicting pass/fail in the last week of the online course outperforming SVM, Logistic Regression and Artificial Neural Network (ANN). In [14], authors used 54 features from OULA dataset to train a DL model to predict student performance. As a preprocessing step, sparse feature reduction technique is applied in order to select the subset of features that mostly affect student performance. Then, Singular Value Decomposition is applied to cater the data sparsity. Finally, minmax is applied for normalization. The preprocessed data are used to train an ANN of three hidden layers with 50, 20 and 10 neurons. In their experiments, authors established four scenarios to classify: pass/fail, distinction/fail, distinction/pass and withdrawn/pass students. Results showed that withdrawn/pass are highly distinguishable with 94% accuracy. Less accuracy is obtained for the rest of the classification formulation with distinction/pass having the lowest accuracy: 80%. The findings also showed that for students achieving distinction, the age, region and having special needs negatively affect their performance. Students in rural areas are faced with difficulty accessing the online learning system due to connectivity problems. For students with failure and withdrawal outcome, the overall credits, highest achieved education and region significantly affected their performance. The age factor has shown positive association with the withdrawal outcome. Indeed, mature learners are less likely to withdraw compared to younger ones. Although this study is comprehensive, its formulation is still restricted to binary classification. This can be explained by the class imbalance in the data making the classification task more challenging. DL has also opened the door for the possibility to learn from variety of data at the same time, i.e. getting insights by combining different data modalities to strengthen the learning process and achieve better performance. Such learning strategy is widely used in computer vision [16] and health applications [17]. Using the demographic data along with the clickstream from OULA dataset, Karimi et al. [15] proposed a deep learning model that learns from both modalities termed as DOPP and DOPPFCN. In literature, demographic is a broad term and may refer to anything ranging from age, gender, location, nationality to income. In their study, Karimi et al. considered the gender, age, highest education level and special need status of students. Authors proposed a network with two pathways. The first one is a stack of fully connected layers, dedicated to extract hidden representations from demographic data. The second pathway is a sequence of LSTM that learns from the clickstream, modeled as time series. Both pathways are merged using simple concatenation followed by a classifier to output the probability of student outcome. Binary and multiclass classification experiments are conducted and F1 score is evaluated for the different courses. The proposed model achieved more than 0.85 F1 score for binary classification and 0.54 for multiclass classification. Experiments also showed that counting for the demographic information boosted the model performance. This latter performance is significantly low due to the imbalance class problem. Similar approach is witnessed by the work reported in [18]. He et al. [19] proposed a threepathway deep neural network where, in addition to demographic and clickstream pathway, a third pathway for assessment data is considered. The assessment stream consists of outcome of learners’evaluation during the semesters in: Tutor Marked Assessment (TMA), Computer Marked Assessment (CMA) and Final Exam (Exam). Results showed that when training is conducted using data up to the 5th week, the proposed model achieved 60% accuracy. As course progresses, more data became available and accuracy improved to reach more than 90% using all data.
DL approaches have showed significant performance improvement compared to traditional machine learning algorithms. Nevertheless, imbalanced class is a recurrent problem and is challenging to overcome.
2 Open university learning analytics dataset
Virtual learning platforms are commonly used by online educational platform to collect data about learners’ interaction during the online session to get better insights about their learning behavior. In this study, we use the OULA dataset. This dataset contains information about 32,593 learners monitored for 9 months. It includes demographics of learners: the area of residence, gender, age, highest education level on entry to the module and special needs status. These learners were enrolled in 7 different courses. Each course is taught at least twice and started at different months of the year 2013 and 2014. Among the seven courses, this study focuses on three particular ones, those with high enrollment. Table 1 presents description of these courses codenamed BBB, DDD and FFF.
The dataset also contains assessment information and mutual information, i.e. the interaction of learners during the online session. This interaction was logged in number of clicks on daily basis for each course. 20 click actions are presented including completing quizzes, visiting URL and resources, filling questionnaires etc. A learner can have four possible outcomes in a course: Distinction, Pass, Fail and Withdrawn. The exploratory data analysis conducted by He et al. [19] showed that learners with frequent interaction and high assessment scores are high likely to pass the course while learners who fail the course, rarely interact during the online session. This observation is visually confirmed by Fig. 1 which illustrates samples of click time series per learner outcome.
3 Proposed approach
Figure 2 illustrates the proposed deep learning model for prediction of learners outcome. As preprocessing, clickstream data i.e. time series of clicks, are transformed into images using the Gramian Angular Summation Field. The demographics and assessment data are typically presented in tabular format. In the forward pass, images are fed in the first pathway which consists of a sequence of CNN blocks. At each CNN block, the input is convolved with a set of filters. This is followed by Batch Normalization and nonlinear transformation, typically a ReLU function. The output of each CNN is then downsized using a Pooling layer. This process is repeated through the rest of the CNN blocks and Pooling layers. At the second pathway, the demographics and assessment tabular data are fed into a sequence of fully connected dense layers. The outputs of both pathways are then merged using simple concatenation of both pathways. We opt for this merging strategy for simplicity and computation reduction although more complicated techniques can be adopted (e.g. Compact Bilinear Pooling). The output of this layer goes through a sequence of fully connected layers. Finally a Softmax classifier outputs the predicted class, i.e. the learner’s outcome. The set of filters and the weights of the fully connected layers are the parameters that will be updated during the training process i.e. the backward pass also called backpropagation. Indeed, this process aims at minimizing the error between the true label of the input data and the model output. A typical error for classification task is the crossentropy loss.
In the following, we present the technical details of time series imaging using Gramian Angular Field. We also describe the layers used to build this model.
3.1 Imaging time series: the Gramian Angular Field
Transforming time series into images has demonstrated performance improvement in several applications [20, 36, 37]. The intuition is to exploit spatial features by projecting the raw time series data into another space, then applying trigonometric transformation. This approach has been proposed by Wang et al. [20] and applied for classification task related to 20 datasets including electrocardiogram and human motions. The obtained images are used to train a Tiled deep CNN [21] for classification. Hitami et al. [22] applied the Gramian Angular Field [20] and Recurrence Plot [23] for time series imaging. Obtained images are used to train a deep CNN. The pipeline showed better performance compared to traditional ones where particular features, e.g. Scaleinvariant feature transform (SIFT) [24], Gabor and Local Binary Patterns (LBP) [25] features are extracted and classified. De Santo et al. [40] encoded the time series as images using several techniques including The Gramian Angular Filed and Recurrence Plot for predictive maintenance. Hong et al. [42] applied the time series imaging paradigm for predictive maintenance in context of photovoltaic arrays. In [41], the authors used the imaging techniques to accurately predict natural gas consumption. Ding et al. [43] used the Gramian Angular Field for fast and accurate fault detection in Direct Current electricity grid. Kong et al. [45] transformed financial time series into images to represent the temporal characteristics and reveal intrinsic feature details for better prediction performance. Imaging time series has been also successfully applied for cancer prediction [44].
The pipeline of imaging time series using Gramian Angular Field is depicted in Fig 3. Given a time series \(ts=[ts_1,ts_2,\cdots ,ts_N]\), normalization is applied:
Where \(\hat{ts}_i\) is the i\(^{\text {th}}\) element of the normalized time series \(\hat{ts}\). The normalization brings the range of value to \([1,1]\). \(\hat{ts}\) is then transformed to polar coordinates system by encoding each value using angular cosine and its timestamp i as its radius:
Where arccos is the inverse of cosine and R is a constant to control the span of the polar coordinates system. This mapping is unique as it ensures one and only one point in the polar coordinates system for the corresponding time series value. After this transformation, the angular property can be exploited to identify the temporal correlation among the time intervals to obtain the images. This can be achieved using the Gramian Angular Summation Field (GASF). GASF image is a matrix of the form:
It can also be written as:
Where tr is the transpose operator and Id is the unit vector. Each \(GASF_{i,j}\) represents the temporal correlation by summation of angular directions. The main diagonal of GASF is the special case which represents the original angular information. It can be used to approximitly reconstruct the time series from the high level features learned by a deep neural network [21]. We illustrate in Fig. 4 the imaging of sin(x) and sin(2x) using GASF Transform. We can clearly notice the difference in patterns between the two images.
3.2 deep learning layers
3.2.1 Convolution layer
A convolution layer consists of a set of filters. Parameters of these filters are learned as a result of the training process of the neural network model. These filters are small in size, typically \(3 \times 3\), \(5 \times 5\) or \(7 \times 7\) and are convolved with the data coming from the previous layer. Given an input I of size \(M \times N\) and a filter K of size \(m \times n\), the convolution of I by the filter K is expressed as:
In other words, the filter is slid across the width and height of I and the dot products between I and K are calculated at every position (i, j). In this way, each O(i, j) is locally connected to a small local region of the input I. The resulting convolution with all filters are stacked along the depth dimension.
3.2.2 Batch normalization
Batch normalization [26] was introduced to combat the effect of distribution change of the inputs from layer to layer. This is previously addressed by lowering the learning rate and carefully initializing each layer parameter. Batch normalization addressed this issue by normalizing each layer input which enables using high learning rate hence accelerating the learning process. It also acts as a regularization hence strengthening the network by reducing the overfitting.
3.2.3 Activation function
The activation function defines the output at the neuron level. It is a mathematical function that is applied on the representation value. The obtained value determines whether the neurons should be activated ("fired") or not, and refers to whether the neuron input is relevant for the model’s prediction. Multiple activation functions have been proposed and studied in the literature. Historically, the sigmoid and hyperbolic tangent function were used. Recently more efficient activation functions have been proposed. The efficiency is reflected in terms of achieving better learning performance and avoiding the notorious problem of vanishing gradients during the minimization of the network loss. \(ReLU(x)=max(0,x)\) is a widely used activation functions that reduces the likelihood of vanishing gradient.
3.2.4 Pooling
The pooling layer is periodically inserted between CNNs. Its purpose is to reduce the spatial size of the output, known as representation, which results in reducing the number of parameters of the deep network. The most common type is the max pooling. Specifically, a filter of typically \(2 \times 2\) is used and downsampling is applied by choosing the maximum value of the representation within the filter, hence discarding 75% of it. Other pooling operation can be applied such us the average.
3.2.5 Classifier: Softmax
The top layer of deep network is commonly set as a softmax. Given \(x=(x_1, x_2,...,x_K) \in \mathbb {R}^K\), let:
The exponential function is applied on each element \(x_i\) of the input x and normalized by diving it by the sum of all the exponentials. The output layer consists of C neurons where C is the number of the data classes. The i\(^{th}\) output is the probability that the input belongs to the i\(^{th}\) class.
3.2.6 Merge
The merge layer receives the output of each pathway and performs a simple concatenation at a specific dimension. Other possible merging approaches include summation, product, Compact Bilinear Pooling etc.
3.2.7 Fully connected dense layer
The fully connected dense layer applies a linear transformation of the form \(Wx+b\) where W and b are the weight matrix and bias vector. This transformation is followed by an activation function. This layer consists of different units called neurons. Each neuron is connected to all neurons from the previous layer, hence the fully connected terminology. The number of neurons per layer is commonly known as the layer size. Fully connected dense layer is data agnostic, i.e. there is no assumptions needed about the input data.
4 Experimental results
In this section, we assess the performance of the proposed approach and validate it on the OULA dataset. Our experimental protocol is as follows:

We compare the proposed approach against: Support Vector Machine (SVM) with radial basis function kernel, Logistic Regression (LR), Deep Online Performance Prediction model (DOPP) [15] and Deep Online Performance Prediction model with fully connected layers DOPPFCN

We assess the model under two formulations: binary and multiclass classification. In the first one, we consider the pass/distinction as a single class i.e. pass and fail/withdrawn as fail. For the second setting, we consider each outcome as a class on its own, i.e. 4class classification problem.

As the courses’ duration is 39 weeks, we evaluate the proposed approach at different weeks e.g. 5th, 10th, 15th etc. We expect that as course progresses, hence more data become available, model accuracy will improve.

We also evaluate the model performance under two settings: intra and intercourse outcome evaluation, i.e. we train the model on data of one and only one specific course and evaluate it on data from that course and other courses.

We demonstrate the importance of including assessment and demographic information of learners in the learning process by reporting the performance with and without these information.

For the binary classification case, we report both the Accuracy and F1 Score. For the multiclass classification case, we are interested in predicting the critical cases i.e. students with risk of failure (Withdraw from course and Fail) representing the minority classes in the data . Hence, we report the Recall in addition to the F1 score, both expressed as:
$$\begin{aligned} \mathrm {F1 = \frac{2*Recall*Precision}{Precision+Recall}} \end{aligned}$$(7)where:
$$\begin{aligned} \mathrm {Precision = \frac{TP}{TP+FP}} \end{aligned}$$(8)$$\begin{aligned} \mathrm {Recall = \frac{TP}{TP+FN}} \end{aligned}$$(9)and where TP, FP and FN are the number of true positive, false positive and false negative samples, respectively.
We use the data of three courses: BBB, DDD and FFF. 80% are used for training and 20% for testing. 20% of the trainining data are used for validation purposes to monitor the model behavior during the learning process. The proposed model is trained using Adam optimizer [27]. The model configuration consists of 3 CNN blocks with 16, 32 and 32 filters, respectively, 2 dense layers of 128 and 32 neurons, respectively. For the binary classification, the loss is the binary crossentropy while for multicclass classification, we use the categorical crossentropy loss. We also use Adam optimizer and set the learning rate = 0.00001 and batch size = 32 for all our experiments.
4.1 Gramian angular summation field of clickstream
Figures 5, 6, 7 and 8 illustrate the GASF images of the clickstream time series of learners for each outcome. A visual inspection of the images clearly shows different visual patterns between pass (pass/distinction) and fail (fail/withdrawn). We also notice visual similarity between the success outcomes i.e. pass/distinction. This similarity is also witnessed between the fail/withdrawn outcome. We can conclude that the multiclass classification task is more challenging compared to the binary classification.
4.2 Binary classification
Figures 9 and 10 depict the variation of the accuracy and F1 score for three courses: BBB, DDD and FFF for binary classification: Fail and Success where the classification is conducted at different weeks from week 5 to week 39. The findings showed that as the courses progressed, performance of all models has improved as more click data become available. By the end of the course, the models reached their best accuracy and F1 score performance. We notice that, on overall, the proposed approach achieved the best performance. For the BBB course, it significantly outperforms SVM, LR and DOPPFCN. For the DDD course, DOPP achieved the best performance up to week 30 then outperformed by our approach for week 35 and 39 week. For the FFF course, SVM failed to accurately classify learner performance. Our approach achieved the best classification performance for weeks 10, 15, 20, 25, 35 and 39.
4.3 Multiclass classification
We report in Figs. 11 and 12 the Recall and F1 scores for classification of learners performance: Fail, Pass, Withdrawal and Distinction. We notice that both scores are lower compared to the binary classification task as more confusion between PassDistinction and FailWithdrawal is introduced. The proposed model achieved the best Recall and F1 scores for all three courses, hence less confusion between the four classes and better detection of the minority classes i.e. Withdrawal and Fail. In fact, the introduction of batch normalization contributed in reducing the overfitting and guided the model towards better distinction between classes. SVM resulted in the lowest performance with F1 score not exceeding 0.5 for all weeks while DOPP showed competitive results. Similar to the binary classification task, as course progressed, more click data are obtained, classification performance has improved.
4.4 Intra and intercourse evaluation
In this experiment, we train our proposed model on data of a specific course and test it on data from the same and other courses for both classification settings: binary and multiclass classification during week 20. Results, detailed in Tables 2, 3, 4, 5 show that our approach achieved good performance for intra domain experiments and outperformed DOPP. It also achieved good results when trained on data of one course and tested on data of another. The proposed approach successfully extracted specific common features for crossdomain learning, although the performance is less compared to the intracourse experiments.
4.5 Importance of including extra information of learners
To demonstrate the importance of including the extra information, we assess the performance of the proposed model when trained with and without the non click data. Binary classification experiments are conducted at weeks 20 and 39. Results, illustrated in Figs. 13 and 14, demonstrate that, when trained with extra non click data, classification performance has improved for all courses.
5 Conclusion
We addressed the problem of predicting learners’ outcome in online learning environment based on their interaction during online sessions in addition to extra demographic and assessment data. Our approach relies on the time series aspect of the clicks and use the Gramian Angular Summation Field to transform these time series data into images. The proposed model, trained on both the click images and extra data, achieved competitive performance when tested at different weeks of the courses. The findings also confirmed that interaction seems to be in a onetoone correspondence with student academic outcome. Hence, more attention and research efforts should be dedicated to the development and implementation of new learning techniques and methodologies to keep learners more engaged in the online session. This aspect is very critical as Elearning is becoming a viable learning option. In future work, we will investigate the residual architecture as a potential upgrade to the proposed model with the objective of reducing the confusion between classes. We also plan to investigate the potential of other imaging techniques such as the Recurrence Plot and its combination with the Gramian Angular Field. A further investigative idea is to apply the stateofart novel attention model most suited for sequential data with breakthrough in natural language processing.
References
Semliko DOSSOU (2020) ELearning “De Facto” Implementation Probabilities in Educational Sector: A Preliminary Estimation if Confinement Should Be Extended in Covid19 Crisis Context. J Public Adm Governance 10(3)
Jordan K (2015) Massive open online course completion rates revisited: Assessment, length and attrition. Int Rev Res Open Distance Learn 16(3)
Jha N, Ghergulescu I, Moldovan AN (2019) OULAD MOOC Dropout and Result Prediction using Ensemble, Deep Learning and Regression Techniques. In: Proceedings of the 11th International Conference on Computer Supported Education  Volume 2: CSEDU, SciTePress, pp 154–164
AlShabandar R, Hussain A, Laws A, Keight R, Lunn J, Radi N (2017) Machine learning approaches to predict learning outcomes in Massive open online courses. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp 713720
Ashenafi MM, Riccardi G, Ronchetti M (2015) Predicting students’ final exam scores from their course activities. In: 2015 IEEE Frontiers in Education Conference (FIE), pp 1–9
Azizah EN, Pujianto U, Nugraha E, Darusalam (2018) Comparative performance between C4.5 and Naive Bayes classifiers in predicting student academic performance in a Virtual Learning Environment. In: 2018 4th International Conference on Education and Technology (ICET), pp 18–22
Pandey M, Taruna S (2016) Towards the integration of multiple classifier pertaining to the Student’s performance prediction. Perspect Sci 8:364–366 Recent Trends in Engineering and Material Sciences
Peach RL, Yaliraki SN, Lefevre D, Barahona M (2019) Datadriven unsupervised clustering of online learner behaviour. arXiv 1902.04047
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet Classification with Deep Convolutional Neural Networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in Neural Information Processing Systems, Curran Associates, Inc., vol 25, pp 1097–1105
Aljohani NR, Fayoumi A, Hassan SU (2019) Predicting AtRisk Students Using Clickstream Data in the Virtual Learning Environment. Sustainability 11(24):1–12
Kuzilek J, Hlosta M, Zdrahal Z (2017) Open University Learning Analytics dataset. Sci Data 4(1):170171
Hochreiter S, Schmidhuber J (1997) Long ShortTerm Memory. Neural Comput 9(8):1735–1780
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J Mach Learn Res 15(56):1929–1958
Waheed H, Hassan SU, Aljohani NR, Hardman J, Alelyani S, Nawaz R (2020) Predicting academic performance of students from VLE big data using deep learning models. Comput Hum Behav 104:106189
Karimi H, Huang J, Derr T (2020) A Deep Model for Predicting Online Course Performance. In: The ThirtyFourth AAAI Conference on Artificial Intelligence (AAAI20), Workshop on Artificial Intelligence for Education
Ngiam J, Khosla A, Kim M, Nam J, Lee H, Ng AY (2011) Multimodal Deep Learning. In: Proceedings of the 28th International Conference on International Conference on Machine Learning, Omnipress, ICML’11 pp 689–696
Ben Said A, Mohamed A, Elfouly T, Harras K, Wang ZJ (2017) Multimodal Deep Learning Approach for Joint EEGEMG Data Compression and Classification. In: 2017 IEEE Wireless Communications and Networking Conference (WCNC), pp 1–6
Qiao C, Hu X (2020) A joint neural network model for combining heterogeneous user data sources: An example of atrisk student prediction. J Assoc Inf Sci Technol 71:1192–1204
He Y, Chen R, Li X, Hao C, Liu S, Zhang G, Jiang B (2020) Online AtRisk Student Identification using RNNGRU Joint Neural Networks. Information 11(10):474
Wang Z, Oates T (2015) Imaging TimeSeries to Improve Classification and Imputation. In: Proceedings of the 24th International Conference on Artificial Intelligence, AAAI Press, IJCAI’15, pp 3939–3945
Ngiam J, Chen Z, Chia D, Koh P, Le Q, Ng A (2010) Tiled convolutional neural networks. In: Lafferty J, Williams C, ShaweTaylor J, Zemel R, Culotta A (eds) Advances in Neural Information Processing Systems, Curran Associates, Inc., vol 23, pp 1279–1287
Hatami N, Gavet Y, Debayle J (2017) Classification of TimeSeries Images Using Deep Convolutional Neural Networks, 1710.00886
Eckmann JP, Oliffson Kamphorst S, Ruelle D (1987) Recurrence Plots of Dynamical Systems. Europhys Lett (EPL) 4(9):973–977
Lowe DG (2004) Distinctive Image Features from ScaleInvariant Keypoints. Int J Comput Vis 60(2):91–110
Pietikäinen M, Hadid A, Zhao G, Ahonen T (2011) Computer Vision Using Local Binary Patterns. In: Computational Imaging and Vision, Springer, London
Ioffe S, Szegedy C (2015) Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In: Bach F, Blei D (eds) Proceedings of the 32nd International Conference on Machine Learning, PMLR, Lille, France, Proceedings of Machine Learning Research, vol 37, pp 448–456
Kingma DP, Ba J (2017) Adam: A Method for Stochastic Optimization. 1412.6980
Kuzmanović M, Andjelković Labrović J, Nikodijević A (2019) Designing elearning environment based on student preferences: conjoint analysis approach. Int J Cogn Res Sci Eng Educ 7(3):37–47
Dyment J, Downing J, Hill A, Smith H (2018) ‘I did think it was a bit strange taking outdoor education online’: exploration of initial teacher education students’ online learning experiences in a tertiary outdoor education unit. J Adventure Educ Outdoor Learn 18(1):70–85
Hamann K, Pollock PH, Smith GE, Wilson BM (2017) Distance education and the scholarship of teaching and learning in political science. Politics 37(2):229–238
Toquero CM (2020) Challenges and Opportunities for Higher Education amid the COVID19 Pandemic: The Philippine Context. Pedagogical Res 5(4):em0063
Tannenbaum D (2019) https://www.teachthought.com/thefutureoflearning/elearningonlinetotalvalue/
Baker R (2010) Data mining for education. International encyclopedia of education 7(3):112–118
Mohamad SK, Tasir Z (2013) Educational Data Mining: A Review. Procedia Soc Behav Sci 97(6):320–324 The 9th International Conference on Cognitive Science
Quinlan JR (1993) C4.5: programs for machine learning. The Morgan Kaufmann series in machine learning, Morgan Kaufmann Publishers, San Mateo, Calif
Li X, Kang Y, Li F (2020) Forecasting with time series imaging. Expert Syst Appl 160(1):113680
Yang CL, Chen ZX, Yang CY (2020) Sensor Classification Using Convolutional Neural Network by Encoding Multivariate Time Series as TwoDimensional Colored Images. Sensors 20(1):168
Ali S, Hafeez Y, Abbas MA, Aqib M, Nawaz A (2021) Enabling remote learning system for virtual personalized preferences during COVID19 pandemic. Multim Tools Appl 80(24):33329–33355
Gupta SK, Ashwin TS, Guddeti RMR (2019) Students’ affective content analysis in smart classroom environment using deep learning techniques. Multim Tools Appl 78(18):25321–25348
De Santo A, Ferraro A, Galli A, Moscato V, Sperlì G (2022) Evaluating time series encoding techniques for Predictive Maintenance. Expert Syst Appl 210:118435
Du J, Zheng J, Liang Y, Lu X, Klemes̆ JJ, Varbanov PS, Shahzad K, Rashid MI, Ali AM, Liao Q, Wang B, (2022) A hybrid deep learning framework for predicting daily natural gas consumption. Energy 257:124689
Hong YY, Pula RA (2022) Detection and classification of faults in photovoltaic arrays using a 3D convolutional neural network. Energy 246:123391
Ding C, Wang Z, Ding Q, Yuan Z (2022) Convolutional neural network based on fast Fourier transform and gramian angle field for fault identification of HVDC transmission line. Sustainable Energy, Grids and Networks 32:100888
Qi Y, Zhang G, Yang L, Liu B, Zeng H, Xue Q, Liu D, Zheng Q, Liu Y (2022) HighPrecision Intelligent Cancer Diagnosis Method: 2D Raman Figures Combined with Deep Learning. Anal Chem 94(17):6491–6501
Kong X, Luo C (2022) A novel ConvLSTM with multifeature fusion for financial intelligent trading. Int J Intell Syst n/a(n/a):1–23
Acknowledgements
This publication was made possible by RRC02 grant #0825210045 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.
Funding
Open Access funding provided by the Qatar National Library. Open Access funding provided by the Qatar National Library.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All of the authors declare no conficts of interest regarding to publish this paper. We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Ben Said, A., AbdelSalam, AS.G. & Hazaa, K.A. Performance prediction in online academic course: a deep learning approach with time series imaging. Multimed Tools Appl 83, 55427–55445 (2024). https://doi.org/10.1007/s11042023175969
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042023175969