Abstract
Learning outcomes can be predicted with machine learning algorithms that assess students’ online behavior data. However, there have been few generalized predictive models for a large number of blended courses in different disciplines and in different cohorts. In this study, we examined learning outcomes in terms of learning data in all of the blended courses offered at a Chinese university and proposed a new classification method of blended courses, in which students were primarily clustered on the basis of their online learning behaviors in blended courses using the expectation–maximization algorithm. Then, the blended courses were classified on the basis of the cluster of students who were present in the course and had the highest proportion. The advantage of this method is that the criteria used for classification of the blended courses are clearly defined on the basis of students' online behavior data, so it can easily be used by machine learning systems to algorithmically classify blended courses based on log data collected from a learning management system. Drawing on the classification of the blended courses, we also proposed and validated a general model using the random forest algorithm to predict learning outcomes based on students’ online behaviors in blended courses with different disciplines and different cohorts. The findings of this study indicated that after blended courses were classified on the basis of students’ online behavior, prediction accuracy in each category increased. The overall accuracies for Course I (380 courses out of 661 after screening), L (14 courses out of 661 after screening), A (237 courses out of 661 after screening), V (8 courses out of 661 after screening), and H (22 courses out of 661 after screening) were 38.2%, 48.4%, 42.3%, 42.4%, and 74.7%, respectively. According to these results, it was found that a prerequisite for the accurate prediction of students' learning outcomes in a blended course was that most students should be highly engaged in a variety of online learning activities rather than being focused on only one type of activity, such as only watching online videos or submitting online assignments. The prediction model achieved accuracies of 80.6%, 85.3%, 63%, 54.8%, and 14.3% for grades A, B, C, D, and F in Course H, respectively. The results demonstrated the potential of the proposed model for accurately predicting learning outcomes in blended courses. Finally, we found that there was no single online learning behavior that had a dominant effect on the prediction of students' final grades.
This is a preview of subscription content, access via your institution.





References
Al-Samarraie, H., & Saeed, N. (2018). A systematic review of cloud computing tools for collaborative learning: Opportunities and challenges to the blended-learning environment. Computers & Education, 124, 77–91.
Allen, I. E., & Seaman, J. (2003). Sizing the Opportunity: The Quality and Extent of Online Education in the United States, 2002 and 2003. Sloan Consortium (NJ1).
Avella, J. T., Kebritchi, M., Nunn, S. G., & Kanai, T. (2016). Learning analytics methods, benefits, and challenges in higher education: A systematic literature review. Online Learning, 20(2), 13–29.
Cannaday II, A. B., Chastain, R. L., Hurt, J. A., Davis, C. H., Scott, G. J., & Maltenfort, A. J. (2019, December). Decision-level fusion of DNN outputs for improving feature detection performance on large-scale remote sensing image datasets. In 2019 IEEE international conference on big data (big data) (pp. 5428–5436). IEEE.
Cano, A., & Leonard, J. D. (2019). Interpretable multiview early warning system adapted to underrepresented student populations. IEEE Transactions on Learning Technologies, 12(2), 198–211.
Cerezo, R., Sánchez-Santillán, M., Paule-Ruiz, M. P., & Núñez, J. C. (2016). Students’ LMS interaction patterns and their relationship with achievement: A case study in higher education. Computers & Education, 96, 42–54.
Conijn, R., Snijders, C., Kleingeld, A., & Matzat, U. (2016). Predicting student performance from LMS data: A comparison of 17 blended courses using Moodle LMS. IEEE Transactions on Learning Technologies, 10(1), 17–29.
Conrad, D. (2004). University instructors’ reflections on their first online teaching experiences. Journal of Asynchronous Learning Networks, 8(2), 31–44.
Du, X., Yang, J., Shelton, B. E., Hung, J.-L., & Zhang, M. (2019). A systematic meta-Review and analysis of learning analytics research. Behaviour & Information Technology, 40, 49–62.
Ekwunife-Orakwue, K. C., & Teng, T. L. (2014). The impact of transactional distance dialogic interactions on student learning outcomes in online and blended environments. Computers & Education, 78, 414–427.
Farnstrom, F., Lewis, J., & Elkan, C. (2000). Scalability for clustering algorithms revisited. ACM SIGKDD Explorations Newsletter, 2(1), 51–57.
Fahad, A., Alshatri, N., Tari, Z., Alamri, A., Khalil, I., Zomaya, A. Y., …, & Bouras, A. (2014). A survey of clustering algorithms for big data: Taxonomy and empirical analysis. IEEE Transactions on Emerging Topics in Computing, 2(3), 267-279.
Ganguly, D., Mukherjee, S., Naskar, S., & Mukherjee, P. (2009, March). A novel approach for determination of optimal number of cluster. In 2009 International conference on computer and automation engineering (pp. 113–117). IEEE.
Gašević, D., Dawson, S., Rogers, T., & Gasevic, D. (2016). Learning analytics should not promote one size fits all: The effects of instructional conditions in predicting academic success. The Internet and Higher Education, 28, 68–84.
Gitinabard, N., Xu, Y., Heckman, S., Barnes, T., & Lynch, C. F. (2019). How widely can prediction models be generalized? performance prediction in blended courses. IEEE Transactions on Learning Technologies, 12(2), 184–197.
Halkidi, M., Vazirgiannis, M., & Batistakis, Y. (2000, September). Quality scheme assessment in the clustering process. In European conference on principles of data mining and knowledge discovery (pp. 265–276). Berlin, Heidelberg: Springer.
Hasan, R., Palaniappan, S., Mahmood, S., Naidu, V. R., Agarwal, A., Singh, B., … & Sattar, M. U. (2020). A review: Emerging trends of big data in higher educational institutions. In Sharma, D. K. (Ed.), Micro-Electronics and Telecommunication Engineering (pp. 289–297). Springer
Jasra, A., Holmes, C. C., & Stephens, D. A. (2005). Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling. Statistical Science, 20, 50–67.
Jo, I., Park, Y., Kim, J., & Song, J. (2014). Analysis of online behavior and prediction of learning performance in blended learning environments.
Khan, M. M. R., Siddique, M. A. B., Arif, R. B., & Oishe, M. R. (2018, September). Adbscan: Adaptive density-based spatial clustering of applications with noise for identifying clusters with varying densities. In 2018 4th International conference on electrical engineering and information & communication technology (iCEEiCT) (pp. 107–111). IEEE.
Kodinariya, T. M., & Makwana, P. R. (2013). Review on determining number of cluster in K-means clustering. International Journal, 1(6), 90–95.
Kuhn, M., & Johnson, K. (2013). Applied Predictive Modeling (Vol. 26, p. 70). New York: Springer.
Lin, L., & Reinders, H. (2019). Students’ and teachers’ readiness for autonomy: Beliefs and practices in developing autonomy in the Chinese context. Asia Pacific Education Review, 20(1), 69–89.
Liu, C. B., Chamberlain, B. P., Little, D. A., & Cardoso, Â. (2017, September). Generalising random forest parameter optimisation to include stability and cost. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 102–113). Cham: Springer.
Liu, C., Zou, D., Chen, X., Xie, H., & Chan, W. H. (2021). A bibliometric review on latent topics and trends of the empirical MOOC literature (2008–2019). Asia Pacific Education Review, 22, 535–534.
Matheos, K., Daniel, B. K., & McCalla, G. I. (2005). Dimensions for blended learning technology: Learners’ perspectives. Journal of Learning Design, 1(1), 56–75.
Mahé, F., Rognes, T., Quince, C., de Vargas, C., & Dunthorn, M. (2014). Swarm: robust and fast clustering method for amplicon-based studies. PeerJ, 2, e593.
Means, B., Toyama, Y., Murphy, R., & Baki, M. (2013). The effectiveness of online and blended learning: A meta-analysis of the empirical literature. Teachers College Record, 115(3), 1–47.
Mirriahi, N., Jovanovic, J., Dawson, S., Gašević, D., & Pardo, A. (2018). Identifying engagement patterns with video annotation activities: A case study in professional development. Australasian Journal of Educational Technology. https://doi.org/10.14742/ajet.3207
Moreno-Marcos, P. M., De Laet, T., Muñoz-Merino, P. J., Van Soom, C., Broos, T., Verbert, K., & Delgado Kloos, C. (2019). Generalizing predictive models of admission test success based on online interactions. Sustainability, 11(18), 4940.
Nespereira, C. G., Elhariri, E., El-Bendary, N., Vilas, A. F., & Redondo, R. P. D. (2016). Machine learning based classification approach for predicting students’ performance in blended learning. In The 1st international conference on advanced intelligent system and informatics (AISI2015), November 28–30, 2015, Beni Suef, Egypt (pp. 47–56). Springer, Cham.
Nouri, J., Saqr, M., & Fors, U. (2019). Predicting performance of students in a flipped classroom using machine learning: towards automated data-driven formative feedback. In 10th International conference on education, training and informatics (ICETI 2019).
Park, Y., Yu, J. H., & Jo, I. H. (2016). Clustering blended learning courses by online behavior data: A case study in a Korean higher education institute. The Internet and Higher Education, 29, 1–11.
Quinn, R. J., & Gray, G. (2020). Prediction of student academic performance using Moodle data from a Further Education setting. Irish Journal of Technology Enhanced Learning. https://doi.org/10.22554/ijtel.v5i1.57
Rodríguez-Triana, M. J., Prieto, L. P., Vozniuk, A., Boroujeni, M. S., Schwendimann, B. A., Holzer, A., & Gillet, D. (2017). Monitoring, awareness and reflection in blendedtechnology enhanced learning: A systematic review. International Journal of Technology Enhanced Learning, 9(2/3), 1–26.
Romero, C., & Ventura, S. (2013). Data mining in education. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 3(1), 12–27.
Saqr, M., Fors, U., & Tedre, M. (2017). How learning analytics can early predict under-achieving students in a blended medical education course. Medical Teacher, 39(7), 757–767.
Saw Htoon, K. (2020). Log transformation: purpose and interpretation. [online] Medium. Retrieved from https://medium.com/@kyawsawhtoon/log-transformation-purpose-and-interpretation-9444b4b049c9. Accessed 13 July 2020.
Schell, J., Lukoff, B., & Alvarado, C. (2014). Using early warning signs to predict academic risk in interactive, blended teaching environments. Internet Learning, 3(2), 6.
Seif, G. (2019). The 5 clustering algorithms data scientists need to know. [online] Medium. Retrieved from https://towardsdatascience.com/the-5-Clustering-algorithms-data-scientists-need-to-know-a36d136ef68. Accessed 24 Jul. 2019.
Sharma, B., Nand, R., Naseem, M., & Reddy, E. V. (2019). Effectiveness of online presence in a blended higher learning environment in the Pacific. Studies in Higher Education. https://doi.org/10.1080/03075079.2019.1602756
Shmueli, B. (2020a). Multiclass Metrics Made Simple, Part I: Precision And Recall. [online] Medium. Retrieved from https://towardsdatascience.com/multi-class-metrics-made-simple-part-i-precision-and-recall-9250280bddc2. Accessed 11 May 2020a.
Shmueli, B. (2020b). Multiclass Metrics Made Simple, Part II: The F1-Score. [online] Medium. Retrieved from https://towardsdatascience.com/multi-class-metrics-made-simple-part-ii-the-f1-score-ebe8b2c2ca1. Accessed 11 May 2020b.
Smith, K., & Hill, J. (2019). Defining the nature of blended learning through its depiction in current research. Higher Education Research & Development, 38(2), 383–397.
Stapel, M., Zheng, Z., & Pinkwart, N. (2016). An Ensemble Method to Predict Student Performance in an Online Math Learning Environment. International Educational Data Mining Society.
Tang, C., & Chaw, L. (2013). Readiness for blended learning: Understanding attitude of university students. International Journal of Cyber Society and Education, 6(2), 79–100.
Vabalas, A., Gowen, E., Poliakoff, E., & Casson, A. J. (2019). Machine learning algorithm validation with a limited sample size. PloS ONE, 14(11), e0224365.
Van Goidsenhoven, S., Bogdanova, D., Deeva, G., Broucke, S. V., De Weerdt, J., & Snoeck, M. (2020, March). Predicting student success in a blended learning environment. In Proceedings of the tenth international conference on learning analytics & knowledge (pp. 17–25).
Vinutha, H. P., & Poornima, B. (2019). Analysis of NSL-KDD dataset using K-means and canopy clustering algorithms based on distance metrics. In A. N. Krishna (Ed.), Integrated Intelligent Computing, Communication and Security (pp. 193–200). Singapore: Springer.
Yu, L., Wang, S., & Lai, K. K. (2005). A novel nonlinear ensemble forecasting model incorporating GLAR and ANN for foreign exchange rates. Computers & Operations Research, 32(10), 2523–2541.
Zacharis, N. Z. (2015). A multivariate approach to predicting student outcomes in web-enabled blended learning courses. The Internet and Higher Education, 27, 44–53.
Zhou, M., & Shao, Y. (2014). A powerful test for multivariate normality. Journal of Applied Statistics, 41(2), 351–363.
Funding
China National Social Science Grant project titled Theory Building and Empirical Study on Blended Learning (BCA180084).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Luo, Y., Han, X. & Zhang, C. Prediction of learning outcomes with a machine learning algorithm based on online learning behavior data in blended courses. Asia Pacific Educ. Rev. (2022). https://doi.org/10.1007/s12564-022-09749-6
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12564-022-09749-6
Keywords
- Data science applications in education
- Evaluation methodologies
- Postsecondary education
- Distributed learning environments
- Interdisciplinary projects