Skip to main content

Prediction of learning outcomes with a machine learning algorithm based on online learning behavior data in blended courses

Abstract

Learning outcomes can be predicted with machine learning algorithms that assess students’ online behavior data. However, there have been few generalized predictive models for a large number of blended courses in different disciplines and in different cohorts. In this study, we examined learning outcomes in terms of learning data in all of the blended courses offered at a Chinese university and proposed a new classification method of blended courses, in which students were primarily clustered on the basis of their online learning behaviors in blended courses using the expectation–maximization algorithm. Then, the blended courses were classified on the basis of the cluster of students who were present in the course and had the highest proportion. The advantage of this method is that the criteria used for classification of the blended courses are clearly defined on the basis of students' online behavior data, so it can easily be used by machine learning systems to algorithmically classify blended courses based on log data collected from a learning management system. Drawing on the classification of the blended courses, we also proposed and validated a general model using the random forest algorithm to predict learning outcomes based on students’ online behaviors in blended courses with different disciplines and different cohorts. The findings of this study indicated that after blended courses were classified on the basis of students’ online behavior, prediction accuracy in each category increased. The overall accuracies for Course I (380 courses out of 661 after screening), L (14 courses out of 661 after screening), A (237 courses out of 661 after screening), V (8 courses out of 661 after screening), and H (22 courses out of 661 after screening) were 38.2%, 48.4%, 42.3%, 42.4%, and 74.7%, respectively. According to these results, it was found that a prerequisite for the accurate prediction of students' learning outcomes in a blended course was that most students should be highly engaged in a variety of online learning activities rather than being focused on only one type of activity, such as only watching online videos or submitting online assignments. The prediction model achieved accuracies of 80.6%, 85.3%, 63%, 54.8%, and 14.3% for grades A, B, C, D, and F in Course H, respectively. The results demonstrated the potential of the proposed model for accurately predicting learning outcomes in blended courses. Finally, we found that there was no single online learning behavior that had a dominant effect on the prediction of students' final grades.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  • Al-Samarraie, H., & Saeed, N. (2018). A systematic review of cloud computing tools for collaborative learning: Opportunities and challenges to the blended-learning environment. Computers & Education, 124, 77–91.

    Article  Google Scholar 

  • Allen, I. E., & Seaman, J. (2003). Sizing the Opportunity: The Quality and Extent of Online Education in the United States, 2002 and 2003. Sloan Consortium (NJ1).

  • Avella, J. T., Kebritchi, M., Nunn, S. G., & Kanai, T. (2016). Learning analytics methods, benefits, and challenges in higher education: A systematic literature review. Online Learning, 20(2), 13–29.

    Google Scholar 

  • Cannaday II, A. B., Chastain, R. L., Hurt, J. A., Davis, C. H., Scott, G. J., & Maltenfort, A. J. (2019, December). Decision-level fusion of DNN outputs for improving feature detection performance on large-scale remote sensing image datasets. In 2019 IEEE international conference on big data (big data) (pp. 5428–5436). IEEE.

  • Cano, A., & Leonard, J. D. (2019). Interpretable multiview early warning system adapted to underrepresented student populations. IEEE Transactions on Learning Technologies, 12(2), 198–211.

    Article  Google Scholar 

  • Cerezo, R., Sánchez-Santillán, M., Paule-Ruiz, M. P., & Núñez, J. C. (2016). Students’ LMS interaction patterns and their relationship with achievement: A case study in higher education. Computers & Education, 96, 42–54.

    Article  Google Scholar 

  • Conijn, R., Snijders, C., Kleingeld, A., & Matzat, U. (2016). Predicting student performance from LMS data: A comparison of 17 blended courses using Moodle LMS. IEEE Transactions on Learning Technologies, 10(1), 17–29.

    Article  Google Scholar 

  • Conrad, D. (2004). University instructors’ reflections on their first online teaching experiences. Journal of Asynchronous Learning Networks, 8(2), 31–44.

    Google Scholar 

  • Du, X., Yang, J., Shelton, B. E., Hung, J.-L., & Zhang, M. (2019). A systematic meta-Review and analysis of learning analytics research. Behaviour & Information Technology, 40, 49–62.

    Article  Google Scholar 

  • Ekwunife-Orakwue, K. C., & Teng, T. L. (2014). The impact of transactional distance dialogic interactions on student learning outcomes in online and blended environments. Computers & Education, 78, 414–427.

    Article  Google Scholar 

  • Farnstrom, F., Lewis, J., & Elkan, C. (2000). Scalability for clustering algorithms revisited. ACM SIGKDD Explorations Newsletter, 2(1), 51–57.

    Article  Google Scholar 

  • Fahad, A., Alshatri, N., Tari, Z., Alamri, A., Khalil, I., Zomaya, A. Y., …, & Bouras, A. (2014). A survey of clustering algorithms for big data: Taxonomy and empirical analysis. IEEE Transactions on Emerging Topics in Computing, 2(3), 267-279.

  • Ganguly, D., Mukherjee, S., Naskar, S., & Mukherjee, P. (2009, March). A novel approach for determination of optimal number of cluster. In 2009 International conference on computer and automation engineering (pp. 113–117). IEEE.

  • Gašević, D., Dawson, S., Rogers, T., & Gasevic, D. (2016). Learning analytics should not promote one size fits all: The effects of instructional conditions in predicting academic success. The Internet and Higher Education, 28, 68–84.

    Article  Google Scholar 

  • Gitinabard, N., Xu, Y., Heckman, S., Barnes, T., & Lynch, C. F. (2019). How widely can prediction models be generalized? performance prediction in blended courses. IEEE Transactions on Learning Technologies, 12(2), 184–197.

    Article  Google Scholar 

  • Halkidi, M., Vazirgiannis, M., & Batistakis, Y. (2000, September). Quality scheme assessment in the clustering process. In European conference on principles of data mining and knowledge discovery (pp. 265–276). Berlin, Heidelberg: Springer.

  • Hasan, R., Palaniappan, S., Mahmood, S., Naidu, V. R., Agarwal, A., Singh, B., … & Sattar, M. U. (2020). A review: Emerging trends of big data in higher educational institutions. In Sharma, D. K. (Ed.), Micro-Electronics and Telecommunication Engineering (pp. 289–297). Springer

  • Jasra, A., Holmes, C. C., & Stephens, D. A. (2005). Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling. Statistical Science, 20, 50–67.

    Article  Google Scholar 

  • Jo, I., Park, Y., Kim, J., & Song, J. (2014). Analysis of online behavior and prediction of learning performance in blended learning environments.

  • Khan, M. M. R., Siddique, M. A. B., Arif, R. B., & Oishe, M. R. (2018, September). Adbscan: Adaptive density-based spatial clustering of applications with noise for identifying clusters with varying densities. In 2018 4th International conference on electrical engineering and information & communication technology (iCEEiCT) (pp. 107–111). IEEE.

  • Kodinariya, T. M., & Makwana, P. R. (2013). Review on determining number of cluster in K-means clustering. International Journal, 1(6), 90–95.

    Google Scholar 

  • Kuhn, M., & Johnson, K. (2013). Applied Predictive Modeling (Vol. 26, p. 70). New York: Springer.

    Book  Google Scholar 

  • Lin, L., & Reinders, H. (2019). Students’ and teachers’ readiness for autonomy: Beliefs and practices in developing autonomy in the Chinese context. Asia Pacific Education Review, 20(1), 69–89.

    Article  Google Scholar 

  • Liu, C. B., Chamberlain, B. P., Little, D. A., & Cardoso, Â. (2017, September). Generalising random forest parameter optimisation to include stability and cost. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 102–113). Cham: Springer.

  • Liu, C., Zou, D., Chen, X., Xie, H., & Chan, W. H. (2021). A bibliometric review on latent topics and trends of the empirical MOOC literature (2008–2019). Asia Pacific Education Review, 22, 535–534.

    Article  Google Scholar 

  • Matheos, K., Daniel, B. K., & McCalla, G. I. (2005). Dimensions for blended learning technology: Learners’ perspectives. Journal of Learning Design, 1(1), 56–75.

    Google Scholar 

  • Mahé, F., Rognes, T., Quince, C., de Vargas, C., & Dunthorn, M. (2014). Swarm: robust and fast clustering method for amplicon-based studies. PeerJ, 2, e593.

    Article  Google Scholar 

  • Means, B., Toyama, Y., Murphy, R., & Baki, M. (2013). The effectiveness of online and blended learning: A meta-analysis of the empirical literature. Teachers College Record, 115(3), 1–47.

    Article  Google Scholar 

  • Mirriahi, N., Jovanovic, J., Dawson, S., Gašević, D., & Pardo, A. (2018). Identifying engagement patterns with video annotation activities: A case study in professional development. Australasian Journal of Educational Technology. https://doi.org/10.14742/ajet.3207

    Article  Google Scholar 

  • Moreno-Marcos, P. M., De Laet, T., Muñoz-Merino, P. J., Van Soom, C., Broos, T., Verbert, K., & Delgado Kloos, C. (2019). Generalizing predictive models of admission test success based on online interactions. Sustainability, 11(18), 4940.

    Article  Google Scholar 

  • Nespereira, C. G., Elhariri, E., El-Bendary, N., Vilas, A. F., & Redondo, R. P. D. (2016). Machine learning based classification approach for predicting students’ performance in blended learning. In The 1st international conference on advanced intelligent system and informatics (AISI2015), November 28–30, 2015, Beni Suef, Egypt (pp. 47–56). Springer, Cham.

  • Nouri, J., Saqr, M., & Fors, U. (2019). Predicting performance of students in a flipped classroom using machine learning: towards automated data-driven formative feedback. In 10th International conference on education, training and informatics (ICETI 2019).

  • Park, Y., Yu, J. H., & Jo, I. H. (2016). Clustering blended learning courses by online behavior data: A case study in a Korean higher education institute. The Internet and Higher Education, 29, 1–11.

    Article  Google Scholar 

  • Quinn, R. J., & Gray, G. (2020). Prediction of student academic performance using Moodle data from a Further Education setting. Irish Journal of Technology Enhanced Learning. https://doi.org/10.22554/ijtel.v5i1.57

    Article  Google Scholar 

  • Rodríguez-Triana, M. J., Prieto, L. P., Vozniuk, A., Boroujeni, M. S., Schwendimann, B. A., Holzer, A., & Gillet, D. (2017). Monitoring, awareness and reflection in blendedtechnology enhanced learning: A systematic review. International Journal of Technology Enhanced Learning, 9(2/3), 1–26.

    Google Scholar 

  • Romero, C., & Ventura, S. (2013). Data mining in education. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 3(1), 12–27.

    Google Scholar 

  • Saqr, M., Fors, U., & Tedre, M. (2017). How learning analytics can early predict under-achieving students in a blended medical education course. Medical Teacher, 39(7), 757–767.

    Article  Google Scholar 

  • Saw Htoon, K. (2020). Log transformation: purpose and interpretation. [online] Medium. Retrieved from https://medium.com/@kyawsawhtoon/log-transformation-purpose-and-interpretation-9444b4b049c9. Accessed 13 July 2020.

  • Schell, J., Lukoff, B., & Alvarado, C. (2014). Using early warning signs to predict academic risk in interactive, blended teaching environments. Internet Learning, 3(2), 6.

    Google Scholar 

  • Seif, G. (2019). The 5 clustering algorithms data scientists need to know. [online] Medium. Retrieved from https://towardsdatascience.com/the-5-Clustering-algorithms-data-scientists-need-to-know-a36d136ef68. Accessed 24 Jul. 2019.

  • Sharma, B., Nand, R., Naseem, M., & Reddy, E. V. (2019). Effectiveness of online presence in a blended higher learning environment in the Pacific. Studies in Higher Education. https://doi.org/10.1080/03075079.2019.1602756

    Article  Google Scholar 

  • Shmueli, B. (2020a). Multiclass Metrics Made Simple, Part I: Precision And Recall. [online] Medium. Retrieved from https://towardsdatascience.com/multi-class-metrics-made-simple-part-i-precision-and-recall-9250280bddc2. Accessed 11 May 2020a.

  • Shmueli, B. (2020b). Multiclass Metrics Made Simple, Part II: The F1-Score. [online] Medium. Retrieved from https://towardsdatascience.com/multi-class-metrics-made-simple-part-ii-the-f1-score-ebe8b2c2ca1. Accessed 11 May 2020b.

  • Smith, K., & Hill, J. (2019). Defining the nature of blended learning through its depiction in current research. Higher Education Research & Development, 38(2), 383–397.

    Article  Google Scholar 

  • Stapel, M., Zheng, Z., & Pinkwart, N. (2016). An Ensemble Method to Predict Student Performance in an Online Math Learning Environment. International Educational Data Mining Society.

    Google Scholar 

  • Tang, C., & Chaw, L. (2013). Readiness for blended learning: Understanding attitude of university students. International Journal of Cyber Society and Education, 6(2), 79–100.

    Article  Google Scholar 

  • Vabalas, A., Gowen, E., Poliakoff, E., & Casson, A. J. (2019). Machine learning algorithm validation with a limited sample size. PloS ONE, 14(11), e0224365.

    Article  Google Scholar 

  • Van Goidsenhoven, S., Bogdanova, D., Deeva, G., Broucke, S. V., De Weerdt, J., & Snoeck, M. (2020, March). Predicting student success in a blended learning environment. In Proceedings of the tenth international conference on learning analytics & knowledge (pp. 17–25).

  • Vinutha, H. P., & Poornima, B. (2019). Analysis of NSL-KDD dataset using K-means and canopy clustering algorithms based on distance metrics. In A. N. Krishna (Ed.), Integrated Intelligent Computing, Communication and Security (pp. 193–200). Singapore: Springer.

    Chapter  Google Scholar 

  • Yu, L., Wang, S., & Lai, K. K. (2005). A novel nonlinear ensemble forecasting model incorporating GLAR and ANN for foreign exchange rates. Computers & Operations Research, 32(10), 2523–2541.

    Article  Google Scholar 

  • Zacharis, N. Z. (2015). A multivariate approach to predicting student outcomes in web-enabled blended learning courses. The Internet and Higher Education, 27, 44–53.

    Article  Google Scholar 

  • Zhou, M., & Shao, Y. (2014). A powerful test for multivariate normality. Journal of Applied Statistics, 41(2), 351–363.

    Article  Google Scholar 

Download references

Funding

China National Social Science Grant project titled Theory Building and Empirical Study on Blended Learning (BCA180084).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xibin Han.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Luo, Y., Han, X. & Zhang, C. Prediction of learning outcomes with a machine learning algorithm based on online learning behavior data in blended courses. Asia Pacific Educ. Rev. (2022). https://doi.org/10.1007/s12564-022-09749-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12564-022-09749-6

Keywords

  • Data science applications in education
  • Evaluation methodologies
  • Postsecondary education
  • Distributed learning environments
  • Interdisciplinary projects