Are MOOC Learning Analytics Results Trustworthy? With Fake Learners, They Might Not Be!

  • Giora AlexandronEmail author
  • Lisa Y. Yoo
  • José A. Ruipérez-Valiente
  • Sunbok Lee
  • David E. Pritchard


The rich data that Massive Open Online Courses (MOOCs) platforms collect on the behavior of millions of users provide a unique opportunity to study human learning and to develop data-driven methods that can address the needs of individual learners. This type of research falls into the emerging field of learning analytics. However, learning analytics research tends to ignore the issue of the reliability of results that are based on MOOCs data, which is typically noisy and generated by a largely anonymous crowd of learners. This paper provides evidence that learning analytics in MOOCs can be significantly biased by users who abuse the anonymity and open-nature of MOOCs, for example by setting up multiple accounts, due to their amount and aberrant behavior. We identify these users, denoted fake learners, using dedicated algorithms. The methodology for measuring the bias caused by fake learners’ activity combines the ideas of Replication Research and Sensitivity Analysis. We replicate two highly-cited learning analytics studies with and without fake learners data, and compare the results. While in one study, the results were relatively stable against fake learners, in the other, removing the fake learners’ data significantly changed the results. These findings raise concerns regarding the reliability of learning analytics in MOOCs, and highlight the need to develop more robust, generalizable and verifiable research methods.


Learning Analytics MOOCs Replication research Sensitivity analysis Fake learners 



GA’s research is supported by the Israeli Ministry of Science and Technology under project no. 713257.


  1. Alexandron, G., Ruipérez-Valiente, J.A., Pritchard, D.E. (2015a). Evidence of MOOC students using multiple accounts to harvest correct answers. Learning with MOOCs II, 2015.Google Scholar
  2. Alexandron, G., Zhou, Q., Pritchard, D. (2015b). Discovering the pedagogical resources that assist students in answering questions correctly – a machine learning approach. In Proceedings of the 8th international conference on educational data mining (pp. 520–523).Google Scholar
  3. Alexandron, G., Ruipėrez-Valiente, J.A., Chen, Z., Muñoz-Merino, P.J., Pritchard, D.E. (2017). Copying@Scale using harvesting accounts for collecting correct answers in a MOOC. Communication Education, 108, 96–114.Google Scholar
  4. Alexandron, G., Ruipérez-Valiente, J.A., Lee, S., Pritchard, D.E. (2018). Evaluating the robustness of learning analytics results against fake learners. In Proceedings of the thirteenth European conference on technology enhanced learning: Springer.Google Scholar
  5. Alexandron, G., Ruipérez-Valiente, J.A., Pritchard, D.E. (2019). Towards a general purpose anomaly detection method to identify cheaters in massive open online courses. In Proceedings of the 12th international conference on educational data mining.Google Scholar
  6. Baker, R., Walonoski, J., Heffernan, N., Roll, I., Corbett, A., Koedinger, K. (2008). Why students engage in “gaming the system” behavior in interactive learning environments. Journal of Interactive Learning Research, 19(2), 162–182.Google Scholar
  7. Champaign, J., Colvin, K.F., Liu, A., Fredericks, C., Seaton, D., Pritchard, D.E. (2014). Correlating skill and improvement in 2 MOOCs with a student’s time on tasks. In Proceedings of the first ACM conference on Learning @ scale conference - L@S ’14, (March): 11–20.Google Scholar
  8. Chen, Z., Chudzicki, C., Palumbo, D., Alexandron, G., Choi, Y.-J., Zhou, Q., Pritchard, D.E. (2016). Researching for better instructional methods using AB experiments in MOOCs: results and challenges. Research and Practice in Technology Enhanced Learning, 11(1), 9.CrossRefGoogle Scholar
  9. De Ayala, R. (2009). The theory and practice of item response theory. Methodology in the social sciences. Guilford Publications.Google Scholar
  10. Donders, A.R.T., Van Der Heijden, G.J., Stijnen, T., Moons, K.G. (2006). A gentle introduction to imputation of missing values. Journal of Clinical Epidemiology, 59(10), 1087–1091.CrossRefGoogle Scholar
  11. Du, X., Duivesteijn, W., Klabbers, M., Pechenizkiy, M. (2018). Elba: exceptional learning behavior analysis. In Educational data mining (pp. 312–318).Google Scholar
  12. Gardner, J., Brooks, C., Andres, J.M.L., Baker, R. (2018). Morf: a framework for MOOC predictive modeling and replication at scale. arXiv:1801.05236.
  13. Goldhammer, F. (2015). Measuring ability, speed, or both? challenges, psychometric solutions, and what can be gained from experimental control. Measurement: Interdisciplinary Research and Perspectives, 13(3-4), 133–164.Google Scholar
  14. Hastie, T., Tibshirani, R., Friedman, J. (2001). The elements of statistical learning. Springer series in statistics. New York: Springer.zbMATHGoogle Scholar
  15. Hodge, V., & Austin, J. (2004). A survey of outlier detection methodologies. Artificial Intelligence Review, 22(2), 85–126.CrossRefGoogle Scholar
  16. Kiernan, M., Kraemer, H.C., Winkleby, M.A., King, A.C., Taylor, C.B. (2001). Do logistic regression and signal detection identify different subgroups at risk? implications for the design of tailored interventions. Psychological Methods, 6(1), 35.CrossRefGoogle Scholar
  17. Kim, J., Guo, P.J., Cai, C.J., Li, S.-W.D., Gajos, K.Z., Miller, R.C. (2014a). Data-driven interaction techniques for improving navigation of educational videos. In Proceedings of the 27th annual ACM symposium on user interface software and technology - UIST’14 (pp. 563–572).Google Scholar
  18. Kim, J., Guo, P.J., Seaton, D.T., Mitros, P., Gajos, K.Z., Miller, R.C. (2014b). Understanding in-video dropouts and interaction peaks in online lecture videos.Google Scholar
  19. Koedinger, K.R., Mclaughlin, E.A., Kim, J., Jia, J.Z., Bier, N.L. (2015). Learning is not a spectator sport doing is better than watching for learning from a MOOC, pp. 111–120.Google Scholar
  20. Krause, J., Perer, A., Ng, K. (2016). Interacting with predictions: visual inspection of black-box machine learning models. In Proceedings of the 2016 CHI conference on human factors in computing systems (pp. 5686–5697): ACM.Google Scholar
  21. Kyllonen, P., & Zu, J. (2016). Use of response time for measuring cognitive ability. Journal of Intelligence, 4(4), 14.CrossRefGoogle Scholar
  22. Lazer, D., Kennedy, R., King, G., Vespignani, A. (2014). The parable of google flu: traps in big data analysis. Science, 343(6176), 1203–1205.CrossRefGoogle Scholar
  23. Luna, J. M., Castro, C., Romero, C. (2017). Mdm tool: a data mining framework integrated into moodle. Computer Applications in Engineering Education, 25(1), 90–102.CrossRefGoogle Scholar
  24. MacHardy, Z., & Pardos, Z.A. (2015). Toward the evaluation of educational videos using bayesian knowledge tracing and big data. In Proceedings of the second (2015) ACM conference on learning @ scale, L@S ’15 (pp. 347–350): ACM.Google Scholar
  25. MacKinnon, J.G. (2009). Bootstrap hypothesis testing, chapter 6, pp. 183–213. John Wiley & Sons, Ltd.Google Scholar
  26. Meyer, J.P., & Zhu, S. (2013). Fair and equitable measurement of student learning in moocs: an introduction to item response theory, scale linking, and score equating. Research & Practice in Assessment, 8, 26–39.Google Scholar
  27. Müller, O., Junglas, I., Brocke, J.V., Debortoli, S. (2016). Utilizing big data analytics for information systems research: challenges, promises and guidelines. European Journal of Information Systems, 25(4), 289–302.CrossRefGoogle Scholar
  28. Northcutt, C.G., Ho, A.D., Chuang, I.L. (2016). Detecting and preventing “multiple-account” cheating in massive open online courses. Computers in Education, 100(C), 71–80.CrossRefGoogle Scholar
  29. O’Neil, C. (2017). Weapons of math destruction: how big data increases inequality and threatens democracy. Broadway Books.Google Scholar
  30. Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251). ISSN 0036-8075.Google Scholar
  31. Pardo, A., Mirriahi, N., Martinez-Maldonado, R., Jovanovic, J., Dawson, S., Gašević, D. (2016). Generating actionable predictive models of academic performance. In Proceedings of the sixth international conference on learning analytics & knowledge (pp. 474–478): ACM.Google Scholar
  32. Pardos, Z.A., Tang, S., Davis, D., Le, C.V. (2017). Enabling real-time adaptivity in MOOCs with a personalized next-step recommendation framework. In Proceedings of the fourth (2017) ACM conference on learning @ scale - L@S ’17. ISBN 9781450344500.
  33. Perez, S., Massey-Allard, J., Butler, D., Ives, J., Bonn, D., Yee, N., Roll, I. (2017). Identifying productive inquiry in virtual labs using sequence mining. In André, E., Baker, R., Hu, X., Rodrigo, M.M.T., du Boulay, B. (Eds.) Artificial intelligence in education (pp. 287–298). Cham: Springer International Publishing.Google Scholar
  34. Qiu, J., Tang, J., Liu, T. X., Gong, J., Zhang, C., Zhang, Q., Xue, Y. (2016). Modeling and predicting learning behavior in moocs. In Proceedings of the ninth ACM international conference on web search and data mining (pp. 93–102): ACM.Google Scholar
  35. Reich, J., & Ruipérez-Valiente, J.A. (2019). The MOOC pivot. Science, 363 (6423), 130–131.CrossRefGoogle Scholar
  36. Romero, C., & Ventura, S. (2017). Educational data science in massive open online courses. Wiley interdisciplinary reviews: data mining and knowledge discovery, WIREs Data Mining Knowl Discov, 01. Scholar
  37. Rosen, Y., Rushkin, I., Ang, A., Federicks, C., Tingley, D., Blink, M.J. (2017). Designing adaptive assessments in MOOCs. In Proceedings of the fourth (2017) ACM conference on learning @ scale, L@S ’17. ISBN 978-1-4503-4450-0 (pp. 233–236).Google Scholar
  38. Ruipérez-Valiente, J.A., Alexandron, G., Chen, Z., Pritchard, D.E. (2016). Using multiple accounts for harvesting solutions in MOOCs. In Proceedings of the third (2016) ACM conference on learning @ scale - L@S ’16 (pp. 63–70).Google Scholar
  39. Ruipérez-Valiente, J.A., Joksimović, S., Kovanović, V., Gašević, D., Muñoz Merino, P.J., Delgado Kloos, C. (2017a). A data-driven method for the detection of close submitters in online learning environments. In Proceedings of the 26th international conference on world wide web companion (pp. 361–368).Google Scholar
  40. Ruipérez-Valiente, J.A., Muñoz-Merino, P.J., Alexandron, G., Pritchard, D.E. (2017b). Using machine learning to detect ‘multiple-account’ cheating and analyze the influence of student and problem features. IEEE Transactions on Learning Technologies, 14(8), 1–11.Google Scholar
  41. Ruipérez-Valiente, J.A., Muñoz-Merino, P.J., Gascón-Pinedo, J.A., Kloos, C.D. (2017c). Scaling to massiveness with ANALYSE: a learning analytics tool for Open edX. IEEE Transactions on Human-Machine Systems, 47(6), 909–914.CrossRefGoogle Scholar
  42. Saltelli, A., Chan, K., Scott, E.M., et al. (2000). Sensitivity analysis Vol. 1. New York: Wiley.zbMATHGoogle Scholar
  43. Seshia, S.A., & Sadigh, D. (2016). Towards verified artificial intelligence. CoRR, arXiv:1606.08514, .
  44. Siemens, G. (2013). Learning analytics: the emergence of a discipline. American Behavioral Scientist, 57(10), 1380–1400.CrossRefGoogle Scholar
  45. Silver, N. (2012). The signal and the noise: why so many predictions fail–but some don’t. Penguin.Google Scholar
  46. U.S. Department of Education, Office of Educational Technology. (2012). Enhancing teaching and learning through educational data mining and learning analytics: an issue brief.Google Scholar
  47. van der Zee, T., & Reich, J. (2018). Open education science. AERA Open, 4 (3), 2332858418787466.Google Scholar
  48. Xing, W., Chen, X., Stein, J., Marcinkowski, M. (2016). Temporal predication of dropouts in moocs Reaching the low hanging fruit through stacking generalization. Comput. Hum. Behav., 58, 119–129.CrossRefGoogle Scholar
  49. Yudelson, M., Fancsali, S., Ritter, S., Berman, S., Nixon, T., Joshi, A. (2014). Better data beats big data. In Educational data mining 2014.Google Scholar

Copyright information

© International Artificial Intelligence in Education Society 2019

Authors and Affiliations

  1. 1.Weizmann Institute of ScienceRehovotIsrael
  2. 2.Massachusetts Institute of TechnologyCambridgeUSA
  3. 3.University of HoustonHoustonUSA

Personalised recommendations