Abstract
Cluster analysis is a group of statistical methods that has great potential for analyzing the vast amounts of web server-log data to understand student learning from hyperlinked information resources. In this methodological paper we provide an introduction to cluster analysis for educational technology researchers and illustrate its use through two examples of mining click-stream server-log data that reflects student use of online learning environments. Cluster analysis can be used to help researchers develop profiles that are grounded in learner activity—like sequence for accessing tasks and information, or time spent engaged in a given activity or examining resources—during a learning session. The examples in this paper illustrate the use of a hierarchical clustering method (Ward’s clustering) and a non-hierarchical clustering method (k-Means clustering) to analyze characteristics of learning behavior while learners engage in a problem-solving activity in an online learning environment. A discussion of advantages and limitations of using cluster analysis as a data mining technique in educational technology research concludes the article.
This is a preview of subscription content, access via your institution.


References
Aldenderfer, M. S., & Blashfield, R. K. (1984). Cluster analysis. Beverly Hills: Sage Press.
Barab, S. A., Bowdish, B. E., & Lawless, K. A. (1997). Hypermedia navigation: profiles of hypermedia users. Educational Technology Research and Development, 45(3), 23–42.
Belland, B. R., French, B., & Ertmer, P. A. (2009). Validity and problem-based learning research: a review of instruments used to assess intended learning outcomes. Interdisciplinary Journal of Problem-based Learning, 3(1), 59–89.
Belland, B. R., Glazewski, K. D., & Richardson, J. C. (2010). Problem-based learning and argumentation: testing a scaffolding framework to support middle school students’ creation of evidence-based arguments. Instructional Science,. doi:10.1007/s11251-010-9148-z.
Clark, R. E. (2010). Cognitive and neuroscience research on learning and instruction: recent insights about the impact of non-conscious knowledge on problem solving, higher order thinking skills and interactive cyber-learning environments. Presented at the International Conference on Education Research (ICER), Seoul.
Cronbach, L. J., & Gleser, G. C. (1953). Assessing similarity between profiles. Psychological Bulletin, 50, 456–473.
Donner, A., & Koval, J. J. (1980). The estimation of intraclass correlation in the analysis of family data. Biometrics, 36(1), 19–25.
Everitt, B. S., Landau, S., & Leese, M. (2009). Cluster analysis (4th ed.) London: Arnold.
Facione, P. A., & Facione, N. C. (1994). Holistic critical thinking scoring rubric. Millbrae: California Academic Press.
Feldon, D. F. (2007). Implications of research on expertise for curriculum and pedagogy. Educational Psychology Review, 19(2), 91–110.
Fielding, A. H. (2007). Cluster and classification techniques for the biosciences. Cambridge: Cambridge University Press.
Gijbels, D., Dochy, F., Van den Bossche, P., & Segers, M. (2005). Effects of problem-based learning: a meta-analysis from the angle of assessment. Review of Educational Research, 75(1), 27–61.
Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: a survey. ACM Computing Surveys, 31, 264–323.
Jeong, H., & Hmelo-Silver, C. E. (2010). Productive use of learning resources in an online problem-based learning environment. Computers in Human Behavior, 26, 84–99.
Jonassen, D. H. (1997). Instructional design models for well-structured and ill-structured problem-solving learning outcomes. Educational Technology Research and Development, 45(1), 65–94.
Jonassen, D. H. (2000). Toward a design theory of problem solving. Educational Technology Research and Development, 48(4), 63–85.
Kim, M. C., & Hannafin, M. J. (2011). Scaffolding problem solving in technology-enhanced learning environments (TELEs): bridging research and theory with practice. Computers & Education, 56, 403–417.
Kumsaikaew, P., Jackman, J., & Dark, V. J. (2006). Task relevant information in engineering problem solving. Journal of Engineering Education, 95, 227–239.
Lawless, K., & Kulikowich, J. (1996). Understanding hypertext navigation through cluster analysis. Journal of Educational Computing Research, 14(4), 385–399.
Leighton, J. P. (2004). Avoiding misconception, misuse, and missed opportunities: the collection of verbal reports in educational achievement testing. Educational Measurement: Issues and Practice, 23, 6–15.
MacQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In L. M. Le Cam & J. Neyman (Eds.) Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability (pp. 281–297). Berkley: University of California Press.
Milligan, G. W. (1980). A review of Monte Carlo tests of cluster analysis. Multivariate Behavioral Research, 16, 379–407.
Milligan, G. W., & Cooper, M. C. (1987). Methodology review: clustering methods. Applied Psychological Measurement, 11, 329–354.
Ng, R. T., & Han, J. (1994). Efficient and effective clustering methods for spatial data mining. In J. B. Bocca, M. Jarke, & C. Zaniolo (Eds.) Proceedings of the Twentieth International Conference on Very Large Databases (pp. 144–155). Santiago: Morgan Kaufmann.
Niederhauser, D. S., Antonenko, P., Ryan, S., Jackman, J., Ogilvie, C., Marathe, R., & Kumsaikaew, P. (2007). Solution strategies of more and less successful problem solvers in an online problem-based learning environment. Presented at the annual conference of the American Educational Research Association. Chicago, IL.
Nisbet, R., Elder, J., & Miner, G. (2009). Handbook of statistical analysis and data mining applications. London: Academic Press.
Norušis, M. (2005). SPSS 13.0 statistical procedures companion. Englewood Cliffs:Prentice Hall.
Ryan, S., Jackman, J., Kumsaikaew, P., Dark, V., & Olafsson, S. (2007). Use of information in collaborative problem solving. In D. H. Jonassen (Ed.), Learning to solve complex, scientific problems (pp. 187–204). Mahwah, NJ: Lawrence Erlbaum Associates.
Schrader, P. G., & Lawless, K. A. (2007). Dribble files: methodologies to evaluate learning and performance in complex environments. Performance Improvement, 46(1), 40–48.
Sokal, R. R., & Michener, C. D. (1958). A statistical method for evaluating systematic relationships. The University of Kansas Scientific Bulletin, 38, 1409–1438.
Stevens, R. H. (2007). Quantifying student’s scientific problem solving efficiency and effectiveness. Cognition and Learning, 5, 325–337.
Toy, S. (2008). Online ill-structured problem-solving strategies and their influence on problem-solving performance. Unpublished doctoral dissertation, Iowa State University, Ames, IA.
Ward, J. H. (1963). Hierarchical grouping to optimize an objective function. Journal of American Statistical Association, 58(301), 236–244.
Webb, A. (2002). Statistical pattern recognition. Hoboken: John Wiley.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Antonenko, P.D., Toy, S. & Niederhauser, D.S. Using cluster analysis for data mining in educational technology research. Education Tech Research Dev 60, 383–398 (2012). https://doi.org/10.1007/s11423-012-9235-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11423-012-9235-8