In this article we describe the emerging area of text classification research focused on the problem of collaborative learning process analysis both from a broad perspective and more specifically in terms of a publicly available tool set called TagHelper tools. Analyzing the variety of pedagogically valuable facets of learners’ interactions is a time consuming and effortful process. Improving automated analyses of such highly valued processes of collaborative learning by adapting and applying recent text classification technologies would make it a less arduous task to obtain insights from corpus data. This endeavor also holds the potential for enabling substantially improved on-line instruction both by providing teachers and facilitators with reports about the groups they are moderating and by triggering context sensitive collaborative learning support on an as-needed basis. In this article, we report on an interdisciplinary research project, which has been investigating the effectiveness of applying text classification technology to a large CSCL corpus that has been analyzed by human coders using a theory-based multi-dimensional coding scheme. We report promising results and include an in-depth discussion of important issues such as reliability, validity, and efficiency that should be considered when deciding on the appropriateness of adopting a new technology such as TagHelper tools. One major technical contribution of this work is a demonstration that an important piece of the work towards making text classification technology effective for this purpose is designing and building linguistic pattern detectors, otherwise known as features, that can be extracted reliably from texts and that have high predictive power for the categories of discourse actions that the CSCL community is interested in.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
TagHelper tools can be downloaded from http://www.cs.cmu.edu/~cprose/TagHelper.html.
Aleven, V., Koedinger, K. R., & Popescu, O. (2003). A tutorial dialogue system to support self-explanation: Evaluation and open questions. Proceedings of the 11th International Conference on Artificial Intelligence in Education (AI-ED 2003) pp. 39–46. Amsterdam: IOS Press.
Berkowitz, M., & Gibbs, J. (1983). Measuring the developmental features of moral discussion. Merrill-Palmer Quarterly, 29, 399–410.
Burstein, J., Kukich, K., Wolff, S., Chi, L., & Chodorow, M. (1998). Enriching automated essay scoring using discourse marking. Proceedings of the Workshop on Discourse Relations and Discourse Marking, Annual Meeting of the Association of Computational Linguistics, Motreal, Canada, pp. 15–21.
Burstein, J., Marcu, D., Andreyev, S., & Chodorow, M. (2001). Towards automatic classification of discourse elements in essays. In Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, Toulouse, France, pp. 98–105.
Cakir, M., Xhafa, F., Zhou, N., & Stahl, G. (2005). Thread-based analysis of patterns of collaborative interaction in chat. Proceedings of the 12th international conference on Artificial Intelligence in Education (AI-Ed 2005), Amsterdam, The Netherlands, pp. 120–127.
Carbonell, J., & Goldstein, J. (1998). The use of MMR, diversity based reranking for reordering documents and producing summaries, Proceedings of ACM SIG-IR 1998.
Carvalho, V., & Cohen, W. (2005). On the collective classification of email “Speech Acts.” Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval pp. 345–352. New York: ACM Press.
Chi, M. T. H., de Leeuw, N., Chiu, M. H., & LaVancher, C. (1994). Eliciting self-explanations improves understanding. Cognitive Science, 18(3), 439–477.
Cohen, J. A. (1960). Coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46.
Cohen, W. (2004). Minorthird: Methods for identifying names and ontological relations in text using heuristics for inducing regularities from data. Retrieved from http://minorthird.sourceforge.net.
Collins, M. (2002). Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), pp. 1–8.
Craven, M., DiPasquo, D., Freitag, D., McCallum, A., Mitchell, T., Nigam, K., et al. (1998). Learning to extract symbolic knowledge from the World Wide Web. Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-98), pp. 509–516.
De Wever, B., Schellens, T., Valcke, M., & Van Keer, H. (2006). Content analysis schemes to analyze transcripts of online asynchronous discussion groups: A review. Computers and Education, 46, 6–28.
Dillenbourg, P., Baker, M., Blaye, A., & O'Malley, C. (1995). The evolution of research on collaborative learning. In E. Spada, & P. Reiman (Eds.) Learning in humans and machine: Towards an interdisciplinary learning science (pp. 189–211). Oxford: Elsevier.
Dönmez, P., Rosé, C. P., Stegmann, K., Weinberger, A., & Fischer, F. (2005). Supporting CSCL with automatic corpus analysis technology. In T. Koschmann, D. Suthers, & T.-W. Chan (Eds.) Proceedings of the International Conference on Computer Supported Collaborative Learning—CSCL 2005 (pp. 125–134). Taipei, TW: Lawrence Erlbaum.
Erkens, G., & Janssen, J. (2006). Automatic coding of communication in collaboration protocols. In S. A. Barab, K. E. Hay, & D. T. Hickey (Eds.) Proceedings of the 7th International Conference of the Learning Sciences (ICLS) (vol. 2, (pp. 1063–1064)). Mahwah, NJ: Lawrence Erlbaum Associates.
Evens, M., & Michael, J. (2003). One-on-one tutoring by humans and machines. Mahwah, NJ: Lawrence Earlbaum Associates.
Fischer, F., Bruhn, J., Gräsel, C., & Mandl, H. (2002). Fostering collaborative knowledge construction with visualization tools. Learning and Instruction, 12, 213–232.
Fleiss, J. L., & Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement, 33, 613–619.
Foltz, P., Kintsch, W., & Landauer, T. (1998). The measurement of textual coherence with latent semantic analysis. Discourse Processes, 25, 285–308.
Fuernkranz, J. (2002). Round robin classification. Journal of Machine Learning Research, 2, 721–747.
Goodman, B., Linton, F., Gaimari, R., Hitzeman, J., Ross, H., & Zarrella, J. (2005). Using dialogue features to predict trouble during collaborative learning. Journal of User Modeling and User Adapted Interaction, 15(102), 85–134.
Gong, Y., & Liu, X. (2001). Generic text summarization using relevance measure and latent semantic analysis, Proceedings of ACM SIG-IR 2001.
Graesser, A. C., Bowers, C. A., Hacker, D. J., & Person, N. K. (1998). An anatomy of naturalistic tutoring. Scaffolding of instruction. In K. Hogan, & M. Pressley (Eds.) . Brooklyn, MA: Brookline Books.
Gweon, G., Rosé, C. P., Albright, E., & Cui, Y. (2007). Evaluating the effect of feedback from a CSCL problem solving environment on learning, interaction, and perceived interdependence. Proceedings of CSCL 2007.
Gweon, G., Rosé, C. P., Wittwer, J., & Nueckles, M. (2005). An adaptive interface that facilitates reliable content analysis of corpus data. Proceedings of the 10th IFIP TC13 International Conference on Human-Computer Interaction (Interact’05), Rome, Italy.
Gweon, G., Rosé, C. P., Zaiss, Z., & Carey, R. (2006). Providing support for adaptive scripting in an on-line collaborative learning environment. Proceedings of CHI 06: ACM conference on human factors in computer systems. New York: ACM Press.
Hachey, B., & Grover, C. (2005). Sequence modeling for sentence classification in a legal summarization system. Proceedings of the 2005 ACM Symposium on Applied Computing, pp. 292–296.
Henri, F. (1992). Computer conferencing and content analysis. In A. Kaye (Ed.) Collaborative learning through computer conferencing: The Najaden papers (pp. 117–136). Berlin: Springer.
Hmelo-Silver, C., & Chernobilsky, E. (2004). Understanding collaborative activity systems: The relation of tools and discourse in mediating learning, Proceedings of the 6th International Conference of the Learning Sciences (ICLS). Los Angeles, California pp. 254–261.
Joshi, M., & Rosé, C. P. (2007). Using transactivity in conversation summarization in educational dialog. In Proceedings of the SLaTE Workshop on Speech and Language Technology in Education.
King, A. (1998). Transactive peer tutoring: Distributing cognition and metacognition. Computer-supported cooperation scripts. Educational Psychology Review, 10, 57–74.
King, A. (1999). Discourse patterns for mediating peer learning. In A., O’Donnell, & A. King (Eds.) Cognitive perspectives on peer learning. New Jersey: Lawrence Erlbaum.
King, A. (2007). Scripting collaborative learning processes: A cognitive perspective. In F. Fischer, I. Kollar, H. Mandl, & J. M. Haake (Eds.) Scripting computer-supported collaborative learning: Cognitive, computational, and educational perspectives. New York: Springer.
Kollar, I., Fischer, F., & Hesse, F. W. (2006). Collaboration scripts—a conceptual analysis. Educational Psychology Review, 18(2), 159–185.
Kollar, I., Fischer, F., & Slotta, J. D. (2005). Internal and external collaboration scripts in webbased science learning at schools. In T. Koschmann, D. Suthers, & T. W. Chan (Eds.) Computer supported collaborative learning 2005: The next 10 years (pp. 331–340). Mahwah, NJ: Lawrence Erlbaum.
Krippendorf, K. (1980). Content analysis: An introduction to its methodology. Beverly Hills: Sage Publications.
Krippendorff, K. (2004). Reliability in content analysis: some common misconceptions and recommendations. Human Communication Research, 30, 411–433.
Kuhn, D. (1991). The skills of argument. Cambridge: Cambridge University Press.
Kumar, R., Rosé, C. P., Wang, Y. C., Joshi, M., & Robinson, A. (2007). Tutorial dialogue as adaptive collaborative learning support. Proceedings of the 13th International Conference on Artificial Intelligence in Education (AI-ED 2007). Amsterdam: IOS Press.
Kupiec, J., Pederson, J., & Chen, F. (1995). A trainable document summarizer, Proceedings of ACM SIG-IR 1995.
Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the International Conference on Machine Learning (ICML-2001), Williamstown, MA.
Laham, D. (2000). Automated content assessment of text using latent semantic analysis to simulate human cognition. PhD dissertation, University of Colorado, Boulder.
Landauer, T., & Dumais, S. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104, 211–240.
Leitão, S. (2000). The potential of argument in knowledge building. Human Development, 43, 332–360.
Lewis, D., Yang, Y., Rose, T., & Li, F. (2004). RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5, 361–397.
Litman, D., Rosé, C. P., Forbes-Riley, K., Silliman, S., & VanLehn, K. (2006). Spoken versus typed human and computer dialogue tutoring. International Journal of Artificial Intelligence in Education Special Issue on the Best of ITS’04, 16, 145–170.
Luckin, R. (2002). Between the lines: Documenting the multiple dimensions of computer-supported collaboration. Computers and Education, 41, 379–396.
McLaren, B., Scheuer, O., De Laat, M., Hever, R., de Groot, R., & Rośe, C. P. (2007). Using machine learning techniques to analyze and support mediation of student e-discussions. Proceedings of Artificial Intelligence in Education.
O'Donnell, A. M., & Dansereau, D. F. (1992). Scripted cooperation in student dyads: A method for analyzing and enhancing academic learning and performance. In R. Hertz-Lazarowitz, & N. Miller (Eds.) Interaction in cooperative groups. The theoretical anatomy of group learning (pp. 120–141). Cambridge, MA: Cambridge University Press.
Page, E. B. (1968). The use of the computer in analyzing student essays. International Review of Education, 14, 210–225.
Page, E. B., & Petersen, N. S. (1995). The computer moves into essay grading: Updating the ancient test. Phi Delta Kappan, 76, 561–565.
Pennebaker, J. W. (2003). The social, linguistic, and health consequences of emotional disclosure. In J. Suls, & K. A. Wallston (Eds.) Social psychological foundations of health and illness (pp. 288–313). Malden, MA: Blackwell.
Pennebaker, J. W., & Francis, M. E. (1996). Cognitive, emotional, and language processes in disclosure. Cognition and Emotion, 10, 601–626.
Pennebaker, J. W., Francis, M. E., & Booth, R. J. (2001). Linguistic inquiry and word count: LIWC. Mahwah, NJ: Erlbaum.
Piaget, J. (1985). The equilibrium of cognitive structures: The central problem of intellectual development. Chicago: Chicago University Press.
Roman, N., Piwek, P., & Carvalho, A. (2006). Politeness and bias in dialogue summarization: Two exploratory studies, in J. Shanahan, Y. Qu, & J. Wiebe (Eds.) Computing attitude and affect in text: Theory and Applications, the Information Retrieval Series. Dordrecht: Springer.
Rosé, C. P. (2000). A framework for robust semantic interpretation. Proceedings of 1st Meeting of the North American Chapter of the Association for Computational Linguistics.
Rosé, C., Dönmez, P., Gweon, G., Knight, A., Junker, B., Cohen, W., et al. (2005). Automatic and semi-automatic skill coding with a view towards supporting on-line assessment. Proceedings of the 12th International Conference on Artificial Intelligence in Education (AI-ED 2005). Amsterdam: IOS Press.
Rosé, C. P., Gweon, G., Arguello, J., Finger, S., Smailagic, A., & Siewiorek, D. (2007). Towards and interactive assessment framework for engineering design project based learning. Proceedings of ASME 2007 International Design Engineering Technical Conferences an d Computers and Information in Engineering Conference.
Rosé, C. P., Jordan, P., Ringenberg, M., Siler, S., VanLehn, K., & Weinstein, A. (2001). Interactive conceptual tutoring in atlas-andes. In J. D. Moore, C. L. Redfield, & W. L. Johnson (Eds.) Artificial Intelligence in Education: AI-ED in the wired and wireless future, Proceedings of AI-ED 2001 (pp. 256–266). Amsterdam: IOS Press.
Rosé, C., Roque, A., Bhembe, D., & VanLehn, K. (2003). A hybrid text classification approach for analysis of student essays. Proceedings of the HLT-NAACL 03 Workshop on Educational Applications of NLP (pp. 68–75). Morristown, NJ: Association for Computational Linguistics.
Rosé C. P., & VanLehn, K. (2005). An evaluation of a hybrid language understanding approach for robust selection of tutoring goals. International Journal of AI in Education, 15(4).
Salomon, G., & Perkins, D. N. (1998). Individual and social aspects of learning. Review of Research in Education, 23, 1–4.
Schegloff, E., & Sacks, H. (1973). Opening up closings. Semiotica, 8, 289–327.
Schoor, C., & Bannert, M. (2007). Motivation and processes of social co-construction of knowledge during CSCL. Poster presented at the 12th Biennial Conference EARLI 2007, Budapest.
Serafin, R., & Di Eugenio, B. (2004). FLSA: Extending latent semantic analysis with features for dialogue act classification. Proceedings of the Association for Computational Linguistics. Morristown, NJ: Association for Computational Lingusitics.
Soller, A., & Lesgold, A. (2000). Modeling the Process of Collaborative Learning. Proceedings of the International Workshop on New Technologies in Collaborative Learning. Japan: Awaiji–Yumebutai.
Stahl, G. (2006). Group cognition: Computer support for building collaborative knowledge. Cambridge, MA: MIT Press.
Stegmann, K., Weinberger, A., & Fischer, F. (2007). Facilitating argumentative knowledge construction with computer-supported collaboration scripts. International Journal of Computer-Supported Collaborative Learning, 2(4).
Stegmann, K., Weinberger, A., Fischer, F., & Rosé, C. P. (2006). Automatische Analyse natürlich-sprachlicher Daten aus Onlinediskussionen [Automatic corpus analysis of natural language data of online discussions]. Paper presented at the 68th Tagung der Arbeitsgruppe für Empirische Pädagogische Forschung (AEPF, Working Group for Empirical Educational Research) Munich, Germany.
Stolcke, A., Ries, K., Coccaro, N., Shriberg, J., Bates, R., Jurafsku, D., et al. (2000). Dialogue act modeling for automatic tagging and recognition of conversational speech. Computational Linguistics, 26(3), 39–373.
Teasley, S. D. (1997). Talking about reasoning: How important is the peer in peer collaboration? In L. B. Resnick, R. Säljö, C. Pontecorvo, & B. Burge (Eds.) Discourse, tools and reasoning: Essays on situated cognition (pp. 361–384). Berlin: Springer.
Tsochantaridis, I., Hofmann, T., Joachims, T., & Altun, Y. (2004). Support vector learning for interdependent and structured output spaces. Proceedings of the International Conference on Machine Learning 2004.
van der Pol, J., Admiraal, W., & Simons, P. R. J. (2006). The affordance of anchored discussion for the collaborative processing of academic texts. International Journal of Computer-Supported Collaborative Learning, 1(3), 339–357.
VanLehn, K., Graesser, A., Jackson, G. T., Jordan, P., Olney, A., & Rosé, C. P. (2007). Natural language tutoring: A comparison of human tutors, computer tutors, and text. Cognitive Science, 31(1), 3–52.
Vapnik, V. (1995). The nature of statistical learning theory. Heidelberg: Springer.
Voss, J. F., & Van Dyke, J. A. (2001). Argumentation in psychology. Discourse Processes, 32(2 & 3), 89–111.
Wang, Y. C., Joshi, M., & Rosé, C. P. (2007). A feature based approach for leveraging context for classifying newsgroup style discussion segments. Proceedings of the Association for Computational Linguistics.
Wang, H. C., Rosé, C. P., Cui, Y., Chang, C. Y., Huang, C. C., & Li, T. Y. (2007b). Thinking hard together: The long and short of collaborative idea generation for scientific inquiry. Proceedings of Computer Supported Collaborative Learning (CSCL 2007), New Jersey.
Webb, N. M. (1989). Peer interaction and learning in small groups. International Journal of Educational Research, 13, 21–39.
Wegerif, R. (2006). A dialogic understanding of the relationship between CSCL and teaching thinking skills. International Journal of Computer-Supported Collaborative Learning, 1(1), 143–157.
Weinberger, A. (2003). Scripts for computer-supported collaborative learning. Effects of social and epistemic cooperation scripts on collaborative knowledge construction. Ludwig-Maximilian University, Munich. Retrieved from http://edoc.ub.uni-muenchen.de/archive/00001120/01/Weinberger_Armin.pdf.
Weinberger, A., & Fischer, F. (2006). A framework to analyze argumentative knowledge construction in computer-supported collaborative learning. Computers & Education, 46(1), 71–95.
Weinberger, A., Reiserer, M., Ertl, B., Fischer, F., & Mandl, H. (2005). Facilitating computer-supported collaborative learning with cooperation scripts. In R. Bromme, F. W. Hesse, & H. Spada (Eds.) Barriers and Biases in network-based knowledge communication in groups. Dordrecht: Kluwer.
Weiner, B. (1985). An attributional theory of achievement motivation and emotion. Psychological Review, 92, 548–573.
Wiebe, J., & Riloff, E. (2005). Creating Subjective and Objective Sentence Classifiers from Unnanotated Texts, Proceedings of the Sixth International Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2005), Springer LNCS, vol. 3406.
Wiebe, J., Wilson, T., Bruce, R., Bell, M., & Martin, M. (2004). Learning Subjective Language. Computational Linguistics, 30(3), 277–308.
Witten, I. H., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques (2nd ed.). Elsevier: San Francisco. ISBN:(ISBN 0-12-088407-0).
Yeh, A., & Hirschman, L. (2002). Background and overview for KDD Cup 2002 task 1: Information extraction from biomedical articles. SIGKDD Explorations, 4, 87–89.
Zechner, K. (2001). Automatic generation of concise summaries of spoken dialogues in unrestricted domains. Proceedings of ACM SIG-IR 2001.
Zhou, L., & Hovy, E. (2006). On the summarization of dynamically introduced information: Online discussions and blogs. In Proceedings of AAAI-2006 Spring Symposium on Computational Approaches to Analyzing Weblogs, Stanford, CA.
This work has grown out of an initiative jointly organized by the American National Science Foundation and the Deutsche Forschungsgemeinschaft to bring together educational psychologists and technology experts from Germany and from the USA to build a new research network for technology-supported education. This work was supported by the National Science Foundation grant number SBE0354420 to the Pittsburgh Science of Learning Center, Office of Naval Research, Cognitive and Neural Sciences Division Grant N00014-05-1-0043, and the Deutsche Forschungsgemeinschaft. We would also like to thank Jaime Carbonnel, William Cohen, Pinar Dönmez, Gahgene Gweon, Mahesh Joshi, Emil Albright, Edmund Huber, Rohit Kumar, Hao-Chuan Wang, Gerry Stahl, Hans Spada, Nikol Rummel, Kenneth Koedinger, Erin Walker, Bruce McLaren, Alexander Renkl, Matthias Nueckles, Rainer Bromme, Regina Jucks, Robert Kraut, and our very helpful anonymous reviewers for their contributions to this work.
About this article
Cite this article
Rosé, C., Wang, YC., Cui, Y. et al. Analyzing collaborative learning processes automatically: Exploiting the advances of computational linguistics in computer-supported collaborative learning. Computer Supported Learning 3, 237–271 (2008). https://doi.org/10.1007/s11412-007-9034-0
- Collaborative process analysis
- Machine learning
- Analysis tools