Skip to main content

Analyzing collaborative learning processes automatically: Exploiting the advances of computational linguistics in computer-supported collaborative learning

Abstract

In this article we describe the emerging area of text classification research focused on the problem of collaborative learning process analysis both from a broad perspective and more specifically in terms of a publicly available tool set called TagHelper tools. Analyzing the variety of pedagogically valuable facets of learners’ interactions is a time consuming and effortful process. Improving automated analyses of such highly valued processes of collaborative learning by adapting and applying recent text classification technologies would make it a less arduous task to obtain insights from corpus data. This endeavor also holds the potential for enabling substantially improved on-line instruction both by providing teachers and facilitators with reports about the groups they are moderating and by triggering context sensitive collaborative learning support on an as-needed basis. In this article, we report on an interdisciplinary research project, which has been investigating the effectiveness of applying text classification technology to a large CSCL corpus that has been analyzed by human coders using a theory-based multi-dimensional coding scheme. We report promising results and include an in-depth discussion of important issues such as reliability, validity, and efficiency that should be considered when deciding on the appropriateness of adopting a new technology such as TagHelper tools. One major technical contribution of this work is a demonstration that an important piece of the work towards making text classification technology effective for this purpose is designing and building linguistic pattern detectors, otherwise known as features, that can be extracted reliably from texts and that have high predictive power for the categories of discourse actions that the CSCL community is interested in.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Notes

  1. 1.

    TagHelper tools can be downloaded from http://www.cs.cmu.edu/~cprose/TagHelper.html.

References

  1. Aleven, V., Koedinger, K. R., & Popescu, O. (2003). A tutorial dialogue system to support self-explanation: Evaluation and open questions. Proceedings of the 11th International Conference on Artificial Intelligence in Education (AI-ED 2003) pp. 39–46. Amsterdam: IOS Press.

    Google Scholar 

  2. Berkowitz, M., & Gibbs, J. (1983). Measuring the developmental features of moral discussion. Merrill-Palmer Quarterly, 29, 399–410.

    Google Scholar 

  3. Burstein, J., Kukich, K., Wolff, S., Chi, L., & Chodorow, M. (1998). Enriching automated essay scoring using discourse marking. Proceedings of the Workshop on Discourse Relations and Discourse Marking, Annual Meeting of the Association of Computational Linguistics, Motreal, Canada, pp. 15–21.

  4. Burstein, J., Marcu, D., Andreyev, S., & Chodorow, M. (2001). Towards automatic classification of discourse elements in essays. In Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, Toulouse, France, pp. 98–105.

  5. Cakir, M., Xhafa, F., Zhou, N., & Stahl, G. (2005). Thread-based analysis of patterns of collaborative interaction in chat. Proceedings of the 12th international conference on Artificial Intelligence in Education (AI-Ed 2005), Amsterdam, The Netherlands, pp. 120–127.

  6. Carbonell, J., & Goldstein, J. (1998). The use of MMR, diversity based reranking for reordering documents and producing summaries, Proceedings of ACM SIG-IR 1998.

  7. Carvalho, V., & Cohen, W. (2005). On the collective classification of email “Speech Acts.” Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval pp. 345–352. New York: ACM Press.

    Google Scholar 

  8. Chi, M. T. H., de Leeuw, N., Chiu, M. H., & LaVancher, C. (1994). Eliciting self-explanations improves understanding. Cognitive Science, 18(3), 439–477.

    Article  Google Scholar 

  9. Cohen, J. A. (1960). Coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46.

    Google Scholar 

  10. Cohen, W. (2004). Minorthird: Methods for identifying names and ontological relations in text using heuristics for inducing regularities from data. Retrieved from http://minorthird.sourceforge.net.

  11. Collins, M. (2002). Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), pp. 1–8.

  12. Craven, M., DiPasquo, D., Freitag, D., McCallum, A., Mitchell, T., Nigam, K., et al. (1998). Learning to extract symbolic knowledge from the World Wide Web. Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-98), pp. 509–516.

  13. De Wever, B., Schellens, T., Valcke, M., & Van Keer, H. (2006). Content analysis schemes to analyze transcripts of online asynchronous discussion groups: A review. Computers and Education, 46, 6–28.

    Article  Google Scholar 

  14. Dillenbourg, P., Baker, M., Blaye, A., & O'Malley, C. (1995). The evolution of research on collaborative learning. In E. Spada, & P. Reiman (Eds.) Learning in humans and machine: Towards an interdisciplinary learning science (pp. 189–211). Oxford: Elsevier.

    Google Scholar 

  15. Dönmez, P., Rosé, C. P., Stegmann, K., Weinberger, A., & Fischer, F. (2005). Supporting CSCL with automatic corpus analysis technology. In T. Koschmann, D. Suthers, & T.-W. Chan (Eds.) Proceedings of the International Conference on Computer Supported Collaborative Learning—CSCL 2005 (pp. 125–134). Taipei, TW: Lawrence Erlbaum.

    Google Scholar 

  16. Erkens, G., & Janssen, J. (2006). Automatic coding of communication in collaboration protocols. In S. A. Barab, K. E. Hay, & D. T. Hickey (Eds.) Proceedings of the 7th International Conference of the Learning Sciences (ICLS) (vol. 2, (pp. 1063–1064)). Mahwah, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  17. Evens, M., & Michael, J. (2003). One-on-one tutoring by humans and machines. Mahwah, NJ: Lawrence Earlbaum Associates.

    Google Scholar 

  18. Fischer, F., Bruhn, J., Gräsel, C., & Mandl, H. (2002). Fostering collaborative knowledge construction with visualization tools. Learning and Instruction, 12, 213–232.

    Article  Google Scholar 

  19. Fleiss, J. L., & Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement, 33, 613–619.

    Article  Google Scholar 

  20. Foltz, P., Kintsch, W., & Landauer, T. (1998). The measurement of textual coherence with latent semantic analysis. Discourse Processes, 25, 285–308.

    Article  Google Scholar 

  21. Fuernkranz, J. (2002). Round robin classification. Journal of Machine Learning Research, 2, 721–747.

    Article  Google Scholar 

  22. Goodman, B., Linton, F., Gaimari, R., Hitzeman, J., Ross, H., & Zarrella, J. (2005). Using dialogue features to predict trouble during collaborative learning. Journal of User Modeling and User Adapted Interaction, 15(102), 85–134.

    Article  Google Scholar 

  23. Gong, Y., & Liu, X. (2001). Generic text summarization using relevance measure and latent semantic analysis, Proceedings of ACM SIG-IR 2001.

  24. Graesser, A. C., Bowers, C. A., Hacker, D. J., & Person, N. K. (1998). An anatomy of naturalistic tutoring. Scaffolding of instruction. In K. Hogan, & M. Pressley (Eds.) . Brooklyn, MA: Brookline Books.

    Google Scholar 

  25. Gweon, G., Rosé, C. P., Albright, E., & Cui, Y. (2007). Evaluating the effect of feedback from a CSCL problem solving environment on learning, interaction, and perceived interdependence. Proceedings of CSCL 2007.

  26. Gweon, G., Rosé, C. P., Wittwer, J., & Nueckles, M. (2005). An adaptive interface that facilitates reliable content analysis of corpus data. Proceedings of the 10th IFIP TC13 International Conference on Human-Computer Interaction (Interact’05), Rome, Italy.

  27. Gweon, G., Rosé, C. P., Zaiss, Z., & Carey, R. (2006). Providing support for adaptive scripting in an on-line collaborative learning environment. Proceedings of CHI 06: ACM conference on human factors in computer systems. New York: ACM Press.

    Google Scholar 

  28. Hachey, B., & Grover, C. (2005). Sequence modeling for sentence classification in a legal summarization system. Proceedings of the 2005 ACM Symposium on Applied Computing, pp. 292–296.

  29. Henri, F. (1992). Computer conferencing and content analysis. In A. Kaye (Ed.) Collaborative learning through computer conferencing: The Najaden papers (pp. 117–136). Berlin: Springer.

    Google Scholar 

  30. Hmelo-Silver, C., & Chernobilsky, E. (2004). Understanding collaborative activity systems: The relation of tools and discourse in mediating learning, Proceedings of the 6th International Conference of the Learning Sciences (ICLS). Los Angeles, California pp. 254–261.

  31. Joshi, M., & Rosé, C. P. (2007). Using transactivity in conversation summarization in educational dialog. In Proceedings of the SLaTE Workshop on Speech and Language Technology in Education.

  32. King, A. (1998). Transactive peer tutoring: Distributing cognition and metacognition. Computer-supported cooperation scripts. Educational Psychology Review, 10, 57–74.

    Article  Google Scholar 

  33. King, A. (1999). Discourse patterns for mediating peer learning. In A., O’Donnell, & A. King (Eds.) Cognitive perspectives on peer learning. New Jersey: Lawrence Erlbaum.

    Google Scholar 

  34. King, A. (2007). Scripting collaborative learning processes: A cognitive perspective. In F. Fischer, I. Kollar, H. Mandl, & J. M. Haake (Eds.) Scripting computer-supported collaborative learning: Cognitive, computational, and educational perspectives. New York: Springer.

    Google Scholar 

  35. Kollar, I., Fischer, F., & Hesse, F. W. (2006). Collaboration scripts—a conceptual analysis. Educational Psychology Review, 18(2), 159–185.

    Article  Google Scholar 

  36. Kollar, I., Fischer, F., & Slotta, J. D. (2005). Internal and external collaboration scripts in webbased science learning at schools. In T. Koschmann, D. Suthers, & T. W. Chan (Eds.) Computer supported collaborative learning 2005: The next 10 years (pp. 331–340). Mahwah, NJ: Lawrence Erlbaum.

    Google Scholar 

  37. Krippendorf, K. (1980). Content analysis: An introduction to its methodology. Beverly Hills: Sage Publications.

    Google Scholar 

  38. Krippendorff, K. (2004). Reliability in content analysis: some common misconceptions and recommendations. Human Communication Research, 30, 411–433.

    Google Scholar 

  39. Kuhn, D. (1991). The skills of argument. Cambridge: Cambridge University Press.

    Google Scholar 

  40. Kumar, R., Rosé, C. P., Wang, Y. C., Joshi, M., & Robinson, A. (2007). Tutorial dialogue as adaptive collaborative learning support. Proceedings of the 13th International Conference on Artificial Intelligence in Education (AI-ED 2007). Amsterdam: IOS Press.

    Google Scholar 

  41. Kupiec, J., Pederson, J., & Chen, F. (1995). A trainable document summarizer, Proceedings of ACM SIG-IR 1995.

  42. Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the International Conference on Machine Learning (ICML-2001), Williamstown, MA.

  43. Laham, D. (2000). Automated content assessment of text using latent semantic analysis to simulate human cognition. PhD dissertation, University of Colorado, Boulder.

  44. Landauer, T., & Dumais, S. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104, 211–240.

    Article  Google Scholar 

  45. Leitão, S. (2000). The potential of argument in knowledge building. Human Development, 43, 332–360.

    Article  Google Scholar 

  46. Lewis, D., Yang, Y., Rose, T., & Li, F. (2004). RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5, 361–397.

    Google Scholar 

  47. Litman, D., Rosé, C. P., Forbes-Riley, K., Silliman, S., & VanLehn, K. (2006). Spoken versus typed human and computer dialogue tutoring. International Journal of Artificial Intelligence in Education Special Issue on the Best of ITS’04, 16, 145–170.

    Google Scholar 

  48. Luckin, R. (2002). Between the lines: Documenting the multiple dimensions of computer-supported collaboration. Computers and Education, 41, 379–396.

    Article  Google Scholar 

  49. McLaren, B., Scheuer, O., De Laat, M., Hever, R., de Groot, R., & Rośe, C. P. (2007). Using machine learning techniques to analyze and support mediation of student e-discussions. Proceedings of Artificial Intelligence in Education.

  50. O'Donnell, A. M., & Dansereau, D. F. (1992). Scripted cooperation in student dyads: A method for analyzing and enhancing academic learning and performance. In R. Hertz-Lazarowitz, & N. Miller (Eds.) Interaction in cooperative groups. The theoretical anatomy of group learning (pp. 120–141). Cambridge, MA: Cambridge University Press.

    Google Scholar 

  51. Page, E. B. (1968). The use of the computer in analyzing student essays. International Review of Education, 14, 210–225.

    Article  Google Scholar 

  52. Page, E. B., & Petersen, N. S. (1995). The computer moves into essay grading: Updating the ancient test. Phi Delta Kappan, 76, 561–565.

    Google Scholar 

  53. Pennebaker, J. W. (2003). The social, linguistic, and health consequences of emotional disclosure. In J. Suls, & K. A. Wallston (Eds.) Social psychological foundations of health and illness (pp. 288–313). Malden, MA: Blackwell.

    Google Scholar 

  54. Pennebaker, J. W., & Francis, M. E. (1996). Cognitive, emotional, and language processes in disclosure. Cognition and Emotion, 10, 601–626.

    Article  Google Scholar 

  55. Pennebaker, J. W., Francis, M. E., & Booth, R. J. (2001). Linguistic inquiry and word count: LIWC. Mahwah, NJ: Erlbaum.

    Google Scholar 

  56. Piaget, J. (1985). The equilibrium of cognitive structures: The central problem of intellectual development. Chicago: Chicago University Press.

    Google Scholar 

  57. Roman, N., Piwek, P., & Carvalho, A. (2006). Politeness and bias in dialogue summarization: Two exploratory studies, in J. Shanahan, Y. Qu, & J. Wiebe (Eds.) Computing attitude and affect in text: Theory and Applications, the Information Retrieval Series. Dordrecht: Springer.

  58. Rosé, C. P. (2000). A framework for robust semantic interpretation. Proceedings of 1st Meeting of the North American Chapter of the Association for Computational Linguistics.

  59. Rosé, C., Dönmez, P., Gweon, G., Knight, A., Junker, B., Cohen, W., et al. (2005). Automatic and semi-automatic skill coding with a view towards supporting on-line assessment. Proceedings of the 12th International Conference on Artificial Intelligence in Education (AI-ED 2005). Amsterdam: IOS Press.

    Google Scholar 

  60. Rosé, C. P., Gweon, G., Arguello, J., Finger, S., Smailagic, A., & Siewiorek, D. (2007). Towards and interactive assessment framework for engineering design project based learning. Proceedings of ASME 2007 International Design Engineering Technical Conferences an d Computers and Information in Engineering Conference.

  61. Rosé, C. P., Jordan, P., Ringenberg, M., Siler, S., VanLehn, K., & Weinstein, A. (2001). Interactive conceptual tutoring in atlas-andes. In J. D. Moore, C. L. Redfield, & W. L. Johnson (Eds.) Artificial Intelligence in Education: AI-ED in the wired and wireless future, Proceedings of AI-ED 2001 (pp. 256–266). Amsterdam: IOS Press.

    Google Scholar 

  62. Rosé, C., Roque, A., Bhembe, D., & VanLehn, K. (2003). A hybrid text classification approach for analysis of student essays. Proceedings of the HLT-NAACL 03 Workshop on Educational Applications of NLP (pp. 68–75). Morristown, NJ: Association for Computational Linguistics.

  63. Rosé C. P., & VanLehn, K. (2005). An evaluation of a hybrid language understanding approach for robust selection of tutoring goals. International Journal of AI in Education, 15(4).

  64. Salomon, G., & Perkins, D. N. (1998). Individual and social aspects of learning. Review of Research in Education, 23, 1–4.

    Google Scholar 

  65. Schegloff, E., & Sacks, H. (1973). Opening up closings. Semiotica, 8, 289–327.

    Google Scholar 

  66. Schoor, C., & Bannert, M. (2007). Motivation and processes of social co-construction of knowledge during CSCL. Poster presented at the 12th Biennial Conference EARLI 2007, Budapest.

  67. Serafin, R., & Di Eugenio, B. (2004). FLSA: Extending latent semantic analysis with features for dialogue act classification. Proceedings of the Association for Computational Linguistics. Morristown, NJ: Association for Computational Lingusitics.

  68. Soller, A., & Lesgold, A. (2000). Modeling the Process of Collaborative Learning. Proceedings of the International Workshop on New Technologies in Collaborative Learning. Japan: Awaiji–Yumebutai.

  69. Stahl, G. (2006). Group cognition: Computer support for building collaborative knowledge. Cambridge, MA: MIT Press.

  70. Stegmann, K., Weinberger, A., & Fischer, F. (2007). Facilitating argumentative knowledge construction with computer-supported collaboration scripts. International Journal of Computer-Supported Collaborative Learning, 2(4).

  71. Stegmann, K., Weinberger, A., Fischer, F., & Rosé, C. P. (2006). Automatische Analyse natürlich-sprachlicher Daten aus Onlinediskussionen [Automatic corpus analysis of natural language data of online discussions]. Paper presented at the 68th Tagung der Arbeitsgruppe für Empirische Pädagogische Forschung (AEPF, Working Group for Empirical Educational Research) Munich, Germany.

  72. Stolcke, A., Ries, K., Coccaro, N., Shriberg, J., Bates, R., Jurafsku, D., et al. (2000). Dialogue act modeling for automatic tagging and recognition of conversational speech. Computational Linguistics, 26(3), 39–373.

    Article  Google Scholar 

  73. Teasley, S. D. (1997). Talking about reasoning: How important is the peer in peer collaboration? In L. B. Resnick, R. Säljö, C. Pontecorvo, & B. Burge (Eds.) Discourse, tools and reasoning: Essays on situated cognition (pp. 361–384). Berlin: Springer.

    Google Scholar 

  74. Tsochantaridis, I., Hofmann, T., Joachims, T., & Altun, Y. (2004). Support vector learning for interdependent and structured output spaces. Proceedings of the International Conference on Machine Learning 2004.

  75. van der Pol, J., Admiraal, W., & Simons, P. R. J. (2006). The affordance of anchored discussion for the collaborative processing of academic texts. International Journal of Computer-Supported Collaborative Learning, 1(3), 339–357.

    Article  Google Scholar 

  76. VanLehn, K., Graesser, A., Jackson, G. T., Jordan, P., Olney, A., & Rosé, C. P. (2007). Natural language tutoring: A comparison of human tutors, computer tutors, and text. Cognitive Science, 31(1), 3–52.

    Google Scholar 

  77. Vapnik, V. (1995). The nature of statistical learning theory. Heidelberg: Springer.

    Google Scholar 

  78. Voss, J. F., & Van Dyke, J. A. (2001). Argumentation in psychology. Discourse Processes, 32(2 & 3), 89–111.

    Article  Google Scholar 

  79. Wang, Y. C., Joshi, M., & Rosé, C. P. (2007). A feature based approach for leveraging context for classifying newsgroup style discussion segments. Proceedings of the Association for Computational Linguistics.

  80. Wang, H. C., Rosé, C. P., Cui, Y., Chang, C. Y., Huang, C. C., & Li, T. Y. (2007b). Thinking hard together: The long and short of collaborative idea generation for scientific inquiry. Proceedings of Computer Supported Collaborative Learning (CSCL 2007), New Jersey.

  81. Webb, N. M. (1989). Peer interaction and learning in small groups. International Journal of Educational Research, 13, 21–39.

    Article  Google Scholar 

  82. Wegerif, R. (2006). A dialogic understanding of the relationship between CSCL and teaching thinking skills. International Journal of Computer-Supported Collaborative Learning, 1(1), 143–157.

    Article  Google Scholar 

  83. Weinberger, A. (2003). Scripts for computer-supported collaborative learning. Effects of social and epistemic cooperation scripts on collaborative knowledge construction. Ludwig-Maximilian University, Munich. Retrieved from http://edoc.ub.uni-muenchen.de/archive/00001120/01/Weinberger_Armin.pdf.

  84. Weinberger, A., & Fischer, F. (2006). A framework to analyze argumentative knowledge construction in computer-supported collaborative learning. Computers & Education, 46(1), 71–95.

    Article  Google Scholar 

  85. Weinberger, A., Reiserer, M., Ertl, B., Fischer, F., & Mandl, H. (2005). Facilitating computer-supported collaborative learning with cooperation scripts. In R. Bromme, F. W. Hesse, & H. Spada (Eds.) Barriers and Biases in network-based knowledge communication in groups. Dordrecht: Kluwer.

    Google Scholar 

  86. Weiner, B. (1985). An attributional theory of achievement motivation and emotion. Psychological Review, 92, 548–573.

    Article  Google Scholar 

  87. Wiebe, J., & Riloff, E. (2005). Creating Subjective and Objective Sentence Classifiers from Unnanotated Texts, Proceedings of the Sixth International Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2005), Springer LNCS, vol. 3406.

  88. Wiebe, J., Wilson, T., Bruce, R., Bell, M., & Martin, M. (2004). Learning Subjective Language. Computational Linguistics, 30(3), 277–308.

    Article  Google Scholar 

  89. Witten, I. H., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques (2nd ed.). Elsevier: San Francisco. ISBN:(ISBN 0-12-088407-0).

    Google Scholar 

  90. Yeh, A., & Hirschman, L. (2002). Background and overview for KDD Cup 2002 task 1: Information extraction from biomedical articles. SIGKDD Explorations, 4, 87–89.

    Article  Google Scholar 

  91. Zechner, K. (2001). Automatic generation of concise summaries of spoken dialogues in unrestricted domains. Proceedings of ACM SIG-IR 2001.

  92. Zhou, L., & Hovy, E. (2006). On the summarization of dynamically introduced information: Online discussions and blogs. In Proceedings of AAAI-2006 Spring Symposium on Computational Approaches to Analyzing Weblogs, Stanford, CA.

Download references

Acknowledgement

This work has grown out of an initiative jointly organized by the American National Science Foundation and the Deutsche Forschungsgemeinschaft to bring together educational psychologists and technology experts from Germany and from the USA to build a new research network for technology-supported education. This work was supported by the National Science Foundation grant number SBE0354420 to the Pittsburgh Science of Learning Center, Office of Naval Research, Cognitive and Neural Sciences Division Grant N00014-05-1-0043, and the Deutsche Forschungsgemeinschaft. We would also like to thank Jaime Carbonnel, William Cohen, Pinar Dönmez, Gahgene Gweon, Mahesh Joshi, Emil Albright, Edmund Huber, Rohit Kumar, Hao-Chuan Wang, Gerry Stahl, Hans Spada, Nikol Rummel, Kenneth Koedinger, Erin Walker, Bruce McLaren, Alexander Renkl, Matthias Nueckles, Rainer Bromme, Regina Jucks, Robert Kraut, and our very helpful anonymous reviewers for their contributions to this work.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Carolyn Rosé.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Rosé, C., Wang, YC., Cui, Y. et al. Analyzing collaborative learning processes automatically: Exploiting the advances of computational linguistics in computer-supported collaborative learning. Computer Supported Learning 3, 237–271 (2008). https://doi.org/10.1007/s11412-007-9034-0

Download citation

Keywords

  • Collaborative process analysis
  • Machine learning
  • Analysis tools