The ACODEA framework: Developing segmentation and classification schemes for fully automatic analysis of online discussions

  • Jin MuEmail author
  • Karsten Stegmann
  • Elijah Mayfield
  • Carolyn Rosé
  • Frank Fischer


Research related to online discussions frequently faces the problem of analyzing huge corpora. Natural Language Processing (NLP) technologies may allow automating this analysis. However, the state-of-the-art in machine learning and text mining approaches yields models that do not transfer well between corpora related to different topics. Also, segmenting is a necessary step, but frequently, trained models are very sensitive to the particulars of the segmentation that was used when the model was trained. Therefore, in prior published research on text classification in a CSCL context, the data was segmented by hand. We discuss work towards overcoming these challenges. We present a framework for developing coding schemes optimized for automatic segmentation and context-independent coding that builds on this segmentation. The key idea is to extract the semantic and syntactic features of each single word by using the techniques of part-of-speech tagging and named-entity recognition before the raw data can be segmented and classified. Our results show that the coding on the micro-argumentation dimension can be fully automated. Finally, we discuss how fully automated analysis can enable context-sensitive support for collaborative learning.


Online discussion Automatic content analysis Text classification 


  1. Andriessen, J., Baker, M., & Suthers, D. (2003). Argumentation, computer support, and the educational context of confronting cognitions. In J. Andriessen, M. Baker, & D. Suthers (Eds.), Arguing to learn: Confronting cognitions in computer-supported collaborative learning environments (pp. 1–25). Dordrecht: Kluwer Academic Publishers.Google Scholar
  2. Argamon, S., Koppel, M., Fine, J., & Shimoni, A. R. (2003). Gender, genre, and writing style in formal written texts. Text - Interdisciplinary Journal for the Study of Discourse, 23(3), 321–346. doi: 10.1515/text.2003.014.CrossRefGoogle Scholar
  3. Argamon, S., Koppel, M., Pennebaker, J. W., & Schler, J. (2007). Mining the blogosphere: Age, gender and the varieties of self-expression. First Monday 12(9).Google Scholar
  4. Arnold, A. O. (2009). Exploiting domain and task regularities for robust named entity recognition. PhD thesis, Carnegie Mellon University.Google Scholar
  5. Arora, S., Joshi, M., & Rosé, C. P. (2009). Identifying types of claims in online customer reviews. Paper presented at the Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers (pp. 37–40), Boulder, Colorado, USA.Google Scholar
  6. Arora, S., Mayfield, E., Rosé, C. P., & Nyberg, E. (2010). Sentiment classification using automatically extracted subgraph features. Paper presented at the Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text (pp. 131–139), Los Angeles, California, USA.Google Scholar
  7. Brill, E. (1992). A simple rule-based part of speech tagger. Paper presented at the Proceedings of the Third Conference on Applied Natural Language Processing (pp. 152–155), Trento, Italy.Google Scholar
  8. Castro, F., Vellido, A., Nebot, A., & Minguillon, J. (2005). Detecting atypical student behaviour on an e-learning system. Paper presented at the Simposio Nacional de Tecnologas de la Informacin y las Comunicaciones en la Educacion (pp. 153–160), Granada, Spain.Google Scholar
  9. Clark, D., Sampson, V., Weinberger, A., & Erkens, G. (2007). Analytic frameworks for assessing dialogic argumentation in online learning environments. Educational Psychology Review, 19(3), 343–374. doi: 10.1007/s10648-007-9050-7.CrossRefGoogle Scholar
  10. Corney, M., de Vel, O., Anderson, A., & Mohay, G. (2002). Gender-preferential text mining of e-mail discourse. Paper presented at the the 18th Annual Computer Security Applications Conference (pp. 21–27), Las Vegas, NV, USA.Google Scholar
  11. Daumé III, H. (2007). Frustratingly easy domain adaptation. Paper presented at the the 45th Annual Meeting of the Association of Computational Linguistics (pp. 256–263), Prague, Czech Republic.Google Scholar
  12. De Laat, M., & Lally, V. (2003). Complexity, theory and praxis: Researching collaborative learning and tutoring processes in a networked learning community. Instructional Science, 31(1), 7–39. doi: 10.1023/a:1022596100142.CrossRefGoogle Scholar
  13. De Wever, B., Schellens, T., Valcke, M., & Van Keer, H. (2006). Content analysis schemes to analyze transcripts of online asynchronous discussion groups: A review. Computers in Education, 46(1), 6–28. doi: 10.1016/j.compedu.2005.04.005.CrossRefGoogle Scholar
  14. Diziol, D., Walker, E., Rummel, N., & Koedinger, K. (2010). Using intelligent tutor technology to implement adaptive support for student collaboration. Educational Psychology Review, 22(1), 89–102.CrossRefGoogle Scholar
  15. Dönmez, P., Rosé, C., Stegmann, K., Weinberger, A., & Fischer, F. (2005). Supporting CSCL with automatic corpus analysis technology. Paper presented at the Proceedings of th 2005 Conference on Computer Support for Collaborative Learning: Learning 2005: The Next 10 Years! (pp. 125–134), Taipei, Taiwan.Google Scholar
  16. Duwairi, R. M. (2006). A framework for the computerized assessment of university student essays. Computers in Human Behavior, 22(3), 381–388.CrossRefGoogle Scholar
  17. Finkel, J., & Manning, C. (2009). Hierarchical bayesian domain adaptation. Paper presented at the Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (pp. 602–610), Boulder, Colorado, USA.Google Scholar
  18. Gianfortoni, P., Adamson, D., & Rosé, C. P. (2011). Modeling stylistic variation in social media with stretchy patters. Paper presented at the First Workshop on Algorithms and Resources for Modeling of Dialects and Language Varieties (pp. 49–59), Edinburgh, Scotland, UK.Google Scholar
  19. Girju, R. (2010). Towards social causality: An analysis of interpersonal relationships in online blogs and forums. Paper presented at the the Fourth International AAAI Conference on Weblogs and Social Media (pp. 251–260), Montreal, Quebec, Canada.Google Scholar
  20. Gweon, G., Rosé, C., Carey, R., & Zaiss, Z. (2006). Providing support for adaptive scripting in an on-line collaborative learning environment. Paper presented at the Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 251–260), Montreal, Quebec, Canada.Google Scholar
  21. Han, J., & Kamber, M. (2006). Data mining: Concepts and techniques. San Mateo: Morgan Kaufmann Publishers.Google Scholar
  22. Howley, I., Mayfield, E., & Rose, C. P. (2011). Missing something? Authority in collaborative learning. Paper presented at the Connecting Computer-Supported Collaborative Learning to Policy and Practice: CSCL2011 Conference (pp. 366–373), Hong Kong.Google Scholar
  23. Jiang, M., & Argamon, S. (2008). Political leaning categorization by exploring subjectivities in political blogs. Paper presented at the the 4th International Conference on Data Mining(pp. 647–653), Las Vegas, Nevada, USA.Google Scholar
  24. Joshi, M., & Rosé, C. P. (2009). Generalizing dependency features for opinion mining. Paper presented at the Proceedings of the ACL-IJCNLP 2009 Conference Short Papers (pp. 313–316), Suntec, Singapore.Google Scholar
  25. Klosgen, W., & Zytkow, J. (2002). Handbook of data mining and knowledge discovery. New York: Oxford University Press.Google Scholar
  26. Kumar, R., & Rosé, C. (2011). Architecture for building conversational agents that support collaborative learning. IEEE Transactions on Learning Technologies, 4(1), 21–34. doi: 10.1109/tlt.2010.41.CrossRefGoogle Scholar
  27. Kumar, R., Rosé, C., Wang, Y.-C., Joshi, M., & Robinson, A. (2007). Tutorial dialogue as adaptive collaborative learning support. Paper presented at the Proceeding of the 2007 Conference on Artificial Intelligence in Education: Building Technology Rich Learning Contexts That Work (pp. 383–390).Google Scholar
  28. Landauer, T. K. (2003). Automatic essay assessment. Assessment in Education: Principles, Policy & Practice, 10(3), 295–308. doi: 10.1080/0969594032000148154.CrossRefGoogle Scholar
  29. Mayfield, E., & Rosé, C. (2010a). An interactive tool for supporting error analysis for text mining. Paper presented at the Proceedings of the NAACL HLT 2010 Demonstration Session (pp. 25–28), Los Angeles, California.Google Scholar
  30. Mayfield, E., & Rosé, C. (2010b). Using feature construction to avoid large feature spaces in text classification. Paper presented at the Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation (pp. 1299–1306), Portland, Oregon, USA.Google Scholar
  31. Mayfield, E., & Rosé, C. P. (2011). Recognizing authority in dialogue with an integer linear programming constrained model. Paper presented at the Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1 (pp. 1018–1026), Portland, Oregon.Google Scholar
  32. Mora, G., & Peiró, J. A. S. (2007). Part-of-speech tagging based on machine translation techniques. Paper presented at the Proceedings of the 3rd Iberian Conference on Pattern Recognition and Image Analysis, Part I (pp. 257–264), Girona, Spain.Google Scholar
  33. MUC6. (1995). Paper presented at the the sixth message understanding conference. Maryland: Columbia.Google Scholar
  34. Mukherjee, A., & Liu, B. (2010). Improving gender classification of blog authors. Paper presented at the Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (pp. 207–217), Cambridge, Massachusetts.Google Scholar
  35. Poel, M., Stegeman, L., & op den Akker, R. (2007). A support vector machine approach to dutch part-of-speech tagging. In M. R. Berthold, J. Shawe-Taylor, & N. Lavrac (Eds.), Advances in intelligent data analysis VII (Vol. 4723, pp. 274–283). Berlin: Springer Verlag.CrossRefGoogle Scholar
  36. Romero, C., & Ventura, S. (2006). Data mining in e-learning. Southampton: Wit Press.CrossRefGoogle Scholar
  37. Rosé, C., & Vanlehn, K. (2005). An evaluation of a hybrid language understanding approach for robust selection of tutoring goals. International Journal of AI in Education, 15(4), 325–355.Google Scholar
  38. Rosé, C., Wang, Y.-C., Cui, Y., Arguello, J., Stegmann, K., Weinberger, A., & Fischer, F. (2008). Analyzing collaborative learning processes automatically: Exploiting the advances of computational linguistics in computer-supported collaborative learning. International Journal of Computer-Supported Collaborative Learning, 3(3), 237–271. doi: 10.1007/s11412-007-9034-0.CrossRefGoogle Scholar
  39. Schler, J. (2006). Effects of age and gender on blogging. Artificial Intelligence, 86, 82–84.Google Scholar
  40. Schler, J., Koppel, M., Argamon, S., & Pennebaker, J. (2006). Effects of age and gender on blogging. Paper presented at the Proc. of AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs (pp. 199–205), Stanford, California, USA.Google Scholar
  41. Stegmann, K., Weinberger, A., & Fischer, F. (2007). Facilitating argumentative knowledge construction with computer-supported collaboration scripts. International Journal of Computer-Supported Collaborative Learning, 2(4), 421–447. doi: 10.1007/s11412-007-9028-y.CrossRefGoogle Scholar
  42. Stegmann, K., Wecker, C., Weinberger, A., & Fischer, F. (2012). Collaborative argumentation and cognitive elaboration in a computer-supported collaborative learning environment. Instructional Science, 40(2), 297–323. doi: 10.1007/s11251-011-9174-5.CrossRefGoogle Scholar
  43. Strijbos, J.-W., Martens, R. L., Prins, F. J., & Jochems, W. M. G. (2006). Content analysis: What are they talking about? Computers in Education, 46(1), 29–48. doi: 10.1016/j.compedu.2005.04.002.CrossRefGoogle Scholar
  44. Tsur, O., Davidov, D., & Rappoport, A. (2010). ICWSM—a great catchy name: Semi-supervised recognition of sarcastic sentences in online product reviews. Paper presented at the the Fourth International AAAI Conference on Weblogs and Social Media (pp. 162–169), Washington, DC, USA.
  45. Walker, E., Rummel, N., & Koedinger, K. (2009). CTRL: A research framework for providing adaptive collaborative learning support. User Modeling and User-Adapted Interaction, 19(5), 387–431.CrossRefGoogle Scholar
  46. Wang, H.-C., Rosé, C., & Chang, C.-Y. (2011). Agent-based dynamic support for learning from collaborative brainstorming in scientific inquiry. International Journal of Computer-Supported Collaborative Learning, 6(3), 371–395. doi: 10.1007/s11412-011-9124-x.CrossRefGoogle Scholar
  47. Wecker, C., Stegmann, K., Bernstein, F., Huber, M., Kalus, G., Kollar, I., & Fischer, F. (2010). S-COL: A copernican turn for the development of flexibly reusable collaboration scripts. International Journal of Computer-Supported Collaborative Learning, 5(3), 321–343. doi: 10.1007/s11412-010-9093-5.CrossRefGoogle Scholar
  48. Weinberger, A., & Fischer, F. (2006). A framework to analyze argumentative knowledge construction in computer-supported collaboratice learning. [Journal]. Computers in Education, 46, 71–95.CrossRefGoogle Scholar
  49. Weiner, B. (1985). An attributional theory of achievement motivation and emotion. Psychological Review, 92(4), 548–573. doi: 10.1037/0033-295x.92.4.548.CrossRefGoogle Scholar
  50. Wiebe, J., Wilson, T., Bruce, R., Bell, M., & Martin, M. (2004). Learning subjective language. Computational Linguistics, 30(3), 277–308. doi: 10.1162/0891201041850885.CrossRefGoogle Scholar
  51. Witten, L. H., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques. San Francisco: Elsevier.Google Scholar
  52. Yan, X., & Yan, L. (2006). Gender classification of weblog authors. Paper presented at the the AAAI Spring Symposium Series Computational Approaches to Analyzing Weblogs(pp. 228–230), Stanford, California, USA.Google Scholar
  53. Zhang, Y., Dang, Y., & Chen, H. (2009). Gender difference analysis of political web forums: An experiment on an international Islamic women’s forums. Paper presented at the Proceedings of the 2009 IEEE International Conference on Intelligence and Security Informatics (pp. 61–64), Richardson, Texas, USA.Google Scholar

Copyright information

© International Society of the Learning Sciences, Inc.; Springer Science + Business Media, LLC 2012

Authors and Affiliations

  • Jin Mu
    • 1
    Email author
  • Karsten Stegmann
    • 1
    • 2
  • Elijah Mayfield
    • 3
  • Carolyn Rosé
    • 3
  • Frank Fischer
    • 1
  1. 1.Ludwig-Maximilians-Universität München, Empirische Pädagogik und Pädagogische PsychologieMunichGermany
  2. 2.Universität Koblenz-Landau, Institut Erziehungswissenschaft/PhilosophieLandauGermany
  3. 3.Language Technologies InstituteCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations