A Noun Phrase Analysis Tool for Mining Online Community Conversations

  • Caroline Haythornthwaite
  • Anatoliy Gruzd
Conference paper


Online communities are creating a growing legacy of texts in online bulletin board postings, chat, blogs, etc. These texts record conversation, knowledge exchange, and variation in focus as groups grow, mature, and decline; they represent a rich history of group interaction and an opportunity to explore the purpose and development of online communities. However, the quantity of data created by these communities is vast, and to address their processes in a timely manner requires automated processes. This raises questions about how to conduct automated analyses, and what can we gain from them: Can we gain an idea of community interests, priorities, and operation from automated examinations of texts of postings and patterns of posting behavior? Can we mine stored texts to discover patterns of language and interaction that characterize a community?


Noun Phrase Natural Language Processing Online Community Bulletin Board Unique Message 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Barrett, K. LaPointe, D. & Greysen, K. (Jan. 2004). Speak2Me: Using synchronous audio for ESL teaching in Taiwan. Report R28/0401, Athabasca University, Centre For Distance Education.Google Scholar
  2. Boguraev, B. & Kenned, C. (1999). “Applications of term identification technology: domain description and content characterization”, Natural Language Engineering 5(1): 17–14.CrossRefGoogle Scholar
  3. Boguraev, B., Wong, Y. Y., Kennedy, C, Bellamy, R., Brawer, S., and Swartz, J. (1998). Dynamic presentation of document content for rapid on-line browsing. AAAI Spring Symposium on Intelligent Text Summarization, Stanford, CA. 118–128.Google Scholar
  4. Brants, T. (2000). “TnT: A statistical part-of-speech tagger”, in Proceedings of the 6th Conference on Applied Natural Language Processing (Seattle, WA), pp. 224–231.Google Scholar
  5. Cherny, L. (1999). Conversation and community: Chat in a virtual world. Stanford, CA: CSLI Publications.Google Scholar
  6. Crystal, D. (2001). Language and the Internet. Cambridge, UK: Cambridge University Press.Google Scholar
  7. DeSanctis, G. & Poole, M. S. (1994). “Capturing the complexity in advanced technology use: Adaptive structuration theory”, Org. Science, 5(2), 121–47.CrossRefGoogle Scholar
  8. Dönmez, P., Rosé, C, Stegmann, K., Weinberger, A. & Fischer, F. (2005). “Supporting CSCL with automatic corpus analysis technology”, CSCL V5: Proceedings of Th 2005 Conference on Computer Support for Collaborative Learning, Taipei, Taiwan. 125–134.Google Scholar
  9. Erickson, T. Herring, S. & Sack, W. (2002). Discourse Architectures: Designing and Visualizing Computer-Mediated Communication. Workshop at the CHI 2002 Conference, Minneapolis, MN.Google Scholar
  10. Fagan, J. L. (1989). “The effectiveness of a nonsyntactic approach to automatic phrase indexing for document retrieval”, Journal of the American Society for Information Science 40(2): 115–132.CrossRefGoogle Scholar
  11. Fahy, P.J. (2003). “Indicators of support in online interaction”, The International Review of Research in Open and Distance Learning, 4(1). Retrieved June 13, 2006 from: Scholar
  12. Fahy, P.J., Crawford, G. & Ally, M. (2001). “Patterns of interaction in a computer conference transcript”, International Review of Research in Open and Distance Learning, 2 (1). Retrieved June 13, 2006 from: Scholar
  13. Garrison, D. R. & Anderson, T. (2003). E-Learning in the 21st Century. London: RoutledgeFalmer.CrossRefGoogle Scholar
  14. Hearne, B. & Nielsen, A. (2004). “Catch a cyber by the tale: Online orality and the lore of a distributed learning community”, in Haythornthwaite, C. & Kazmer, M. M. (Eds.) (pp. 59–87). Learning, Culture and Community in Online Education: Research and Practice. NY: Peter Lang.Google Scholar
  15. Herring S. C. (1996). “Gender and democracy in computer-mediated communication”, in R. Kling (Ed.) Computerization and Controversy. 2nd edition. San Diego: Academic Press.Google Scholar
  16. Herring, S.C. (1994). “Gender differences in computer-mediated communication: _Bringing familiar baggage to the new frontier.” Presented at American Library Association convention, Miami, FL. Retrieved June 13, 2006 from: Scholar
  17. Herring, S.C. (2000). “Gender Differences in CMC: Findings and Implications”, CPSR Newsletter, 18(1). Retrieved June 13, 2006 from: Scholar
  18. Herring, S.C. (2003). “Dynamic topic analysis of synchronous chat”, Symposium on New Research for New Media, University of Minnesota, Minneapolis. Retrieved June 5, 2006 from: Scholar
  19. Herring, S.C., Scheidt, L.A., Kouper, I. & Wright, E. (in press). “A longitudinal content analysis of weblogs: 2003–2004”, in M. Tremayne (Ed.), Blogging, Citizenship and the Future of Media. London: Routledge.Google Scholar
  20. Hmelo-Silver, C. E. (2006). Analyzing collaborative learning: Multiple approaches to understanding processes and outcomes. ICLS’ 06: Proceedings of the 7th International Conference on Learning Sciences, Bloomington, Indiana. 1059–1065.Google Scholar
  21. Krippendorff, K. (2004). Content Analysis. Thousand Oaks, CA: Sage.Google Scholar
  22. Liddy, E.D. (1998). “Enhanced text retrieval using natural language processing”, Bulletin of the American Society for Information Science, 24(4). Available at: Scholar
  23. McLaughlin, M. L., Osborne, K. K. & Smith, C. B. (1995). “Standards of conduct on usenet”, in S. G. Jones (Ed.), CyberSociety: Computer-Mediated Communication and Community (pp 90–111). Thousand Oaks, CA: Sage.Google Scholar
  24. Mei, Q. & Zhai, C. (2005). “Discovering evolutionary themes patterns from text — an exploration of temporal text mining”, KDD’05 (Chicago, Illinois). 198–207.Google Scholar
  25. Ooi, V. B. Y. (2000). “Aspects of computer-mediated communication for research in corpus linguistics”, Language and Computers, 36, 91–104.Google Scholar
  26. Rafaeli, S. & Sudweeks, F. (1997). “Networked interactivity”, Journal of Computer-Mediated Communication, 2(4). Available online: Scholar
  27. Salton, G. (1988). “Syntactic approaches to automatic book indexing”, in Proceedings of the 26th Annual Meeting on Association for Computational Linguistics, Buffalo, New York. 204–210.Google Scholar
  28. Schmid, H. (1994). “Probabilistic part-of-speech tagging using decision trees”, in Proceedings of International Conference on New Methods in Language Processing. Manchester, UK.Google Scholar
  29. Sixl-Daniell, K. & Williams, J.B. (May 2005). Paralinguistic Discussion in an Online Educational Setting: A Preliminary Study. Retrieved June 13, 2006 from: Scholar
  30. Stuckey, B. & Barab, S. (forthcoming). “Why good design isn’t enough for websupported communities”, in R. Andrews & C. Haythornthwaite (Eds.), Handbook of Elearning Research, Sage.Google Scholar
  31. Weber, R.P. (1985). Basic Content Analysis. Beverly Hills, CA: Sage.Google Scholar
  32. Wu, H., Zubair, M., & Maly, K. (2006). “Harvesting social knowledge from folksonomies”, in Proceedings of the Seventeenth Conference on Hypertext and Hypermedia (Odense, Denmark, August 22–25, 2006). 111–114.Google Scholar
  33. Zhai, C. (1997). “Fast statistical parsing of noun phrases for document indexing”, in Proceedings of the Fifth Conference on Applied Natural Language Proessing, Washington, DC. 312–319.Google Scholar

Copyright information

© Springer-Verlag London Limited 2007

Authors and Affiliations

  • Caroline Haythornthwaite
    • 1
  • Anatoliy Gruzd
    • 1
  1. 1.University of Illinois at Urbana-ChampaignUSA

Personalised recommendations