Fuzzy Limits: Researching Discourse in the Internet with Corpora

Living reference work entry


Internet has provided us with an amount of linguistic data without precedents. For those who research discourse and communication, it is an unexpected gift with a huge potential. However, this gift comes with important challenges we have to face. First, large corpora make us to use quantitative methods in fields where we were used to qualitative approaches. In order to change it, new strategies are being developed, such as the Corpus Assisted Discourse Studies (Baker et al. Discourse Soc 19(3):273–305, 2008; Partington et al. Patterns and meanings in discourse. John Benjamins Publishing Company, Amsterdam, 2013).

Secondly, traditional units of analysis need to be redefined. Communication through Internet has its own characteristics, and some of them do not fit in previous definitions. There are two main reasons for this regarding discourse analysis. On the one hand, current interactions are multimedia. Video, image, and sound are not necessarily subordinated to text in Internet, and researchers ‘need to look beyond language to better understand how people communicate and interact in digital environments’ (Jewitt. Multimodal analysis. In: Georgakopoulou S (ed) The Routledge handbook of language and digital communication. Routledge, London, 2016). Recent approaches, such as Multimodal Critical Discourse Studies (Machin. Crit Discourse Stud 10:347, 2013), move in this direction.

On the other hand, limits have become fuzzy. Interactions in Internet work in new ways, even when we call them conversations or chats (Alcántara-Plá. Estudios de Lingüística del Español 35(1):214–233, 2014). If we study them with our current units of analysis, these “conversations” will seem fragmentary and unstructured.

In this chapter, we describe these new challenges and the solutions that have been adopted so far, drawing attention to the major problems that still remain unsolved.


Discourse Internet Corpus Interaction Multimodality 


  1. Alcántara-Plá M (2014) Las unidades discursivas en los mensajes instantáneos de wasap. Estudios de Lingüística del Español 35(1):214–233Google Scholar
  2. Alcántara-Plá M (2017) Palabras invasoras. El español de las nuevas tecnologías. Los libros de la catarata, MadridGoogle Scholar
  3. Alcántara-Plá M, Ruiz-Sánchez A (2017) Not for twitter: migration as a silenced topic in 2015 Spain general election. In: Schröter M, Taylor C (eds) Exploring silence and absence in discourse: empirical approaches. Palgrave Macmillan, LondonGoogle Scholar
  4. Androutsopoulos J (2011) From variation to heteroglossia in the study of computer-mediated discourse. In: Digital discourse: language in the new media. Oxford University Press, Oxford, pp 277–298Google Scholar
  5. Baker P, McEnery T (2015) Corpora and discourse studies: integrating discourse and corpora. Springer, Netherlands, AmsterdamCrossRefGoogle Scholar
  6. Baker P, Gabrielatos C, Khosravinik M, Krzyzanowski M, McEnery T, Wodak R (2008) A useful methodological synergy? Combining critical discourse analysis and corpus linguistics to examine discourses of refugees and asylum seekers in the UK press. Discourse Soc 19(3):273–305CrossRefGoogle Scholar
  7. Baron NS (2009) Are instant messages speech? In: International handbook of internet research. Springer, Netherlands, AmsterdamGoogle Scholar
  8. Baron A, Rayson P, Archer D (2009) Word frequency and key word statistics in corpus linguistics. Anglistik 20(1):41–67Google Scholar
  9. Bauman R, Briggs CL (1990) Poetics and performance as critical perspectives on language and social life. Annu Rev Anthropol 19:59–88CrossRefGoogle Scholar
  10. Beesley KR, Karttunen L (2003) Finite-state morphology: xerox tools and techniques. CSLI, StanfordGoogle Scholar
  11. Biber D, Conrad S, Reppen R (1998) Corpus linguistics: investigating language structure and use. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  12. Bolter JD, Grusin R (2000) Remediation: understanding new media. MIT Press, CambridgeGoogle Scholar
  13. Bybee J, Hopper P (eds) (2001) Frequency and the emergence of linguistic structure. John Benjamins, AmsterdamGoogle Scholar
  14. Croft W, Cruse DA (2004) Cognitive linguistics. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  15. Crystal D (2008) Txtng: the gr8 db8. OUP Oxford, OxfordGoogle Scholar
  16. Elm MS (2009) Language deterioration revisited: the extent and function of English content in a Swedish chat room. In: International handbook of internet research. Springer, Netherlands, Amsterdam, pp 437–453Google Scholar
  17. Fillmore CJ (1985) Frames and the semantics of understanding. Quaderni di semantica 6(2): 222–254Google Scholar
  18. Gee JP (2004) Situated language and learning: a critique of traditional schooling. Routledge, LondonGoogle Scholar
  19. Gee JP (2015) Discourse analysis of games. In Jones RH, Chik A, Hafner CA (eds) Discourse and digital practices: doing discourse analysis in the digital age. Routledge, LondonGoogle Scholar
  20. Genosko G (2016) Critical semiotics. Theory, from information to affect. Bloomsbury, LondonGoogle Scholar
  21. Georgakopoulou A, Spilioti T (eds) (2016) The Routledge handbook of language and digital communication. Routledge, LondonGoogle Scholar
  22. Gibson J (1979) The ecological approach to visual perception. Houghton Mifflin, BostonGoogle Scholar
  23. Givón T (2005) Context as other minds. John Benjamins Publishing Company, AmsterdamCrossRefGoogle Scholar
  24. Hafner CA (2015) Co-constructing identity in virtual worlds for children. In: Jones, Chik and Hafner (2015)Google Scholar
  25. Halliday MAK (1978) Language as social semiotic: the social interpretation of language and meaning. Edward Arnold, LondonGoogle Scholar
  26. Halliday MAK, Matthiessen CMIM (2004) An introduction to functional grammar. Arnold, LondonGoogle Scholar
  27. Heyd T (2016) Digital genres and processes of remediation. In: The Routledge handbook of language and digital communication. Routledge, LondonGoogle Scholar
  28. Hunston S (2010) Corpus approaches to evaluation: phraseology and evaluative language. Routledge, LondonGoogle Scholar
  29. Jaworski A, Coupland N (2014) The discourse reader. Routledge, LondonGoogle Scholar
  30. Jewitt C (2016) Multimodal analysis. In: Georgakopoulou S (ed) The Routledge handbook of language and digital communication. Routledge, LondonGoogle Scholar
  31. Jones RJ, Chik A, Hafner CA (2015) Discourse and digital practices. Doing discourse analysis in the digital age. Routledge, LondonGoogle Scholar
  32. Koskenniemi K (1984) A general computational model for word-form recognition and production. In: Proceedings of the 10th international conference on computational linguistics. Association for Computational Linguistics, pp 178–181Google Scholar
  33. Kress G (2010) Multimodality: a social semiotic approach to contemporary communication. Routledge, LondonGoogle Scholar
  34. Kress G, van Leeuwen T (2006) Reading images: a visual grammar of design. Routledge, LondonGoogle Scholar
  35. Langacker RW (1987) Foundations of cognitive grammar: theoretical prerequisites, vol 1. Stanford University Press, CaliforniaGoogle Scholar
  36. Machin D (2013) What is multimodal critical discourse studies? Crit Discourse Stud 10:347CrossRefGoogle Scholar
  37. Manning CD, Schütze H (1999) Foundations of statistical natural language processing. MIT Press, CambridgeGoogle Scholar
  38. MODE (2012) Glossary of multimodal terms. Retrieved 10/10/2017
  39. Morley J, Bayley P (2009) Corpus-assisted discourse studies on the Iraq war: wording the war. Routledge, LondonGoogle Scholar
  40. O’Reilly T (2005) What is Web 2.0. Design patterns and business models for the next generation of software. Retrieved 10/10/2017
  41. Palmer DD (2000) Tokenisation and sentence segmentation. In: Handbook of natural language processing. Marcel Dekker, New York, pp 11–35Google Scholar
  42. Partington A, Duguid A, Taylor C (2013) Patterns and meanings in discourse. John Benjamins Publishing Company, AmsterdamCrossRefGoogle Scholar
  43. Rafaeli S, Ariel Y (2007) Assessing interactivity in computer-mediated research. In: Joinson AN, McKenna KYA, Postmes T, Reips U-D (eds) The Oxford handbook of internet psychology. Oxford University Press, OxfordGoogle Scholar
  44. Schank RC, Abelson RP (1977) Scripts, plans, goals, and understanding: an inquiry into human knowledge structures. Lawrence Erlbaum Associates, HillsdaleGoogle Scholar
  45. Silverstein M (1992) The indeterminacy of contextualization: when is enough enough. In: Auer P, di Luzio A (eds) The contextualization of language. John Benjamins Publishing Company, Amsterdam, pp 55–75CrossRefGoogle Scholar
  46. Stubbs M (2007) On texts, corpora and models of language. In: Hoey M (ed) Text, discourse and corpora: theory and analysis. A&C Black, LondonGoogle Scholar
  47. Szudarski P (2017) Corpus linguistics for vocabulary. Routledge, LondonGoogle Scholar
  48. Widdowson HG (2008) Text, context, pretext: critical Isssues in discourse analysis. Blackwell Publishing Ltd, OxfordGoogle Scholar
  49. Wiedemann G (2016) Text Mining for Qualitative Data Analysis in the social sciences. Springer Fachmedien, WiesbadenCrossRefGoogle Scholar
  50. Yates SJ (1996) Oral and written aspects of computer conferencing. In: Herring S (ed) Computer-mediated communication. Linguistic, social and cross-cultural perspectives. John Benjamins Publishing Company, Amsterdam, pp 29–46CrossRefGoogle Scholar
  51. Zanchetta E, Baroni M, Bernardini S (2011) Corpora for the masses: the BootCaT front-end. In: Corpus Linguistics 2011. University of Birmingham, BirminghamGoogle Scholar

Authors and Affiliations

  1. 1.Wor(l)ds Lab – Department of LinguisticsUniversidad Autónoma de MadridMadridSpain

Personalised recommendations