Advertisement

Computers and the Humanities

, Volume 28, Issue 4–5, pp 243–252 | Cite as

ICAME-Quo Vadis? Reflections on the use of computer corpora in linguistics

  • Stig Johansson
Article
  • 67 Downloads

Abstract

The focus of the paper is on the use of computer corpora in language research. The historical background is touched on, with special reference to work within the International Computer Archive of Modern English (ICAME). Developments in the use of corpora are surveyed. Issues taken up include the representativeness and structure of corpora. Special attention is paid to pitfalls in the use of corpora. Corpus compilers must provide adequate documentation on the texts. Corpus users must know the corpus in order to evaluate whether it is appropriate for their research problem and in order to evaluate the results of their studies.

Key words

language corpora corpus linguistics representativeness of corpora structure of corpora uses of corpora text encoding 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aarts, Jan and Willem Meijs, eds.Corpus Linguistics: Recent Developments in the Use of Computer Corpora in English Language Research. Amsterdam: Rodopi, 1984.Google Scholar
  2. Aarts, Jan, Pieter de Haan and Nelleke Oostdijk, eds.English Language Corpora: Design, Analysis and Exploitation. Amsterdam: Rodopi, 1993.Google Scholar
  3. Aijmer, Karin and Bengt Altenberg, eds.English Corpus Linguistics: Studies in Honour of Jan Svartvik. London & New York: Longman, 1991.Google Scholar
  4. Akkerman, Erik, Pieter Masereeuw, and Willem Meijs.Designing a Computerized Lexicon for Linguistic Purposes. ASCOT Report No 1. 1985.Google Scholar
  5. Allén, Sture. “Opening Address.” InDirections in Corpus Linguistics: Proceedings of Nobel Symposium 82, Stockholm, 4–8 August 1991. Ed. Jan Svartvik. Berlin & New York: Mouton de Gruyter, 1992, pp. 1–3.Google Scholar
  6. Altenberg, Bengt. “A Bibliography of Publications Relating to English Computer Corpora.” InEnglish Computer Corpora: Selected Papers and Research Guide. Eds. Stig Johansson and Anna-Brita Stenström. Berlin and New York: Mouton de Gruyter, 1991, pp. 355–96.Google Scholar
  7. Biber, Douglas.Variation Across Speech and Writing. Cambridge: Cambridge University Press, 1988.Google Scholar
  8. Biber, Douglas. “Representativeness in Corpus Design.”Literary and Linguistic Computing, 8 (1993), 243–257.Google Scholar
  9. Chafe, Wallace. “The Importance of Corpus Linguistics to Understanding the Nature of Language.” InDirections in Corpus Linguistics: Proceedings of Nobel Symposium 82, Stockholm, 4–8 August 1991. Ed. Jan Svartvik. Berlin & New York: Mouton de Gruyter, 1992, pp. 79–97.Google Scholar
  10. Clear, Jeremy. “Trawling the Language: Monitor Corpora.” InZURILEX Proceedings. Ed. M. Snell-Hornby. Tübingen: Francke, 1987.Google Scholar
  11. Collins, Peter and Pam Peters. “The Australian Corpus Project.” InCorpus Linguistics Hard and Soft. Ed. Merja Kytö, Ossi Ihalainen, and Matti Rissanen. Amsterdam: Rodopi, 1988, pp. 103–20.Google Scholar
  12. Crowdy, Steve. “The Longman/Lancaster English Language Corpus and the Longman Corpus of Learners' English.”ICAME Journal, 16 (1992), 126–28.Google Scholar
  13. Faber, Dorrit and Karen M. Lauridsen. “The Compilation of a Danish-English-French Corpus in Contract Law.” InEnglish Computer Corpora: Selected Papers and Research Guide. Eds. Stig Johansson and Anna-Brita Stenström. Berlin and New York: Mouton de Gruyter, 1991, pp. 235–43.Google Scholar
  14. Francis, W. Nelson. “Problems of Assembling and Computerizing Large Corpora.” InComputer Corpora in English Language Research. Ed. Stig Johansson. Bergen: Norwegian Computing Centre for the Humanities, 1982, pp. 7–24.Google Scholar
  15. Francis, W. Nelson. “Language Corpora B.C.” InDirections in Corpus Linguistics: Proceedings of Nobel Symposium 82, Stockholm, 4–8 August 1991. Ed. Jan Svartvik. Berlin & New York: Mouton de Gruyter, 1992, pp. 17–32.Google Scholar
  16. Francis, W. Nelson and Henry Kučera.Manual of Information to Accompany a Standard Sample of Present-Day Edited American English, for Use with Digital Computers. Revised ed. Providence, R.I.: Department of Linguistics, Brown University, 1979.Google Scholar
  17. Garside, Roger, Geoffrey Leech and Geoffrey Sampson, eds.The Computational Analysis of English: A Corpus-Based Approach. London: Longman, 1987.Google Scholar
  18. Geens, Dirk, L.K. Engels and W. Martin.Leuven Drama Corpus and Frequency List. Department of Applied Linguistics, Catholic University of Leuven, 1975.Google Scholar
  19. Granger, Sylviane. “The International Corpus of Learner English.” InEnglish Language Corpora: Design, Analysis and Exploitation. Eds. Jan Aarts, Pieter de Haan and Nelleke Oostdijk. Amsterdam: Rodopi, 1993, pp. 57–72.Google Scholar
  20. Greenbaum, Sidney. “The Development of the International Corpus of English.” InEnglish Corpus Linguistics. Eds. K. Aijmer and Bengt Altenberg. London & New York: Longman, 1991, pp. 83–91.Google Scholar
  21. Halteren, Hans van and Theo van den Heuvel.Linguistic Exploitation of Syntactic Databases: The Use of the Nijmegen Linguistic Database Program. Amsterdam: Rodopi, 1990.Google Scholar
  22. ICAME Journal: Computers in English Linguistics. International Computer Archive of Modern English, Norwegian Computing Centre for the Humanities, Bergen.Google Scholar
  23. Johansson, Stig, ed.Computer Corpora in English Language Research. Bergen: Norwegian Computing Centre for the Humanities, 1982.Google Scholar
  24. Johansson, Stig. “Times Change, and So Do Corpora.” InEnglish Corpus Linguistics. Eds. K. Aijmer and Bengt Altenberg. London & New York: Longman, 1991, pp. 305–314.Google Scholar
  25. Johansson, Stig, Geoffrey Leech and Helen Goodluck.Manual of Information to Accompany the Lancaster-Oslo/Bergen Corpus of British English, for Use with Digital Computers. Oslo: Department of English, University of Oslo, 1978.Google Scholar
  26. Johansson, Stig, Eric Atwell, Roger Garside and Geoffrey Leech.The Tagged LOB Corpus: Users' Manual. Bergen: Norwegian Computing Centre for the Humanities, 1986.Google Scholar
  27. Johansson, Stig and Anna-Brita Stenström, eds. “Topics in English Linguistics 3.”English Computer Corpora: Selected Papers and Research Guide. Berlin and New York: Mouton de Gruyter, 1991.Google Scholar
  28. Källgren, Gunnel. Review of Garsideet al. (1967).ICAME Journal, 14 (1990), 98–103.Google Scholar
  29. Karlsson, Fred. “Constraint Grammar as a Framework for Parsing Running Text.” InPapers Presented to the 13th International Conference on Computational Linguistics. Volume 3. Ed. Hans Karlgren. Helsinki, 1990, pp. 168–73.Google Scholar
  30. Knowles, Gerry. “The Machine-Readable Spoken English Corpus.” InEnglish Language Corpora: Design, Analysis and Exploitation. Eds. Jan Aarts, Pieter de Haan and Nelleke Oostdijk. Amsterdam: Rodopi, 1993, pp. 107–22.Google Scholar
  31. Kučera, Henry and W. Nelson Francis.Computational Analysis of Present-Day American English. Boston: Houghton Mifflin, 1967.Google Scholar
  32. Kytö, Merja, comp.Manual to the Diachronic Part of the Helsinki Corpus of English Texts: Coding Conventions and Lists of Source Texts. Helsinki: Department of English, University of Helsinki, 1991.Google Scholar
  33. Kytö, Merja, Ossi Ihalainen and Matti Rissanen, eds.Corpus Linguistics Hard and Soft. Amsterdam: Rodopi, 1988.Google Scholar
  34. Kytö, Merja, Matti Rissanen and Susan Wright. “The First International Colloquium on English Diachronic Corpora.”ICAME Journal, 17 (1993), 132–37.Google Scholar
  35. Lancashire, Ian, ed.The Humanities Computing Yearbook 1989–90. Oxford: Clarendon Press, 1991.Google Scholar
  36. Leech, Geoffrey. “The Lancaster Parsed Corpus.”ICAME Journal, 16 (1992), 124–26.Google Scholar
  37. Leech, Geoffrey and Roger Garside. “Running a Grammar Factory: The Production of Syntactically Analysed Corpora or ‘Treebanks.’” InEnglish Computer Corpora: Selected Papers and Research Guide. Eds. Stig Johansson and Anna-Brita Stenström. Berlin and New York: Mouton de Gruyter, 1991, pp. 15–32.Google Scholar
  38. Murray, Elizabeth K. M.Caught in the Web of Words: James A. H. Murray and the Oxford English Dictionary. Oxford: Oxford University Press, 1979.Google Scholar
  39. Oostdijk, Nelleke. “A Corpus for Studying Linguistic Variation.”ICAME Journal, 12 (1988), 3–14.Google Scholar
  40. Quirk, Randolph. “The Survey of English Usage.” InEssays on the English Language, Medieval and Modern. Bloomington & London: Indiana University Press, 1968, pp. 70–87.Google Scholar
  41. Rissanen, Matti. “Three Problems Connected with the Use of Diachronic Corpora.”ICAME Journal, 13 (1989), 16–19.Google Scholar
  42. Sampson, Geoffrey. “The SUSANNE Corpus.”ICAME Journal, 17 (1993), 125–27.Google Scholar
  43. Shastri, S. V. “The Kolhapur Corpus of Indian English and Work Done on its Basis so Far.”ICAME Journal, 12 (1988), 15–26.Google Scholar
  44. Sigurd, Bengt. “Om datorns effekter på språkvetenskap.” InSkrifter for Anvendt og Matematisk Lingvistikk 6. Copenhagen: Department of Applied and Mathematical Linguistics, University of Copenhagen, 1980, pp. 357–65.Google Scholar
  45. Sinclair, John. “Reflections on Computer Corpora in English Language Research.” InComputer Corpora in English Language Research. Ed. Stig Johansson. Bergen: Norwegian Computing Centre for the Humanities, 1982, pp. 1–6.Google Scholar
  46. Sinclair, John, ed.Looking Up: An Account of the COBUILD Project in Lexical Computing. London: Collins ELT, 1987.Google Scholar
  47. Sinclair, John.Corpus, Concordance, Collocation. Oxford: Oxford University Press, 1991.Google Scholar
  48. Sinclair, John. “The Automatic Analysis of Corpora (with comments by Fred Karlsson).” InDirections in Corpus Linguistics: Proceedings of Nobel Symposium 82, Stockholm, 4–8 August 1991. Ed. Jan Svartvik. Berlin & New York: Mouton de Gruyter, 1992, pp. 379–400.Google Scholar
  49. Sperberg-McQueen, C.M. and Lou Burnard, eds.Guidelines for Electronic Text Encoding and Interchange. Chicago and Oxford: The Association for Computers and the Humanities/The Association for Computational Linguistics/The Association for Literary and Linguistic Computing, 1994.Google Scholar
  50. Svartvik, Jan, ed.The London-Lund Corpus of Spoken English: Description and Research. Lund Studies in English 82. Lund: Lund University Press, 1990.Google Scholar
  51. Svartvik, Jan, ed.Directions in Corpus Linguistics: Proceedings of Nobel Symposium 82, Stockholm, 4–8 August 1991. Trends in Linguistics: Studies and Monographs 65. Berlin & New York: Mouton de Gruyter, 1992.Google Scholar
  52. Taylor, Lita, Geoffrey Leech and Steven Fligelstone. “A Survey of English Machine-Readable Corpora.” InEnglish Computer Corpora: Selected Papers and Research Guide. Eds. Stig Johansson and Anna-Brita Stenström. Berlin and New York: Mouton de Gruyter, 1991, pp. 319–54.Google Scholar
  53. Tottie, Gunnel.Negation in English Speech and Writing: A Study in Variation. Quantative Analyses of Linguistic Structure 4. San Diego: Academic Press, 1991.Google Scholar

Copyright information

© Kluwer Academic Publishers 1995

Authors and Affiliations

  • Stig Johansson
    • 1
  1. 1.Department of EnglishUniversity of OsloOsloNorway

Personalised recommendations