Skip to main content
Log in

Searching for extended units of meaning—and what to do when you find them

  • Original Paper
  • Published:
Lexicography

Abstract

Two of the key outcomes of corpus-linguistic research over the past 30 years have been the development of the idea that meanings are mostly constructed through context (undermining traditional notions of the individual word as an autonomous bearer of meaning); and the discovery that recurrence and regularity—our tendency to employ a limited number of conventionalized ways of expressing ideas—are essential features of the language system. Both findings have had a major impact on our understanding of how language works, and both have influenced the content of dictionary entries—contributing, for example, to improved word sense disambiguation, and to a greater emphasis on phraseology and collocation. However, there is still much to do. Ever-larger corpora and more powerful corpus-query tools reveal areas where we can further improve our description of languages, and thus provide better resources for users. In addition, the migration of dictionaries to digital media (removing space constraints) opens up new opportunities for doing this. In a characteristically far-sighted paper (Sinclair, Textus 9(1): 75–106, 1996), John Sinclair broadened the search for what he called “units of meaning” by investigating longer strings of words and identifying recurrent, and often quite extended, patterns of usage. Using this as a starting point, I will look at other examples in corpus data of the kinds of patterning Sinclair discussed, and we will see how current corpus-querying systems can help us identify these extended units of meaning. Finally, I will speculate about whether dictionaries should aim to describe these longer units, and if so, how this might work in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. In this case, it is not possible to give a reliable figure for the number of occurrences of our target word in the corpus, since (as the following paragraph shows) instances of the phrasal sink in are interspersed arbitrarily with cases where the verb sink, in its usual meanings, is followed by the preposition in.

  2. The CQL query used was:  [lempos="(sink)-v"] [lemma="in"] [tag!="CD" & tag!="JJ" & tag!="N.*" & word!="the" & lemma!="a"]. This finds all cases of sink-verb (lemma) followed immediately by in, but it excludes cases where in is followed by a number (CD) (to eliminate cases like sank in 1815), an adjective (JJ), a noun (N.*) or the words a or the. It would no doubt be possible to find a more elegant solution, but this immediately removed well over 2000 non-relevant cases from the raw sample.

References

  • Biber, D., S. Johansson, G. Leech, S. Conrad, and E. Finegan. 1999. Longman Grammar of Spoken and Written English. London: Pearson Education.

    Google Scholar 

  • Convery, C.Ó., P. Mianáin, M.Ó. Raghallaigh, S. Atkins, A. Kilgarriff, and M. Rundell. 2010. The DANTE Database (Database of ANalysed Texts of English). In Proceedings of the XIV EURALEX Congress, ed. Anne Dykstra, and Tanneke Schoonheim. Leeuwarden: Fryske Akademy.

    Google Scholar 

  • Cowie, A.P. 1999. English Dictionaries for Foreign Learners: A History. Oxford: Oxford University Press.

    Google Scholar 

  • Hanks, P.W. 2013. Lexical Analysis: Norms and Exploitations. Cambridge: MIT Press.

    Book  Google Scholar 

  • Halliday, M.A.K. 1966. Lexis as a Linguistic Level. In Memory of J. R. Firth, eds. C.E Bazell, J.C Catford, M.A.K. Halliday, R.H. Robins. 148–162. London: Longman.

  • Hoey, M. 2005. Lexical Priming: A New Theory of Words and Language. London: Routledge.

    Google Scholar 

  • Johnson, S. 1755. Preface to A Dictionary of the English Language. Edited by Jack Lynch. http://andromeda.rutgers.edu/~jlynch/Texts/preface.html.

  • Kilgarriff, A., P. Rychly, P. Smrz, and D. Tugwell. 2004. The Sketch Engine. In Proceedings of the Eleventh Euralex Congress, ed. Geoffrey Williams and Sandra Vessier, 105–116. France: UBS Lorient.

    Google Scholar 

  • Kilgarriff, A., Baisa, V., Rychlý, P., Jakubíček, M. 2015. Longest–commonest Match. In Electronic Lexicography in the 21st Century: Linking Lexical Data in the Digital Age. Proceedings of the eLex 2015 conference, ed. Kosem, I., Jakubíček, M., Kallas, J., Krek, S, 397–404. Ljubljana/Brighton

  • Rundell, M. 2015. From Print to Digital: implications for Dictionary Policy and Lexicographic Conventions. Lexikos 25: 301–322.

    Article  Google Scholar 

  • Rundell, M., and A. Kilgarriff. 2011. Automating the Creation of Dictionaries: Where Will It All End? In A Taste for Corpora. A tribute to Professor Sylviane Granger, ed. F. Meunier, S. De Cock, G. Gilquin, and M. Paquot, 257–281. Amsterdam: Benjamins.

    Chapter  Google Scholar 

  • Sinclair, J.M. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press.

    Google Scholar 

  • Sinclair, J.M. 1996. The Search for Units of Meaning. Textus 9 (1): 75–106.

    Google Scholar 

  • Sinclair, J.M. 1998. The Lexical Item. In Contrastive Lexical Semantics, ed. E. Weigand, 1–24. Amsterdam: Benjamins.

    Google Scholar 

  • Sinclair, J.M. 2007/2010. Defining the Definiendum. In A Way with Words: Recent Advances in Lexical Theory and Analysis - A Festschrift for Patrick Hanks, ed. G-M. de Schryver, 37–47. Kampala: Menha Publishers.

  • Summers, D. (ed.). 1993. Longman Language Activator. London: Longman.

    Google Scholar 

Download references

Acknowledgements

I am grateful to Vojtěch Kovář of the Sketch Engine team for his helpful comments on the functions discussed in Sect. 2.1.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Rundell.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rundell, M. Searching for extended units of meaning—and what to do when you find them. Lexicography ASIALEX 5, 5–21 (2018). https://doi.org/10.1007/s40607-018-0042-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40607-018-0042-1

Keywords

Navigation