Skip to main content

Core Vocabulary: A Useful But Mystical Concept in Some Kinds of Linguistics

  • Chapter
Shall We Play the Festschrift Game?

Abstract

This paper is a theoretical and empirical investigation into the use of the notion “core vocabulary” in some areas of linguistics and related disciplines, originally prompted by the concrete task of compiling core vocabularies in two research projects growing out of two quite different research traditions: (1) lexicostatistics, where “core vocabularies” are used to measure the linguistic distance among languages in order to establish genetic and typological language groupings; and (2) computer-assisted language learning—a long-standing research interest of Lauri Carlson—where the “core vocabulary” is the most central vocabulary, to which language learners should be exposed first. In linguistics we also find a more theoretically motivated notion of “core vocabulary”, as so-called “semantic primitives”. In the paper, I compare the three kinds of “core vocabulary” and discuss their relationship to the formal knowledge-representation systems called “ontologies” (currently among Lauri Carlson’s research interests)—especially “core” ontologies such as SUMO—and the notion of “concept” central to the latter work: What is the relationship—if any—between concepts in such ontologies and lexical items in languages?

mys·ti·cal

–adjective

1. mystic; occult.

2. of or pertaining to mystics or mysticism: mystical writings.

3. spiritually symbolic.

4. Rare. obscure in meaning; mysterious.

(http://dictionary.reference.com , s.v. mystical)

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://spraakbanken.gu.se/eng/kelly.

  2. 2.

    http://spraakbanken.gu.se/eng/research/digital-areal-linguistics.

  3. 3.

    http://lingweb.eva.mpg.de/ids/.

  4. 4.

    https://svn.spraakdata.gu.se/sb/fnplusplus/pub/lwt-meanings.html.

  5. 5.

    Such ontologies and their most salient and high-flying application domain, the so-called Semantic Web, are among Lauri Carlson’s more recent research interests; see, e.g., the TermFactory Manual http://www.helsinki.fi/~lcarlson/CF/TF/doc/TFManual_en.html.

  6. 6.

    A reviewer remarked on the absence of lexeme from this list. This term is ambiguous, meaning both ‘lexical unit’—thus making the lexeme equivalent to any and all of the listed units—and ‘lexical morpheme’, and is best avoided in our context.

  7. 7.

    The three entries mean respectively ‘language; speech; legal process; cause; opinion’, ‘measure; time; need; meal’, and ‘paint’.

  8. 8.

    Unfortunately, lemma is also used in other senses, e.g., roughly equivalent to lempos or lemgram as explained below.

  9. 9.

    http://spraakbanken.gu.se/eng/saldo.

  10. 10.

    http://wordnet.princeton.edu/.

  11. 11.

    Although, as we will see below, we do not need to look long in the world’s languages in order to find lexical units with ‘funny’ meanings. Evans and Levinson (2009: 435) provide the example of the Mundari (an Austro-Asiatic language spoken in South Asia) ideophone rawa-dawa, which they gloss as ‘the sensation of suddenly realizing you can do something reprehensible, and no-one is there to witness it’. This kind of example can be repeated essentially ad infinitum.

  12. 12.

    Since several of the vocabularies ultimately come out of the same research tradition, we may tentatively assume commensurability at least for the items in those lists.

  13. 13.

    http://email.eva.mpg.de/~wichmann/ASJPHomePage.htm.

  14. 14.

    http://wold.livingsources.org/; the LJ list has been somewhat edited for the purposes of this presentation, so that, e.g., “1sg pronoun” has been replaced by “I”, “arm/hand” by “hand”, “who?” by “who”, “child (kin term)” by “child”, etc., in order to make the word lists mechanically comparable with a computer program.

  15. 15.

    http://expsy.ugent.be/subtlexus/.

  16. 16.

    Here the thorny issue of translation equivalence rears its head. The KELLY translators were instructed to provide one translation in the normal case. How this methodological decision has influenced this investigation is a very interesting question which we will have to leave for future research.

  17. 17.

    As we have seen above, this may be true only in an approximate sense.

References

  • Baroni, Marco, and Silvia Bernardini, eds. 2006. Wacky! working papers on the Web as Corpus. Bologna: GEDIT. Online version: http://wackybook.sslmit.unibo.it.

    Google Scholar 

  • Borin, Lars, Lauri Carlson, and Diana Santos. 2002. Corpus based language technology for computer-assisted learning of Nordic languages: Squirrel. Progress report September 2001. In Nordisk sprogteknologi. Nordic language technology. Årbog for Nordisk Sprogteknologisk Forskningsprogram 2000–2004. Årbog 2001, ed. Henrik Holmboe, 257–270. Copenhagen: Museum Tusculanums Forlag, Københavns Universitet.

    Google Scholar 

  • Borin, Lars, Dana Dannélls, Markus Forsberg, Maria Toporowska Gronostaj, and Dimitrios Kokkinakis. 2010. The past meets the present in Swedish FrameNet++. In 14th EURALEX international congress, 269–281. Leeuwarden: EURALEX.

    Google Scholar 

  • Borin, Lars, Markus Forsberg, and Lennart Lönngren. 2008. The hunting of the BLARK—SALDO, a freely available lexical database for Swedish language technology. In Resourceful language technology. Festschrift in honor of Anna Sågvall Hein, eds. Joakim Nivre, Mats Dahllöf, and Beata Megyesi. Vol. 7 of Acta Universitatis Upsaliensis: Studia linguistica Upsaliensia, 21–32. Uppsala: Uppsala University, Department of Linguistics and Philology.

    Google Scholar 

  • Borin, Lars, Markus Forsberg, Leif-Jöran Olsson, and Jonatan Uppström. 2012. The open lexical infrastructure of Språkbanken. In Proceedings of LREC 2012, 3598–3602. Istanbul: ELRA.

    Google Scholar 

  • Evans, Nicholas, and Stephen C. Levinson. 2009. The myth of language universals: Language diversity and its importance for cognitive science. Behavioral and Brain Sciences 32: 429–492.

    Article  Google Scholar 

  • Fellbaum, Christiane, ed. 1998. WordNet: An electronic lexical database. Cambridge: MIT Press.

    MATH  Google Scholar 

  • Goddard, Cliff. 2001. Lexico-semantic universals: A critical overview. Linguistic Typology 5: 1–65.

    Article  MathSciNet  Google Scholar 

  • Haspelmath, Martin. 2010. Comparative concepts and descriptive categories in crosslinguistic studies. Language 86: 663–687.

    Article  Google Scholar 

  • Haspelmath, Martin, and Uri Tadmor, eds. 2009. Loanwords in the world’s languages. A comparative handbook. Berlin: De Gruyter.

    Google Scholar 

  • Holman, Eric W., Søren Wichmann, Cecil H. Brown, Viveka Velupillai, André Müller, and Dik Bakker 2008. Explorations in automated language classification. Folia Linguistica 42: 331–354.

    Article  Google Scholar 

  • Levinson, Stephen C. 2003. Language and mind: Let’s get the issues straight. In Language in mind. Advances in the study of language and thought, eds. Dedre Gentner and Susan Goldin-Meadow, 25–46. Cambridge: MIT Press.

    Google Scholar 

  • Lewis, M. Paul, ed. 2009. Ethnologue: Languages of the world, 16th edn. Dallas: SIL International. Online version: http://www.ethnologue.com/.

    Google Scholar 

  • Ogden, Charles K. 1930. Basic English: A general introduction with rules and grammar. London: Paul Treber.

    Google Scholar 

  • Pease, Adam, and Christiane Fellbaum. 2010. Formal ontology as interlingua: the SUMO and WordNet linking project and global WordNet. In Ontology and the lexicon. A natural language processing perspective, eds. Chu-ren Huang, Nicoletta Calzolari, Aldo Gangemi, Alessandro Lenci, Alessandro Oltramari, and Laurent Prevot, 25–35. Cambridge: Cambridge University Press.

    Chapter  Google Scholar 

  • Swadesh, Morris. 1950. Salish internal relationships. International Journal of American Linguistics 16: 157–167.

    Article  Google Scholar 

  • Swadesh, Morris. 1955. Towards greater accuracy in lexicostatistic dating. International Journal of American Linguistics 21: 121–137.

    Article  Google Scholar 

  • von Fintel, Kai, and Lisa Matthewson. 2008. Universals in semantics. The Linguistic Review 25: 139–201.

    Google Scholar 

  • Wilks, Yorick. 2009. Ontotherapy, or how to stop worrying about what there is. In Recent advances in natural language processing V, eds. Nicolas Nicolov, Galia Angelova, and Ruslan Mitkov. Current issues in linguistic theory, 1–20. Amsterdam: John Benjamins.

    Google Scholar 

Download references

Acknowledgements

The work described here has received financial support from the Swedish Research Council (the Digital areal linguistics project: VR dnr 429-2009-1448), from the European Commission (the KELLY project: Lifelong Learning Programme project no. 505630-LLP-2009-1-SE-KA2-KA2MP), and from the University of Gothenburg through its funding of the Centre for Language Technology. I would also like to thank the reviewers for their illuminating questions and insightful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lars Borin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Borin, L. (2012). Core Vocabulary: A Useful But Mystical Concept in Some Kinds of Linguistics. In: Santos, D., Lindén, K., Ng’ang’a, W. (eds) Shall We Play the Festschrift Game?. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30773-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-30773-7_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-30772-0

  • Online ISBN: 978-3-642-30773-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics