Core Vocabulary: A Useful But Mystical Concept in Some Kinds of Linguistics

  • Lars Borin


This paper is a theoretical and empirical investigation into the use of the notion “core vocabulary” in some areas of linguistics and related disciplines, originally prompted by the concrete task of compiling core vocabularies in two research projects growing out of two quite different research traditions: (1) lexicostatistics, where “core vocabularies” are used to measure the linguistic distance among languages in order to establish genetic and typological language groupings; and (2) computer-assisted language learning—a long-standing research interest of Lauri Carlson—where the “core vocabulary” is the most central vocabulary, to which language learners should be exposed first. In linguistics we also find a more theoretically motivated notion of “core vocabulary”, as so-called “semantic primitives”. In the paper, I compare the three kinds of “core vocabulary” and discuss their relationship to the formal knowledge-representation systems called “ontologies” (currently among Lauri Carlson’s research interests)—especially “core” ontologies such as SUMO—and the notion of “concept” central to the latter work: What is the relationship—if any—between concepts in such ontologies and lexical items in languages?


Lexical Item Formal Ontology Language Pair Learner Vocabulary Lexical Unit 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



The work described here has received financial support from the Swedish Research Council (the Digital areal linguistics project: VR dnr 429-2009-1448), from the European Commission (the KELLY project: Lifelong Learning Programme project no. 505630-LLP-2009-1-SE-KA2-KA2MP), and from the University of Gothenburg through its funding of the Centre for Language Technology. I would also like to thank the reviewers for their illuminating questions and insightful comments.


  1. Baroni, Marco, and Silvia Bernardini, eds. 2006. Wacky! working papers on the Web as Corpus. Bologna: GEDIT. Online version: Google Scholar
  2. Borin, Lars, Lauri Carlson, and Diana Santos. 2002. Corpus based language technology for computer-assisted learning of Nordic languages: Squirrel. Progress report September 2001. In Nordisk sprogteknologi. Nordic language technology. Årbog for Nordisk Sprogteknologisk Forskningsprogram 2000–2004. Årbog 2001, ed. Henrik Holmboe, 257–270. Copenhagen: Museum Tusculanums Forlag, Københavns Universitet. Google Scholar
  3. Borin, Lars, Dana Dannélls, Markus Forsberg, Maria Toporowska Gronostaj, and Dimitrios Kokkinakis. 2010. The past meets the present in Swedish FrameNet++. In 14th EURALEX international congress, 269–281. Leeuwarden: EURALEX. Google Scholar
  4. Borin, Lars, Markus Forsberg, and Lennart Lönngren. 2008. The hunting of the BLARK—SALDO, a freely available lexical database for Swedish language technology. In Resourceful language technology. Festschrift in honor of Anna Sågvall Hein, eds. Joakim Nivre, Mats Dahllöf, and Beata Megyesi. Vol. 7 of Acta Universitatis Upsaliensis: Studia linguistica Upsaliensia, 21–32. Uppsala: Uppsala University, Department of Linguistics and Philology. Google Scholar
  5. Borin, Lars, Markus Forsberg, Leif-Jöran Olsson, and Jonatan Uppström. 2012. The open lexical infrastructure of Språkbanken. In Proceedings of LREC 2012, 3598–3602. Istanbul: ELRA. Google Scholar
  6. Evans, Nicholas, and Stephen C. Levinson. 2009. The myth of language universals: Language diversity and its importance for cognitive science. Behavioral and Brain Sciences 32: 429–492. CrossRefGoogle Scholar
  7. Fellbaum, Christiane, ed. 1998. WordNet: An electronic lexical database. Cambridge: MIT Press. zbMATHGoogle Scholar
  8. Goddard, Cliff. 2001. Lexico-semantic universals: A critical overview. Linguistic Typology 5: 1–65. MathSciNetCrossRefGoogle Scholar
  9. Haspelmath, Martin. 2010. Comparative concepts and descriptive categories in crosslinguistic studies. Language 86: 663–687. CrossRefGoogle Scholar
  10. Haspelmath, Martin, and Uri Tadmor, eds. 2009. Loanwords in the world’s languages. A comparative handbook. Berlin: De Gruyter. Google Scholar
  11. Holman, Eric W., Søren Wichmann, Cecil H. Brown, Viveka Velupillai, André Müller, and Dik Bakker 2008. Explorations in automated language classification. Folia Linguistica 42: 331–354. CrossRefGoogle Scholar
  12. Levinson, Stephen C. 2003. Language and mind: Let’s get the issues straight. In Language in mind. Advances in the study of language and thought, eds. Dedre Gentner and Susan Goldin-Meadow, 25–46. Cambridge: MIT Press. Google Scholar
  13. Lewis, M. Paul, ed. 2009. Ethnologue: Languages of the world, 16th edn. Dallas: SIL International. Online version: Google Scholar
  14. Ogden, Charles K. 1930. Basic English: A general introduction with rules and grammar. London: Paul Treber. Google Scholar
  15. Pease, Adam, and Christiane Fellbaum. 2010. Formal ontology as interlingua: the SUMO and WordNet linking project and global WordNet. In Ontology and the lexicon. A natural language processing perspective, eds. Chu-ren Huang, Nicoletta Calzolari, Aldo Gangemi, Alessandro Lenci, Alessandro Oltramari, and Laurent Prevot, 25–35. Cambridge: Cambridge University Press. CrossRefGoogle Scholar
  16. Swadesh, Morris. 1950. Salish internal relationships. International Journal of American Linguistics 16: 157–167. CrossRefGoogle Scholar
  17. Swadesh, Morris. 1955. Towards greater accuracy in lexicostatistic dating. International Journal of American Linguistics 21: 121–137. CrossRefGoogle Scholar
  18. von Fintel, Kai, and Lisa Matthewson. 2008. Universals in semantics. The Linguistic Review 25: 139–201. Google Scholar
  19. Wilks, Yorick. 2009. Ontotherapy, or how to stop worrying about what there is. In Recent advances in natural language processing V, eds. Nicolas Nicolov, Galia Angelova, and Ruslan Mitkov. Current issues in linguistic theory, 1–20. Amsterdam: John Benjamins. Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.Språkbanken, Department of SwedishUniversity of GothenburgGothenburgSweden

Personalised recommendations