Core Vocabulary: A Useful But Mystical Concept in Some Kinds of Linguistics
This paper is a theoretical and empirical investigation into the use of the notion “core vocabulary” in some areas of linguistics and related disciplines, originally prompted by the concrete task of compiling core vocabularies in two research projects growing out of two quite different research traditions: (1) lexicostatistics, where “core vocabularies” are used to measure the linguistic distance among languages in order to establish genetic and typological language groupings; and (2) computer-assisted language learning—a long-standing research interest of Lauri Carlson—where the “core vocabulary” is the most central vocabulary, to which language learners should be exposed first. In linguistics we also find a more theoretically motivated notion of “core vocabulary”, as so-called “semantic primitives”. In the paper, I compare the three kinds of “core vocabulary” and discuss their relationship to the formal knowledge-representation systems called “ontologies” (currently among Lauri Carlson’s research interests)—especially “core” ontologies such as SUMO—and the notion of “concept” central to the latter work: What is the relationship—if any—between concepts in such ontologies and lexical items in languages?
KeywordsLexical Item Formal Ontology Language Pair Learner Vocabulary Lexical Unit
The work described here has received financial support from the Swedish Research Council (the Digital areal linguistics project: VR dnr 429-2009-1448), from the European Commission (the KELLY project: Lifelong Learning Programme project no. 505630-LLP-2009-1-SE-KA2-KA2MP), and from the University of Gothenburg through its funding of the Centre for Language Technology. I would also like to thank the reviewers for their illuminating questions and insightful comments.
- Borin, Lars, Lauri Carlson, and Diana Santos. 2002. Corpus based language technology for computer-assisted learning of Nordic languages: Squirrel. Progress report September 2001. In Nordisk sprogteknologi. Nordic language technology. Årbog for Nordisk Sprogteknologisk Forskningsprogram 2000–2004. Årbog 2001, ed. Henrik Holmboe, 257–270. Copenhagen: Museum Tusculanums Forlag, Københavns Universitet. Google Scholar
- Borin, Lars, Dana Dannélls, Markus Forsberg, Maria Toporowska Gronostaj, and Dimitrios Kokkinakis. 2010. The past meets the present in Swedish FrameNet++. In 14th EURALEX international congress, 269–281. Leeuwarden: EURALEX. Google Scholar
- Borin, Lars, Markus Forsberg, and Lennart Lönngren. 2008. The hunting of the BLARK—SALDO, a freely available lexical database for Swedish language technology. In Resourceful language technology. Festschrift in honor of Anna Sågvall Hein, eds. Joakim Nivre, Mats Dahllöf, and Beata Megyesi. Vol. 7 of Acta Universitatis Upsaliensis: Studia linguistica Upsaliensia, 21–32. Uppsala: Uppsala University, Department of Linguistics and Philology. Google Scholar
- Borin, Lars, Markus Forsberg, Leif-Jöran Olsson, and Jonatan Uppström. 2012. The open lexical infrastructure of Språkbanken. In Proceedings of LREC 2012, 3598–3602. Istanbul: ELRA. Google Scholar
- Haspelmath, Martin, and Uri Tadmor, eds. 2009. Loanwords in the world’s languages. A comparative handbook. Berlin: De Gruyter. Google Scholar
- Levinson, Stephen C. 2003. Language and mind: Let’s get the issues straight. In Language in mind. Advances in the study of language and thought, eds. Dedre Gentner and Susan Goldin-Meadow, 25–46. Cambridge: MIT Press. Google Scholar
- Ogden, Charles K. 1930. Basic English: A general introduction with rules and grammar. London: Paul Treber. Google Scholar
- Pease, Adam, and Christiane Fellbaum. 2010. Formal ontology as interlingua: the SUMO and WordNet linking project and global WordNet. In Ontology and the lexicon. A natural language processing perspective, eds. Chu-ren Huang, Nicoletta Calzolari, Aldo Gangemi, Alessandro Lenci, Alessandro Oltramari, and Laurent Prevot, 25–35. Cambridge: Cambridge University Press. CrossRefGoogle Scholar
- von Fintel, Kai, and Lisa Matthewson. 2008. Universals in semantics. The Linguistic Review 25: 139–201. Google Scholar
- Wilks, Yorick. 2009. Ontotherapy, or how to stop worrying about what there is. In Recent advances in natural language processing V, eds. Nicolas Nicolov, Galia Angelova, and Ruslan Mitkov. Current issues in linguistic theory, 1–20. Amsterdam: John Benjamins. Google Scholar