Abstract
This paper is a theoretical and empirical investigation into the use of the notion “core vocabulary” in some areas of linguistics and related disciplines, originally prompted by the concrete task of compiling core vocabularies in two research projects growing out of two quite different research traditions: (1) lexicostatistics, where “core vocabularies” are used to measure the linguistic distance among languages in order to establish genetic and typological language groupings; and (2) computer-assisted language learning—a long-standing research interest of Lauri Carlson—where the “core vocabulary” is the most central vocabulary, to which language learners should be exposed first. In linguistics we also find a more theoretically motivated notion of “core vocabulary”, as so-called “semantic primitives”. In the paper, I compare the three kinds of “core vocabulary” and discuss their relationship to the formal knowledge-representation systems called “ontologies” (currently among Lauri Carlson’s research interests)—especially “core” ontologies such as SUMO—and the notion of “concept” central to the latter work: What is the relationship—if any—between concepts in such ontologies and lexical items in languages?
mys·ti·cal
–adjective
1. mystic; occult.
2. of or pertaining to mystics or mysticism: mystical writings.
3. spiritually symbolic.
4. Rare. obscure in meaning; mysterious.
(http://dictionary.reference.com , s.v. mystical)
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
Such ontologies and their most salient and high-flying application domain, the so-called Semantic Web, are among Lauri Carlson’s more recent research interests; see, e.g., the TermFactory Manual http://www.helsinki.fi/~lcarlson/CF/TF/doc/TFManual_en.html.
- 6.
A reviewer remarked on the absence of lexeme from this list. This term is ambiguous, meaning both ‘lexical unit’—thus making the lexeme equivalent to any and all of the listed units—and ‘lexical morpheme’, and is best avoided in our context.
- 7.
The three entries mean respectively ‘language; speech; legal process; cause; opinion’, ‘measure; time; need; meal’, and ‘paint’.
- 8.
Unfortunately, lemma is also used in other senses, e.g., roughly equivalent to lempos or lemgram as explained below.
- 9.
- 10.
- 11.
Although, as we will see below, we do not need to look long in the world’s languages in order to find lexical units with ‘funny’ meanings. Evans and Levinson (2009: 435) provide the example of the Mundari (an Austro-Asiatic language spoken in South Asia) ideophone rawa-dawa, which they gloss as ‘the sensation of suddenly realizing you can do something reprehensible, and no-one is there to witness it’. This kind of example can be repeated essentially ad infinitum.
- 12.
Since several of the vocabularies ultimately come out of the same research tradition, we may tentatively assume commensurability at least for the items in those lists.
- 13.
- 14.
http://wold.livingsources.org/; the LJ list has been somewhat edited for the purposes of this presentation, so that, e.g., “1sg pronoun” has been replaced by “I”, “arm/hand” by “hand”, “who?” by “who”, “child (kin term)” by “child”, etc., in order to make the word lists mechanically comparable with a computer program.
- 15.
- 16.
Here the thorny issue of translation equivalence rears its head. The KELLY translators were instructed to provide one translation in the normal case. How this methodological decision has influenced this investigation is a very interesting question which we will have to leave for future research.
- 17.
As we have seen above, this may be true only in an approximate sense.
References
Baroni, Marco, and Silvia Bernardini, eds. 2006. Wacky! working papers on the Web as Corpus. Bologna: GEDIT. Online version: http://wackybook.sslmit.unibo.it.
Borin, Lars, Lauri Carlson, and Diana Santos. 2002. Corpus based language technology for computer-assisted learning of Nordic languages: Squirrel. Progress report September 2001. In Nordisk sprogteknologi. Nordic language technology. Årbog for Nordisk Sprogteknologisk Forskningsprogram 2000–2004. Årbog 2001, ed. Henrik Holmboe, 257–270. Copenhagen: Museum Tusculanums Forlag, Københavns Universitet.
Borin, Lars, Dana Dannélls, Markus Forsberg, Maria Toporowska Gronostaj, and Dimitrios Kokkinakis. 2010. The past meets the present in Swedish FrameNet++. In 14th EURALEX international congress, 269–281. Leeuwarden: EURALEX.
Borin, Lars, Markus Forsberg, and Lennart Lönngren. 2008. The hunting of the BLARK—SALDO, a freely available lexical database for Swedish language technology. In Resourceful language technology. Festschrift in honor of Anna Sågvall Hein, eds. Joakim Nivre, Mats Dahllöf, and Beata Megyesi. Vol. 7 of Acta Universitatis Upsaliensis: Studia linguistica Upsaliensia, 21–32. Uppsala: Uppsala University, Department of Linguistics and Philology.
Borin, Lars, Markus Forsberg, Leif-Jöran Olsson, and Jonatan Uppström. 2012. The open lexical infrastructure of Språkbanken. In Proceedings of LREC 2012, 3598–3602. Istanbul: ELRA.
Evans, Nicholas, and Stephen C. Levinson. 2009. The myth of language universals: Language diversity and its importance for cognitive science. Behavioral and Brain Sciences 32: 429–492.
Fellbaum, Christiane, ed. 1998. WordNet: An electronic lexical database. Cambridge: MIT Press.
Goddard, Cliff. 2001. Lexico-semantic universals: A critical overview. Linguistic Typology 5: 1–65.
Haspelmath, Martin. 2010. Comparative concepts and descriptive categories in crosslinguistic studies. Language 86: 663–687.
Haspelmath, Martin, and Uri Tadmor, eds. 2009. Loanwords in the world’s languages. A comparative handbook. Berlin: De Gruyter.
Holman, Eric W., Søren Wichmann, Cecil H. Brown, Viveka Velupillai, André Müller, and Dik Bakker 2008. Explorations in automated language classification. Folia Linguistica 42: 331–354.
Levinson, Stephen C. 2003. Language and mind: Let’s get the issues straight. In Language in mind. Advances in the study of language and thought, eds. Dedre Gentner and Susan Goldin-Meadow, 25–46. Cambridge: MIT Press.
Lewis, M. Paul, ed. 2009. Ethnologue: Languages of the world, 16th edn. Dallas: SIL International. Online version: http://www.ethnologue.com/.
Ogden, Charles K. 1930. Basic English: A general introduction with rules and grammar. London: Paul Treber.
Pease, Adam, and Christiane Fellbaum. 2010. Formal ontology as interlingua: the SUMO and WordNet linking project and global WordNet. In Ontology and the lexicon. A natural language processing perspective, eds. Chu-ren Huang, Nicoletta Calzolari, Aldo Gangemi, Alessandro Lenci, Alessandro Oltramari, and Laurent Prevot, 25–35. Cambridge: Cambridge University Press.
Swadesh, Morris. 1950. Salish internal relationships. International Journal of American Linguistics 16: 157–167.
Swadesh, Morris. 1955. Towards greater accuracy in lexicostatistic dating. International Journal of American Linguistics 21: 121–137.
von Fintel, Kai, and Lisa Matthewson. 2008. Universals in semantics. The Linguistic Review 25: 139–201.
Wilks, Yorick. 2009. Ontotherapy, or how to stop worrying about what there is. In Recent advances in natural language processing V, eds. Nicolas Nicolov, Galia Angelova, and Ruslan Mitkov. Current issues in linguistic theory, 1–20. Amsterdam: John Benjamins.
Acknowledgements
The work described here has received financial support from the Swedish Research Council (the Digital areal linguistics project: VR dnr 429-2009-1448), from the European Commission (the KELLY project: Lifelong Learning Programme project no. 505630-LLP-2009-1-SE-KA2-KA2MP), and from the University of Gothenburg through its funding of the Centre for Language Technology. I would also like to thank the reviewers for their illuminating questions and insightful comments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Borin, L. (2012). Core Vocabulary: A Useful But Mystical Concept in Some Kinds of Linguistics. In: Santos, D., Lindén, K., Ng’ang’a, W. (eds) Shall We Play the Festschrift Game?. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30773-7_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-30773-7_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30772-0
Online ISBN: 978-3-642-30773-7
eBook Packages: Computer ScienceComputer Science (R0)