Quantitative Regularities of the Diversity of Lexical Meaning
There is a number of extensively tested and confirmed regularities of use of natural language such as rank-frequency distribution (widely known as Zipf’s Law) and type-token distribution of words. Most of these regularities are based on formal attributes of words (number of word occurrences, number of different lexicographical types of words, etc.). On the other hand, very little has been done to investigate potential regularities involving semantic attributes of natural language. The focus of the study reported here is on the quantitative aspects of the diversity of referential meaning of linguistic elements which they acquire in the process of overall semantic attribution.
Specifically, the objective was to investigate, for selected languages, the distribution of words and morphemes by the number of their dictionary meanings. Truncated negative binomial, Waring, Yule, Borel, and zeta distributions were selected as the most likely theoretical candidates. Statistical methods were used to evaluate goodness-of-fit of empirical data to these theoretical distributions. Results on the distribution of words by the number of dictionary meanings and on the lexical frequency distribution of morphemes are presented. Best fits to empirical frequencies of words by the number of meanings for English, Spanish, Russian, and Hungarian languages and for English morphemes were obtained to negative binomial, Waring and Yule distribution laws, both across and within the major grammatical categories of words (i.e. nouns, verbs, adjectives).
The results of fitting the frequencies of word associations to theoretical distributions are described. The distributions of word associations for a sample of 67 stimulus words fitted best to truncated negative binomial law with remarkable consistency. Potential generalizations and implications of these findings are discussed.
KeywordsNegative Binomial Distribution Theoretical Distribution Parameter Estimation Method Spanish Word Transitive Verb
Unable to display preview. Download preview PDF.
- Altmann, G., Best, K.H., and Kind, B., (1987), A Generalization of the La of Semantic Diversification, Quantitative Linguistics, Vol. 32, pp. 130–139 (In German).Google Scholar
- Andrukovich, P. F., and Korolev, E. I., 1977, The Statistical and Lexicogrammatical Properties of Words, Autom. Doc. Math. Linguist., Vol. 11, No. 2, pp. 1–11.Google Scholar
- Baker, S. J., 1950, The Pattern of Language, Journal of General Psychology, Vol. 42, No. 1, pp. 25–66. Becker, C. A., and Killion, T. H., 1977, Interaction of Visual and Cognitive Effects in Word Recognition, J. Exp. Psychol., Human Perceptions and Performance, Vol. 3, No. 3, pp. 389–4111.CrossRefGoogle Scholar
- Fuller, W., 1968, An Introduction to Probability Theory and Its Applications, Vol. I, John Wiley & Sons, New York.Google Scholar
- Guiraud, P., 1954, Language and Communication, Informational Substance of Semantization, Bulletin de la Societe de Linguistique de Paris, Vol. 49, pp. 119–133. (In French.)Google Scholar
- Guiraud, P., 1965, Diacritical and Statistical Models for Languages in Relation to the Computer, The Use of Computers in Anthropology, Hymes, D., ed., Mouton and Co., London, pp. 235–254.Google Scholar
- Guiraud, P., 1971, The Semic Matrices of Meaning, Essays in Semiotics, Kristeva, J., Rey-Debove, J., and Umiker, D. J., eds., Mouton, Paris, pp. 150–159.Google Scholar
- Harris, Z., 1954, Distributional Structure, Word, No. 10, pp. 146–162.Google Scholar
- Krylov, Yu. K., and Yakubovskaya, M. D., 1977, Statistical Analysis of Polysemy as a Language Universal and the Problem of the Semantic Identity of the Word, Nauchno-Tekhnicheskaya Informatsiya, Series 2, Vol. 11, No. 3, pp. 1–6.Google Scholar
- Lewis, P. A., Baxendale, P. B., and Bennet, J. L., 1967, Statistical Discrimination of the Synonymy/Google Scholar
- Antonymy Relationship Between Words, Journal of the ACM,Vol. 14, No. 1, pp. 20–44. Ljung, M., 1974, A Frequency Dictionary of English Morphemes,AWE/Gebers, Stockholm, Sweden.Google Scholar
- Orszag, L., 1962, A magyar nyelv ertelmezo szotara,Vol. 1–7, Budapest, Hungary (In Hungarian). Ozhegov, S.I., 1960, Lexicographic Collection,Moscow (In Russian)Google Scholar
- Pap, F., 1967, On Some Quantitative Characteristics of a Language Vocabulary, Annales Institutti Philologiae Slavicae Universitatis Debreceniensis, Vol 7, pp. 51–58 (In Russian)Google Scholar
- Simon, H. A., 1955, On a Class of Skew Distribution Functions, Biometrika, 42, pp. 425–440. Terwilliger, R. F., 1968, Meaning and Mind, Oxford Univ. Press, New York.Google Scholar
- Thoren, B., 1959, 8000 ord for 8 ars angelska, Malmo,GleerupsGoogle Scholar
- Thorndike, E. L., and Lorge, I., 1959, The Teacher’s Workbook of 30,000 Words, 3rd ed. New York, Columbia University Press.Google Scholar
- Zipf, G. K., 1949, Human Behavior and the Principle of Least Effort, Addison-Wesley Press, Cambridge, Mass.Google Scholar
- Zunde, P., 1981, On Empirical Laws and Theories of Information Science,Research Report, Georgia Institute of Technology, Atlanta, GA, NTIS Access No. PB82–125998.Google Scholar
- Zunde, P., 1987, Information Science Laws and Regularities: A Survey, Rasmussen, J., and Zunde, P., eds., Empirical Foundations of Information and Software Sciences III, Plenum Press, New York, NY, p. 243–270.Google Scholar
- Zunde, P., and Zhou, H., 1988, On Semantic Regularities of Language Use, Research Report GIT-ICS89/03, Georgia Institute of Technology, Atlanta, Georgia.Google Scholar