Abstract
The distribution behavior described by the empirical Menzerath–Altmann law is frequently encountered during the self-organization of linguistic and non-linguistic natural organizations at various structural levels. This study presents a statistical mechanical derivation of the law based on the analogy between the classical particles of a statistical mechanical organization and the distinct words of a textual organization. The derived model, a transformed (generalized) form of the Menzerath–Altmann model, was termed as the statistical mechanical Menzerath–Altmann model. The derived model allows interpreting the model parameters in terms of physical concepts. We also propose that many organizations presenting the Menzerath–Altmann law behavior, whether linguistic or not, can be methodically examined by the transformed distribution model through the properly defined structure-dependent parameter and the energy associated states.
Similar content being viewed by others
Notes
Henceforth, we will refer to vocabulary of a text as distinct words (DWs) for convenience. In general, DWs of a text can be considered as the set of dissimilar constituents of a construct at the level of organization under investigation.
Thus, we will use the terminologies word length and word energy interchangeably, for the rest of the paper.
Note that we only consider the word length distribution of DWs in a text, so the frequency and the occurrence positions of the constituents (words) are insignificant.
References
Menzerath, P.: Die Architektonik des deutschen Wortschatzes. Dümmler, Bonn (1954)
Hrebicek, L.: The Menzerath–Altmann law on the semantic level. Glottometrika 11, 47–56 (1989)
Teupenhayn, R., Altmann, G.: Clause length and Menzerath’s law. Glottometrika 6, 127–138 (1984)
Hrebicek, L.: Text Levels : Language Constructs. Constituents and the Menzerath–Altmann Law. Wissenschaftlicher Verlag, Trier (1995)
Wimmer, G., Köhler, R., Grotjahn, R., Altmann, G.: Towards a theory of word length distribution. J. Quant. Linguist. 1, 98–106 (1994)
Eroglu, S.: Language-like behavior of protein length distribution in proteomes. Complexity. doi:10.1002/cplx.21498.
Eroglu, S.: Self-organization of genic and intergenic sequence lengths in genomes: statistical properties and linguistic coherence. Complexity. doi:10.1002/cplx.21563
Ferrer-i-Cancho, R., Forns, N., Hernández-Fernández, A., Bel-enguix, G., Baixeries, J.: The challenges of statistical patterns of language: the case of Menzerath’s law in genomes. Complexity 18, 11–17 (2013)
Li, W.: Menzerath’s law at the gene-exon level in the human genome. Complexity 17, 49–53 (2012)
Hernández-Fernández, A., Baixeries, J., Forns, N., Ferrer-i-Cancho, R.: Size of the whole versus number of parts in genomes. Entropy 13, 1465–1480 (2011)
Solé, R.V.: Genome size, self-organization and DNA’s dark matter. Complexity 16, 20–23 (2010)
Ferrer-i-Cancho, R., Forns, N.: The self-organization of genomes. Complexity 15, 34–36 (2009)
Boroda, M.G., Altmann, G.: Menzerath’s law in musical texts. Musikometrica 3, 1–13 (1991)
Altmann, G.: Prolegomena to Menzerath’s law. Glottometrika 2, 1–10 (1980)
Krott, A.: Some remarks on the relation between word length and morpheme length. J. Quant. Linguist. 3, 29–37 (1996)
Antic, G., Stadlober, E., Gryzbek, P., Kelih, E.: Word length and frequency distributions in different text genres. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Niirnberger, A., Gaul, W. (eds.) From Data and Information Analysis to Knowledge Engineering, pp. 310–317. Springer, Berlin (2006)
Popescu, I.-I., et al.: Word Frequency Studies. Mouton de Gruyter, Berlin-New York (2009)
Eroglu, S.: Menzerath–Altmann law for distinct word distribution analysis in a large text. Physica A 392, 2775–2780 (2013)
Heaps, H.S.: Information Retrieval: Computational and Theoretical Aspects. Academic Press, Orlando (1978)
Kulacka, A., Macutek, J.: A discrete formula for the Menzerath–Altmann Law. J. Quant. Linguist. 14, 23–32 (2007)
Cramer, I.M.: The parameters of the Menzerath–Altmann law. J. Quant. Linguist. 12, 41–52 (2005)
Kosmidis, K., Kalampokis, A., Argyrakis, P.: Statistical mechanical approach of human language. Physica A 366, 495–502 (2006)
Miyazima, S., Yamamoto, K.: Measuring the temperature of texts. Fractals 16, 25–32 (2008)
Rovenchak, A., Buk, S.: Application of a quantum ensemble model to linguistic analysis. Physica A 390, 1326–1331 (2011)
Ferrer-i-Cancho, R.: Decoding least effort and scaling in signal frequency distributions. Physica A 345, 275–284 (2005)
Balasubrahmanyan, V.K., Naranan, S.: Quantitative linguistics and complex system Studies. J. Quant. Linguist. 3, 177–228 (1996)
Zipf, G.K.: Human Behaviour and the Principle of Least Effort: An Introduction to Human Ecology. Addison-Wesley, Cambridge (1949)
Ferrer-i-Cancho, R., Hernández-Fernández, A., Lusseau, D., Agoramoorthy, G., Hsu, M.J., Semple, S.: Compression as a universal principle of animal behavior. arXiv:1303.6174
Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T.: Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press, New York (1992)
Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19, 716–723 (1974)
Acknowledgments
We are grateful to A. Algin for helpful discussions and we appreciate the careful proofreading of the manuscript by H. Kreuzer. This work was partially supported by Eskisehir Osmangazi University’s Scientific Research Project Commission (Grant No. 2008-19019).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Eroglu, S. Menzerath–Altmann Law: Statistical Mechanical Interpretation as Applied to a Linguistic Organization. J Stat Phys 157, 392–405 (2014). https://doi.org/10.1007/s10955-014-1078-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10955-014-1078-8