Abstract
The previous chapter examined the return distributions of the words in a text. Another way to examine returns is in terms of how they succeed one another. As we will see here and in the following chapter, in a natural language text, a short return is likely to follow a series of short returns, and a long return is likely to follow a series of long returns. This causes a clustering phenomenon, meaning that at certain times, a word appears densely in a chunk of text, whereas at other times, the word hardly occurs. One source of such clustering phenomena in language lies in the context.
The original version of this chapter was revised. The correction to this chapter is available at https://doi.org/10.1007/978-3-030-59377-3_23
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Change history
29 November 2022
The original version of the chapter “Language as a Complex System” was previously published without updating missing reference in footnote 4, page 22. This change has now been included and the chapter and the book have been updated with the change.
Notes
- 1.
- 2.
A logarithmic bin is a range that extends exponentially along s. This is a common technique for analysis of power function decay (Clauset et al., 2009). Although Part II did not apply it, a logarithmic bin is commonly used whenever some points appear in a cloud, as shown in the figures in Sects. 6.1 and 7.1 . For analysis with the autocorrelation function, this book uses an integer bin in the range of 1 ≤ s ≤ 10 and after s = ⌈10 × 1.2k⌉, k = 1, ….
- 3.
- 4.
ε = 0.00761.
- 5.
Cov[Q i, Q i+s] is defined as follows:
$$\displaystyle \begin{aligned} \mathrm{Cov}[Q_i,Q_{i+s}] \equiv \mathrm{E}[(Q_i - \mu)(Q_{i+s} - \mu)] \end{aligned} $$(8.4)
References
Altmann, Edouard G., Cristadoro, Giampaolo and Esposti, Mirko D. (2012). On the origin of long-range correlations in texts. Proceedings of the National Academy of Sciences, 109(29), 11582–11587.
Blender, Richard, Raible, Christoph C., and Lunkeit, Frank (2014). Non-exponential return time distributions for vorticity extremes explained by fractional poisson processes. Quarterly Journal of the Royal Meteorological Society, 141, 249–257.
Bogachev, Mikhail I., Eichner, Jan F., and Bunde, Armin (2007). Effect of nonlinear correlations on the statistics of return intervals in multifractal data sets. Physical Review Letters, 99(24):240601.
Bunde, Armin, Eichner, Jan F., Kantelhardt, Jan W., and Havlin, Shlomo (2005). Long-term memory : A natural mechanism for the clustering of extreme events and anomalous residual times in climate records. Physical Review Letters, 94(4):048701.
Clauset, Aaron, Shalizi, Cosma R., and Newman, Mark E. J. (2009). Power-law distributions in empirical data. SIAM review, 51(4), 661–703.
Corral, Álvaro (2004). Long-term clustering, scaling, and universality in the temporal occurrence of earthquakes. Physical Review Letters, 92(10):108501.
Corral, Álvaro (2005). Renomalization-group transformations and correlations of seismicity. Physical Review Letters, 95:028501.
Ebeling, Werner and Neiman, Alexander (1995). Long-range correlations between letters and sentences in texts. Physica A, 215, 233–241.
Ebeling, Werner and Pöschel, Thorsten (1993). Entropy and long-range correlations in literary English. Europhysics Letters, 26(4), 241–246.
Eisler, Zoltán, Bartos, Imre, and Kertész, János (2008). Fluctuation scaling in complex systems: Taylor’s law and beyond. Advances in Physics, 57, 89–142.
Kosmidis, Kosmas, Kalampokis, Alkiviadis, and Argyrakis, Panos (2006). Language time series analysis. Physica A, 370, 808–816.
Lennartz, Sabine and Bunde, Armin (2009). Eliminating finite-size effects and detecting the amount of white noise in short records with long-term memory. Physical Review E, 79(6):066101.
Li, Wentian (1989). Mutual information functions of natural language texts. Santa Fe Institute Working Paper, 1989.
Li, Wentian, Marr, Thomas G., and Kaneko, Kunihiko (1994). Understanding long-range correlations in DNA sequences. Physica D : Nonlinear Phenomena, 75, 392–416.
Lin, Henry W. and Tegmark, Max (2017). Critial behavior in physics and probabilistic formal languages. Entropy, 19(7):299.
Montemurro, Marcelo A. and Pury, Pedro A. (2002). Long-range fractal correlations in literary corpora. Fractals, 10(4), 451–461.
Pipiras, Vladas and Taqqu, Murad S. (2017). Long-Range Dependence and Self-Similarity. Cambridge University Press.
Santhanam, M. S. and Kantz, Holger (2005). Long-range correlations and rare events in boundary layer wind fields. Physica A, 345, 713–721.
Shumway, Robert H. and Stoffer, David S. (2011). Time Series Analysis and Its Applications: With R Examples (3rd edition). Springer.
Takahashi, Shuntaro and Tanaka-Ishii, Kumiko (2017). Do neural nets learn statistical laws behind natural langauge? PLoS One, 12(12):e0189326.
Tanaka-Ishii, Kumiko (2018). Long-range correlation underlying childhood language and generative models. Frontiers in Psychology. Section Quantitative Psychology and Measurement, 9:01725.
Tanaka-Ishii, Kumiko and Bunde, Armin (2016). Long-range memory in literary texts: On the universal clustering of the rare words. PLoS One, 11(11), e0164658.
Turcotte, Donald L. (1997). Fractals and Chaos in Geology and Geophysics. Cambridge University Press.
Voss, Richard F. (1992). Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. Physical Review Letters, 68(25), 3805–3808.
Yamasaki, Kazuko, Muchnik, Lev, Havlin, Shlomo, Bunde, Armin, and Stanley, H.Eugene (2005). Scaling and memory in volatility return intervals in financial markets. Proceedings of the National Acaddemy of Sciences, 102(26), 9424–9428.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2021 The Author(s)
About this chapter
Cite this chapter
Tanaka-Ishii, K. (2021). Long-Range Correlation. In: Statistical Universals of Language. Mathematics in Mind. Springer, Cham. https://doi.org/10.1007/978-3-030-59377-3_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-59377-3_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59376-6
Online ISBN: 978-3-030-59377-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)