Skip to main content

Long-Range Correlation

  • Chapter
  • First Online:
Statistical Universals of Language

Part of the book series: Mathematics in Mind ((MATHMIN))

  • 719 Accesses

Abstract

The previous chapter examined the return distributions of the words in a text. Another way to examine returns is in terms of how they succeed one another. As we will see here and in the following chapter, in a natural language text, a short return is likely to follow a series of short returns, and a long return is likely to follow a series of long returns. This causes a clustering phenomenon, meaning that at certain times, a word appears densely in a chunk of text, whereas at other times, the word hardly occurs. One source of such clustering phenomena in language lies in the context.

The original version of this chapter was revised. The correction to this chapter is available at https://doi.org/10.1007/978-3-030-59377-3_23

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Change history

  • 29 November 2022

    The original version of the chapter “Language as a Complex System” was previously published without updating missing reference in footnote 4, page 22. This change has now been included and the chapter and the book have been updated with the change.

Notes

  1. 1.

    Some examples were reported in Eisler et al. (2008); Corral (2004, 2005); Bunde et al. (2005); Santhanam and Kantz (2005); Blender et al. (2014); Turcotte (1997); Yamasaki et al. (2005); Bogachev et al. (2007).

  2. 2.

    A logarithmic bin is a range that extends exponentially along s. This is a common technique for analysis of power function decay (Clauset et al., 2009). Although Part II did not apply it, a logarithmic bin is commonly used whenever some points appear in a cloud, as shown in the figures in Sects. 6.1 and 7.1 . For analysis with the autocorrelation function, this book uses an integer bin in the range of 1 ≤ s ≤ 10 and after s = ⌈10 × 1.2k⌉, k = 1, ….

  3. 3.

    For fitting Fig. 6.3 , the least-squares method was applied (cf. Sect. 21.1 ). ε = 0.00884.

  4. 4.

    ε = 0.00761.

  5. 5.

    Cov[Q i, Q i+s] is defined as follows:

    $$\displaystyle \begin{aligned} \mathrm{Cov}[Q_i,Q_{i+s}] \equiv \mathrm{E}[(Q_i - \mu)(Q_{i+s} - \mu)] \end{aligned} $$
    (8.4)

References

  • Altmann, Edouard G., Cristadoro, Giampaolo and Esposti, Mirko D. (2012). On the origin of long-range correlations in texts. Proceedings of the National Academy of Sciences, 109(29), 11582–11587.

    Article  Google Scholar 

  • Blender, Richard, Raible, Christoph C., and Lunkeit, Frank (2014). Non-exponential return time distributions for vorticity extremes explained by fractional poisson processes. Quarterly Journal of the Royal Meteorological Society, 141, 249–257.

    Article  Google Scholar 

  • Bogachev, Mikhail I., Eichner, Jan F., and Bunde, Armin (2007). Effect of nonlinear correlations on the statistics of return intervals in multifractal data sets. Physical Review Letters, 99(24):240601.

    Article  Google Scholar 

  • Bunde, Armin, Eichner, Jan F., Kantelhardt, Jan W., and Havlin, Shlomo (2005). Long-term memory : A natural mechanism for the clustering of extreme events and anomalous residual times in climate records. Physical Review Letters, 94(4):048701.

    Article  Google Scholar 

  • Clauset, Aaron, Shalizi, Cosma R., and Newman, Mark E. J. (2009). Power-law distributions in empirical data. SIAM review, 51(4), 661–703.

    Article  MathSciNet  MATH  Google Scholar 

  • Corral, Álvaro (2004). Long-term clustering, scaling, and universality in the temporal occurrence of earthquakes. Physical Review Letters, 92(10):108501.

    Article  Google Scholar 

  • Corral, Álvaro (2005). Renomalization-group transformations and correlations of seismicity. Physical Review Letters, 95:028501.

    Article  Google Scholar 

  • Ebeling, Werner and Neiman, Alexander (1995). Long-range correlations between letters and sentences in texts. Physica A, 215, 233–241.

    Article  Google Scholar 

  • Ebeling, Werner and Pöschel, Thorsten (1993). Entropy and long-range correlations in literary English. Europhysics Letters, 26(4), 241–246.

    Article  Google Scholar 

  • Eisler, Zoltán, Bartos, Imre, and Kertész, János (2008). Fluctuation scaling in complex systems: Taylor’s law and beyond. Advances in Physics, 57, 89–142.

    Article  Google Scholar 

  • Kosmidis, Kosmas, Kalampokis, Alkiviadis, and Argyrakis, Panos (2006). Language time series analysis. Physica A, 370, 808–816.

    Article  Google Scholar 

  • Lennartz, Sabine and Bunde, Armin (2009). Eliminating finite-size effects and detecting the amount of white noise in short records with long-term memory. Physical Review E, 79(6):066101.

    Article  Google Scholar 

  • Li, Wentian (1989). Mutual information functions of natural language texts. Santa Fe Institute Working Paper, 1989.

    Google Scholar 

  • Li, Wentian, Marr, Thomas G., and Kaneko, Kunihiko (1994). Understanding long-range correlations in DNA sequences. Physica D : Nonlinear Phenomena, 75, 392–416.

    Article  MATH  Google Scholar 

  • Lin, Henry W. and Tegmark, Max (2017). Critial behavior in physics and probabilistic formal languages. Entropy, 19(7):299.

    Article  Google Scholar 

  • Montemurro, Marcelo A. and Pury, Pedro A. (2002). Long-range fractal correlations in literary corpora. Fractals, 10(4), 451–461.

    Article  Google Scholar 

  • Pipiras, Vladas and Taqqu, Murad S. (2017). Long-Range Dependence and Self-Similarity. Cambridge University Press.

    Book  MATH  Google Scholar 

  • Santhanam, M. S. and Kantz, Holger (2005). Long-range correlations and rare events in boundary layer wind fields. Physica A, 345, 713–721.

    Article  Google Scholar 

  • Shumway, Robert H. and Stoffer, David S. (2011). Time Series Analysis and Its Applications: With R Examples (3rd edition). Springer.

    Google Scholar 

  • Takahashi, Shuntaro and Tanaka-Ishii, Kumiko (2017). Do neural nets learn statistical laws behind natural langauge? PLoS One, 12(12):e0189326.

    Article  Google Scholar 

  • Tanaka-Ishii, Kumiko (2018). Long-range correlation underlying childhood language and generative models. Frontiers in Psychology. Section Quantitative Psychology and Measurement, 9:01725.

    Google Scholar 

  • Tanaka-Ishii, Kumiko and Bunde, Armin (2016). Long-range memory in literary texts: On the universal clustering of the rare words. PLoS One, 11(11), e0164658.

    Article  Google Scholar 

  • Turcotte, Donald L. (1997). Fractals and Chaos in Geology and Geophysics. Cambridge University Press.

    Book  MATH  Google Scholar 

  • Voss, Richard F. (1992). Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. Physical Review Letters, 68(25), 3805–3808.

    Article  Google Scholar 

  • Yamasaki, Kazuko, Muchnik, Lev, Havlin, Shlomo, Bunde, Armin, and Stanley, H.Eugene (2005). Scaling and memory in volatility return intervals in financial markets. Proceedings of the National Acaddemy of Sciences, 102(26), 9424–9428.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kumiko Tanaka-Ishii .

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s)

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Tanaka-Ishii, K. (2021). Long-Range Correlation. In: Statistical Universals of Language. Mathematics in Mind. Springer, Cham. https://doi.org/10.1007/978-3-030-59377-3_8

Download citation

Publish with us

Policies and ethics