Complexity

Tanaka-Ishii, Kumiko

doi:10.1007/978-3-030-59377-3_10

Kumiko Tanaka-Ishii¹¹

Part of the book series: Mathematics in Mind ((MATHMIN))

Abstract

We now have a rough overview of the most important statistical universals underlying language. As a whole, is there any way to examine how complex language is? What is the characteristic underlying this complexity?

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Note that the conditional entropy is given in general as
$$\displaystyle \begin{aligned} \mathrm{H}(X|Y) = -\sum_{x,y} P(X=x,Y=y)\log P(X=x |Y=y). \end{aligned} $$
(10.3)
2.
In this book, ⇒ indicates convergence.
3.
See Sect. 17.2 for the concepts of language models and their training.
4.
The PPM code uses an n-gram-based language modeling method (Bell et al., 1990) that applies variable-length n-grams and arithmetic coding. The PPM code is guaranteed to be universal when the length of the n-gram is considered up to infinity (Ryabko, 2010). Among state-of-the-art compressors, 7-zip PPMd was used for the PPM code. PPM was used because it follows theory better than many other compression methods do, such as zip, lzh, and tar.xz (Takahira et al., 2016).
5.
I hereby thank Ryosuke Takahira and Shuntaro Takahashi for generating this figure for the purpose of this book, by reusing code used to conduct the study reported in Takahira et al. (2016).
6.
For fitting Fig. 6.3, the least-squares method was applied (cf. Sect. 21.1). ε = 0.0175 for the New York Times, ε = 0.00606 for the shuffled text, and ε = 0.00295 for the monkey text.
7.
This figure was adapted from Takahira et al. (2016) by applying the extrapolation function of formula (10.5) to the data listed in the first block of Table 1 in that article, which consisted only of results obtained from clean newspaper data.
8.
Whether this is true would require more fundamental research. Above all, it might not be the case that β characterizes natural language, given that a shuffled text’s β was pretty close to that of natural language here. One possible path to verify the β value’s universality would be to conduct a statistical test, as was done for Taylor analysis, with many sets of data as introduced in the previous section. The problem in doing so is that the texts must be very large indeed to estimate a credible β to acquire the entropy rate. At the same time, larger texts have the limitation of self-similarity as discussed in Chap. 5. Therefore, clarifying whether β is universal would require a completely different approach.
9.
Section 21.8 explains the perplexity in relation to the entropy rate and cross entropy.

References

Bell, Timothy C., Cleary, John G., and Witten, Ian H. (1990). Text Compression. Prentice Hall.
Google Scholar
Berger, Toby (1968). Rate distortion theory for sources with abstract alphabets and memory. Information and Control, 13, 254–273.
Article MathSciNet MATH Google Scholar
Brown, Peter F., Della-Pietra, Stephan A., Della-Pietra, Vincent J., Lai, Jennifer C., and Mercer, Robert L. (1992). An estimate of an upper bound for the entropy of English. Computational Linguistics, 18(1), 31–40.
Google Scholar
Cover, Thomas M. and King, Roger C. (1978). A convergent gambling estimate of the entropy of English. IEEE Transactions on Information Theory, 24(4), 413–421.
Article MathSciNet MATH Google Scholar
Cover, Thomas M. and Thomas, Joy A. (1991). Elements of Information Theory. John Wiley & Sons, Inc.
Book MATH Google Scholar
Crutchfield, J. P. and Feldman, D. P. (2003). Regularities unseen, randomness observed: Levels of entropy convergence. Chaos, 13, 25–54.
Article MathSciNet MATH Google Scholar
Dai, Zihang, Yang, Zhilin, Yang, Yiming, Carbonell, Jaime, Le, Quoc V., and Salakhutdinov, Ruslan (2019). Transformer-XL: Attentive language models beyond a fixed-length context. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2978–2988.
Google Scholar
Dȩbowski, Łukasz (2015). The relaxed Hilberg conjecture: A review and new experimental support. Journal of Quantitative Linguistics, 22(4), 311–337.
Google Scholar
Dȩbowski, Łukasz (2020). Information Theory Meets Power Laws: Stochastic Processes and Language Models. Wiley.
Google Scholar
Ebeling, Werner and Nicolis, G. (1991). Entropy of symbolic sequences : The role of correlations. Europhysics Letters, 14(3), 191–196.
Article Google Scholar
Ferrer-i-Cancho, Ramon, Dȩbowski, Łukas, and Moscoso del Prado Martin, Fermin 2013:L07001. Constant conditional entropy and related hypotheses. Journal of Statistical Mechanics: Theory and Experiment, 2013.
Google Scholar
Genzel, Dmitriy and Charniak, Eugene (2002). Entropy rate constancy in text. In Proccedings of the 40th Annual Meeting on Association for Computational Linguistics, pages 199–206.
Google Scholar
Hilberg, Wolfgang (1990). Der bekannte grenzwert der redundanzfreien information in texten — eine fehlinterpretation der shannonschen experimente? Frequenz, 44, 243–248.
Article Google Scholar
Hockett, Charles F. (1958). A course in modern linguistics. Macmillan Company.
Google Scholar
Levy, Roger and Jaeger, T. Florian (2006). Speakers optimize information density through syntactic reduction. In Advances in Neural Information Processing Systems 19, pages 849–856. MIT Press.
Google Scholar
Manning, Christopher D. and Schütze, Hinrich (1999). Foundations of Statistical Natural Language Processing. The MIT Press.
MATH Google Scholar
Moradi, Hamid, Grzymala-Busse, Jerzy W., and Roberts, James A. (1998). Entropy of English text: Experiments with humans and a machine learning system based on rough sets. Information Sciences, 104, 31–47.
Article Google Scholar
Ren, Geng, Takahashi, Shuntaro, and Tanaka-Ishii, Kumiko (2019). Entropy rate estimation for English via a large cognitive experiment using Mechanical Turk. Entropy, 21(12):1201.
Article Google Scholar
Ryabko, Boris (2010). Applications of universal source coding to statistical analysis of time series. In I. Woungang, S. Misra, and S. C. Misra, editors, Selected Topics in Information and Coding Theory, Series on Coding and Cryptology, pages 289–338. World Scientific Publishing.
Google Scholar
Schümann, Thomas and Grassberger, Peter (1996). Entropy estimation of symbol sequences. Chaos, 6(3), 414–427.
Article MathSciNet MATH Google Scholar
Shannon, Claude E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27, 379–423, 623–656.
Article MathSciNet MATH Google Scholar
Shannon, Claude E. (1951). Prediction and entropy of printed English. The Bell System Technical Journal, 30, 50–64.
Article MATH Google Scholar
Shannon, Claude E. (1959). Coding theorems for a discrete source with a fidelity criterion. IRE National Convention Record, 4, 142–163.
Google Scholar
Takahira, Ryosuke, Tanaka-Ishii, Kumiko, and Dȩbowski, Łukasz (2016). Entropy rate estimates for natural language—a new extrapolation of compressed large-scale corpora. Entropy, 18(10):364.
Google Scholar

Download references

Author information

Authors and Affiliations

Research Center for Advanced Science and Technology (RCAST), The University of Tokyo, Tokyo, Japan
Kumiko Tanaka-Ishii

Authors

Kumiko Tanaka-Ishii
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Tanaka-Ishii, K. (2021). Complexity. In: Statistical Universals of Language. Mathematics in Mind. Springer, Cham. https://doi.org/10.1007/978-3-030-59377-3_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-59377-3_10
Published: 02 April 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59376-6
Online ISBN: 978-3-030-59377-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics