Skip to main content

Articulation of Elements

  • Chapter
  • First Online:
Statistical Universals of Language

Part of the book series: Mathematics in Mind ((MATHMIN))

  • 671 Accesses

Abstract

The previous two parts of this book considered statistical universals of language. Sequences were input to specific analysis methods to examine the behavior of words or characters. The resulting phenomena were studied from the two viewpoints of the poplulation and sequence. As shown by the thick rightward arrow in Fig. 1.1, Parts II and III studied language corpora to reveal the statistical universals.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The description of pronunciation here follows that appearing in (Harris 1955).

  2. 2.

    Note that there are other, far less common possibilities such as “The United States of Tara,” the name of a television series, and misspelled words (see Sect. 17.3).

  3. 3.

    Section 21.9 explains the relations of various measures of complexity through the notion of generalized entropy , including the successor count and the Shannon entropy.

  4. 4.

    \(g(c_i^{i+n-1}\)) in this chapter is designed to follow Harris’s concept closely. Another way is instead to use \({\mathrm{H}}(X_{i+n}|X_i^{i+n-1} = c_i^{i+n-1})\), whose relation with \({\mathrm{H}}(X_n|X_1^{n-1})\) is clearer. The overall experimental result should not be different.

  5. 5.

    This maximum value was chosen after testing with some larger values in the original work (Tanaka-Ishii and Jin, 2008). Language has long memory, as reported in Part III, but when limited to this specific task of articulation using Harris’s scheme, n = 10 was deemed sufficient to obtain maximum performance.

  6. 6.

    This figure appeared in Tanaka-Ishii and Jin (2008). For clarity, the figure only shows part of the experimental results. In the experiment, an entropy shift was verified starting from every phoneme for a length of 10, as mentiond in the main text.

  7. 7.

    The denotation of phonemes by capital letters here follows the CMU Pronouncing Dictionary (The Speech Group at CMU, 1998).

  8. 8.

    The experiment involved a threshold parameter k such that candidate points at j > i were regarded as borders when \(g(c_i^{j}) - g(c_i^{j-1}) \geq k\). This k value was then varied, and f-scores were acquired for all thresholds. The value k = 1.6 gave the best results.

  9. 9.

    This figure also appeared in Tanaka-Ishii and Jin (2008). As detailed in the original paper, the rightmost point has slightly different precision and recall values from those given in the text. The reason is that the threshold value k was varied as mentioned in the previous footnote, and this graph shows the result for another slightly different k.

References

  • Creutz, Mathias and Lagus, Krista (2002). Unsupervised discovery of morphemes. In Proceedings of the ACL-02 Workshop on Morphological and Phonological Learning, pages 21–30.

    Google Scholar 

  • Frantzi, Katerina T. and Ananiadou, Sophia (1996). Extracting nested collocations. In Proceedings of the 16th International conference on Computational linguistics, pages 41–46.

    Google Scholar 

  • Goldwater, Sharon, Griffiths, Thomas L., and Johnson, Mark (2009). A Bayesian framework for word segmentation: Exploring the effects of context. Cognition, 112, 21–54.

    Article  Google Scholar 

  • Hafer, Margaret A.and Weiss, Stefan F. (1974). Word segmentation by letter successor varieties. Information Storage and Retrieval, 10, 371–385.

    Google Scholar 

  • Harris, Zellig S. (1955). From phoneme to morpheme. Language, 31(2), 190–222.

    Article  Google Scholar 

  • Harris, Zellig S. (1968). Mathematical Structures of Language. Interscience Publishers (John Wiley & Sons).

    Google Scholar 

  • Harris, Zellig S. (1988). Language and Information. Columbia University Press.

    Google Scholar 

  • Huang, Jin-Hu and Powers, David (2003). Chinese word segmentation based on contexual entropy. Proceedings of the 17th Pacific Asia Conference on Language, Information and Computation, pages 152–158.

    Google Scholar 

  • Kempe, André (1999). Experiments in unsupervised entropy-based corpus segmentation. In EACL 1999: CoNLL-99 in Computational Natural Language Learning, pages 7–13.

    Google Scholar 

  • Martinet, André (1960). Eléments de linguistique générale. Armand Colin.

    Google Scholar 

  • Nobesawa, Shiho, Tsutsumi, Junya, Jiang, Sun D., Sano, Tomohisa, Sato, Kengo, and Nakanishi, Masakazu (1996). Segmenting sentences into linky strings using d-bigram statistics. The 16th International Conference on Computational linguistics, pages 586–591.

    Google Scholar 

  • Saffran, Jenny R. (2001). Words in a sea of sounds : the output of infant statistical learning. Cognition, 81, 149–169.

    Article  Google Scholar 

  • Tanaka-Ishii, Kumiko and Ishii, Yuichiro (2007). Multilingual phrase-based concordance generation in real-time. Information Retrieval, 10, 275–295.

    Article  Google Scholar 

  • Tanaka-Ishii, Kumiko and Jin, Zhihui (2008). From phoneme to morpheme: Another verification in English and Chinese using corpora-. Studia Linguistica, 62(2), 224–248.

    Article  Google Scholar 

  • The Speech Group at CMU (1998). The CMU pronouncing dictionary version 0.6. http://www.speech.cs.cmu.edu/cgi-bin/cmudict.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s)

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Tanaka-Ishii, K. (2021). Articulation of Elements. In: Statistical Universals of Language. Mathematics in Mind. Springer, Cham. https://doi.org/10.1007/978-3-030-59377-3_11

Download citation

Publish with us

Policies and ethics