Natural Language & Linguistic Theory

, Volume 30, Issue 3, pp 859–896 | Cite as

Information theoretic approaches to phonological structure: the case of Finnish vowel harmony

  • John Goldsmith
  • Jason Riggle


This paper offers a study of vowel harmony in Finnish as an example of how information theoretic concepts can be employed in order to better understand the nature of phonological structure. The probability assigned by a phonological model to a corpus is used as a means to evaluate how good such a model is, and information theoretic methods allow us to determine the extent to which each addition to our grammar results in a better treatment of the data. We explore a natural implementation of autosegmental phonology within an information theoretic perspective, and find that it is empirically inadequate; that is, it performs more poorly than a simple bigram model. We extend the model by means of a Boltzmann distribution, taking into consideration both local, segment-to-segment, relations and distal, vowel-to-vowel, relations, and find a significant improvement. We conclude with some general observations on how we propose to revisit other phonological questions from this perspective.


Information theory Learning Vowel harmony 



For helpful and insightful discussion, suggestions, and comments we would like to thank: Max Bane, Ryan Bennett, Pierre Collet, Antonio Galves, Sharon Goldwater, Yu Hu, Junko Itô, Mark Johnson, Theano Starvinos, John Sylak, Colin Wilson, and Alan Yu.


  1. Albro, Daniel. 2000. A probabilistic ranking learner for phonotactics. Ms., UCLA. Google Scholar
  2. Altmann, Gabriel, and Werner Lehfeldt. 1980. Einführung in die quantitative phonologie. Vol. 7 of Quantitative linguistics. Bochum: Studienverlag Dr. N. Brockmeyer. Google Scholar
  3. Bailey, Todd M., and Ulrike Hahn. 2001. Determinants of wordlikeness: Phonotactics or lexical neighborhoods? Journal of Memory and Language 44: 568–591. CrossRefGoogle Scholar
  4. Baker, Adam C. 2009. Two Bayesian approaches to finding vowel harmony. Technical report, University of Chicago.
  5. Belevitch, Vitold. 1956. Langage des machines et langage humain. Bruxelles: Office de Publicité. Google Scholar
  6. Billerey-Mosier, Roger. 2003. Exemplar-based phonotactic learning. Paper given at SWOT 8, Tuscon, AZ. Google Scholar
  7. Bod, Rens, Jennifer Hay, and Stefanie Jannedy. 2003. Probabilistic linguistics. Cambridge: MIT Press. Google Scholar
  8. Bradshaw, Mary. 1999. A crosslinguistic study of consonant-tone interaction. PhD diss, Ohio State University. Google Scholar
  9. Cherry, Colin, Morris Halle, and Roman Jakobson. 1953. Toward the logical description of languages in their phonemic aspect. Language 29: 34–46. CrossRefGoogle Scholar
  10. Chomsky, Noam. 1956/1975. The logical structure of linguistic theory. New York: Plenum. Google Scholar
  11. Chomsky, Noam. 1957. Syntactic structures. The Hague: Mouton. Google Scholar
  12. Cole, Jennifer. 1987. Planar phonology and morphology. PhD diss., MIT. Google Scholar
  13. Cole, Jennifer. 2009. Emergent feature structures: Harmony systems in exemplar models of phonology. Language Sciences 31 (2–3): 144–160. doi: 10.1016/j.langsci.2008.12.004. Data and Theory: Papers in Phonology in Celebration of Charles W. Kisseberth. CrossRefGoogle Scholar
  14. Coleman, John, and Janet B. Pierrehumbert. 1997. Stochastic phonological grammars and acceptability. In Computational phonology. Third meeting of the acl special interest group in computational phonology, 49–56. Somerset: Association for Computational Linguistics. Google Scholar
  15. Courant, Richard, and Herbert Robbins. 1941. What is mathematics? New York: Oxford University Press. Google Scholar
  16. Cover, Thomas M., and Joy A. Thomas. 1991. Elements of information theory. New York: Wiley. CrossRefGoogle Scholar
  17. Dainora, Audra. 2001. An empirically based probabilistic model of intonation in American English. PhD diss., University of Chicago. Google Scholar
  18. Downing, Laura. 2008. Where does depression come from in Nguni Bantu languages? Paper presented at the Old World Conference in Phonology 5, Toulouse, France. Google Scholar
  19. Durbin, Richard, Sean R. Eddy, Anders Krogh, and Graeme Mitchison. 1999. Biological sequence analysis: Probabilistic models of proteins and nucleic acids. Cambridge: Cambridge University Press. Google Scholar
  20. Eisner, Jason. 2002. Parameter estimation for probabilistic finite-state transducers. In Proceedings of the 40th annual meeting of the association for computational linguistics (acl), Philadelphia, 1–8. Google Scholar
  21. Frisch, Stefan, N. R. Large, and David B. Pisoni. 2000. Perception of wordlikeness: Effects of segment probability and length on the processing of nonwords. Journal of Memory and Language 42: 481–496. CrossRefGoogle Scholar
  22. Geman, Stuart, and Mark Johnson. 2001. Probability and statistics in computational linguistics, a brief review. Ms:
  23. Goldsmith, John. 1985. Vowel harmony in khalkha mongolian, yaka, finnish and hungarian. Phonology 2: 253–275. CrossRefGoogle Scholar
  24. Goldsmith, John A. 1990. Autosegmental and metrical phonology. Oxford: Blackwell. Google Scholar
  25. Goldsmith, John A. 1991. Phonology as an intelligent system. In Bridges between psychology and linguistics: A swarthmore festschift for Lila Gleitman, eds. Donna Jo Napoli and Judy Kegl. Mahwah: Lawrence Erlbaum. Google Scholar
  26. Goldsmith, John A. 1993. Harmonic phonology. In The last phonological rule, ed. John Goldsmith, 221–269. Chicago: University of Chicago Press. Google Scholar
  27. Goldsmith, John A. 2001. The unsupervised learning of natural language morphology. Computational Linguistics 27 (2): 153–198. CrossRefGoogle Scholar
  28. Goldsmith, John A. 2002. Probabilistic models of grammar: Phonology as information minimization. Phonological Studies 5: 21–46. Google Scholar
  29. Goldsmith, John A. 2007a. Analogy in morphology: Only a beginning. In Proceedings from a conference on analogy, eds. James Blevins and Juliette Blevins. Google Scholar
  30. Goldsmith, John A. 2007b. Towards a new empiricism. In Recherches linguistiques à vincennes, ed. Joaquim Brandao de Carvalho, 9–36. Google Scholar
  31. Goldsmith, John A., and Aris Xanthos. 2006. Discovering phonological categories. Google Scholar
  32. Goldwater, Sharon, and Mark Johnson. 2003. Learning OT constraint rankings using a maximum entropy model. In Proceedings of the workshop on variation within optimality theory, eds. Jennifer Spenader, Anders Eriksson, and Östen Dahl, 111–120. Stockholm: Stockholm University. Google Scholar
  33. Goldwater, Sharon, and Mark Johnson. 2004. Priors in Bayesian learning of phonological rules. In Proceedings of the seventh workshop of the ACL special interest group in computational phonology, 35–42. Google Scholar
  34. Good, Irving John. 1980. Some history of the hierarchical Bayesian methodology. Trabajos de Estadística y de Investigación Operativa 31: 489–519. CrossRefGoogle Scholar
  35. Halle, Morris. 1958. Review of Herdan, language as choice and chance. Kratylos 3: 20–28. Google Scholar
  36. Hansson, Gunnar. 2001. Theoretical and typological issues in consonant harmony. PhD diss., University of California, Berkeley. Google Scholar
  37. Hayes, Bruce, and Colin Wilson. 2008. A maximum entropy model of phonotactics and phonotactic learning. Linguistic Inquiry 39 (3): 379–440. CrossRefGoogle Scholar
  38. Herdan, Gustav. 1956. Language as choice and chance. Groningen: P. Noordhoff. Google Scholar
  39. Herdan, Gustav. 1960. Type-token mathematics. ’s-Gravenhage: Mouton. Google Scholar
  40. Herdan, Gustav. 1964. Quantitative linguistics. London: Butterworths. Google Scholar
  41. Hopcroft, John E., and Jeffrey D. Ullman. 1979. Introduction to automata theory, languages, and computation. Reading: Addison-Wesley. Google Scholar
  42. Huang, Xuedon, and Alex Acero. 2001. Spoken language processing: A guide to theory, algorithm and system development. New York: Prentice Hall. Google Scholar
  43. Jaynes, Edwin T. 2003. Probability theory: The logic of science. Cambridge: Cambridge University Press. CrossRefGoogle Scholar
  44. Jäger, Gerhard. 2004. Maximum entropy models and stochastic optimality theory. Rutgers Optimality Archive: ROA 625. Google Scholar
  45. Kaufman, Leonard, and Peter J. Rousseeuw. 2005. Finding groups in data: An introduction to cluster analysis. New York: Wiley-Interscience. Google Scholar
  46. Kiparsky, Paul. 1973. Phonological representations. In Three dimensions of linguistic theory, ed. Osamu Fujimura, 1–136. Tokyo: TEC. Google Scholar
  47. Kisseberth, Charles. 1984. Digo tonology. In Autosegmental studies in bantu tone, eds. G.N. Clements and John Goldsmith, 105–182. Dordrecht: Foris. CrossRefGoogle Scholar
  48. Kucera, Henry. 1982. Markedness and frequency: A computational analysis. In Coling 1982, ed. J. Hoercky, 167–173. Amsterdam: North Holland. Google Scholar
  49. Lafferty, John D., Andrew McCallum, and Fernando C. N. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data, 282–289. Google Scholar
  50. Laughren, Mary. 1984. Tone in zulu nouns. In Autosegmental studies in bantu tone, eds. G.N. Clements and John Goldsmith, 183–234. Dordrecht: Foris. CrossRefGoogle Scholar
  51. Lewin, Jonathan. 2003. An interactive introduction to mathematical analysis. Cambridge: Cambridge University Press. Google Scholar
  52. Li, Ming, and Paul M. B. Vitányi. 1997. An introduction to Kolmogorov complexity and its applications, 2nd ed. New York: Springer. Google Scholar
  53. Lin, Ying. 2005. Learning features and segments from waveforms: A statistical model of early phonological acquisition. PhD diss., University of California Los Angeles. Google Scholar
  54. Manning, Christopher D., and Hinrich Schütze. 2000. Foundations of natural language processing. Cambridge: MIT Press. Google Scholar
  55. McCallum, Andrew. 2003. Efficiently inducing features of conditional random fields. In Nineteenth conference on uncertainty in artificial intelligence (uai03), 403–410. Google Scholar
  56. Padgett, Jaye. 2011. Consonant-vowel place feature interactions. In The Blackwell companion to phonology, eds. Elizabeth Hume, Marc van Oostendorp, Colin J. Ewen, and Keren Rice. Oxford: Blackwell. Google Scholar
  57. Pater Joe, Christopher Potts, and Rajesh Bahtt. 2007. Harmonic grammar with linear programming. ROA: 984-0708. Google Scholar
  58. Pereira, Fernando. 2000. Formal grammar and information theory: Together again? Philosophical Transactions of the Royal Society 358: 1239–1253. CrossRefGoogle Scholar
  59. Pierrehumbert, Janet B. 1994. Syllable structure and word structure: A study of triconsonantal clusters in English. In Phonological structure and phonetic form: Papers in laboratory phonology III, ed. Patricia A. Keating, 168–188. Cambridge: Cambridge University Press. CrossRefGoogle Scholar
  60. Pierrehumbert, Janet B. 2001. Stochastic phonology. GLOT 5 (6): 195–207. Google Scholar
  61. Pierrehumbert, Janet B. 2003. Probabilistic phonology: Discrimination and robustness. In Probability theory in linguistics, eds. Rens Bod, Jennifer Hay, and Stefanie Jannedy. Cambridge: MIT Press. Google Scholar
  62. Prince, Alan, and Paul Smolensky. 1993/2004. Optimality theory: Constraint interaction in generative grammar. Cambridge: MIT Press. Google Scholar
  63. Ringen, Catherine O. 1975/1988. Vowel harmony: Theoretical implications. Garland Press, NY. Indiana University PhD dissertation, 1975. Published in 1988 by Garland Press, NY. Google Scholar
  64. Ringen, Catherine O., and Orvokki Heinämäki. 1997. Variation in finnish vowel harmony: An OT account. Natural Language & Linguistic Theory 17: 303–337. CrossRefGoogle Scholar
  65. Rissanen, Jorma. 1989. Stochastic complexity in statistical inquiry. Vol. 15 of Series in computer science. Singapore: World Scientific. Google Scholar
  66. Rose, Sharon, and Rachel Walker. 2004. A typology of consonant agreement as correspondence. Language 80: 457–531. CrossRefGoogle Scholar
  67. Rose, Sharon, and Rachel Walker. 2011. Harmony systems. In The handbook of phonological theory, 2nd ed., eds. John Goldsmith, Jason Riggle, and Alan Yu. New York: Wiley-Blackwell. Google Scholar
  68. Seidenberg, Mark. 1997. Language acquisition and use: Learning and applying probabilistic constraints. Science 275: 1599–1604. CrossRefGoogle Scholar
  69. Shannon, Claude. 1951. Prediction and entropy of printed English. Bell Systems Technical Journal 30: 50–64. Google Scholar
  70. Shannon, Claude, and Warren Weaver. 1949. The mathematical theory of communication. Urbana: University of Illinois Press. Google Scholar
  71. Solomonoff, Ray. 1959a. A new method for discovering the grammars of phrase structure languages. In Proceedings of the international conference on information processing, 256–290. Paris: UNESCO. Google Scholar
  72. Solomonoff, Ray. 1959b. A progress report on a machine to learn to translate languages and retrieve information. In Advances in documentation and library sciences, 3, 941–953. New York: Interscience. Google Scholar
  73. Solomonoff, Ray. 1964. A formal theory of inductive inference. Information and Control 7: 224–254. CrossRefGoogle Scholar
  74. Solomonoff, Ray. 1997. The discovery of algorithmic probability. Journal of Computer and System Sciences 55 (1): 73–88. CrossRefGoogle Scholar
  75. Trubetzkoy, Nicolas Sergueevitch. 1939/1968. Grundzùge der phonologie (translated in French by Jean Cantineau: Principes de phonologie). Paris: Klincksieck. Google Scholar
  76. Walker, Rachel. 2000. Long-distance consonantal identity effects. In Proceedings of the west coast conference on formal linguistics 19, 532–545. Google Scholar
  77. Walker, Rachel. 2005. Weak triggers in vowel harmony. Natural Language & Linguistic Theory 23 (4): 917–989. doi: 10.1007/s11049-004-4562-z. CrossRefGoogle Scholar
  78. Wilson, Colin. 2006. Learning phonology with substantive bias: An experimental and computational study of velar palatalization. Cognitive Science 30: 945–982. CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2012

Authors and Affiliations

  1. 1.University of ChicagoChicagoUSA

Personalised recommendations