, Volume 27, Issue 4, pp 431–458 | Cite as

Abstractive morphological learning with a recurrent neural network

  • Robert MaloufEmail author


In traditional word-and-paradigm models of morphology, an inflectional system is represented via a set of exemplary paradigms. Novel wordforms are produced by analogy with previously encountered forms. This paper describes a recurrent neural network which can use this strategy to learn the paradigms of a morphologically complex language based on incomplete and randomized input. Results are given which show good performance for a range of typologically diverse languages.


Morphology Connectionist models Word and paradigm Analogy 


  1. Ackerman, F., & Malouf, R. (2013). Morphological organization: the low conditional entropy conjecture. Language, 89, 429–464. CrossRefGoogle Scholar
  2. Ackerman, F., & Malouf, R. (2016). Implicative relations in word-based morphological systems. In A. Hippisley & G. Stump (Eds.), Cambridge handbook of morphology (pp. 272–296). Cambridge: Cambridge University Press. Google Scholar
  3. Ackerman, F., Blevins, J. P., & Malouf, R. (2009). Parts and wholes: patterns of relatedness in complex morphological systems and why they matter. In J. P. Blevins & J. Blevins (Eds.), Analogy in grammar: form and acquisition (pp. 54–82). Oxford: Oxford University Press. CrossRefGoogle Scholar
  4. Ahlberg, M., Forsberg, M., & Hulden, M. (2014). Semi-supervised learning of morphological paradigms and lexicons. In Proceedings of the 14th conference of the European chapter of the association for computational linguistics (pp. 569–578). Google Scholar
  5. Ahlberg, M., Forsberg, M., & Hulden, M. (2015). Paradigm classification in supervised learning of morphology. In Human language technologies: the 2015 annual conference of the North American chapter of the ACL (pp. 1024–1029). Google Scholar
  6. Albright, A., & Hayes, B. (2003). Rules vs. analogy in English past tenses: a computational/experimental study. Cognition, 90(2), 119–161. CrossRefGoogle Scholar
  7. Anderson, A. E., & Merrifield, W. R. (2000). Chinantec project of the language of the scattered peoples of Ancient San Pedro Tlatepuzco, Oaxaca, Mexico.
  8. Aronoff, M. (2012). Morphological stems: what William of Ockham really said. Word Structure, 5(1), 28–51. CrossRefGoogle Scholar
  9. Baerman, M. (2016). Seri verb classes: morphosyntactic motivation and morphological autonomy. Language, 92(4), 792–823. CrossRefGoogle Scholar
  10. Baerman, M., & Palancar, E. L. (2016). The organization of Chinantec tone paradigms. In S. Augendre, G. Couasnon-Torlois, D. Lebon, C. Michard, G. Boyé, & F. Montermini (Eds.), Proceedings of the 8th Décembrettes (pp. 46–59). Toulouse: CLLE-ERSS. Google Scholar
  11. Bengio, Y., Frasconi, P., & Simard, P. (1993). The problem of learning long-term dependencies in recurrent networks. In IEEE international conference on neural networks (pp. 1183–1188). CrossRefGoogle Scholar
  12. Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3, 1137–1155. Google Scholar
  13. Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimzation. Journal of Machine Learning Research, 13, 281–305. Google Scholar
  14. Blevins, J. P. (2006). Word-based morphology. Journal of Linguistics, 42(3), 531–573. CrossRefGoogle Scholar
  15. Blevins, J. P. (2016). Word and paradigm morphology. Oxford: Oxford University Press. CrossRefGoogle Scholar
  16. Blevins, J. P., Ackerman, F., Malouf, R., & Ramscar, M. (2016). Morphology as an adaptive discriminative system. In H. Harley & D. Siddiqi (Eds.), Morphological metatheory (pp. 269–300). Amsterdam: Benjamins. Google Scholar
  17. Blevins, J. P., Milin, P., & Ramscar, M. (2017). The Zipfian paradigm cell filling problem. In F. Kiefer, J. P. Blevins, & H. Bartos (Eds.), Morphological paradigms and functions, Leiden: Brill. Google Scholar
  18. Bonami, O. (2012). Discovering implicative morphology. In Les Décembrettes 8: colloque international de morphologie, Bordeaux. Google Scholar
  19. Bonami, O. (2013). Towards a robust assessment of implicative relations in inflectional systems. In Workshop on computational approaches to morphological complexity, Paris. Google Scholar
  20. Bonami, O., & Beniamine, S. (2016). Joint predictiveness in inflectional paradigms. Word Structure, 9(2), 156–182. CrossRefGoogle Scholar
  21. Bonami, O., & Boyé, G. (2002). Suppletion and dependency in inflectional morphology. In F. V. Eynde, L. Hellan, & D. Beermann (Eds.), The proceedings of the 8th international conference on head-driven phrase structure grammar (pp. 51–70). Stanford: CSLI Publications. Google Scholar
  22. Bonami, O., & Boyé, G. (2014). De formes en thèmes. In F. Villoing, S. Leroy, & S. David (Eds.), Foisonnements morphologiques. Etudes en hommage à Françoise Kerleroux (pp. 17–45). Paris: Presses Universitaires de Paris Ouest. Google Scholar
  23. Bonami, O., & Luís, A. R. (2014). Sur la morphologie implicative dans la conjugaison du portugais : une étude quantitative. In J. L. Léonard (Ed.), Mémoires de la Société de Linguistique de Paris: Vol. 22. Morphologie flexionnelle et dialectologie romane. Typologie(s) et modélisation(s) (pp. 111–151). Leuven: Peeters. Google Scholar
  24. Bonami, O., Caron, G., & Plancq, C. (2013). Flexique: an inflectional lexicon for spoken French. Google Scholar
  25. Brown, D., & Hippisley, A. (2012). Network morphology. Cambridge: Cambridge University Press. CrossRefGoogle Scholar
  26. Brown, D., Corbett, G., Fraser, N., Hippisley, A., & Timberlake, A. (1996). Russian noun stress and network morphology. Linguistics, 34, 53–107. CrossRefGoogle Scholar
  27. Carnie, A. (2008). Irish nouns: a reference guide. Oxford: Oxford University Press. Google Scholar
  28. Chan, E. (2008). Structures and distributions in morphology learning. PhD thesis, University of Pennsylvania. Google Scholar
  29. Chollet, F. (2015). Keras.
  30. Corbett, G. G., & Fraser, N. M. (1993). Network morphology: a DATR account of Russian nominal inflection. Journal of Linguistics, 29, 113–142. CrossRefGoogle Scholar
  31. Cotterell, R., Kirov, C., Sylak-Glassman, J., Yarowsky, D., Eisner, J., & Hulden, M. (2016). The SIGMORPHON 2016 shared task—morphological reinflection. In Proceedings of the 2016 meeting of SIGMORPHON. Berlin: Association for Computational Linguistics. Google Scholar
  32. Courbariaux, M., Bengio, Y., & David, J. P. (2015). BinaryConnect: training deep neural networks with binary weights during propagations. In Proceedings of the 29th annual conference on neural information processing systems (NIPS). Google Scholar
  33. CSC (2004). Suomen sanomalehtikielen taajuussanasto [Frequency dictionary of Finnish newspaper language].
  34. Dreyer, M., & Eisner, J. (2011). Discovering morphological paradigms from plain text using a Dirichlet process mixture model. In Proceedings of the 2011 conference on empirical methods in natural language processing (pp. 616–627). Edinburgh: Association for Computational Linguistics. Google Scholar
  35. Durrett, G., & DeNero, J. (2013). Supervised learning of complete morphological paradigms. In HLT-NAACL (pp. 1185–1195). Google Scholar
  36. Elman, J. L. (1989). Representation and structure in connectionist models (CRL Technical Report 8903). Center for Research in Learning. Google Scholar
  37. Elman, J. L. (1990). Finding structure in time. Cognititive Science, 14, 179–211. CrossRefGoogle Scholar
  38. Elman, J. L. (1995). Language as a dynamical system. In R. F. Port & T. van Gelder (Eds.), Mind as motion: explorations in the dynamics of cognition (pp. 195–225). Cambridge: MIT Press. Google Scholar
  39. Fagyal, Z., Kibbee, D., & Jenkins, F. (2006). French: a linguistic introduction. Cambridge: Cambridge University Press. CrossRefGoogle Scholar
  40. Féry, C. (2003). Markedness, faithfulness, vowel quality and syllable structure in French. Journal of French Language Studies, 13(2), 247–280. CrossRefGoogle Scholar
  41. Figueroa, R. L., Zeng-Treitler, Q., Kandula, S., & Ngo, L. H. (2012). Predicting sample size required for classification performance. BMC Medical Informatics and Decision Making, 12(1), 1–10. CrossRefGoogle Scholar
  42. Forcada, M. L., Ginestí-Rosell, M., Nordfalk, J., O’Regan, J., Ortiz-Rojas, S., Pérez-Ortiz, J. A., Sánchez-Martínez, F., Ramírez-Sánchez, G., & Tyers, F. M. (2011). Apertium: a free/open-source platform for rule-based machine translation. Machine Translation, 25, 127–144. CrossRefGoogle Scholar
  43. Goldsmith, J. A. (2006). An algorithm for the unsupervised learning of morphology. Natural Language Engineering, 12, 353–371. CrossRefGoogle Scholar
  44. Goldsmith, J., & O’Brien, J. (2006). Learning inflectional classes. Language Learning and Development, 2, 219–250. CrossRefGoogle Scholar
  45. Graves, A. (2014). Generating sequences with recurrent neural networks. arXiv:1308.0850v5 [cs.NE].
  46. Hinton, G. (2012). Lecture 6e: rmsprop: Divide the gradient by a running average of its recent magnitude.
  47. Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580v1.
  48. Hoberman, R. D., & Aronoff, M. (2003). The verbal morphology of Maltese: from Semitic to Romance. In J. Shimron (Ed.), Language processing and acquisition in languages of Semitic, root-based, morphology (pp. 61–78). Amsterdam: Benjamins. CrossRefGoogle Scholar
  49. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9, 1735–1780. CrossRefGoogle Scholar
  50. Hochreiter, S., Bengio, Y., Frasconi, P., & Schmidhuber, J. (2001). Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In J. F. Kolen & S. C. Kremer (Eds.), A field guide to dynamical recurrent neural networks (pp. 237–244). New York: IEEE Press. Google Scholar
  51. Hockett, C. (1967). The Yawelmani basic verb. Language, 43(1), 208–222. CrossRefGoogle Scholar
  52. Jacques, G., Lahaussois, A., Michailovsky, B., & Rai, D. B. (2012). An overview of Khaling verbal morphology. Language and Linguistics, 13(6), 1095–1170. Google Scholar
  53. Jordan, M. I. (1989). Serial order: a parallel distributed processing approach. In J. L. Elman & D. E. Rumelhart (Eds.), Advances in connectionist theory (pp. 214–249). New York: Erlbaum. Google Scholar
  54. Jozefowicz, R., Zaremba, W., & Sutskever, I. (2015). An empirical exploration of recurrent network architectures. In Proceedings of the 32nd international conference on machine learning (ICML 2015) (pp. 2342–2350). Google Scholar
  55. Karpathy, A., Johnson, J., & Li, F. (2015). Visualizing and understanding recurrent networks. arXiv:1506.02078v2 [cs.LG].
  56. Kohonen, O., Virpioja, S., & Lagus, K. (2010). Semi-supervised learning of concatenative morphology. In Proceedings of the 11th meeting of the ACL special interest group on computational morphology and phonology (pp. 78–86). Google Scholar
  57. Křen, M., Bartoň, T., Cvrček, V., Hnátková, M., Jelínek, T., Kocek, J., Novotná, R., Petkevič, V., Procházka, P., Schmiedtová, V., & Skoumalová, H. (2010). Syn2010: žánrově vyvážený korpus psané češtiny. Tech. rep., Ústav Českého národního korpusu, FF UK, Prague. Google Scholar
  58. Lee, J. L., & Goldsmith, J. A. (2016). Linguistica 5: unsupervised learning of linguistic structure. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics. Google Scholar
  59. Li, J., Chen, X., Hovy, E., & Jurafsky, D. (2016). Visualizing and understanding neural models in NLP. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies. Google Scholar
  60. van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605. Google Scholar
  61. Marzi, C., Ferro, M., Cardillo, F. A., & Pirrelli, V. (2016). Effects of frequency and regularity in an integrative model of word storage and processing. Italian Journal of Linguistics, 28(1), 79–114. Google Scholar
  62. Matthews, P. H. (1991). Morphology. Cambridge: Cambridge Univesity Press. CrossRefGoogle Scholar
  63. Merrifield, W. R. (1968). Palantla Chinantec grammar. Mexico: Museo Nacional de Antropoloía. Google Scholar
  64. Merrifield, W. R., & Anderson, A. E. (2006). Diccionario Chinanteco de la diáspora del pueblo antiguo de San Pedro Tlatepuzco, Oaxaca. Coyoacán, D.F., Mexico: Instituto Lingüístico de Verano. Google Scholar
  65. Miestamo, M., Sinnemäki, K., & Karlsson, F. (Eds.) (2008). Language complexity: typology, contact, change. Amsterdam: Benjamins. Google Scholar
  66. Mikolov, T., & Zweig, G. (2012). Context dependent recurrent neural network language model. In Proceedings of speech language technology (pp. 234–239). Google Scholar
  67. Mikolov, T., Karafiát, M., Burget, L., Černocký, J., & Khudanpur, S. (2010). Recurrent neural network based language model. In Proceedings of interspeech (pp. 1045–1048). Google Scholar
  68. Mikolov, T., Sutskever, I., Deoras, A., Le, H. S., Kombrink, S., & Černocký, J. (2012). Subword language modeling with neural networks.
  69. Mikolov, T. M., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Proceedings of NIPS. Google Scholar
  70. Ní Chasaide, A., Wogan, J., Ó Raghallaigh, B., Ní Bhriain, Á., Zoerner, E., Berthelsen, H., & Gobl, C. (2006). Speech technology for minority languages: the case of Irish (Gaelic). In Proceedings of the 9th international conference on spoken language processing, INTERSPEECH 2006 (pp. 181–184). Google Scholar
  71. Nicolai, G., Cherry, C., & Kondark, G. (2015). Inflection generation as discriminative string transduction. In Human language technologies: the 2015 annual conference of the North American chapter of the ACL. Google Scholar
  72. Pascanu, R., Mikolov, T., & Bengio, Y. (2013). On the difficulty of train recurrent neural networks. In Proceedings of the 30th international conference on machine learning (ICML 2013) (pp. 1310–1318). Google Scholar
  73. Paul, H. (1891). Principles of the history of language, translated from 2nd edition into English by H.A. Strong edn. London: Longmans, Green. Google Scholar
  74. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830. Google Scholar
  75. Pihel, K., & Pikamäe, A. (1999). Soome-eesti sõnaraamat. Valgus. Google Scholar
  76. Pinker, S., & Prince, A. (1988). On language and connectionism: analysis of a parallel distributed processing model of language acquisition. Cognition, 28, 73–193. CrossRefGoogle Scholar
  77. Rumelhart, D. E., & McClelland, J. L. (1986). Parallel distributed processing (Vol. 1). Cambridge: MIT Press. Google Scholar
  78. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1985). Learning internal representations by error propagation. Tech. rep., DTIC Document. Google Scholar
  79. Salminen, T. (1997). Tundra Nenets Inflection. Mémoires de la Société Finno-Ougrienne 227, Helsinki. Google Scholar
  80. Sampson, G. B., Gil, D., & Trudgill, P. (Eds.) (2010). Language complexity as an evolving variable. Oxford: Oxford University Press. Google Scholar
  81. Servan-Schreiber, D., Cleeremans, A., & McClelland, J. L. (1989). Learning sequential structure in simple recurrent networks. In D. S. Touretzky (Ed.), Advances in neural information processing systems 1 (pp. 643–652). San Francisco: Morgan Kaufmann Google Scholar
  82. Silverman, D. (2006). Chinantec: phonology. In Concise encyclopedia of langauges of the world (pp. 211–213). Oxford: Elsevier. Google Scholar
  83. Sims, A. D., & Parker, J. (2016). How inflection class systems work: on the informativity of implicative structure. Word Structure, 9(2), 215–239. CrossRefGoogle Scholar
  84. Spencer, A. J. (2012). Identifying stems. Word Structure, 5, 88–108. CrossRefGoogle Scholar
  85. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1), 1929–1958. Google Scholar
  86. Stump, G. (2001). Inflectional morphology: a theory of paradigm structure. Cambridge: Cambridge University Press. CrossRefGoogle Scholar
  87. Stump, G., & Finkel, R. (2013). Morphological typology: from word to paradigm. Cambridge: Cambridge University Press. CrossRefGoogle Scholar
  88. Sundermeyer, M., Schlüter, R., & Ney, H. (2012). LSTM neural networks for language modeling. In Proc. of interspeech. Google Scholar
  89. Sundermeyer, M., Schlüter, R., & Ney, H. (2015). From feedforward to recurrent LSTM neural networks for language modeling. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23, 517–529. CrossRefGoogle Scholar
  90. Sutskever, I., Martens, J., & Hinton, G. (2011). Generating text with recurrent neural networks. In International conference on machine learning (ICML 2011). Google Scholar
  91. Testolin, A., Stoianov, I., Sperduti, A., & Zorzi, M. (2016). Learning orthographic structure with sequential generative neural networks. Cognitive Science, 40(3), 579–606. CrossRefGoogle Scholar
  92. Theano Development Team (2016). Theano: A Python framework for fast computation of mathematical expressions. arXiv:1605.02688v1 [cs.SC].
  93. Thymé, A. (1993). Connectionist approach to nominal inflection: Paradigm patterning and analogy in Finnish. PhD thesis, UC San Diego. Google Scholar
  94. Thymé, A., Ackerman, F., & Elman, J. L. (1994). Finnish nominal inflection: paradigmatic patterns and token analogy. In S. D. Lima, R. Corrigan, & G. K. Iverson (Eds.), The reality of linguistic rules (pp. 445–466). Amsterdam: Benjamins. CrossRefGoogle Scholar
  95. Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015). Show and tell: a neural image caption generator. In Computer vision and pattern recognition. Google Scholar
  96. Walther, G., Jacques, G., & Sagot, B. (2013). Uncovering the inner architecture of Khaling verbal morphology. In 3rd workshop on Sino-Tibetan languages of Sichuan. Google Scholar
  97. Wattenberg, M., Viégas, F., & Johnson, I. (2016). How to use t-SNE effectively. Distill doi: 10.23915/distill.00002.
  98. Werbos, P. J. (1990). Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10), 1550–1560. CrossRefGoogle Scholar
  99. Wurzel, W. U. (1989). Inflectional morphology and naturalness. Dordrecht: Springer. Google Scholar
  100. Yang, C. (2016). The price of linguistic productivity. Cambridge: MIT Press. Google Scholar
  101. Zaremba, W., Sutskever, I., & Vinyals, O. (2014). Recurrent neural network regularization. arXiv:1409.2329v5.

Copyright information

© Springer Science+Business Media B.V. 2017

Authors and Affiliations

  1. 1.Department of Linguistics and Asian/Middle Eastern LanguagesSan Diego State UniversitySan DiegoUSA

Personalised recommendations