Abstract
In traditional word-and-paradigm models of morphology, an inflectional system is represented via a set of exemplary paradigms. Novel wordforms are produced by analogy with previously encountered forms. This paper describes a recurrent neural network which can use this strategy to learn the paradigms of a morphologically complex language based on incomplete and randomized input. Results are given which show good performance for a range of typologically diverse languages.
Similar content being viewed by others
Notes
A localist representation is one in which each item is uniquely identified with a single node in the network. This is in contrast to a distributed representation, in which ensembles of nodes are used to represent multipe items.
Stump and Finkel (2013) describe French paradigms as having 49 cells. Flexique records a singular/plural distinction on past participles which adds two additional cells to the paradigm.
All source code and data for the experiments described in this paper are available at http://github.com/rmalouf/abstractive. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research.
https://github.com/sigmorphon/conll2017, accessed 2 June 2017.
For an accessible description of the intuitions behind the LSTM, see, e.g., http://colah.github.io/posts/2015-08-Understanding-LSTMs/.
References
Ackerman, F., & Malouf, R. (2013). Morphological organization: the low conditional entropy conjecture. Language, 89, 429–464.
Ackerman, F., & Malouf, R. (2016). Implicative relations in word-based morphological systems. In A. Hippisley & G. Stump (Eds.), Cambridge handbook of morphology (pp. 272–296). Cambridge: Cambridge University Press.
Ackerman, F., Blevins, J. P., & Malouf, R. (2009). Parts and wholes: patterns of relatedness in complex morphological systems and why they matter. In J. P. Blevins & J. Blevins (Eds.), Analogy in grammar: form and acquisition (pp. 54–82). Oxford: Oxford University Press.
Ahlberg, M., Forsberg, M., & Hulden, M. (2014). Semi-supervised learning of morphological paradigms and lexicons. In Proceedings of the 14th conference of the European chapter of the association for computational linguistics (pp. 569–578).
Ahlberg, M., Forsberg, M., & Hulden, M. (2015). Paradigm classification in supervised learning of morphology. In Human language technologies: the 2015 annual conference of the North American chapter of the ACL (pp. 1024–1029).
Albright, A., & Hayes, B. (2003). Rules vs. analogy in English past tenses: a computational/experimental study. Cognition, 90(2), 119–161.
Anderson, A. E., & Merrifield, W. R. (2000). Chinantec project of the language of the scattered peoples of Ancient San Pedro Tlatepuzco, Oaxaca, Mexico. http://www.sil.org/resources/archives/51375.
Aronoff, M. (2012). Morphological stems: what William of Ockham really said. Word Structure, 5(1), 28–51.
Baerman, M. (2016). Seri verb classes: morphosyntactic motivation and morphological autonomy. Language, 92(4), 792–823.
Baerman, M., & Palancar, E. L. (2016). The organization of Chinantec tone paradigms. In S. Augendre, G. Couasnon-Torlois, D. Lebon, C. Michard, G. Boyé, & F. Montermini (Eds.), Proceedings of the 8th Décembrettes (pp. 46–59). Toulouse: CLLE-ERSS.
Bengio, Y., Frasconi, P., & Simard, P. (1993). The problem of learning long-term dependencies in recurrent networks. In IEEE international conference on neural networks (pp. 1183–1188).
Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3, 1137–1155.
Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimzation. Journal of Machine Learning Research, 13, 281–305.
Blevins, J. P. (2006). Word-based morphology. Journal of Linguistics, 42(3), 531–573.
Blevins, J. P. (2016). Word and paradigm morphology. Oxford: Oxford University Press.
Blevins, J. P., Ackerman, F., Malouf, R., & Ramscar, M. (2016). Morphology as an adaptive discriminative system. In H. Harley & D. Siddiqi (Eds.), Morphological metatheory (pp. 269–300). Amsterdam: Benjamins.
Blevins, J. P., Milin, P., & Ramscar, M. (2017). The Zipfian paradigm cell filling problem. In F. Kiefer, J. P. Blevins, & H. Bartos (Eds.), Morphological paradigms and functions, Leiden: Brill.
Bonami, O. (2012). Discovering implicative morphology. In Les Décembrettes 8: colloque international de morphologie, Bordeaux. http://www.llf.cnrs.fr/sites/llf.cnrs.fr/files/biblio//Bordeaux-dec2012.pdf.
Bonami, O. (2013). Towards a robust assessment of implicative relations in inflectional systems. In Workshop on computational approaches to morphological complexity, Paris. http://www.llf.cnrs.fr/Gens/Bonami/presentations/Bonami-SMG-Paris-2013.pdf.
Bonami, O., & Beniamine, S. (2016). Joint predictiveness in inflectional paradigms. Word Structure, 9(2), 156–182.
Bonami, O., & Boyé, G. (2002). Suppletion and dependency in inflectional morphology. In F. V. Eynde, L. Hellan, & D. Beermann (Eds.), The proceedings of the 8th international conference on head-driven phrase structure grammar (pp. 51–70). Stanford: CSLI Publications.
Bonami, O., & Boyé, G. (2014). De formes en thèmes. In F. Villoing, S. Leroy, & S. David (Eds.), Foisonnements morphologiques. Etudes en hommage à Françoise Kerleroux (pp. 17–45). Paris: Presses Universitaires de Paris Ouest.
Bonami, O., & Luís, A. R. (2014). Sur la morphologie implicative dans la conjugaison du portugais : une étude quantitative. In J. L. Léonard (Ed.), Mémoires de la Société de Linguistique de Paris: Vol. 22. Morphologie flexionnelle et dialectologie romane. Typologie(s) et modélisation(s) (pp. 111–151). Leuven: Peeters.
Bonami, O., Caron, G., & Plancq, C. (2013). Flexique: an inflectional lexicon for spoken French.
Brown, D., & Hippisley, A. (2012). Network morphology. Cambridge: Cambridge University Press.
Brown, D., Corbett, G., Fraser, N., Hippisley, A., & Timberlake, A. (1996). Russian noun stress and network morphology. Linguistics, 34, 53–107.
Carnie, A. (2008). Irish nouns: a reference guide. Oxford: Oxford University Press.
Chan, E. (2008). Structures and distributions in morphology learning. PhD thesis, University of Pennsylvania.
Chollet, F. (2015). Keras. https://github.com/fchollet/keras.
Corbett, G. G., & Fraser, N. M. (1993). Network morphology: a DATR account of Russian nominal inflection. Journal of Linguistics, 29, 113–142.
Cotterell, R., Kirov, C., Sylak-Glassman, J., Yarowsky, D., Eisner, J., & Hulden, M. (2016). The SIGMORPHON 2016 shared task—morphological reinflection. In Proceedings of the 2016 meeting of SIGMORPHON. Berlin: Association for Computational Linguistics.
Courbariaux, M., Bengio, Y., & David, J. P. (2015). BinaryConnect: training deep neural networks with binary weights during propagations. In Proceedings of the 29th annual conference on neural information processing systems (NIPS).
CSC (2004). Suomen sanomalehtikielen taajuussanasto [Frequency dictionary of Finnish newspaper language]. https://korp.csc.fi/suomen-sanomalehtikielen-taajuussanasto-B9996.txt.
Dreyer, M., & Eisner, J. (2011). Discovering morphological paradigms from plain text using a Dirichlet process mixture model. In Proceedings of the 2011 conference on empirical methods in natural language processing (pp. 616–627). Edinburgh: Association for Computational Linguistics.
Durrett, G., & DeNero, J. (2013). Supervised learning of complete morphological paradigms. In HLT-NAACL (pp. 1185–1195).
Elman, J. L. (1989). Representation and structure in connectionist models (CRL Technical Report 8903). Center for Research in Learning.
Elman, J. L. (1990). Finding structure in time. Cognititive Science, 14, 179–211.
Elman, J. L. (1995). Language as a dynamical system. In R. F. Port & T. van Gelder (Eds.), Mind as motion: explorations in the dynamics of cognition (pp. 195–225). Cambridge: MIT Press.
Fagyal, Z., Kibbee, D., & Jenkins, F. (2006). French: a linguistic introduction. Cambridge: Cambridge University Press.
Féry, C. (2003). Markedness, faithfulness, vowel quality and syllable structure in French. Journal of French Language Studies, 13(2), 247–280.
Figueroa, R. L., Zeng-Treitler, Q., Kandula, S., & Ngo, L. H. (2012). Predicting sample size required for classification performance. BMC Medical Informatics and Decision Making, 12(1), 1–10.
Forcada, M. L., Ginestí-Rosell, M., Nordfalk, J., O’Regan, J., Ortiz-Rojas, S., Pérez-Ortiz, J. A., Sánchez-Martínez, F., Ramírez-Sánchez, G., & Tyers, F. M. (2011). Apertium: a free/open-source platform for rule-based machine translation. Machine Translation, 25, 127–144.
Goldsmith, J. A. (2006). An algorithm for the unsupervised learning of morphology. Natural Language Engineering, 12, 353–371.
Goldsmith, J., & O’Brien, J. (2006). Learning inflectional classes. Language Learning and Development, 2, 219–250.
Graves, A. (2014). Generating sequences with recurrent neural networks. arXiv:1308.0850v5 [cs.NE].
Hinton, G. (2012). Lecture 6e: rmsprop: Divide the gradient by a running average of its recent magnitude. http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf.
Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580v1.
Hoberman, R. D., & Aronoff, M. (2003). The verbal morphology of Maltese: from Semitic to Romance. In J. Shimron (Ed.), Language processing and acquisition in languages of Semitic, root-based, morphology (pp. 61–78). Amsterdam: Benjamins.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9, 1735–1780.
Hochreiter, S., Bengio, Y., Frasconi, P., & Schmidhuber, J. (2001). Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In J. F. Kolen & S. C. Kremer (Eds.), A field guide to dynamical recurrent neural networks (pp. 237–244). New York: IEEE Press.
Hockett, C. (1967). The Yawelmani basic verb. Language, 43(1), 208–222.
Jacques, G., Lahaussois, A., Michailovsky, B., & Rai, D. B. (2012). An overview of Khaling verbal morphology. Language and Linguistics, 13(6), 1095–1170.
Jordan, M. I. (1989). Serial order: a parallel distributed processing approach. In J. L. Elman & D. E. Rumelhart (Eds.), Advances in connectionist theory (pp. 214–249). New York: Erlbaum.
Jozefowicz, R., Zaremba, W., & Sutskever, I. (2015). An empirical exploration of recurrent network architectures. In Proceedings of the 32nd international conference on machine learning (ICML 2015) (pp. 2342–2350).
Karpathy, A., Johnson, J., & Li, F. (2015). Visualizing and understanding recurrent networks. arXiv:1506.02078v2 [cs.LG].
Kohonen, O., Virpioja, S., & Lagus, K. (2010). Semi-supervised learning of concatenative morphology. In Proceedings of the 11th meeting of the ACL special interest group on computational morphology and phonology (pp. 78–86).
Křen, M., Bartoň, T., Cvrček, V., Hnátková, M., Jelínek, T., Kocek, J., Novotná, R., Petkevič, V., Procházka, P., Schmiedtová, V., & Skoumalová, H. (2010). Syn2010: žánrově vyvážený korpus psané češtiny. Tech. rep., Ústav Českého národního korpusu, FF UK, Prague.
Lee, J. L., & Goldsmith, J. A. (2016). Linguistica 5: unsupervised learning of linguistic structure. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics.
Li, J., Chen, X., Hovy, E., & Jurafsky, D. (2016). Visualizing and understanding neural models in NLP. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies.
van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.
Marzi, C., Ferro, M., Cardillo, F. A., & Pirrelli, V. (2016). Effects of frequency and regularity in an integrative model of word storage and processing. Italian Journal of Linguistics, 28(1), 79–114.
Matthews, P. H. (1991). Morphology. Cambridge: Cambridge Univesity Press.
Merrifield, W. R. (1968). Palantla Chinantec grammar. Mexico: Museo Nacional de Antropoloía.
Merrifield, W. R., & Anderson, A. E. (2006). Diccionario Chinanteco de la diáspora del pueblo antiguo de San Pedro Tlatepuzco, Oaxaca. Coyoacán, D.F., Mexico: Instituto Lingüístico de Verano.
Miestamo, M., Sinnemäki, K., & Karlsson, F. (Eds.) (2008). Language complexity: typology, contact, change. Amsterdam: Benjamins.
Mikolov, T., & Zweig, G. (2012). Context dependent recurrent neural network language model. In Proceedings of speech language technology (pp. 234–239).
Mikolov, T., Karafiát, M., Burget, L., Černocký, J., & Khudanpur, S. (2010). Recurrent neural network based language model. In Proceedings of interspeech (pp. 1045–1048).
Mikolov, T., Sutskever, I., Deoras, A., Le, H. S., Kombrink, S., & Černocký, J. (2012). Subword language modeling with neural networks. http://www.fit.vutbr.cz/imikolov/rnnlm/char.pdf.
Mikolov, T. M., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Proceedings of NIPS.
Ní Chasaide, A., Wogan, J., Ó Raghallaigh, B., Ní Bhriain, Á., Zoerner, E., Berthelsen, H., & Gobl, C. (2006). Speech technology for minority languages: the case of Irish (Gaelic). In Proceedings of the 9th international conference on spoken language processing, INTERSPEECH 2006 (pp. 181–184).
Nicolai, G., Cherry, C., & Kondark, G. (2015). Inflection generation as discriminative string transduction. In Human language technologies: the 2015 annual conference of the North American chapter of the ACL.
Pascanu, R., Mikolov, T., & Bengio, Y. (2013). On the difficulty of train recurrent neural networks. In Proceedings of the 30th international conference on machine learning (ICML 2013) (pp. 1310–1318).
Paul, H. (1891). Principles of the history of language, translated from 2nd edition into English by H.A. Strong edn. London: Longmans, Green.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
Pihel, K., & Pikamäe, A. (1999). Soome-eesti sõnaraamat. Valgus.
Pinker, S., & Prince, A. (1988). On language and connectionism: analysis of a parallel distributed processing model of language acquisition. Cognition, 28, 73–193.
Rumelhart, D. E., & McClelland, J. L. (1986). Parallel distributed processing (Vol. 1). Cambridge: MIT Press.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1985). Learning internal representations by error propagation. Tech. rep., DTIC Document.
Salminen, T. (1997). Tundra Nenets Inflection. Mémoires de la Société Finno-Ougrienne 227, Helsinki.
Sampson, G. B., Gil, D., & Trudgill, P. (Eds.) (2010). Language complexity as an evolving variable. Oxford: Oxford University Press.
Servan-Schreiber, D., Cleeremans, A., & McClelland, J. L. (1989). Learning sequential structure in simple recurrent networks. In D. S. Touretzky (Ed.), Advances in neural information processing systems 1 (pp. 643–652). San Francisco: Morgan Kaufmann
Silverman, D. (2006). Chinantec: phonology. In Concise encyclopedia of langauges of the world (pp. 211–213). Oxford: Elsevier.
Sims, A. D., & Parker, J. (2016). How inflection class systems work: on the informativity of implicative structure. Word Structure, 9(2), 215–239.
Spencer, A. J. (2012). Identifying stems. Word Structure, 5, 88–108.
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1), 1929–1958.
Stump, G. (2001). Inflectional morphology: a theory of paradigm structure. Cambridge: Cambridge University Press.
Stump, G., & Finkel, R. (2013). Morphological typology: from word to paradigm. Cambridge: Cambridge University Press.
Sundermeyer, M., Schlüter, R., & Ney, H. (2012). LSTM neural networks for language modeling. In Proc. of interspeech.
Sundermeyer, M., Schlüter, R., & Ney, H. (2015). From feedforward to recurrent LSTM neural networks for language modeling. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23, 517–529.
Sutskever, I., Martens, J., & Hinton, G. (2011). Generating text with recurrent neural networks. In International conference on machine learning (ICML 2011).
Testolin, A., Stoianov, I., Sperduti, A., & Zorzi, M. (2016). Learning orthographic structure with sequential generative neural networks. Cognitive Science, 40(3), 579–606.
Theano Development Team (2016). Theano: A Python framework for fast computation of mathematical expressions. arXiv:1605.02688v1 [cs.SC].
Thymé, A. (1993). Connectionist approach to nominal inflection: Paradigm patterning and analogy in Finnish. PhD thesis, UC San Diego.
Thymé, A., Ackerman, F., & Elman, J. L. (1994). Finnish nominal inflection: paradigmatic patterns and token analogy. In S. D. Lima, R. Corrigan, & G. K. Iverson (Eds.), The reality of linguistic rules (pp. 445–466). Amsterdam: Benjamins.
Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015). Show and tell: a neural image caption generator. In Computer vision and pattern recognition.
Walther, G., Jacques, G., & Sagot, B. (2013). Uncovering the inner architecture of Khaling verbal morphology. In 3rd workshop on Sino-Tibetan languages of Sichuan.
Wattenberg, M., Viégas, F., & Johnson, I. (2016). How to use t-SNE effectively. Distill doi:10.23915/distill.00002. http://distill.pub/2016/misread-tsne.
Werbos, P. J. (1990). Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10), 1550–1560.
Wurzel, W. U. (1989). Inflectional morphology and naturalness. Dordrecht: Springer.
Yang, C. (2016). The price of linguistic productivity. Cambridge: MIT Press.
Zaremba, W., Sutskever, I., & Vinyals, O. (2014). Recurrent neural network regularization. arXiv:1409.2329v5.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
The network is structured to take as input a lexeme, a morphosyntactic feature set, and a partial wordform and to output a probability distribution over the next segment in the wordform. The input \(x_{t}\) is a binary vector with as many dimensions as there are segments in the segment inventory of the language to be generated (plus designated start and end characters). The value of \(x_{t}\) is one for the dimension corresponding to the previous character and zero for all other dimensions. This provides a localist, one-host representation of the immediate phonological context. The input \(m_{t}\) is a localist input identifying a lexeme and a paradigm cell: one bit encodes the lexeme (say, walk) and another encodes the paradigm cell. Each combination of morphosyntactic features is given a unique representation: no generalizations are expressed at this level.
These inputs are mapped to a combined projection layer \(z_{t}\) (Bengio et al. 2003):
where ⊕ is vector concatenation. The projection layer \(z_{t}\) in turn is input for the recurrent layer, implemented via Long Short-Term Memory (LSTM) blocks (Hochreiter and Schmidhuber 1997; Jozefowicz et al. 2015). LSTMs avoid the problems with gradients exhibited by Elman-style simple recurrent networks and allow the model to more easily capture medium and long-distance temporal dependencies in the data (Hochreiter et al. 2001).Footnote 14 The output of the recurrent layer \(h_{t}\) is given by:
where ⊙ denotes element-wise multiplication. For implementation purposes, the sigmoid function σ is evaluated using the ‘hard sigmoid’, a piecewise-linear approximation of the true sigmoid (Courbariaux et al. 2015):
Finally, \(h_{t}\) is mapped to a vector with the same dimensionality as the input \(x_{t}\) from which we can induce a probability distribution over output characters:
The probability that the next character in the output \(x_{t+1}\) is the j-th character in the character set is computed by applying the softmax function on the output layer:
The probability of a wordform \(p(x_{1}\ldots x_{n})\) given a lexeme and paradigm cell is the product of the probabilities of each character given the preceding context:
During training, the weights W and U and biases b are selected to maximize the log likelihood of the training data.
Rights and permissions
About this article
Cite this article
Malouf, R. Abstractive morphological learning with a recurrent neural network. Morphology 27, 431–458 (2017). https://doi.org/10.1007/s11525-017-9307-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11525-017-9307-x