Skip to main content

Advertisement

Log in

Abstractive morphological learning with a recurrent neural network

  • Published:
Morphology Aims and scope Submit manuscript

Abstract

In traditional word-and-paradigm models of morphology, an inflectional system is represented via a set of exemplary paradigms. Novel wordforms are produced by analogy with previously encountered forms. This paper describes a recurrent neural network which can use this strategy to learn the paradigms of a morphologically complex language based on incomplete and randomized input. Results are given which show good performance for a range of typologically diverse languages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. https://github.com/rmalouf/morphology/.

  2. A localist representation is one in which each item is uniquely identified with a single node in the network. This is in contrast to a distributed representation, in which ensembles of nodes are used to represent multipe items.

  3. http://www.cs.utexas.edu/~gdurrett/wiktionary-morphology-1.1.tgz.

  4. https://korp.csc.fi/suomen-sanomalehtikielen-taajuussanasto-B9996.txt.

  5. http://www.llf.cnrs.fr/fr/flexique-fr.php.

  6. Stump and Finkel (2013) describe French paradigms as having 49 cells. Flexique records a singular/plural distinction on past participles which adds two additional cells to the paradigm.

  7. http://www.abair.tcd.ie/.

  8. https://svn.code.sf.net/p/apertium/svn/trunk/apertium-mlt-ara/apertium-mlt-ara.mlt.dix.

  9. https://gforge.inria.fr/frs/download.php/file/35119/khalex-0.0.2.mlex.tgz.

  10. http://www.sil.org/resources/archives/51375.

  11. http://networkmorphology.as.uky.edu/sites/default/files/ch23_rusnoms.dmp.

  12. All source code and data for the experiments described in this paper are available at http://github.com/rmalouf/abstractive. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research.

  13. https://github.com/sigmorphon/conll2017, accessed 2 June 2017.

  14. For an accessible description of the intuitions behind the LSTM, see, e.g., http://colah.github.io/posts/2015-08-Understanding-LSTMs/.

References

  • Ackerman, F., & Malouf, R. (2013). Morphological organization: the low conditional entropy conjecture. Language, 89, 429–464.

    Article  Google Scholar 

  • Ackerman, F., & Malouf, R. (2016). Implicative relations in word-based morphological systems. In A. Hippisley & G. Stump (Eds.), Cambridge handbook of morphology (pp. 272–296). Cambridge: Cambridge University Press.

    Google Scholar 

  • Ackerman, F., Blevins, J. P., & Malouf, R. (2009). Parts and wholes: patterns of relatedness in complex morphological systems and why they matter. In J. P. Blevins & J. Blevins (Eds.), Analogy in grammar: form and acquisition (pp. 54–82). Oxford: Oxford University Press.

    Chapter  Google Scholar 

  • Ahlberg, M., Forsberg, M., & Hulden, M. (2014). Semi-supervised learning of morphological paradigms and lexicons. In Proceedings of the 14th conference of the European chapter of the association for computational linguistics (pp. 569–578).

    Google Scholar 

  • Ahlberg, M., Forsberg, M., & Hulden, M. (2015). Paradigm classification in supervised learning of morphology. In Human language technologies: the 2015 annual conference of the North American chapter of the ACL (pp. 1024–1029).

    Google Scholar 

  • Albright, A., & Hayes, B. (2003). Rules vs. analogy in English past tenses: a computational/experimental study. Cognition, 90(2), 119–161.

    Article  Google Scholar 

  • Anderson, A. E., & Merrifield, W. R. (2000). Chinantec project of the language of the scattered peoples of Ancient San Pedro Tlatepuzco, Oaxaca, Mexico. http://www.sil.org/resources/archives/51375.

  • Aronoff, M. (2012). Morphological stems: what William of Ockham really said. Word Structure, 5(1), 28–51.

    Article  Google Scholar 

  • Baerman, M. (2016). Seri verb classes: morphosyntactic motivation and morphological autonomy. Language, 92(4), 792–823.

    Article  Google Scholar 

  • Baerman, M., & Palancar, E. L. (2016). The organization of Chinantec tone paradigms. In S. Augendre, G. Couasnon-Torlois, D. Lebon, C. Michard, G. Boyé, & F. Montermini (Eds.), Proceedings of the 8th Décembrettes (pp. 46–59). Toulouse: CLLE-ERSS.

    Google Scholar 

  • Bengio, Y., Frasconi, P., & Simard, P. (1993). The problem of learning long-term dependencies in recurrent networks. In IEEE international conference on neural networks (pp. 1183–1188).

    Chapter  Google Scholar 

  • Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3, 1137–1155.

    Google Scholar 

  • Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimzation. Journal of Machine Learning Research, 13, 281–305.

    Google Scholar 

  • Blevins, J. P. (2006). Word-based morphology. Journal of Linguistics, 42(3), 531–573.

    Article  Google Scholar 

  • Blevins, J. P. (2016). Word and paradigm morphology. Oxford: Oxford University Press.

    Book  Google Scholar 

  • Blevins, J. P., Ackerman, F., Malouf, R., & Ramscar, M. (2016). Morphology as an adaptive discriminative system. In H. Harley & D. Siddiqi (Eds.), Morphological metatheory (pp. 269–300). Amsterdam: Benjamins.

    Google Scholar 

  • Blevins, J. P., Milin, P., & Ramscar, M. (2017). The Zipfian paradigm cell filling problem. In F. Kiefer, J. P. Blevins, & H. Bartos (Eds.), Morphological paradigms and functions, Leiden: Brill.

    Google Scholar 

  • Bonami, O. (2012). Discovering implicative morphology. In Les Décembrettes 8: colloque international de morphologie, Bordeaux. http://www.llf.cnrs.fr/sites/llf.cnrs.fr/files/biblio//Bordeaux-dec2012.pdf.

    Google Scholar 

  • Bonami, O. (2013). Towards a robust assessment of implicative relations in inflectional systems. In Workshop on computational approaches to morphological complexity, Paris. http://www.llf.cnrs.fr/Gens/Bonami/presentations/Bonami-SMG-Paris-2013.pdf.

    Google Scholar 

  • Bonami, O., & Beniamine, S. (2016). Joint predictiveness in inflectional paradigms. Word Structure, 9(2), 156–182.

    Article  Google Scholar 

  • Bonami, O., & Boyé, G. (2002). Suppletion and dependency in inflectional morphology. In F. V. Eynde, L. Hellan, & D. Beermann (Eds.), The proceedings of the 8th international conference on head-driven phrase structure grammar (pp. 51–70). Stanford: CSLI Publications.

    Google Scholar 

  • Bonami, O., & Boyé, G. (2014). De formes en thèmes. In F. Villoing, S. Leroy, & S. David (Eds.), Foisonnements morphologiques. Etudes en hommage à Françoise Kerleroux (pp. 17–45). Paris: Presses Universitaires de Paris Ouest.

    Google Scholar 

  • Bonami, O., & Luís, A. R. (2014). Sur la morphologie implicative dans la conjugaison du portugais : une étude quantitative. In J. L. Léonard (Ed.), Mémoires de la Société de Linguistique de Paris: Vol. 22. Morphologie flexionnelle et dialectologie romane. Typologie(s) et modélisation(s) (pp. 111–151). Leuven: Peeters.

    Google Scholar 

  • Bonami, O., Caron, G., & Plancq, C. (2013). Flexique: an inflectional lexicon for spoken French.

  • Brown, D., & Hippisley, A. (2012). Network morphology. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Brown, D., Corbett, G., Fraser, N., Hippisley, A., & Timberlake, A. (1996). Russian noun stress and network morphology. Linguistics, 34, 53–107.

    Article  Google Scholar 

  • Carnie, A. (2008). Irish nouns: a reference guide. Oxford: Oxford University Press.

    Google Scholar 

  • Chan, E. (2008). Structures and distributions in morphology learning. PhD thesis, University of Pennsylvania.

  • Chollet, F. (2015). Keras. https://github.com/fchollet/keras.

  • Corbett, G. G., & Fraser, N. M. (1993). Network morphology: a DATR account of Russian nominal inflection. Journal of Linguistics, 29, 113–142.

    Article  Google Scholar 

  • Cotterell, R., Kirov, C., Sylak-Glassman, J., Yarowsky, D., Eisner, J., & Hulden, M. (2016). The SIGMORPHON 2016 shared task—morphological reinflection. In Proceedings of the 2016 meeting of SIGMORPHON. Berlin: Association for Computational Linguistics.

    Google Scholar 

  • Courbariaux, M., Bengio, Y., & David, J. P. (2015). BinaryConnect: training deep neural networks with binary weights during propagations. In Proceedings of the 29th annual conference on neural information processing systems (NIPS).

    Google Scholar 

  • CSC (2004). Suomen sanomalehtikielen taajuussanasto [Frequency dictionary of Finnish newspaper language]. https://korp.csc.fi/suomen-sanomalehtikielen-taajuussanasto-B9996.txt.

  • Dreyer, M., & Eisner, J. (2011). Discovering morphological paradigms from plain text using a Dirichlet process mixture model. In Proceedings of the 2011 conference on empirical methods in natural language processing (pp. 616–627). Edinburgh: Association for Computational Linguistics.

    Google Scholar 

  • Durrett, G., & DeNero, J. (2013). Supervised learning of complete morphological paradigms. In HLT-NAACL (pp. 1185–1195).

    Google Scholar 

  • Elman, J. L. (1989). Representation and structure in connectionist models (CRL Technical Report 8903). Center for Research in Learning.

  • Elman, J. L. (1990). Finding structure in time. Cognititive Science, 14, 179–211.

    Article  Google Scholar 

  • Elman, J. L. (1995). Language as a dynamical system. In R. F. Port & T. van Gelder (Eds.), Mind as motion: explorations in the dynamics of cognition (pp. 195–225). Cambridge: MIT Press.

    Google Scholar 

  • Fagyal, Z., Kibbee, D., & Jenkins, F. (2006). French: a linguistic introduction. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Féry, C. (2003). Markedness, faithfulness, vowel quality and syllable structure in French. Journal of French Language Studies, 13(2), 247–280.

    Article  Google Scholar 

  • Figueroa, R. L., Zeng-Treitler, Q., Kandula, S., & Ngo, L. H. (2012). Predicting sample size required for classification performance. BMC Medical Informatics and Decision Making, 12(1), 1–10.

    Article  Google Scholar 

  • Forcada, M. L., Ginestí-Rosell, M., Nordfalk, J., O’Regan, J., Ortiz-Rojas, S., Pérez-Ortiz, J. A., Sánchez-Martínez, F., Ramírez-Sánchez, G., & Tyers, F. M. (2011). Apertium: a free/open-source platform for rule-based machine translation. Machine Translation, 25, 127–144.

    Article  Google Scholar 

  • Goldsmith, J. A. (2006). An algorithm for the unsupervised learning of morphology. Natural Language Engineering, 12, 353–371.

    Article  Google Scholar 

  • Goldsmith, J., & O’Brien, J. (2006). Learning inflectional classes. Language Learning and Development, 2, 219–250.

    Article  Google Scholar 

  • Graves, A. (2014). Generating sequences with recurrent neural networks. arXiv:1308.0850v5 [cs.NE].

  • Hinton, G. (2012). Lecture 6e: rmsprop: Divide the gradient by a running average of its recent magnitude. http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf.

  • Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580v1.

  • Hoberman, R. D., & Aronoff, M. (2003). The verbal morphology of Maltese: from Semitic to Romance. In J. Shimron (Ed.), Language processing and acquisition in languages of Semitic, root-based, morphology (pp. 61–78). Amsterdam: Benjamins.

    Chapter  Google Scholar 

  • Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9, 1735–1780.

    Article  Google Scholar 

  • Hochreiter, S., Bengio, Y., Frasconi, P., & Schmidhuber, J. (2001). Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In J. F. Kolen & S. C. Kremer (Eds.), A field guide to dynamical recurrent neural networks (pp. 237–244). New York: IEEE Press.

    Google Scholar 

  • Hockett, C. (1967). The Yawelmani basic verb. Language, 43(1), 208–222.

    Article  Google Scholar 

  • Jacques, G., Lahaussois, A., Michailovsky, B., & Rai, D. B. (2012). An overview of Khaling verbal morphology. Language and Linguistics, 13(6), 1095–1170.

    Google Scholar 

  • Jordan, M. I. (1989). Serial order: a parallel distributed processing approach. In J. L. Elman & D. E. Rumelhart (Eds.), Advances in connectionist theory (pp. 214–249). New York: Erlbaum.

    Google Scholar 

  • Jozefowicz, R., Zaremba, W., & Sutskever, I. (2015). An empirical exploration of recurrent network architectures. In Proceedings of the 32nd international conference on machine learning (ICML 2015) (pp. 2342–2350).

    Google Scholar 

  • Karpathy, A., Johnson, J., & Li, F. (2015). Visualizing and understanding recurrent networks. arXiv:1506.02078v2 [cs.LG].

  • Kohonen, O., Virpioja, S., & Lagus, K. (2010). Semi-supervised learning of concatenative morphology. In Proceedings of the 11th meeting of the ACL special interest group on computational morphology and phonology (pp. 78–86).

    Google Scholar 

  • Křen, M., Bartoň, T., Cvrček, V., Hnátková, M., Jelínek, T., Kocek, J., Novotná, R., Petkevič, V., Procházka, P., Schmiedtová, V., & Skoumalová, H. (2010). Syn2010: žánrově vyvážený korpus psané češtiny. Tech. rep., Ústav Českého národního korpusu, FF UK, Prague.

  • Lee, J. L., & Goldsmith, J. A. (2016). Linguistica 5: unsupervised learning of linguistic structure. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics.

    Google Scholar 

  • Li, J., Chen, X., Hovy, E., & Jurafsky, D. (2016). Visualizing and understanding neural models in NLP. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies.

    Google Scholar 

  • van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.

    Google Scholar 

  • Marzi, C., Ferro, M., Cardillo, F. A., & Pirrelli, V. (2016). Effects of frequency and regularity in an integrative model of word storage and processing. Italian Journal of Linguistics, 28(1), 79–114.

    Google Scholar 

  • Matthews, P. H. (1991). Morphology. Cambridge: Cambridge Univesity Press.

    Book  Google Scholar 

  • Merrifield, W. R. (1968). Palantla Chinantec grammar. Mexico: Museo Nacional de Antropoloía.

    Google Scholar 

  • Merrifield, W. R., & Anderson, A. E. (2006). Diccionario Chinanteco de la diáspora del pueblo antiguo de San Pedro Tlatepuzco, Oaxaca. Coyoacán, D.F., Mexico: Instituto Lingüístico de Verano.

    Google Scholar 

  • Miestamo, M., Sinnemäki, K., & Karlsson, F. (Eds.) (2008). Language complexity: typology, contact, change. Amsterdam: Benjamins.

    Google Scholar 

  • Mikolov, T., & Zweig, G. (2012). Context dependent recurrent neural network language model. In Proceedings of speech language technology (pp. 234–239).

    Google Scholar 

  • Mikolov, T., Karafiát, M., Burget, L., Černocký, J., & Khudanpur, S. (2010). Recurrent neural network based language model. In Proceedings of interspeech (pp. 1045–1048).

    Google Scholar 

  • Mikolov, T., Sutskever, I., Deoras, A., Le, H. S., Kombrink, S., & Černocký, J. (2012). Subword language modeling with neural networks. http://www.fit.vutbr.cz/imikolov/rnnlm/char.pdf.

  • Mikolov, T. M., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Proceedings of NIPS.

    Google Scholar 

  • Ní Chasaide, A., Wogan, J., Ó Raghallaigh, B., Ní Bhriain, Á., Zoerner, E., Berthelsen, H., & Gobl, C. (2006). Speech technology for minority languages: the case of Irish (Gaelic). In Proceedings of the 9th international conference on spoken language processing, INTERSPEECH 2006 (pp. 181–184).

    Google Scholar 

  • Nicolai, G., Cherry, C., & Kondark, G. (2015). Inflection generation as discriminative string transduction. In Human language technologies: the 2015 annual conference of the North American chapter of the ACL.

    Google Scholar 

  • Pascanu, R., Mikolov, T., & Bengio, Y. (2013). On the difficulty of train recurrent neural networks. In Proceedings of the 30th international conference on machine learning (ICML 2013) (pp. 1310–1318).

    Google Scholar 

  • Paul, H. (1891). Principles of the history of language, translated from 2nd edition into English by H.A. Strong edn. London: Longmans, Green.

    Google Scholar 

  • Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.

    Google Scholar 

  • Pihel, K., & Pikamäe, A. (1999). Soome-eesti sõnaraamat. Valgus.

  • Pinker, S., & Prince, A. (1988). On language and connectionism: analysis of a parallel distributed processing model of language acquisition. Cognition, 28, 73–193.

    Article  Google Scholar 

  • Rumelhart, D. E., & McClelland, J. L. (1986). Parallel distributed processing (Vol. 1). Cambridge: MIT Press.

    Google Scholar 

  • Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1985). Learning internal representations by error propagation. Tech. rep., DTIC Document.

  • Salminen, T. (1997). Tundra Nenets Inflection. Mémoires de la Société Finno-Ougrienne 227, Helsinki.

  • Sampson, G. B., Gil, D., & Trudgill, P. (Eds.) (2010). Language complexity as an evolving variable. Oxford: Oxford University Press.

    Google Scholar 

  • Servan-Schreiber, D., Cleeremans, A., & McClelland, J. L. (1989). Learning sequential structure in simple recurrent networks. In D. S. Touretzky (Ed.), Advances in neural information processing systems 1 (pp. 643–652). San Francisco: Morgan Kaufmann

    Google Scholar 

  • Silverman, D. (2006). Chinantec: phonology. In Concise encyclopedia of langauges of the world (pp. 211–213). Oxford: Elsevier.

    Google Scholar 

  • Sims, A. D., & Parker, J. (2016). How inflection class systems work: on the informativity of implicative structure. Word Structure, 9(2), 215–239.

    Article  Google Scholar 

  • Spencer, A. J. (2012). Identifying stems. Word Structure, 5, 88–108.

    Article  Google Scholar 

  • Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1), 1929–1958.

    Google Scholar 

  • Stump, G. (2001). Inflectional morphology: a theory of paradigm structure. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Stump, G., & Finkel, R. (2013). Morphological typology: from word to paradigm. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Sundermeyer, M., Schlüter, R., & Ney, H. (2012). LSTM neural networks for language modeling. In Proc. of interspeech.

    Google Scholar 

  • Sundermeyer, M., Schlüter, R., & Ney, H. (2015). From feedforward to recurrent LSTM neural networks for language modeling. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23, 517–529.

    Article  Google Scholar 

  • Sutskever, I., Martens, J., & Hinton, G. (2011). Generating text with recurrent neural networks. In International conference on machine learning (ICML 2011).

    Google Scholar 

  • Testolin, A., Stoianov, I., Sperduti, A., & Zorzi, M. (2016). Learning orthographic structure with sequential generative neural networks. Cognitive Science, 40(3), 579–606.

    Article  Google Scholar 

  • Theano Development Team (2016). Theano: A Python framework for fast computation of mathematical expressions. arXiv:1605.02688v1 [cs.SC].

  • Thymé, A. (1993). Connectionist approach to nominal inflection: Paradigm patterning and analogy in Finnish. PhD thesis, UC San Diego.

  • Thymé, A., Ackerman, F., & Elman, J. L. (1994). Finnish nominal inflection: paradigmatic patterns and token analogy. In S. D. Lima, R. Corrigan, & G. K. Iverson (Eds.), The reality of linguistic rules (pp. 445–466). Amsterdam: Benjamins.

    Chapter  Google Scholar 

  • Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015). Show and tell: a neural image caption generator. In Computer vision and pattern recognition.

    Google Scholar 

  • Walther, G., Jacques, G., & Sagot, B. (2013). Uncovering the inner architecture of Khaling verbal morphology. In 3rd workshop on Sino-Tibetan languages of Sichuan.

    Google Scholar 

  • Wattenberg, M., Viégas, F., & Johnson, I. (2016). How to use t-SNE effectively. Distill doi:10.23915/distill.00002. http://distill.pub/2016/misread-tsne.

  • Werbos, P. J. (1990). Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10), 1550–1560.

    Article  Google Scholar 

  • Wurzel, W. U. (1989). Inflectional morphology and naturalness. Dordrecht: Springer.

    Google Scholar 

  • Yang, C. (2016). The price of linguistic productivity. Cambridge: MIT Press.

    Google Scholar 

  • Zaremba, W., Sutskever, I., & Vinyals, O. (2014). Recurrent neural network regularization. arXiv:1409.2329v5.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robert Malouf.

Appendix

Appendix

The network is structured to take as input a lexeme, a morphosyntactic feature set, and a partial wordform and to output a probability distribution over the next segment in the wordform. The input \(x_{t}\) is a binary vector with as many dimensions as there are segments in the segment inventory of the language to be generated (plus designated start and end characters). The value of \(x_{t}\) is one for the dimension corresponding to the previous character and zero for all other dimensions. This provides a localist, one-host representation of the immediate phonological context. The input \(m_{t}\) is a localist input identifying a lexeme and a paradigm cell: one bit encodes the lexeme (say, walk) and another encodes the paradigm cell. Each combination of morphosyntactic features is given a unique representation: no generalizations are expressed at this level.

These inputs are mapped to a combined projection layer \(z_{t}\) (Bengio et al. 2003):

$$\begin{aligned} z_{t}=\bigl({W^{x}}x_{t}+{b^{x}}\bigr) \oplus\bigl({W^{m}}m_{t}+{b^{m}}\bigr) \end{aligned}$$

where ⊕ is vector concatenation. The projection layer \(z_{t}\) in turn is input for the recurrent layer, implemented via Long Short-Term Memory (LSTM) blocks (Hochreiter and Schmidhuber 1997; Jozefowicz et al. 2015). LSTMs avoid the problems with gradients exhibited by Elman-style simple recurrent networks and allow the model to more easily capture medium and long-distance temporal dependencies in the data (Hochreiter et al. 2001).Footnote 14 The output of the recurrent layer \(h_{t}\) is given by:

$$\begin{aligned} i =& \operatorname{\mbox{$\sigma$}}\bigl({W^{i}}z_{t}+{U^{i}}h_{t-1}+{b^{i}}\bigr)\\ f =& \operatorname{\mbox{$\sigma$}}\bigl({W^{f}}z_{t}+{U^{f}}h_{t-1}+{b^{f}}\bigr)\\ o =& \operatorname{\mbox{$\sigma$}}\bigl({W^{o}}z_{t}+{U^{o}}h_{t-1}+{b^{o}}\bigr)\\ c_{t} =& f\odot c_{t-1}+i\odot\tanh\bigl({W^{c}}z_{t}+{U^{c}}h_{t-1}+{b^{c}}\bigr)\\ h_{t} =& o\odot\tanh(c_{t}) \end{aligned}$$

where ⊙ denotes element-wise multiplication. For implementation purposes, the sigmoid function σ is evaluated using the ‘hard sigmoid’, a piecewise-linear approximation of the true sigmoid (Courbariaux et al. 2015):

$$\begin{aligned} \operatorname{\mbox{$\sigma$}}(x)=\max\biggl(0,\min\biggl(1,\frac{x}{5}+\frac{1}{2}\biggr)\biggr) \end{aligned}$$

Finally, \(h_{t}\) is mapped to a vector with the same dimensionality as the input \(x_{t}\) from which we can induce a probability distribution over output characters:

$$\begin{aligned} y_{t}={W^{y}}h_{t} \end{aligned}$$

The probability that the next character in the output \(x_{t+1}\) is the j-th character in the character set is computed by applying the softmax function on the output layer:

$$\begin{aligned} p(x_{t+1}=j|x_{1}\ldots x_{t})=\frac{\operatorname{exp}(y^{j}_{t})}{\sum_{k}\operatorname{exp}(y^{k}_{t})} \end{aligned}$$

The probability of a wordform \(p(x_{1}\ldots x_{n})\) given a lexeme and paradigm cell is the product of the probabilities of each character given the preceding context:

$$\begin{aligned} p(x_{1},\ldots,x_{n})=\prod_{t} p(x_{t}|x_{1}\ldots x_{t-1}) \end{aligned}$$

During training, the weights W and U and biases b are selected to maximize the log likelihood of the training data.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Malouf, R. Abstractive morphological learning with a recurrent neural network. Morphology 27, 431–458 (2017). https://doi.org/10.1007/s11525-017-9307-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11525-017-9307-x

Keywords

Navigation