Abstractive morphological learning with a recurrent neural network

Malouf, Robert

doi:10.1007/s11525-017-9307-x

Abstractive morphological learning with a recurrent neural network

Published: 08 August 2017

Volume 27, pages 431–458, (2017)
Cite this article

Morphology Aims and scope Submit manuscript

Robert Malouf¹

662 Accesses
13 Citations
1 Altmetric
Explore all metrics

Abstract

In traditional word-and-paradigm models of morphology, an inflectional system is represented via a set of exemplary paradigms. Novel wordforms are produced by analogy with previously encountered forms. This paper describes a recurrent neural network which can use this strategy to learn the paradigms of a morphologically complex language based on incomplete and randomized input. Results are given which show good performance for a range of typologically diverse languages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Near-term advances in quantum natural language processing

Article 11 April 2024

A review on the long short-term memory model

Article 13 May 2020

Fundamentals of Artificial Neural Networks and Deep Learning

Notes

https://github.com/rmalouf/morphology/.
A localist representation is one in which each item is uniquely identified with a single node in the network. This is in contrast to a distributed representation, in which ensembles of nodes are used to represent multipe items.
http://www.cs.utexas.edu/~gdurrett/wiktionary-morphology-1.1.tgz.
https://korp.csc.fi/suomen-sanomalehtikielen-taajuussanasto-B9996.txt.
http://www.llf.cnrs.fr/fr/flexique-fr.php.
Stump and Finkel (2013) describe French paradigms as having 49 cells. Flexique records a singular/plural distinction on past participles which adds two additional cells to the paradigm.
http://www.abair.tcd.ie/.
https://svn.code.sf.net/p/apertium/svn/trunk/apertium-mlt-ara/apertium-mlt-ara.mlt.dix.
https://gforge.inria.fr/frs/download.php/file/35119/khalex-0.0.2.mlex.tgz.
http://www.sil.org/resources/archives/51375.
http://networkmorphology.as.uky.edu/sites/default/files/ch23_rusnoms.dmp.
All source code and data for the experiments described in this paper are available at http://github.com/rmalouf/abstractive. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research.
https://github.com/sigmorphon/conll2017, accessed 2 June 2017.
For an accessible description of the intuitions behind the LSTM, see, e.g., http://colah.github.io/posts/2015-08-Understanding-LSTMs/.

References

Ackerman, F., & Malouf, R. (2013). Morphological organization: the low conditional entropy conjecture. Language, 89, 429–464.
Article Google Scholar
Ackerman, F., & Malouf, R. (2016). Implicative relations in word-based morphological systems. In A. Hippisley & G. Stump (Eds.), Cambridge handbook of morphology (pp. 272–296). Cambridge: Cambridge University Press.
Google Scholar
Ackerman, F., Blevins, J. P., & Malouf, R. (2009). Parts and wholes: patterns of relatedness in complex morphological systems and why they matter. In J. P. Blevins & J. Blevins (Eds.), Analogy in grammar: form and acquisition (pp. 54–82). Oxford: Oxford University Press.
Chapter Google Scholar
Ahlberg, M., Forsberg, M., & Hulden, M. (2014). Semi-supervised learning of morphological paradigms and lexicons. In Proceedings of the 14th conference of the European chapter of the association for computational linguistics (pp. 569–578).
Google Scholar
Ahlberg, M., Forsberg, M., & Hulden, M. (2015). Paradigm classification in supervised learning of morphology. In Human language technologies: the 2015 annual conference of the North American chapter of the ACL (pp. 1024–1029).
Google Scholar
Albright, A., & Hayes, B. (2003). Rules vs. analogy in English past tenses: a computational/experimental study. Cognition, 90(2), 119–161.
Article Google Scholar
Anderson, A. E., & Merrifield, W. R. (2000). Chinantec project of the language of the scattered peoples of Ancient San Pedro Tlatepuzco, Oaxaca, Mexico. http://www.sil.org/resources/archives/51375.
Aronoff, M. (2012). Morphological stems: what William of Ockham really said. Word Structure, 5(1), 28–51.
Article Google Scholar
Baerman, M. (2016). Seri verb classes: morphosyntactic motivation and morphological autonomy. Language, 92(4), 792–823.
Article Google Scholar
Baerman, M., & Palancar, E. L. (2016). The organization of Chinantec tone paradigms. In S. Augendre, G. Couasnon-Torlois, D. Lebon, C. Michard, G. Boyé, & F. Montermini (Eds.), Proceedings of the 8th Décembrettes (pp. 46–59). Toulouse: CLLE-ERSS.
Google Scholar
Bengio, Y., Frasconi, P., & Simard, P. (1993). The problem of learning long-term dependencies in recurrent networks. In IEEE international conference on neural networks (pp. 1183–1188).
Chapter Google Scholar
Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3, 1137–1155.
Google Scholar
Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimzation. Journal of Machine Learning Research, 13, 281–305.
Google Scholar
Blevins, J. P. (2006). Word-based morphology. Journal of Linguistics, 42(3), 531–573.
Article Google Scholar
Blevins, J. P. (2016). Word and paradigm morphology. Oxford: Oxford University Press.
Book Google Scholar
Blevins, J. P., Ackerman, F., Malouf, R., & Ramscar, M. (2016). Morphology as an adaptive discriminative system. In H. Harley & D. Siddiqi (Eds.), Morphological metatheory (pp. 269–300). Amsterdam: Benjamins.
Google Scholar
Blevins, J. P., Milin, P., & Ramscar, M. (2017). The Zipfian paradigm cell filling problem. In F. Kiefer, J. P. Blevins, & H. Bartos (Eds.), Morphological paradigms and functions, Leiden: Brill.
Google Scholar
Bonami, O. (2012). Discovering implicative morphology. In Les Décembrettes 8: colloque international de morphologie, Bordeaux. http://www.llf.cnrs.fr/sites/llf.cnrs.fr/files/biblio//Bordeaux-dec2012.pdf.
Google Scholar
Bonami, O. (2013). Towards a robust assessment of implicative relations in inflectional systems. In Workshop on computational approaches to morphological complexity, Paris. http://www.llf.cnrs.fr/Gens/Bonami/presentations/Bonami-SMG-Paris-2013.pdf.
Google Scholar
Bonami, O., & Beniamine, S. (2016). Joint predictiveness in inflectional paradigms. Word Structure, 9(2), 156–182.
Article Google Scholar
Bonami, O., & Boyé, G. (2002). Suppletion and dependency in inflectional morphology. In F. V. Eynde, L. Hellan, & D. Beermann (Eds.), The proceedings of the 8th international conference on head-driven phrase structure grammar (pp. 51–70). Stanford: CSLI Publications.
Google Scholar
Bonami, O., & Boyé, G. (2014). De formes en thèmes. In F. Villoing, S. Leroy, & S. David (Eds.), Foisonnements morphologiques. Etudes en hommage à Françoise Kerleroux (pp. 17–45). Paris: Presses Universitaires de Paris Ouest.
Google Scholar
Bonami, O., & Luís, A. R. (2014). Sur la morphologie implicative dans la conjugaison du portugais : une étude quantitative. In J. L. Léonard (Ed.), Mémoires de la Société de Linguistique de Paris: Vol. 22. Morphologie flexionnelle et dialectologie romane. Typologie(s) et modélisation(s) (pp. 111–151). Leuven: Peeters.
Google Scholar
Bonami, O., Caron, G., & Plancq, C. (2013). Flexique: an inflectional lexicon for spoken French.
Brown, D., & Hippisley, A. (2012). Network morphology. Cambridge: Cambridge University Press.
Book Google Scholar
Brown, D., Corbett, G., Fraser, N., Hippisley, A., & Timberlake, A. (1996). Russian noun stress and network morphology. Linguistics, 34, 53–107.
Article Google Scholar
Carnie, A. (2008). Irish nouns: a reference guide. Oxford: Oxford University Press.
Google Scholar
Chan, E. (2008). Structures and distributions in morphology learning. PhD thesis, University of Pennsylvania.
Chollet, F. (2015). Keras. https://github.com/fchollet/keras.
Corbett, G. G., & Fraser, N. M. (1993). Network morphology: a DATR account of Russian nominal inflection. Journal of Linguistics, 29, 113–142.
Article Google Scholar
Cotterell, R., Kirov, C., Sylak-Glassman, J., Yarowsky, D., Eisner, J., & Hulden, M. (2016). The SIGMORPHON 2016 shared task—morphological reinflection. In Proceedings of the 2016 meeting of SIGMORPHON. Berlin: Association for Computational Linguistics.
Google Scholar
Courbariaux, M., Bengio, Y., & David, J. P. (2015). BinaryConnect: training deep neural networks with binary weights during propagations. In Proceedings of the 29th annual conference on neural information processing systems (NIPS).
Google Scholar
CSC (2004). Suomen sanomalehtikielen taajuussanasto [Frequency dictionary of Finnish newspaper language]. https://korp.csc.fi/suomen-sanomalehtikielen-taajuussanasto-B9996.txt.
Dreyer, M., & Eisner, J. (2011). Discovering morphological paradigms from plain text using a Dirichlet process mixture model. In Proceedings of the 2011 conference on empirical methods in natural language processing (pp. 616–627). Edinburgh: Association for Computational Linguistics.
Google Scholar
Durrett, G., & DeNero, J. (2013). Supervised learning of complete morphological paradigms. In HLT-NAACL (pp. 1185–1195).
Google Scholar
Elman, J. L. (1989). Representation and structure in connectionist models (CRL Technical Report 8903). Center for Research in Learning.
Elman, J. L. (1990). Finding structure in time. Cognititive Science, 14, 179–211.
Article Google Scholar
Elman, J. L. (1995). Language as a dynamical system. In R. F. Port & T. van Gelder (Eds.), Mind as motion: explorations in the dynamics of cognition (pp. 195–225). Cambridge: MIT Press.
Google Scholar
Fagyal, Z., Kibbee, D., & Jenkins, F. (2006). French: a linguistic introduction. Cambridge: Cambridge University Press.
Book Google Scholar
Féry, C. (2003). Markedness, faithfulness, vowel quality and syllable structure in French. Journal of French Language Studies, 13(2), 247–280.
Article Google Scholar
Figueroa, R. L., Zeng-Treitler, Q., Kandula, S., & Ngo, L. H. (2012). Predicting sample size required for classification performance. BMC Medical Informatics and Decision Making, 12(1), 1–10.
Article Google Scholar
Forcada, M. L., Ginestí-Rosell, M., Nordfalk, J., O’Regan, J., Ortiz-Rojas, S., Pérez-Ortiz, J. A., Sánchez-Martínez, F., Ramírez-Sánchez, G., & Tyers, F. M. (2011). Apertium: a free/open-source platform for rule-based machine translation. Machine Translation, 25, 127–144.
Article Google Scholar
Goldsmith, J. A. (2006). An algorithm for the unsupervised learning of morphology. Natural Language Engineering, 12, 353–371.
Article Google Scholar
Goldsmith, J., & O’Brien, J. (2006). Learning inflectional classes. Language Learning and Development, 2, 219–250.
Article Google Scholar
Graves, A. (2014). Generating sequences with recurrent neural networks. arXiv:1308.0850v5 [cs.NE].
Hinton, G. (2012). Lecture 6e: rmsprop: Divide the gradient by a running average of its recent magnitude. http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf.
Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580v1.
Hoberman, R. D., & Aronoff, M. (2003). The verbal morphology of Maltese: from Semitic to Romance. In J. Shimron (Ed.), Language processing and acquisition in languages of Semitic, root-based, morphology (pp. 61–78). Amsterdam: Benjamins.
Chapter Google Scholar
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9, 1735–1780.
Article Google Scholar
Hochreiter, S., Bengio, Y., Frasconi, P., & Schmidhuber, J. (2001). Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In J. F. Kolen & S. C. Kremer (Eds.), A field guide to dynamical recurrent neural networks (pp. 237–244). New York: IEEE Press.
Google Scholar
Hockett, C. (1967). The Yawelmani basic verb. Language, 43(1), 208–222.
Article Google Scholar
Jacques, G., Lahaussois, A., Michailovsky, B., & Rai, D. B. (2012). An overview of Khaling verbal morphology. Language and Linguistics, 13(6), 1095–1170.
Google Scholar
Jordan, M. I. (1989). Serial order: a parallel distributed processing approach. In J. L. Elman & D. E. Rumelhart (Eds.), Advances in connectionist theory (pp. 214–249). New York: Erlbaum.
Google Scholar
Jozefowicz, R., Zaremba, W., & Sutskever, I. (2015). An empirical exploration of recurrent network architectures. In Proceedings of the 32nd international conference on machine learning (ICML 2015) (pp. 2342–2350).
Google Scholar
Karpathy, A., Johnson, J., & Li, F. (2015). Visualizing and understanding recurrent networks. arXiv:1506.02078v2 [cs.LG].
Kohonen, O., Virpioja, S., & Lagus, K. (2010). Semi-supervised learning of concatenative morphology. In Proceedings of the 11th meeting of the ACL special interest group on computational morphology and phonology (pp. 78–86).
Google Scholar
Křen, M., Bartoň, T., Cvrček, V., Hnátková, M., Jelínek, T., Kocek, J., Novotná, R., Petkevič, V., Procházka, P., Schmiedtová, V., & Skoumalová, H. (2010). Syn2010: žánrově vyvážený korpus psané češtiny. Tech. rep., Ústav Českého národního korpusu, FF UK, Prague.
Lee, J. L., & Goldsmith, J. A. (2016). Linguistica 5: unsupervised learning of linguistic structure. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics.
Google Scholar
Li, J., Chen, X., Hovy, E., & Jurafsky, D. (2016). Visualizing and understanding neural models in NLP. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies.
Google Scholar
van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.
Google Scholar
Marzi, C., Ferro, M., Cardillo, F. A., & Pirrelli, V. (2016). Effects of frequency and regularity in an integrative model of word storage and processing. Italian Journal of Linguistics, 28(1), 79–114.
Google Scholar
Matthews, P. H. (1991). Morphology. Cambridge: Cambridge Univesity Press.
Book Google Scholar
Merrifield, W. R. (1968). Palantla Chinantec grammar. Mexico: Museo Nacional de Antropoloía.
Google Scholar
Merrifield, W. R., & Anderson, A. E. (2006). Diccionario Chinanteco de la diáspora del pueblo antiguo de San Pedro Tlatepuzco, Oaxaca. Coyoacán, D.F., Mexico: Instituto Lingüístico de Verano.
Google Scholar
Miestamo, M., Sinnemäki, K., & Karlsson, F. (Eds.) (2008). Language complexity: typology, contact, change. Amsterdam: Benjamins.
Google Scholar
Mikolov, T., & Zweig, G. (2012). Context dependent recurrent neural network language model. In Proceedings of speech language technology (pp. 234–239).
Google Scholar
Mikolov, T., Karafiát, M., Burget, L., Černocký, J., & Khudanpur, S. (2010). Recurrent neural network based language model. In Proceedings of interspeech (pp. 1045–1048).
Google Scholar
Mikolov, T., Sutskever, I., Deoras, A., Le, H. S., Kombrink, S., & Černocký, J. (2012). Subword language modeling with neural networks. http://www.fit.vutbr.cz/imikolov/rnnlm/char.pdf.
Mikolov, T. M., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Proceedings of NIPS.
Google Scholar
Ní Chasaide, A., Wogan, J., Ó Raghallaigh, B., Ní Bhriain, Á., Zoerner, E., Berthelsen, H., & Gobl, C. (2006). Speech technology for minority languages: the case of Irish (Gaelic). In Proceedings of the 9th international conference on spoken language processing, INTERSPEECH 2006 (pp. 181–184).
Google Scholar
Nicolai, G., Cherry, C., & Kondark, G. (2015). Inflection generation as discriminative string transduction. In Human language technologies: the 2015 annual conference of the North American chapter of the ACL.
Google Scholar
Pascanu, R., Mikolov, T., & Bengio, Y. (2013). On the difficulty of train recurrent neural networks. In Proceedings of the 30th international conference on machine learning (ICML 2013) (pp. 1310–1318).
Google Scholar
Paul, H. (1891). Principles of the history of language, translated from 2nd edition into English by H.A. Strong edn. London: Longmans, Green.
Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
Google Scholar
Pihel, K., & Pikamäe, A. (1999). Soome-eesti sõnaraamat. Valgus.
Pinker, S., & Prince, A. (1988). On language and connectionism: analysis of a parallel distributed processing model of language acquisition. Cognition, 28, 73–193.
Article Google Scholar
Rumelhart, D. E., & McClelland, J. L. (1986). Parallel distributed processing (Vol. 1). Cambridge: MIT Press.
Google Scholar
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1985). Learning internal representations by error propagation. Tech. rep., DTIC Document.
Salminen, T. (1997). Tundra Nenets Inflection. Mémoires de la Société Finno-Ougrienne 227, Helsinki.
Sampson, G. B., Gil, D., & Trudgill, P. (Eds.) (2010). Language complexity as an evolving variable. Oxford: Oxford University Press.
Google Scholar
Servan-Schreiber, D., Cleeremans, A., & McClelland, J. L. (1989). Learning sequential structure in simple recurrent networks. In D. S. Touretzky (Ed.), Advances in neural information processing systems 1 (pp. 643–652). San Francisco: Morgan Kaufmann
Google Scholar
Silverman, D. (2006). Chinantec: phonology. In Concise encyclopedia of langauges of the world (pp. 211–213). Oxford: Elsevier.
Google Scholar
Sims, A. D., & Parker, J. (2016). How inflection class systems work: on the informativity of implicative structure. Word Structure, 9(2), 215–239.
Article Google Scholar
Spencer, A. J. (2012). Identifying stems. Word Structure, 5, 88–108.
Article Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1), 1929–1958.
Google Scholar
Stump, G. (2001). Inflectional morphology: a theory of paradigm structure. Cambridge: Cambridge University Press.
Book Google Scholar
Stump, G., & Finkel, R. (2013). Morphological typology: from word to paradigm. Cambridge: Cambridge University Press.
Book Google Scholar
Sundermeyer, M., Schlüter, R., & Ney, H. (2012). LSTM neural networks for language modeling. In Proc. of interspeech.
Google Scholar
Sundermeyer, M., Schlüter, R., & Ney, H. (2015). From feedforward to recurrent LSTM neural networks for language modeling. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23, 517–529.
Article Google Scholar
Sutskever, I., Martens, J., & Hinton, G. (2011). Generating text with recurrent neural networks. In International conference on machine learning (ICML 2011).
Google Scholar
Testolin, A., Stoianov, I., Sperduti, A., & Zorzi, M. (2016). Learning orthographic structure with sequential generative neural networks. Cognitive Science, 40(3), 579–606.
Article Google Scholar
Theano Development Team (2016). Theano: A Python framework for fast computation of mathematical expressions. arXiv:1605.02688v1 [cs.SC].
Thymé, A. (1993). Connectionist approach to nominal inflection: Paradigm patterning and analogy in Finnish. PhD thesis, UC San Diego.
Thymé, A., Ackerman, F., & Elman, J. L. (1994). Finnish nominal inflection: paradigmatic patterns and token analogy. In S. D. Lima, R. Corrigan, & G. K. Iverson (Eds.), The reality of linguistic rules (pp. 445–466). Amsterdam: Benjamins.
Chapter Google Scholar
Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015). Show and tell: a neural image caption generator. In Computer vision and pattern recognition.
Google Scholar
Walther, G., Jacques, G., & Sagot, B. (2013). Uncovering the inner architecture of Khaling verbal morphology. In 3rd workshop on Sino-Tibetan languages of Sichuan.
Google Scholar
Wattenberg, M., Viégas, F., & Johnson, I. (2016). How to use t-SNE effectively. Distill doi:10.23915/distill.00002. http://distill.pub/2016/misread-tsne.
Werbos, P. J. (1990). Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10), 1550–1560.
Article Google Scholar
Wurzel, W. U. (1989). Inflectional morphology and naturalness. Dordrecht: Springer.
Google Scholar
Yang, C. (2016). The price of linguistic productivity. Cambridge: MIT Press.
Google Scholar
Zaremba, W., Sutskever, I., & Vinyals, O. (2014). Recurrent neural network regularization. arXiv:1409.2329v5.

Download references

Author information

Authors and Affiliations

Department of Linguistics and Asian/Middle Eastern Languages, San Diego State University, San Diego, CA, 92182-7727, USA
Robert Malouf

Authors

Robert Malouf
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robert Malouf.

Appendix

The network is structured to take as input a lexeme, a morphosyntactic feature set, and a partial wordform and to output a probability distribution over the next segment in the wordform. The input $x_{t}$ is a binary vector with as many dimensions as there are segments in the segment inventory of the language to be generated (plus designated start and end characters). The value of $x_{t}$ is one for the dimension corresponding to the previous character and zero for all other dimensions. This provides a localist, one-host representation of the immediate phonological context. The input $m_{t}$ is a localist input identifying a lexeme and a paradigm cell: one bit encodes the lexeme (say, walk) and another encodes the paradigm cell. Each combination of morphosyntactic features is given a unique representation: no generalizations are expressed at this level.

These inputs are mapped to a combined projection layer $z_{t}$ (Bengio et al. 2003):

$$\begin{aligned} z_{t}=\bigl({W^{x}}x_{t}+{b^{x}}\bigr) \oplus\bigl({W^{m}}m_{t}+{b^{m}}\bigr) \end{aligned}$$

where ⊕ is vector concatenation. The projection layer $z_{t}$ in turn is input for the recurrent layer, implemented via Long Short-Term Memory (LSTM) blocks (Hochreiter and Schmidhuber 1997; Jozefowicz et al. 2015). LSTMs avoid the problems with gradients exhibited by Elman-style simple recurrent networks and allow the model to more easily capture medium and long-distance temporal dependencies in the data (Hochreiter et al. 2001).^{Footnote 14} The output of the recurrent layer $h_{t}$ is given by:

$$\begin{aligned} i =& \operatorname{\mbox{$\sigma$}}\bigl({W^{i}}z_{t}+{U^{i}}h_{t-1}+{b^{i}}\bigr)\\ f =& \operatorname{\mbox{$\sigma$}}\bigl({W^{f}}z_{t}+{U^{f}}h_{t-1}+{b^{f}}\bigr)\\ o =& \operatorname{\mbox{$\sigma$}}\bigl({W^{o}}z_{t}+{U^{o}}h_{t-1}+{b^{o}}\bigr)\\ c_{t} =& f\odot c_{t-1}+i\odot\tanh\bigl({W^{c}}z_{t}+{U^{c}}h_{t-1}+{b^{c}}\bigr)\\ h_{t} =& o\odot\tanh(c_{t}) \end{aligned}$$

where ⊙ denotes element-wise multiplication. For implementation purposes, the sigmoid function σ is evaluated using the ‘hard sigmoid’, a piecewise-linear approximation of the true sigmoid (Courbariaux et al. 2015):

$$\begin{aligned} \operatorname{\mbox{$\sigma$}}(x)=\max\biggl(0,\min\biggl(1,\frac{x}{5}+\frac{1}{2}\biggr)\biggr) \end{aligned}$$

Finally, $h_{t}$ is mapped to a vector with the same dimensionality as the input $x_{t}$ from which we can induce a probability distribution over output characters:

$$\begin{aligned} y_{t}={W^{y}}h_{t} \end{aligned}$$

The probability that the next character in the output $x_{t+1}$ is the j-th character in the character set is computed by applying the softmax function on the output layer:

$$\begin{aligned} p(x_{t+1}=j|x_{1}\ldots x_{t})=\frac{\operatorname{exp}(y^{j}_{t})}{\sum_{k}\operatorname{exp}(y^{k}_{t})} \end{aligned}$$

The probability of a wordform $p(x_{1}\ldots x_{n})$ given a lexeme and paradigm cell is the product of the probabilities of each character given the preceding context:

$$\begin{aligned} p(x_{1},\ldots,x_{n})=\prod_{t} p(x_{t}|x_{1}\ldots x_{t-1}) \end{aligned}$$

During training, the weights W and U and biases b are selected to maximize the log likelihood of the training data.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Malouf, R. Abstractive morphological learning with a recurrent neural network. Morphology 27, 431–458 (2017). https://doi.org/10.1007/s11525-017-9307-x

Download citation

Received: 08 September 2016
Accepted: 03 July 2017
Published: 08 August 2017
Issue Date: November 2017
DOI: https://doi.org/10.1007/s11525-017-9307-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Abstractive morphological learning with a recurrent neural network

Abstract

Access this article

Similar content being viewed by others

Near-term advances in quantum natural language processing

A review on the long short-term memory model

Fundamentals of Artificial Neural Networks and Deep Learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstractive morphological learning with a recurrent neural network

Abstract

Access this article

Similar content being viewed by others

Near-term advances in quantum natural language processing

A review on the long short-term memory model

Fundamentals of Artificial Neural Networks and Deep Learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation