Abstract
Recurrent neural networks (RNNs) is a useful tool for sequence labelling tasks in natural language processing. Although in practice RNNs suffer a problem of vanishing/exploding gradient, their compactness still offers efficiency and make them less prone to overfitting. In this paper we show that by propagating the prediction of previous labels we can improve the performance of RNNs while keeping the number of parameters in RNNs unchanged and adding only one more step for inference. As a result, the models are still more compact and efficient than other models with complex memory gates. In the experiment, we evaluate the idea on optical character recognition and Chunking which achieve promising results.
Keywords
- Natural language processing
- Recurrent neural networks
- Sequence labelling
This is a preview of subscription content, access via your institution.
Buying options



References
Chen, T., Singh, S., Taskar, B., Guestrin, C.: Efficient second-order gradient boosting for conditional random fields. In: 18th International Conference on Artificial Intelligence and Statistics, vol. 38, pp. 147–155. PMLR, San Diego (2015)
Cherla, S., Tran, S.N., d’Garcez, A., Weyde, T.: Discriminative learning and inference in the recurrent temporal RBM for melody modelling. In: 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2015)
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Conference on Empirical Methods in Natural Language Processing, pp. 1724–1734 (2014)
Collins, M.: Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In: ACL-2002 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 1–8. Association for Computational Linguistics, Stroudsburg (2002)
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res. 2, 265–292 (2002)
Daumé III, H., Langford, J., Marcu, D.: Search-based structured prediction. Mach. Learn. 75(3), 297–325 (2009)
Dietterich, T.G., Hao, G., Ashenfelter, A.: Gradient tree boosting for training conditional random fields. J. Mach. Learn. Res. 9(2), 2113–2139 (2008)
Do, T., Artieres, T.: Neural conditional random fields. In: 13th International Conference on Artificial Intelligence and Statistics, vol. 9, pp. 177–184. PMLR, Sardinia (2010)
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011)
Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Ann. Stat. 28, 337–407 (2000)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: 13th International Conference on Artificial Intelligence and Statistics, vol. 9, pp. 249–256. PMLR, Sardinia (2010)
Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM networks. In: 2005 IEEE International Joint Conference on Neural Networks, Montreal, Quebec, Canada, vol. 4, pp. 2047–2052 (2005)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. CoRR abs/1508.01991 (2015)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2014)
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: 18th International Conference on Machine Learning, pp. 282–289. Morgan Kaufmann Publishers Inc., San Francisco (2001)
Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional LSTM-CNNS-CRF. In: 54th Annual Meeting of the Association for Computational Linguistics, pp. 1064–1074. Association for Computational Linguistics (2016)
Nguyen, N., Guo, Y.: Comparisons of sequence labeling algorithms and extensions. In: 24th International Conference on Machine Learning, pp. 681–688. ACM, New York (2007)
Peng, F., McCallum, A.: Information extraction from research papers using conditional random fields. Inf. Process. Manag. 42(4), 963–979 (2006)
Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. In: Readings in Speech Recognition, pp. 267–296. Elsevier, San Francisco (1990)
Sun, X., Morency, L.P., Okanohara, D., Tsujii, J.: Modeling latent-dynamic in shallow parsing: a latent conditional model with improved inference. In: 22nd International Conference on Computational Linguistics, pp. 841–848. Association for Computational Linguistics, Stroudsburg (2008)
Sutton, C., McCallum, A., Rohanimanesh, K.: Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data. J. Mach. Learn. Res. 8, 693–723 (2007)
Suzuki, J., Isozaki, H.: Semi-supervised sequential labeling and segmentation using giga-word scale unlabeled data. In: ACL-2008: HLT, pp. 665–673. The Association for Computer Linguistics (2008)
Taskar, B., Guestrin, C., Koller, D.: Max-margin Markov networks. In: Advances in Neural Information Processing Systems, vol. 16, p. 25 (2004)
Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. J. Mach. Learn. Res. 6, 1453–1484 (2005)
Tsuruoka, Y., Miyao, Y., Kazama, J.: Learning with lookahead: can history-based models rival globally optimized models? In: 15th Conference on Computational Natural Language Learning, pp. 238–246. Association for Computational Linguistics, Stroudsburg (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Tran, S.N., Zhang, Q., Nguyen, A., Vu, XS., Ngo, S. (2018). Improving Recurrent Neural Networks with Predictive Propagation for Sequence Labelling. In: Cheng, L., Leung, A., Ozawa, S. (eds) Neural Information Processing. ICONIP 2018. Lecture Notes in Computer Science(), vol 11301. Springer, Cham. https://doi.org/10.1007/978-3-030-04167-0_41
Download citation
DOI: https://doi.org/10.1007/978-3-030-04167-0_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04166-3
Online ISBN: 978-3-030-04167-0
eBook Packages: Computer ScienceComputer Science (R0)