Abstract
Deep neural networks (DNNs) are powerful models that achieved excellent performance on many fields, especially in Nature Language Processing (NLP). Convolutional neural networks (CNN) and Recurrent neural networks (RNN) are two mainstream architectures of DNNs, are wildly explored to handle NLP tasks. However, those two type models adopt totally different ways to work. CNN is supposed to be good at capturing local features while RNN is considered to be able to summarize global information. In this paper, we combine the strengths of both architectures and propose a hybird model AHNN: Attention-based hybrid Neural Network, and use it in sentence modeling study. The AHNN utilizes attention based bidirectional dynamic lstm to obtain a better representation of global sentence information, then uses a parallel convolutional layer which has three different size filters and a max pooling layer to obtain significant local information. Finally, the two results are used together to feed into an expert layer to obtain results. Experiments show that the proposed architecture AHNN is able to summarize the context of the sentence and capture significant local features of sentence which is important for sentence modeling. We evaluate the proposed architecture AHNN on NLPCC News Headline Categorization test set and achieve 0.8098 test accuracy, it is a competitive performance compare with other teams in this task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
He, D., Zhang, H., Hao, W., et al.: An attention-based hybrid neural network for document modeling. IEICE Trans. Inf. Syst. 100(6), 1372–1375 (2017)
Bingham, E., Kabn, A., Girolami, M.: Topic identification in dynamical text by complexity pursuit. Neural Process. Lett. 17(1), 69–83 (2003)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Wang, S., Manning, C.D.: Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2. Association for Computational Linguistics, pp. 90–94 (2012)
Hinton, G., Deng, L., Yu, D., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
Mao, J., Xu, W., Yang, Y., et al.: Deep captioning with multimodal recurrent neural networks (m-RNN). arXiv preprint arXiv:1412.6632 (2014)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Wu, Y., Schuster, M., Chen, Z., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)
Zhang, Y., Er, M.J., Zhao, R., et al.: Multiview convolutional neural networks for multidocument extractive summarization. IEEE Trans. Cybern. (2016)
Gehring, J., Auli, M., Grangier, D., et al.: Convolutional sequence to sequence learning. arXiv preprint arXiv:1705.03122 (2017)
Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP (2014)
Kalchbrenner, N., Grefenstette, E., Blunsom, P., et al.: A convolutional neural network for modelling sentences. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 212–217 (2014)
Vinyals, O., Toshev, A., Bengio, S., et al.: Show and tell: a neural image caption generator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2015)
Graves, A., Jaitly, N.: Towards end-to-end speech recognition with recurrent neural networks. In: Proceedings of the 31st International Conference on Machine Learning (ICML2014), pp. 1764–1772 (2014)
Zhou, C., Sun, C., Liu, Z., et al.: A C-LSTM neural network for text classification. arXiv preprint arXiv:1511.08630 (2015)
Er, M.J., Zhang, Y., Wang, N., et al.: Attention pooling-based convolutional neural network for sentence modelling. Inf. Sci. 373, 388–403 (2016)
Bengio, Y., Ducharme, R., Vincent, P., et al.: A neural probabilistic language model. J. Mach. Learn. Res. 3(Feb), 1137–1155 (2003)
Mikolov, T., Sutskever, I., Chen, K., et al.: Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp. 3111–3119 (2013)
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning. ACM, pp. 160–167 (2008)
Mnih, A., Hinton, G.E.: A scalable hierarchical distributed language model. In: Advances in neural information processing systems, pp. 1081–1088 (2009)
Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 384–394 (2010)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of International Conference on Learning Representation (ICLR), Scottsdale, AZ, USA (2013)
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP, vol. 14, pp. 1532–1543 (2014)
LeCun, Y., Bottou, L., Bengio, Y., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Cho, K., Van Merriënboer, B., Gulcehre, C., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Sig. Process. 45(11), 2673–2681 (1997)
Jacobs, R.A., Jordan, M.I., Nowlan, S.J., et al.: Adaptive mixtures of local experts. Neural Comput. 3(1), 79–87 (1991)
Zhang, Y., Wallace, B.: A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. arXiv preprint arXiv:1510.03820 (2015)
Szegedy, C., Vanhoucke, V., Ioffe, S., et al.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)
Acknowledgment
This work was supported in part by the National Science Foundation of China under Grants 61573081 and the Fundamental Research Funds for Central Universities under Grant ZYGX2015J062.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Zhang, X., Huang, L., Qu, H. (2018). AHNN: An Attention-Based Hybrid Neural Network for Sentence Modeling. In: Huang, X., Jiang, J., Zhao, D., Feng, Y., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2017. Lecture Notes in Computer Science(), vol 10619. Springer, Cham. https://doi.org/10.1007/978-3-319-73618-1_63
Download citation
DOI: https://doi.org/10.1007/978-3-319-73618-1_63
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73617-4
Online ISBN: 978-3-319-73618-1
eBook Packages: Computer ScienceComputer Science (R0)