AHNN: An Attention-Based Hybrid Neural Network for Sentence Modeling

Zhang, Xiaomin; Huang, Li; Qu, Hong

doi:10.1007/978-3-319-73618-1_63

Xiaomin Zhang¹⁸,
Li Huang¹⁸ &
Hong Qu¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10619))

Included in the following conference series:

National CCF Conference on Natural Language Processing and Chinese Computing

3343 Accesses
1 Citations

Abstract

Deep neural networks (DNNs) are powerful models that achieved excellent performance on many fields, especially in Nature Language Processing (NLP). Convolutional neural networks (CNN) and Recurrent neural networks (RNN) are two mainstream architectures of DNNs, are wildly explored to handle NLP tasks. However, those two type models adopt totally different ways to work. CNN is supposed to be good at capturing local features while RNN is considered to be able to summarize global information. In this paper, we combine the strengths of both architectures and propose a hybird model AHNN: Attention-based hybrid Neural Network, and use it in sentence modeling study. The AHNN utilizes attention based bidirectional dynamic lstm to obtain a better representation of global sentence information, then uses a parallel convolutional layer which has three different size filters and a max pooling layer to obtain significant local information. Finally, the two results are used together to feed into an expert layer to obtain results. Experiments show that the proposed architecture AHNN is able to summarize the context of the sentence and capture significant local features of sentence which is important for sentence modeling. We evaluate the proposed architecture AHNN on NLPCC News Headline Categorization test set and achieve 0.8098 test accuracy, it is a competitive performance compare with other teams in this task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

He, D., Zhang, H., Hao, W., et al.: An attention-based hybrid neural network for document modeling. IEICE Trans. Inf. Syst. 100(6), 1372–1375 (2017)
Article Google Scholar
Bingham, E., Kabn, A., Girolami, M.: Topic identification in dynamical text by complexity pursuit. Neural Process. Lett. 17(1), 69–83 (2003)
Article Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
Wang, S., Manning, C.D.: Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2. Association for Computational Linguistics, pp. 90–94 (2012)
Google Scholar
Hinton, G., Deng, L., Yu, D., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
Google Scholar
Mao, J., Xu, W., Yang, Y., et al.: Deep captioning with multimodal recurrent neural networks (m-RNN). arXiv preprint arXiv:1412.6632 (2014)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Wu, Y., Schuster, M., Chen, Z., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)
Zhang, Y., Er, M.J., Zhao, R., et al.: Multiview convolutional neural networks for multidocument extractive summarization. IEEE Trans. Cybern. (2016)
Google Scholar
Gehring, J., Auli, M., Grangier, D., et al.: Convolutional sequence to sequence learning. arXiv preprint arXiv:1705.03122 (2017)
Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP (2014)
Google Scholar
Kalchbrenner, N., Grefenstette, E., Blunsom, P., et al.: A convolutional neural network for modelling sentences. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 212–217 (2014)
Google Scholar
Vinyals, O., Toshev, A., Bengio, S., et al.: Show and tell: a neural image caption generator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2015)
Google Scholar
Graves, A., Jaitly, N.: Towards end-to-end speech recognition with recurrent neural networks. In: Proceedings of the 31st International Conference on Machine Learning (ICML2014), pp. 1764–1772 (2014)
Google Scholar
Zhou, C., Sun, C., Liu, Z., et al.: A C-LSTM neural network for text classification. arXiv preprint arXiv:1511.08630 (2015)
Er, M.J., Zhang, Y., Wang, N., et al.: Attention pooling-based convolutional neural network for sentence modelling. Inf. Sci. 373, 388–403 (2016)
Article Google Scholar
Bengio, Y., Ducharme, R., Vincent, P., et al.: A neural probabilistic language model. J. Mach. Learn. Res. 3(Feb), 1137–1155 (2003)
MATH Google Scholar
Mikolov, T., Sutskever, I., Chen, K., et al.: Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp. 3111–3119 (2013)
Google Scholar
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning. ACM, pp. 160–167 (2008)
Google Scholar
Mnih, A., Hinton, G.E.: A scalable hierarchical distributed language model. In: Advances in neural information processing systems, pp. 1081–1088 (2009)
Google Scholar
Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 384–394 (2010)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of International Conference on Learning Representation (ICLR), Scottsdale, AZ, USA (2013)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP, vol. 14, pp. 1532–1543 (2014)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Cho, K., Van Merriënboer, B., Gulcehre, C., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Sig. Process. 45(11), 2673–2681 (1997)
Article Google Scholar
Jacobs, R.A., Jordan, M.I., Nowlan, S.J., et al.: Adaptive mixtures of local experts. Neural Comput. 3(1), 79–87 (1991)
Article Google Scholar
Zhang, Y., Wallace, B.: A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. arXiv preprint arXiv:1510.03820 (2015)
Szegedy, C., Vanhoucke, V., Ioffe, S., et al.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)
Google Scholar

Download references

Acknowledgment

This work was supported in part by the National Science Foundation of China under Grants 61573081 and the Fundamental Research Funds for Central Universities under Grant ZYGX2015J062.

Author information

Authors and Affiliations

School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 610054, People’s Republic of China
Xiaomin Zhang, Li Huang & Hong Qu

Authors

Xiaomin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Li Huang
View author publications
You can also search for this author in PubMed Google Scholar
Hong Qu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hong Qu .

Editor information

Editors and Affiliations

Fudan University, Shanghai, China
Xuanjing Huang
Singapore Management University, Singapore, Singapore
Jing Jiang
Peking University, Beijing, China
Dongyan Zhao
Peking University, Beijing, China
Yansong Feng
Soochow University, Suzhou, China
Yu Hong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, X., Huang, L., Qu, H. (2018). AHNN: An Attention-Based Hybrid Neural Network for Sentence Modeling. In: Huang, X., Jiang, J., Zhao, D., Feng, Y., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2017. Lecture Notes in Computer Science(), vol 10619. Springer, Cham. https://doi.org/10.1007/978-3-319-73618-1_63

Download citation

DOI: https://doi.org/10.1007/978-3-319-73618-1_63
Published: 05 January 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73617-4
Online ISBN: 978-3-319-73618-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics