Abstract
Automatic document classification is considered to be an important part of managing and processing document in digital form, which is increasing. While there are a number of studies addressing the problem of English document classification, there are few studies that deal with the problem of Vietnamese document classification. In this paper, we propose to employ a hierarchical attention networks (HAN) for Vietnamese document classification. The HAN network has the two-level architecture with attention mechanisms applied to the word level and sentence level from which it reflects the hierarchical structure of the document. Experimental results are conducted on the Vietnamese news Database which is collected from the Vietnamese news Web sites. The results show that our proposed method is promising in the Vietnamese document classification problem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606 (2016)
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
Ha, P.T., Chi, N.Q.: Automatic classification for vietnamese news. Adv. Comput. Sci. Int. J. 4(4), 126–132 (2015)
Hoang, V.C.D., Dinh, D., Le Nguyen, N., Ngo, H.Q.: A comparative study on vietnamese text classification methods. In: 2007 IEEE International Conference on Research, Innovation and Vision for the Future, pp. 267–273, March 2007
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: European Conference on Machine Learning, pp. 137–142. Springer (1998)
Le, N.M., Do, B.N., Nguyen, B.D., Nguyen, T.D.: Vnlp: an open source framework for vietnamese natural language processing. In: Proceedings of the Fourth Symposium on Information and Communication Technology, pp. 88–93. ACM (2013)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Nguyen, G.-S., Gao, X., Andreae, P.: Vietnamese document representation and classification. In: Australasian Joint Conference on Artificial Intelligence, pp. 577–586. Springer (2009)
Tang, D., Qin, B., Liu, T.: Document modeling with gated recurrent neural network for sentiment classification. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1422–1432 (2015)
Van Toan, P., Thanh, T.M.: Vietnamese news classification based on bow with keywords extraction and neural network. In: 2017 21st Asia Pacific Symposium on Intelligent and Evolutionary Systems (IES), pp. 43–48. IEEE (2017)
Wang, S., Manning, C.D.: Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2, pp. 90–94. Association for Computational Linguistics (2012)
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489 (2016)
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Advances in Neural Information Processing Systems, pp. 649–657 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Nguyen, K.D.T., Viet, A.P., Hoang, T.H. (2020). Vietnamese Document Classification Using Hierarchical Attention Networks. In: Satapathy, S., Bhateja, V., Nguyen, B., Nguyen, N., Le, DN. (eds) Frontiers in Intelligent Computing: Theory and Applications. Advances in Intelligent Systems and Computing, vol 1014. Springer, Singapore. https://doi.org/10.1007/978-981-13-9920-6_13
Download citation
DOI: https://doi.org/10.1007/978-981-13-9920-6_13
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9919-0
Online ISBN: 978-981-13-9920-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)