Hierarchical Neural Representation for Document Classification
- 135 Downloads
Text representation, which converts text spans into real-valued vectors or matrices, is a crucial tool for machines to understand the semantics of text. Although most previous works employed classic methods based on statistics and neural networks, such methods might suffer from data sparsity and insensitivity to the text structure, respectively. To address the above drawbacks, we propose a general and structure-sensitive framework, i.e., the hierarchical architecture. Specifically, we incorporate the hierarchical architecture into three existing neural network models for document representation, thereby producing three new representation models for document classification, i.e., TextHFT, TextHRNN, and TextHCNN. Our comprehensive experimental results on two public datasets demonstrate the effectiveness of the hierarchical architecture. With a comparable (or substantially less) time expense, our proposals obtain significant improvements ranging from 4.65 to 35.08% in terms of accuracy against the baseline. We can conclude that the hierarchical architecture can enhance the classification performance. In addition, we find that the benefits provided by the hierarchical architecture can be strengthened as the document length increases.
KeywordsDocument representation Neural networks Hierarchical architecture Document classification
Compliance with Ethical Standards
Conflict of interest
The authors declare that they have no conflict of interest.
Informed consent was not required as no human or animals were involved.
Human and Animal Rights
This article does not contain any studies with human participants performed by any of the authors.
- 2.Bengio Y, Ducharme R, Vincent P, Janvin C. A neural probabilistic language models. J Mach Learn Res 2003;3(6):1137–55.Google Scholar
- 4.Collobert R, Weston J, Karlen M, Kavukcuoglu K, Kuksa P. Natural language processing (almost) from scratch. J Mach Learn Res 2011;12(1):2493–537.Google Scholar
- 5.Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. Proceedings of the 13th International Conference on Artificial Intelligence and Statistics; 2010. p. 249–256.Google Scholar
- 6.He R, Lee WS, Ng HT, Dahlmeier D. Exploiting document knowledge for aspect-level sentiment classification. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics; 2018. p. 579–585.Google Scholar
- 7.Henao R, Li C, Carin L, Shen D, Wang G, Wang W, Zhang Y, Zhang X. Joint embedding of words and labels for text classification. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics; 2018. p. 2321–2331.Google Scholar
- 8.Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. Proceedings of the 32nd International Conference on Machine Learning; 2015. p. 448–456.Google Scholar
- 9.Isbell CL. Sparse multi-level representations for retrieval. J Comput Inf Sci Eng 1998;8(3):603–16.Google Scholar
- 11.Joachims T. Text categorization with suport vector machines: Learning with many relevant features. Proceedings of European Conference on Machine Learning; 1998. p. 137– 142.Google Scholar
- 12.Joulin A, Grave E, Bojanowski P, Mikolov T. Bag of tricks for efficient text classification. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics; 2017. p. 427–431.Google Scholar
- 13.Kia D, Soujanya P, Amir H, Erik C, Hawalah AYA, Alexander G, Qiang Z. Multilingual sentiment analysis: state of the art and independent comparison of techniques. Cogn Comput 2016;8(4):1–4.Google Scholar
- 14.Kim Y. Convolutional neural networks for sentence classification. Proceedings of Conference on Empirical Methods in Natural Language Processing; 2014. p. 1746–1751.Google Scholar
- 15.Lai S, Xu L, Liu K, Jun Z. Recurrent convolutional neural networks for text classification. Proceedings of Association for the Advancement of Artificial Intelligence; 2015. p. 2267–2273.Google Scholar
- 17.Le Q, Mikolov T. Distributed representations of sentences and documents. Proceedings of the 31st International Conference on Machine Learning; 2014.Google Scholar
- 19.Liu P, Qiu X, Huang X. Recurrent neural network for text classification with multi-task learning. Proceedings of International Joint Conference on Artificial Intelligence; 2016. p. 2873–2879.Google Scholar
- 20.Manning CD, Surdeanu M, Bauer J, Finkel J, Bethard SJ, Mcclosky D. The Stanford CoreNLP natural language processing toolkit. Meeting of the association for computational linguistics: system demonstrations; 2014. p. 55–60.Google Scholar
- 21.Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. Proceedings of International Conference on Learning Representations; 2013.Google Scholar
- 22.Pascanu R, Mikolov T, Bengio Y. On the difficulty of training recurrent neural networks. Computer Science 2012;52(3):III–1310.Google Scholar
- 23.Pennington J, Socher R, Manning C. Glove: Global vectors for word representation. Proceedings of Conference on Empirical Methods in Natural Language Processing; 2014. p. 1532–1543.Google Scholar
- 24.Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 2014;15(1):1929–58.Google Scholar
- 25.Tang D. Sentiment-specific representation learning for document-level sentiment analysis. Proceedings of 8th ACM International Conference on Web Search and Data Mining; 2015. p. 447–452.Google Scholar
- 27.Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E. Hierarchical attention networks for document classification. Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; 2017. p. 1480–1489.Google Scholar
- 28.Zhang X, Zhao J, Lecun Y. Character-level convolutional networks for text classification. Proceedings of Advances in Neural Information Processing Systems vol. 28; 2015. p. 649– 657.Google Scholar
- 29.Zhao Z, Liu T, Hou X, Li B, Du X. Distributed text representation with weighting scheme guidance for sentiment analysis. Proceedings of Asia-Pacific Web Conference; 2016. p. 41–52.Google Scholar