Abstract
Hierarchical text classification has been receiving increasing attention due to its vast range of applications in real-world natural language processing tasks. While previous approaches have focused on effectively exploiting the label hierarchy for classification or capturing latent label relationships, few studies have integrated these concepts. In this work, we propose a graph attention capsule network for hierarchical text classification (GACaps-HTC), designed to capture both the explicit hierarchy and implicit relationships of labels. A graph attention network is employed to incorporate the information on the label hierarchy into a textual representation, whereas a capsule network infers classification probabilities by understanding the latent label relationships via iterative updates. The proposed approach is optimized using a loss term designed to address the innate label imbalance issue of the task. Experiments were conducted on two widely used text classification datasets, the WOS-46985 dataset and the RCV1 dataset. The results reveal that the proposed approach achieved a 0.6% gain and a 2.0% gain in micro-F1 and macro-F1 scores, respectively, on the WOS-46985 dataset and a 0.3% gain and a 2.2% gain in micro-F1 and macro-F1 scores, respectively, on the RCV1 dataset compared to the previous state-of-the-art approaches. Further ablation studies show that each component in GACaps-HTC played a part in enhancing the classification performance.
This is a preview of subscription content, access via your institution.









Data Availability
The datasets analysed using the current study are available in the Mendeley Data repository (https://data.mendeley.com/datasets/9rw3vkcfy4/6) and the Text Retrieval Conference repository (https://trec.nist.gov/data/reuters/reuters.html) as described in the manuscript.
Code Availability
The code for reproducing the results provided in the manuscript will be made public upon acceptance.
References
Liu X, Gao J, He X et al (2015) Representation learning using multi-task deep neural networks for semantic classification and information retrieval. In: Proc Conf North American Chapter Assoc Comput Linguist: Human lang Technol. ACL, pp 912-921. https://doi.org/10.3115/v1/n15-1092
Panda SP, Mohanty JP (2020) A domain classification-based information retrieval system. In: Proc IEEE int Women eng Conf Electr Comput Eng. IEEE, pp 122-125. https://doi.org/10.1109/WIECON-ECE52138.2020.9398018
Wu Z, Gao J, Li Q et al (2022) Make aspect-based sentiment classification go further: step into the long-document-level. Appl Intell 52(8):8428–8447. https://doi.org/10.1007/s10489-021-02836-y
Liao W, Zeng B, Yin X et al (2021) An improved aspect-category sentiment analysis model for text sentiment analysis based on roberta. Appl Intell 51(6):3522–3533. https://doi.org/10.1007/s10489-020-01964-1
Do P, Phan T H (2022) Developing a bert based triple classification model using knowledge graph embedding for question answering system. Appl Intell 52(1):636–651. https://doi.org/10.1007/s10489-021-02460-w
Yan M, Pan Y (2022) Meta-learning for compressed language model: a multiple choice question answering study. Neurocomputing 487:181–189. https://doi.org/10.1016/j.neucom.2021.01.148
Lewis D D, Yang Y, Russell-Rose T et al (2004) Rcv1: a new benchmark collection for text categorization research. J Mach Learn Res 5(Apr):361–397. https://doi.org/10.5555/1005332.1005345
Abdelgawad L, Kluegl P, Genc E et al (2019) Optimizing neural networks for patent classification. In: Proc jt European conf Mach Learn Knowl Discov Databases. Springer, pp 688-703. https://doi.org/10.1007/978-3-030-46133-1_41
Kowsari K, Brown D E, Heidarysafa M et al (2017) Hdltex: Hierarchical deep learning for text classification. In: Proc IEEE int Conf Mach Learn Appl. IEEE, pp 364-371. https://doi.org/10.1109/ICMLA.2017.0-134
Yu W, Sun Z, Liu H et al (2018) Multi-level deep learning based e-commerce product categorization. In: Proc Int ACM SIGIR conf Res Develop Inf Retr Workshop E-Commer. ACM, pp 1-6
Perez A R, Martinez L M, Delfino J M (2017) Physicochemical stability and rheologic properties of a natural hydrating and exfoliating formulation beneficial for the treatment of skin xeroses. Latin American J Pharm 36:157–164
Zhang X, Zhang Q W, Yan Z et al (2021a) Enhancing label correlation feedback in multi-label text classification via multi-task learning. In: Proc Annu Meet Assoc Comput Linguist. ACL, pp 1190-1200. https://doi.org/10.18653/v1/2021.findings-acl.101
Zhang QW, Zhang X, Yan Z et al (2021b) Correlation-guided representation for multi-label text classification. In: Proc Int Jt Conf Artif Intell, pp 3363–3369. https://doi.org/10.24963/ijcai.2021/463
Chen B, Huang X, Xiao L et al (2020) Hyperbolic interaction model for hierarchical multi-label classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 7496–7503
Mao Y, Tian J, Han J et al (2019) Hierarchical text classification with reinforced label assignment. In: Proc Conf Empir Methods nat Lang Process. ACL, pp 445-455. https://doi.org/10.18653/v1/D19-1042
Lu J, Du L, Liu M et al (2020) Multi-label few/zero-shot learning with knowledge aggregated from multiple label graphs. In: Proc Conf Empir Methods nat Lang Process. ACL, pp 2935-2943. https://doi.org/10.18653/v1/2020.emnlp-main.235
Zhou J, Ma C, Long D et al (2020) Hierarchy-aware global model for hierarchical text classification. in: Proc. Annu. Meet. Assoc. Comput. ACL, Linguist., pp 1106–1117. https://doi.org/10.18653/v1/2020.acl-main.104
Deng Z, Peng H, He D et al (2021) Htcinfomax: a global model for hierarchical text classification via information maximization. In: Proc Conf North American Chapter Assoc Comput Linguist: Human lang Technol. ACL, pp 3259-3265. https://doi.org/10.18653/v1/2021.naacl-main.260
Gopal S, Yang Y (2013) Recursive regularization for large-scale classification with hierarchical and graphical dependencies. In: Proc ACM SIGKDD int Conf Knowl Discov Data Min. ACM, pp 257-265. https://doi.org/10.1145/2487575.2487644
Peng H, Li J, He Y et al (2018) Large-scale hierarchical text classification with recursively regularized deep graph-cnn. in: Proc, World Wide Web Conf., ACM. https://doi.org/10.1145/3178876.3186005
Yu Y, Sun Z, Sun C et al (2021) Hierarchical multilabel text classification via multitask learning. In: Proc IEEE int Conf Tools artif Intell. IEEE, pp 1138-1143. https://doi.org/10.1109/ICTAI52525.2021.00180
Wang R, Long S, Dai X et al (2021) Meta-lmtc: meta-learning for large-scale multi-label text classification. In: Proc Conf Empir Methods nat Lang Process. ACL, pp 8633-8646. https://doi.org/10.18653/v1/2021.emnlp-main.679
Chatterjee S, Maheshwari A, Ramakrishnan G et al (2021) Joint learning of hyperbolic label embeddings for hierarchical multi-label classification. In: Proc Conf European assoc Comput Linguist. ACL, pp 2829-2841. https://doi.org/10.48550/arXiv.2101.04997
Chai D, Wu W, Han Q et al (2020) Description based text classification with reinforcement learning. In: Proceedings of the international conference on machine learning. PMLR, pp 1371–1382
Hang JY, Zhang ML (2021) Collaborative learning of label semantics and deep label-specific features for multi-label classification. IEEE Trans Patt Anal Mach Intell. https://doi.org/10.1109/TPAMI.2021.3136592
Wang Z, Wang P, Huang L et al (2022) Incorporating hierarchy into text encoder: a contrastive learning approach for hierarchical text classification. In: Proc Annu Meet Assoc Comput Linguist. ACL, pp 7109-7119. https://doi.org/10.48550/arXiv.2203.03825
Veličković P, Cucurull G, Casanova A et al (2018) Graph attention networks. In: Proc Int Conf Learn Represent, pp 1–12. https://doi.org/10.48550/arXiv.1710.10903. https://openreview.net/forum?id=rJXMpikCZ
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: Proc Int Conf Learn Represent, pp 1–15. https://doi.org/10.48550/arXiv.1409.0473
Hinton G E, Krizhevsky A, Wang SD (2011) Transforming auto-encoders. In: Int Conf Artif Neural Netw. Springer, pp 44-51. https://doi.org/10.1007/978-3-642-21735-7_6
Aly R, Remus S, Biemann C (2019) Hierarchical multi-label classification of text with capsule networks
Peng H, Li J, Wang S et al (2019) Hierarchical taxonomy-aware and attentional graph capsule rcnns for large-scale multi-label text classification. IEEE Trans Knowl Data Eng 33(6):2505–2519. https://doi.org/10.1109/TKDE.2019.2959991
Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. In: Proc Adv Neural Inf Process Syst, pp 3859–3869. https://doi.org/10.5555/3294996.3295142
Zhao W, Peng H, Eger S et al (2019) Towards scalable and reliable capsule networks for challenging nlp applications. In: Proc Annu Meet Assoc Comput Linguist. ACL, pp 1549-1559. https://doi.org/10.18653/v1/P19-1150
Huang W, Zhou F (2020) Da-capsnet: dual attention mechanism capsule network. Sci Rep 10(1):1–13. https://doi.org/10.1038/s41598-020-68453-w
Xiang C, Zhang L, Tang Y et al (2018) Ms-capsnet: a novel multi-scale capsule network. IEEE Signal Process Lett 25(12):1850–1854. https://doi.org/10.1109/LSP.2018.2873892
Jeong T, Lee Y, Kim H (2019) Ladder capsule network. In: Proc Int Conf Mach Learn. PMLR, pp 3071-3079
Silla C N, Freitas A A (2011) A survey of hierarchical classification across different application domains. Data Min Knowl Discov 22(1):31–72. https://doi.org/10.1007/s10618-010-0175-9
Fürnkranz J, Hüllermeier E, Loza Mencía E et al (2008) Multilabel classification via calibrated label ranking. Mach Learn 73(2):133–153. https://doi.org/10.1007/s10994-008-5064-8
Johnson R, Zhang T (2015) Effective use of word order for text categorization with convolutional neural networks. In: Proc Conf North American Chapter Assoc Comput Linguist: Human lang Technol. ACL, pp 103-112. https://doi.org/10.3115/v1/N15-1011
Dumais S, Chen H (2000) Hierarchical classification of web content. In: Proc Int ACM SIGIR conf Res Develop Inf Retr. ACM, pp 256-263. https://doi.org/10.1145/345508.345593
Moyano J M, Gibaja E L, Cios K J et al (2018) Review of ensembles of multi-label classifiers: models, experimental study and prospects. Inf Fusion 44:33–45. https://doi.org/10.1016/j.inffus.2017.12.001
Fagni T, Sebastiani F (2010) Selecting negative examples for hierarchical text classification: an experimental comparison. J American Soc Inf Sci Technol 61(11):2256–2265. https://doi.org/10.5555/1869064.1869084
Banerjee S, Akkaya C, Perez-Sorrosal F et al (2019) Hierarchical transfer learning for multi-label text classification. In: Proc Annu Meet Assoc Comput Linguist. ACL, pp 6295-6300. https://doi.org/10.18653/v1/P19-1633
Krendzelak M, Jakab F (2019) Hierarchical text classification using cnns with local classification per parent node approach. In: Int Conf Emerg elearning technol Appl. IEEE, pp 460-464. https://doi.org/10.1109/ICETA48886.2019.9040022
Shimura K, Li J, Fukumoto F (2018) Hft-cnn: learning hierarchical category structure for multi-label short text categorization. In: Proc Conf Empir Methods nat Lang Process. ACL, pp 811-816. https://doi.org/10.18653/v1/D18-1093
Wehrmann J, Cerri R, Barros R (2018) Hierarchical multi-label classification networks. In: Proc Int Conf Mach Learn. PMLR, pp 5075–5084
Huang W, Chen E, Liu Q et al (2019) Hierarchical multi-label text classification: an attention-based recurrent network approach. In: Proc ACM int Conf Info Knowl Manag. ACM, pp 1051-1060. https://doi.org/10.1145/3357384.3357885
Zhang X, Xu J, Soh C et al (2022) La-hcn: Label-based attention for hierarchical multi-label text classification neural network. Expert Syst Appl 187:115–922. https://doi.org/10.1016/j.eswa.2021.115922
Scarselli F, Gori M, Tsoi A C et al (2008) The graph neural network model. IEEE Trans Neural Netw 20(1):61–80. https://doi.org/10.1109/TNN.2008.2005605
Chen H, Ma Q, Lin Z et al (2021) Hierarchy-aware label semantics matching network for hierarchical text classification. In: Proc Annu Meet Assoc Comput Linguist. ACL, pp 4370-4379. https://doi.org/10.18653/v1/2021.acl-long.337
Xu L, Teng S, Zhao R et al (2021) Hierarchical multi-label text classification with horizontal and vertical category correlations. In: Proc Conf Empir Methods nat Lang Process. ACL, pp 2459-2468. https://doi.org/10.18653/v1/2021.emnlp-main.190
Wu J, Xiong W, Wang W Y (2019) Learning to learn and predict: a meta-learning approach for multi-label classification. In: Proc Conf Empir Methods nat. lang process. ACL, pp 4354-4364. https://doi.org/10.18653/v1/D19-1444
Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: Proc int conf learn represent, pp 1–14. https://doi.org/10.48550/arXiv.1609.02907. https://openreview.net/forum?id=SJU4ayYgl
Yang F, Zhang H, Tao S (2022) Semi-supervised classification via full-graph attention neural networks. Neurocomputing 476:63–74. https://doi.org/10.1016/j.neucom.2021.12.077
Jo J, Baek J, Lee S et al (2021) Edge representation learning with hypergraphs. In: Proc Adv Neural Inf Process Syst, pp 1–13. https://doi.org/10.48550/arXiv.2106.15845. https://openreview.net/forum?id=vwgsqRorzz
Luo J, Li C, Fan Q et al (2022) A graph convolutional encoder and multi-head attention decoder network for tsp via reinforcement learning. Eng Appl Artif Intell 112:104–848. https://doi.org/10.1016/j.engappai.2022.104848
Ying Z, You J, Morris C et al (2018) Hierarchical graph representation learning with differentiable pooling. In: Proc Adv Neural Inf Process Syst, pp 4805–4815. https://doi.org/10.48550/arXiv.1806.08804
Ma Y, Wang S, Aggarwal CC et al (2019) Graph convolutional networks with eigenpooling. In: Proc ACM SIGKDD int Conf Knowl Discov Data Min. ACM, pp 723-731. https://doi.org/10.1145/3292500.3330982
Gallicchio C, Micheli A (2010) Graph echo state networks. In: Int jt Conf Neural Netw. IEEE, pp 1-8. https://doi.org/10.1109/IJCNN.2010.5596796
Gallicchio C, Micheli A (2013) Tree echo state networks. Neurocomputing 101:319–337. https://doi.org/10.1016/j.neucom.2012.08.017
Bruna J, Zaremba W, Szlam A et al (2014) Spectral networks and deep locally connected networks on graphs. In: Proc Int Conf Learn Represent, pp 1–14. https://doi.org/10.48550/arXiv.1312.6203. https://openreview.net/forum?id=DQNsQf-UsoDBa
Srivastava N, Hinton G, Krizhevsky A et al (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(56):1929–1958. https://doi.org/10.5555/2627435.2670313
Gu J, Tresp V (2020) Improving the robustness of capsule networks to image affine transformations. In: Proc IEEE/CVF conf Comput Vis Pattern Recognit. IEEE, pp 7285-7293. https://doi.org/10.1109/CVPR42600.2020.00731
Zhao W, Ye J, Yang M et al (2018) Investigating capsule networks with dynamic routing for text classification. In: Proc Conf Empir Methods nat Lang Process. ACL, pp 3110-3119. https://doi.org/10.18653/v1/D18-1350
Cheng Y, Zou H, Sun H et al (2022) Hsan-capsule: a novel text classification model. Neurocomputing 489:521–533. https://doi.org/10.1016/j.neucom.2021.12.064
Lai S, Xu L, Liu K et al (2015) Recurrent convolutional neural networks for text classification. In: Proc AAAI Conf Artif Intell, pp 2267–2273. https://doi.org/10.5555/2886521.2886636
Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. In: Proc Adv Neural Inf Process Syst, pp 6000–6010. https://doi.org/10.5555/3295222.3295349
Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proc Int Conf Learn Represent, pp 1–8. https://doi.org/10.5555/3104322.3104425. https://openreview.net/forum?id=rkb15iZdZB
Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: Proc Int Conf Mach Learn Workshop Deep Learn Audio Speech Lang Process, pp 1–6. https://doi.org/10.1.1.693.1422
Pereira RM, Costa YM, Silla CN (2021) Handling imbalance in hierarchical classification problems using local classifiers approaches. Data Min Knowl Discov 35(4):1564–1621. https://doi.org/10.1007/s10618-021-00762-8
Lin T Y, Goyal P, Girshick R et al (2017) Focal loss for dense object detection. In: Proc IEEE int Conf Comput Vis. IEEE, pp 2980-2988. https://doi.org/10.1109/ICCV.2017.324
Shah A, Sra S, Chellappa R et al (2022) Max-margin contrastive learning. In: Proc AAAI Conf Artif Intell, pp 8220–8230. https://doi.org/10.1609/aaai.v36i8.20796
Chen H, Sun M, Tu C et al (2016) Neural sentiment classification with user and product attention. In: Proc Conf Empir Methods nat Lang Process. ACL, pp 1650-1659. https://doi.org/10.18653/v1/D16-1171
Hochreiter S, Schmidhuber J (1997) Long Short-Term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Yang Z, Yang D, Dyer C et al (2016) Hierarchical attention networks for document classification. In: Proc conf north American chapter assoc comput linguist: human lang technol. ACL, pp 1480-1489. https://doi.org/10.18653/v1/N16-1174
Zhou P, Shi W, Tian J et al (2016) Attention-based bidirectional long short-term memory networks for relation classification. In: Proc Annu Meet Assoc Comput Linguist. ACL, pp 207-212. https://doi.org/10.18653/v1/P16-2034
Liu J, Chang WC, Wu Y et al (2017) Deep learning for extreme multi-label text classification. In: Proc Int ACM SIGIR conf Res Develop Inf Retr. ACM, pp 115-124. https://doi.org/10.1145/3077136.3080834
Mou L, Meng Z, Yan R et al (2016) How transferable are neural networks in NLP applications?. In: Proc Conf Empir Methods Nat Lang Process, pp 479–489. https://doi.org/10.18653/v1/D16-1046
Cho K, van Merriënboer B, Bahdanau D et al (2014) On the properties of neural machine translation: encode–decoder approaches. In: Proc Workshop Syntax Semant Struct Statistical Translation, pp 103–111. https://doi.org/10.3115/v1/W14-4012
Devlin J, Chang M W, Lee K et al (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proc conf north American chapter assoc comput linguist: human lang technol. ACL, pp 4171-4186. https://doi.org/10.18653/v1/N19-1423
Nguyen M T, Le D T, Le L (2021) Transformers-based information extraction with limited data for domain-specific business documents. Eng Appl Artif Intell 97:104–100. https://doi.org/10.1016/j.engappai.2020.104100
Uymaz HA, Metin SK (2022) Vector based sentiment and emotion analysis from text: a survey. Eng Appl Artif Intell 113:104–922. https://doi.org/10.1016/j.engappai.2022.104922
Beltagy I, Lo K, Cohan A (2019) Scibert: a Pretrained language model for scientific text. In: Proc Conf Empir Methods nat Lang Process. ACL, pp 3615-3620. https://doi.org/10.18653/v1/D19-1371
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: Proc Int Conf Learn Represent, pp 1–15. https://doi.org/10.48550/arXiv.1412.6980. https://openreview.net/forum?id=8gmWwjFyLj
Wang B, Hu X, Li P et al (2021) Cognitive structure learning model for hierarchical multi-label text classification. Knowl-Based Syst 218:106–876. https://doi.org/10.1016/j.knosys.2021.106876
Abuselidze G (2019) Modern challenges of monetary policy strategies: inflation and devaluation influence on economic development of the country. Acad Strateg Manag J 18(4):1–10
Park J, Cho J, Chang H J et al (2021) Unsupervised hyperbolic representation learning via message passing auto-encoders. In: Proc IEEE/CVF conf Comput Vis Pattern Recognit. IEEE, pp 5516-5526. https://doi.org/10.1109/CVPR46437.2021.00547
Acknowledgements
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. NRF-2019R1F1A1053366). The Institute of Engineering Research at Seoul National University provided research facilities for this work.
Funding
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. NRF-2019R1F1A1053366). The Institute of Engineering Research at Seoul National University provided research facilities for this work.
Author information
Authors and Affiliations
Contributions
All authors contributed to the conceptualization. Methodology design, analysis, and visualization were performed by Jinhyun Bang. Funding was acquired by Jonghun Park. This study was supervised by Jonghun Park and Jonghyuk Park. Original draft of the manuscript was written by Jinhyun Bang and reviewed, edited, and approved by all authors.
Corresponding author
Ethics declarations
Competing interests
The authors have no relevant financial or non-financial interests to disclose other than the aforementioned funding.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bang, J., Park, J. & Park, J. GACaps-HTC: graph attention capsule network for hierarchical text classification. Appl Intell 53, 20577–20594 (2023). https://doi.org/10.1007/s10489-023-04585-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04585-6