Skip to main content
Log in

BoW-based neural networks vs. cutting-edge models for single-label text classification

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

To reliably and accurately classify complicated "big" datasets, machine learning models must be continually improved. This research proposes straightforward yet competitive neural networks for text classification, even though graph neural networks (GNN) have reignited interest in graph-based text classification models. Convolutional neural networks (CNN), artificial neural networks (ANN), and their refined “fine-tuned” models (denoted as FT-CNN and FT-ANN) are the names given to our proposed models. The models presented in this paper demonstrate that our simple models like (CNN, ANN, FT-CNN, and FT-ANN) can perform better than more complex GNN ones such as (SGC, SSGC, and TextGCN) and are comparable to others (i.e., HyperGAT and Bert). The process of fine-tuning is also highly recommended because it improves the performance and reliability of models. The performance of our suggested models on five benchmark datasets (namely, Reuters (R8), R52, 20NewsGroup, Ohsumed, and Mr) is vividly illustrated. According to the experimental findings, on the majority of the target datasets, these models—especially those that have been fine-tuned—perform surprisingly better than SOTA approaches, including GNN-based models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Data availability statement

The datasets used in this work are publicly available. When the paper is accepted, the source code would exist in our Github repository (https://github.com/aliamer).

Abbreviations

SOTA:

State-of-the-art

BoW:

Bag of Words

TFIDF:

Term frequency—Inverse document frequency

VSM:

Vector space model

GNN:

Graph neural networks

CNN:

Convolutional neural networks

ANN:

Artificial neural networks

CNN-rand:

CNN randomly initializes word embeddings

CNN-pre:

CNN uses pretrained word embeddings.

FT-CNN:

Fine-tuned Convolutional neural networks

FT-ANN:

Fine-tuned artificial neural networks

SGC:

Simplifying Graph Convolutional Networks

SSGC:

Simple Spectral Graph Convolutional Networks

TextGCN:

Text Graph Convolutional Network

HyperGAT:

Hypergraph Attention Networks

BERT:

Bidirectional Encoder Representations from Transformers

DAN:

Deep Averaging Networks

MLP:

Multilayer Perceptron

HeteGCN:

Heterogeneous graph convolutional network

TensorGCN:

Tensor Graph Convolutional Networks

GAT:

Graph Attention Network

LSTM:

Long short-term memory

Bi-LSTM:

Bidirectional LSTM

GloVe:

Global vectors for word representation

ReLU:

Rectified linear unit

SVM:

Support vector machine

SWEM:

Simple word embedding-based model

NGNN:

Network in graph neural network model

T-VGAE:

Topic variational graph auto-encoder

CFE:

Category-based feature engineering model

Han-LT:

Heterogeneous attention network for semi-supervised long text classification.

InducT-GCN:

An inductive text classification model based on GCN

Syntax-AT-Capsule:

An enhanced capsule network text classification model

References

  1. Frank M, Drikakis D, Charissis V (2020) Machine-learning methods for computational science and engineering. Computation 8(1):15

    Article  Google Scholar 

  2. Abdalla HI, Amer AA (2022) On the integration of similarity measures with machine learning models to enhance text classification performance. Inf Sci 614:263–288

    Article  Google Scholar 

  3. Diera A, Lin BX, Khera B, Meuser T, Singhal T, Galke L, Scherp A (2022) Bag-of-words vs. sequence vs. graph vs. hierarchy for single-and multi-label text classification. arXiv preprint arXiv:2204.03954

  4. Wu F, Souza A, Zhang T, Fifty C, Yu T, Weinberger, K (2019) Simplifying graph convolutional networks. In International conference on machine learning (pp. 6861–6871). PMLR

  5. Zhu, H., & Koniusz, P. (2021, May). Simple spectral graph convolution. In International conference on learning representations

  6. Ruan S, Chen B, Song K, Li H (2022) Weighted naïve Bayes text classification algorithm based on improved distance correlation coefficient. Neural Comput Appl 34(4):2729–2738

    Article  Google Scholar 

  7. Zhang L, Jiang L, Li C (2019) A discriminative model selection approach and its application to text classification. Neural Comput Appl 31(4):1173–1187

    Article  Google Scholar 

  8. Jiang M, Liang Y, Feng X, Fan X, Pei Z, Xue Y, Guan R (2018) Text classification based on deep belief network and softmax regression. Neural Comput Appl 29(1):61–70

    Article  Google Scholar 

  9. Mirończuk MM, Protasiewicz J (2018) A recent overview of the state-of-the-art elements of text classification. Expert Syst Appl 106:36–54

    Article  Google Scholar 

  10. Liu S, Nimah I, Menkovski V, Mocanu DC, Pechenizkiy M (2021) Efficient and effective training of sparse recurrent neural networks. Neural Comput Appl 33(15):9625–9636

    Article  Google Scholar 

  11. Guo S, Yao N (2020) Generating word and document matrix representations for document classification. Neural Comput Appl 32(14):10087–10108

    Article  Google Scholar 

  12. Jacob D, Ming-Wei C, Kenton L, Kristina T (2019) BERT: pre-training of deep bidirectional transformers for language understanding. n NAACL-HLT (1), pages 4171–4186.ACL

  13. Victor S, Lysandre D, Julien C, Thomas W (2019) Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR, abs/1910.01108

  14. Yi T, Mostafa D, Dara B, Donald M (2020) Efficient transformers: a survey. CoRR, abs/2009.06732

  15. Quentin Fournier, Gaétan Marceau Caron, and Daniel Aloise. 2021. A practical survey on faster and lighter transformers. CoRR, abs/2103.14636

  16. Tolstikhin IO, Houlsby N, Kolesnikov A, Beyer L, Zhai X, Unterthiner T, Yung J, Steiner A, Keysers D, Uszkoreit J et al (2021) MLP-mixer: an all- MLP architecture for vision. Adv Neural Inform Process Syst 34:24261

    Google Scholar 

  17. Hanxiao L, Zihang D, David RS, Quoc VL (2021a) Pay attention to MLPs. CoRR, abs/2105.08050

  18. Rahul R, Sundararajan S, Arun I, Ramakrishna B, Vijay L (2021) HeteGCN: heterogeneous graph convolutional networks for text classification. In WSDM, pages 860–868. ACM

  19. Hui L, Danqing Z, Bing Y, Xiaodan Z (2021b) Improving pretrained models for zero-shot multi-label text classification through reinforced label hierarchy reasoning. arXiv preprint arXiv:2104.01666

  20. Huang L, Ma D, Li S, Zhang X, Wang H (2019) Text level graph neural network for text classification. arXiv preprint. arXiv:1910.02356

  21. Xie Q, Huang J, Du P, Peng M, Nie JY (2021). Inductive topic variational graph auto-encoder for text classification. In: proceedings of the 2021 conference of the North American chapter of the association for computational linguistics: human language technologies (pp. 4218–4227)

  22. Attieh J, Tekli J (2023) Supervised term-category feature weighting for improved text classification. Knowl-Based Syst 261:110215

    Article  Google Scholar 

  23. Kaize D, Jianling W, Jundong L, Dingcheng L, Huan L (2020) Be more with less: hypergraph attention networks for inductive text classification. In EMNLP (1), pages 4927–4936. ACL

  24. Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146

    Article  Google Scholar 

  25. Mohit I, Varun M, Jordan LB-G, Hal D (2015) Deep unordered composition rivals syntactic methods for text classification. In ACL 1:1681–1691

    Google Scholar 

  26. Zhang Y, Jin R, Zhou ZH (2010) Understanding bag-of-words model: a statistical framework. Int J Mach Learn Cybern 1(1–4):43–52. https://doi.org/10.1007/s13042-010-0001-0

    Article  Google Scholar 

  27. Wang S, Manning CD (2012) Baselines and bigrams: simple, good sentiment and topic classification. In 50th annual meeting of the association for computational linguistics, ACL 2012 - proceedings of the conference (Vol. 2, pp. 90–94).

  28. Kim Y (2014) Convolutional neural networks for sentence classification. In EMNLP 2014 - 2014 conference on empirical methods in natural language processing, proceedings of the conference (pp. 1746–1751). Association for computational linguistics (ACL). https://doi.org/10.3115/v1/d14-1181

  29. Tai KS, Socher R, Manning CD (2015) Improved semantic representations from tree-structured long short-Term memory networks. In ACL-IJCNLP 2015 - 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing of the Asian federation of natural language processing, proceedings of the conference (Vol. 1, pp. 1556–1566). Association for computational linguistics (ACL). https://doi.org/10.3115/v1/p15-1150

  30. Liu P, Qiu X, Xuanjing H (2016) Recurrent neural network for text classification with multi-task learning. In IJCAI international joint conference on artificial intelligence (Vol. 2016-January, pp. 2873–2879). International joint conferences on artificial intelligence.

  31. Wang Y, Huang M, Zhao L, Zhu X (2016) Attention-based LSTM for aspect-level sentiment classification. In EMNLP 2016-conference on empirical methods in natural language processing, proceedings (pp. 606–615). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/d16-1058

  32. Yao L, Mao C, Luo Y (2019) Graph convolutional networks for text classification. In 33rd AAAI conference on artificial intelligence, AAAI 2019, 31st innovative applications of artificial intelligence conference, IAAI 2019 and the 9th AAAI symposium on educational advances in artificial intelligence, EAAI 2019 (pp. 7370–7377). AAAI Press. https://doi.org/10.4000/books.aaccademia.4577

  33. Peng H, Li J, He Y, Liu Y, Bao M, Wang L, Yang Q (2018) Large-scale hierarchical text classification with recursively regularized deep Graph-CNN. In the web conference 2018-proceedings of the World Wide Web conference, WWW 2018. Association for Computing Machinery, Inc. (pp. 1063–1072) https://doi.org/10.1145/3178876.3186005

  34. Zhang Y, Yu X, Cui Z, Wu S, Wen Z, Wang L (2020) Every document owns its structure: inductive text classification via graph neural networks. Associat Computat Linguist (ACL). https://doi.org/10.18653/v1/2020.acl-main.31

    Article  Google Scholar 

  35. Joulin A, Grave E, Bojanowski P, Mikolov T (2017) Bag of tricks for efficient text classification. EACL 2:427–431

    Google Scholar 

  36. Shen D, Wang G, Wang W, Min MR, Qinliang S, Zhang Y, Li C, Henao R, Carin L (2018) Baseline needs more love: on simple wordembedding-based models and associated pooling mechanisms. ACL 1:440–450

    Google Scholar 

  37. Peng Zhou, Zhenyu Qi, Suncong Zheng, Jiaming Xu, Hongyun Bao, and Bo Xu. 2016. Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. In COLING, pages 3485–3495. ACL

  38. Siwei L, Liheng X, Kang L, Jun Z (2015) Recurrent convolutional neural networks for text classification. In AAAI, pages 2267–2273. AAAI Press

  39. Santiago G-C, Eduardo CG-M (2020) Comparing BERT against traditional machine learning text classification. CoRR, abs/2005.13012

  40. Yukio O, Nels EB, Masahiko Y (1998) Keygraph: Automatic indexing by co-occurrence graph based on building construction metaphor. In ADL, pages 12–18. IEEE Computer Society

  41. Lu Z, Jiandong D, Yi X, Yingyao L, Shuigeng Z (2021) Weakly-supervised text classification based on keyword graph. In EMNLP (1), pages 2803–2813. Association for Computational Linguistics

  42. Hamilton WL (2020) Graph representation learning. Springer International Publishing, Cham

    Book  MATH  Google Scholar 

  43. Jian T, Meng Q, Qiaozhu M (2015) PTE: predictive text embedding through large-scale heterogeneous text networks. In KDD, pages 1165– 1174. ACM

  44. Thomas N. Kipf and Max Welling (2017) Semisupervised classification with graph convolutional networks. In ICLR (Poster). OpenReview.net.

  45. Kingma DP, Ba J (2014). Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980

  46. Hamilton WL, Ying R, Leskovec J (2017) Inductive representation learning on large graphs. In: advances in neural information processing systems (Vol. 2017-December, pp. 1025–1035). Neural information processing systems foundation

  47. Humayun MA, Yassin H, Shuja J, Alourani A, Abas PE (2022) A transformer fine-tuning strategy for text dialect identification. Neural Comput Appl 35:1–10

    Google Scholar 

  48. Naser A, Aydemir O (2022) Classification of pleasant and unpleasant odor imagery EEG signals. Neural Comput Appl. https://doi.org/10.1007/s00521-022-08171-8

    Article  Google Scholar 

  49. Wang Z, Bai Y, Zhou Y, Xie C (2022) Can CNNs Be More Robust Than Transformers? arXiv preprint arXiv:2206.03452

  50. Fey M, Lenssen JE, Weichert F, Leskovec J (2021). Gnnautoscale: scalable and expressive graph neural networks via historical embeddings. In: international conference on machine learning (pp. 3294–3304). PMLR

  51. Haonan L, Huang SH, Ye T, Xiuyan G (2019) Graph star net for generalized multi-task learning. arXiv preprint arXiv:1906.12330

  52. Pham P, Nguyen LT, Pedrycz W, Vo B (2022) Deep learning, graph-based text representation and classification: a survey, perspectives and challenges. Artific Intell Rev 12:1–35

    Google Scholar 

  53. Galke L, Scherp A (2022) Bag-of-words vs. graph vs. sequence in text classification: Questioning the necessity of text-graphs and the surprising strength of a wide MLP. In Proceedings of the 60th annual meeting of the association for computational linguistics (Volume 1: Long Papers) (pp. 4038–4051)

  54. Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Amodei D (2020) Language models are few-shot learners. Adv Neural Inf Process Syst 33:1877–1901

    Google Scholar 

  55. Rajpurkar P, Zhang J, Lopyrev K, Liang P (2016) Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250

  56. Wang A, Singh A, Michael J, Hill F, Levy O, Bowman SR (2018) GLUE: a multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461

  57. Poulinakis K, Drikakis D, Kokkinakis IW, Spottswood SM (2023) Machine-learning methods on noisy and sparse data. Mathematics 11(1):236

    Article  Google Scholar 

  58. Du J, Huang Y, Moilanen K (2020) Pointing to select: a fast pointer-LSTM for long text classification. In: proceedings of the 28th international conference on computational linguistics (pp. 6184–6193)

  59. Lin, M., & Qiang Chen, S. Y. (2012). Imagenet classification with deep convolutional neural networks.

  60. Wang K, Han SC, & Poon J (2022) InducT-GCN: Inductive graph convolutional networks for text classification. arXiv preprint arXiv:2206.00265

  61. Jia X, Wang L (2022) Attention enhanced capsule network for text classification by encoding syntactic dependency trees with graph convolutional neural network. Peer J Computer Science 7:e831. https://doi.org/10.7717/peerj-cs.831

    Article  Google Scholar 

  62. Ai W, Wang Z, Shao H, Meng T, Li K (2023) A multi-semantic passing framework for semi-supervised long text classification. Appl Intell. https://doi.org/10.1007/s10489-023-04556-x

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to express their gratitude for Research Office (Zayed University) for financing and providing the tools needed to complete this work. Research Incentive Fund (RIF) Grant Activity Code: R22083-Zayed University, UAE.

Funding

This work has been supported by Research Incentive Fund (RIF) Grant Activity Code: R22083—Zayed University, UAE.

Author information

Authors and Affiliations

Authors

Contributions

HIA has been a key contributor in conception, design, analyzing, drafting the results of all experiments, and revising the final version of the manuscript. AAA has been a key contributor in conception and design, implementing the approach, analyzing the results of all experiments, and the preparation, writing, and revising of the final version of the manuscript. SDR has contributed by analyzing the results, drafting the manuscript, and reviewing the final version of this manuscript.

Corresponding author

Correspondence to Ali A. Amer.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abdalla, H.I., Amer, A.A. & Ravana, S.D. BoW-based neural networks vs. cutting-edge models for single-label text classification. Neural Comput & Applic 35, 20103–20116 (2023). https://doi.org/10.1007/s00521-023-08754-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-08754-z

Keywords

Navigation