Skip to main content
Log in

Text categorization: past and present

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Automatic text categorization is the operation of sorting out the text documents into pre-defined text categories using some machine learning algorithms. Normally, it defines the most important approaches to organizing and making the use of a large volume of information exists in unstructured form. Nowadays, text categorization is becoming an extensively researched field of text mining and processing of languages. Word sense, semantic relationships among terms, text documents and categories are quite essential in order of enhancing the performances of categorization. Various surveys on text categorization have already been available which involve techniques of various text representation schemes to such extent but do not include several approaches that have been explored in text categorization over the standard techniques. Here, an exhaustive analysis of different text categorization approaches over the conventional approaches has been undertaken. This survey paper explores a wide variety of algorithms used for categorizing text documents and tries to assemble the existing works into three basic fields: conventional methods, fuzzy logic-based methods, deep learning-based methods. Further, conventional methods have been categorized into three fields: text categorization using handcrafted features, text categorization using nature-inspired algorithms and text categorization using graph-based methods. Furthermore, this survey provides a clear idea about the available libraries used for different algorithms, availability of datasets, categorization technologies explored in various non-Indian and Indian languages as well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Abutiheen ZA, Aliwy AH, Aljanabi KBS (2018) Arabic text classification using master-slaves technique. In: Proceedings of the scientific conference on renewable energy and its applications, pp 1–10

  • Al-Harbi S, Almuhareb A, Al-Thubaity A, Khorsheed MS, Al-Rajeh A (2008) Automatic arabic text classification. In: Proceedings of the international conference on the statistical analysis of textual data, pp 77–83

  • Al-Radaideh QA, Al-Khateeb SS (2015) An associative rule-based classifier for Arabic medical text. Int J Knowl Eng Data Mining 03:255–273

    Google Scholar 

  • Al-Taani AT, Al-Awad NAK (2009) An empirical analysis of Arabic webpages classification using fuzzy operators. Int J Comput Inf Eng 03:671–676

    Google Scholar 

  • Al-Tahrawi MM (2015) Arabic text categorization using logistic regression. Int J Intell Syst Appl 06:71–78

    Google Scholar 

  • Alam MT, Islam MM (2018) Bard: Bangla article classification using a new comprehensive dataset. In: Proceedings of the international conference on Bangla speech and language rocessing

  • Ali AR, Ijaz M (2009) Urdu text classification. In: Proceedings of the international conference on frontiers of information technology, pp 1–7

  • Aly W, Kelleny HA (2014) Adaptation of cuckoo search for documents clustering. Int J Compu Appl 86:4–10

    Google Scholar 

  • Asim MN, Wasim M, Ali MS, Rehman A (2017) Comparison of feature selection methods in text classification on highly skewed datasets. In: Proceedings of the international conference on latest trends in electrical engineering and computing technologies (INTELLECT), p 8

  • Baltrusaitis T, Ahuja C, Morency LP (2019) Multimodal machine learning: a survey and taxonomy. IEEE Trans Pattern Anal Mach Intell 41:423–443

    Google Scholar 

  • Basu A, Watters C, Shepherd M (2003) Support vector machines for text categorization. In: Proceedings of the annual Hawaii international conference on system sciences (HICSS’03), pp 137–142

  • Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35:1798–1828

    Google Scholar 

  • Bidi N, Elberrichi Z (2016) Feature selection for text classification using genetic algorithms. In: Proceedings of IEEE international conference on modelling identification and control, pp 806–810

  • Bijalwan V, Kumar V, Kumari P, Pascual J (2014) KNN based machine learning approach for text and document mining. Int J Database Theory Appl 07(01):61–70

    Google Scholar 

  • Boukil S, Biniz M, Adnani FE, Cherrat L, Moutaouakkil AEE (2018) Arabic text classification using deep learning technics. Int J Grid Distrib Comput 11:103–114

    Google Scholar 

  • Chen J, Huang H, Tian S, Qu Y (2009) Feature selection for text classification with naïve bayes. Expert Syst Appl 36:5432–5435

    Google Scholar 

  • Collobert R, Weston J, Bottou L, Karlen M, Kavukcuglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537

    MATH  Google Scholar 

  • Cordobés H, Fernández A, Chiroque LF, Pérez F, Redondo T, Santos A (2014) Graph-based techniques for topic classification of tweets in Spanish. Int J Artif Intell Interac Multimed 02:31–37

    Google Scholar 

  • Cortez P, Moro S, Rita P, King D, Hall J (2018) Insights from a text mining survey on expert systems research from 2000 to 2016. Expert Syst 35:10

    Google Scholar 

  • Cozman F, Cohen I, Cirelo M (2003) Semi-supervised learning of mixture models. In: Proceedings of the international conference on machine learning

  • Dasondi V, Pathak M, Rathore NPS (2016) An implementation of graph based text classification technique for social media. In: Proceedings of symposium on colossal data analysis and networking (CDAN), p 07

  • DeySarkar S, Goswami S, Agarwal A, Aktar J (2014) A novel feature selection technique for text classification using naïve bayes. Int Sch Res Not 2014:10

    Google Scholar 

  • Dhar A, Dash NS, Roy K (2017) Application of TF-IDF feature for categorizing documents of online Bangla web text corpus. In: Proceedings of the international Ccnference on frontiers of intelligent computing: theory and applications, pp 51–60

  • Dhar A, Dash NS, Roy K (2017) Classification of text documents through distance measurement: an experiment with multi-domain bangla text documents. In: Proceedings of the international conference on advances in computing, communication and automation, pp 1–6

  • Dhar A, Dash NS, Roy K (2018) Categorization of Bangla web text documents based on TF-IDF-ICF text analysis scheme. In: Proceedings of the 52nd annual convention of the computer society of India, pp 477–484

  • Dhar A, Dash NS, Roy K (2018) Classification of Bangla text documents based on inverse class frequency. In: Proceedings of the international conference on internet of things: smart innovation and usages, pp 1–6

  • Dhar A, Dash NS, Roy K (2018) A fuzzy logic-based Bangla text classification for web text documents. J Adv Linguist Stud 07:159–187

    Google Scholar 

  • Dhar A, Dash NS, Roy K (2018) An innovative method of feature extraction for text classification using part classifier. In: Proceedings of the international conference information, communication and computing technology, pp 131–138

  • Dogan T, Uysal AK (2019) On term frequency factor in supervised term weighting schemes for text classification. Arab J Sci Eng 44:9545–9560

    Google Scholar 

  • Dumais S, Platt J, Heckerman D, Sahami M (1998) Inductive learning algorithms and representations for text categorization. In: Proceedings of the international conference on information and knowledge management, pp 148–155

  • el Ameen A, Shaout A (2014) Fuzzy arabic document classification. In: Proceedings of the international Arab conference on information technology (ACIT2014), pp 1–5

  • El-Halees AM (2007) Arabic text classification using maximum entropy. Islam Univ J (Ser Nat Stud Eng) 15:157–167

    Google Scholar 

  • El Kourdi M, Bensaid A, Rachidi Te (2004) Automatic arabic document categorization based on the naïve bayes algorithm. In: Proceedings of the workshop on computational approaches to Arabic script-based languages, pp 51–58

  • Elberrichi Z, Abidi K (2012) Arabic text categorization: a comparative study of different representation models. Int Arab J Inf Technol 09:465–470

    Google Scholar 

  • Farhoodi M, Yari A (2010) Applying machine learning algorithms for automatic persian text classification. In: Proceedings of the international conference on advanced information management and service (IMS), pp 318–323

  • Feng G, Li S, Sun T, Zhang B (2018) A probabilistic model derived term weighting scheme for text classification. Pattern Recognit Lett 110:23–29

    Google Scholar 

  • Fu G, Wang X (2010) Chinese sentence-level sentiment classification based on fuzzy sets. In: Proceedings of the international conference on computational linguistics, pp 312–319

  • Gu C, Wu M, Zhang C (2017) Chinese sentence classification based on convolutional neural network. IOP Conf Ser Mater Sci Eng 261:012008

    Google Scholar 

  • Guelpeli MV, Garcia ACB, Bernardini FC (2010) An analysis of constructed categories for textual classification using fuzzy similarity and agglomerative hierarchical methods. In: Proceedings of the emergent web intelligence: advanced semantic technologies, pp 277–306

  • Gupta N, Gupta V (2012) Punjabi text classification using naive bayes, centroid and hybrid approach. In: Proceedings of the international workshop on computer networks & communications, pp 109–122

  • Guru DS, Suhil M (2015) A novel term\_class relevance measure for text categorization. Proc Comput Sci 45:13–22

    Google Scholar 

  • Guru DS, Suhil M, Raju LN, Kumar NV (2018) An alternative framework for univariate filter based feature selection for text categorization. Pattern Recognit Lett 103:23–31

    Google Scholar 

  • Haralambous Y, Elidrissi Y, Lenca P (2014) Arabic language text classification using dependency syntax-based feature selection. In: Proceedings of the international colloquium on automata, languages and programming, p 10

  • He J, Tan AH, Tan CL (2000) A comparative study on Chinese text categorization methods. In: Proceedings of the international conference on text and web mining, pp 24–35

  • Hemmatian F, Sohrabi MK (2017) A survey on classification techniques for opinion mining and sentiment analysis. Artif Intell Rev 1–51

  • Huang EH, Socher R, Manning CD, Ng AY (2012) Improving word representations via global context and multiple word prototypes. In: Proceedings of the annual meeting of the association for computational linguistics, pp 873–882

  • Islam MS, Jubayer FEM, Ahmed SI (2017) A support vector machine mixed with TF-IDF algorithm to categorize bengali document. In: Proceedings of the international conference on electrical, computer and communication engineering, pp 191–196

  • Jayashree R, Srikanta MK (2011) An analysis of sentence-level text classification for the Kannada language. In: Proceedings of IEEE conference on soft computing and pattern recognition (SoCPaR), pp 147–151

  • Jiang C, Coenen F, Sanderson R, Zito M (2010) Text classification using graph mining-based feature extraction. Int J Eng Res Appl 23:3028–3308

    Google Scholar 

  • Jiang JY, Liou RJ, Lee SJ (2011) A fuzzy self-constructing feature clustering algorithm for text classification. IEEE Trans Knowl Data Eng 23:335–349

    Google Scholar 

  • Jiang M, Liang Y, Feng X, Fan X, Pei Z, Xue Y, Guan R (2018) Text classification based on deep belief network and softmax regression. Neural Comput Appl 29:61–70

    Google Scholar 

  • Jin Y, Xiong W, Wang C (2010) Feature selection for Chinese text categorization based on improved particle swarm optimization. In: Proceedings of the international conference on natural language processing and knowledge engineering (NLPKE-2010), p 6

  • Jin P, Zhang Y, Chen X, Xia Y (2016) Bag-of embeddings for text classification. In: Proceedings of the international joint conference on artificial intelligence, pp 2824–2830

  • Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Proceedings of the European conference on machine learning, pp 137–142

  • Kabir F, Siddique S, Kotwal MRA, Huda MN (2015) Bangla text document categorization using stochastic gradient descent (SGD) classifier. In: Proceedings of the international conference on cognitive computing and information processing (CCIP), pp 1–4

  • Kadhim AI (2019) Survey on supervised machine learning techniques for automatic text classification. Artif Intell Rev 52:273–292

    Google Scholar 

  • Kanapala A, Pal S, Pamula R (2019) Text summarization from legal documents: a survey. Artif Intell Rev 51:371–402

    Google Scholar 

  • Kavuri D, Kumar PA, Rao DVS (2012) Text and image classification using fuzzy similarity based self constructing algorithm. Int J Eng Sci Adv Technol 02:1572–1576

    Google Scholar 

  • Khamar K (2013) Short text classification using KNN based on distance function. Int J Adv Res Comput Commun Eng 02(04):1916–1919

    Google Scholar 

  • Khoury R, Karray F, Kamel M (2005) A fuzzy classifier for natural language text using automatically-learned fuzzy rules. In: Proceedings of the international conference on artificial and computational intelligence for decision, control and automation, p 6

  • Khreisat L (2006) Arabic text classification using N-gram frequency statistics a comparative study. In: Proceedings of the international conference on data mining, pp 78–82

  • Kim SB, Han KS, Rim HC, Myaeng SH (2006) Some effective techniques for Naive bayes text classification. IEEE Trans Knowl Data Eng 18(11):1457–1466

    Google Scholar 

  • Klir GJ, Yuan B (1995) Fuzzy sets and fuzzy logic: theory and applications, 1st edn. Prentice-Hall, Saddle River, NJ

    MATH  Google Scholar 

  • Kosko B (1994) Fuzzy thinking: the new science of fuzzy logic. Hypercollins, UK

    MATH  Google Scholar 

  • Kowsari K, Heidarysafa M, Brown DE, Meimandi KJ, Barnes LE (2018) RMDL: Random multimodel deep learning for classification. In: Proceedings of the international conference on information system and data mining, p 11

  • Kulhari A, Pandey A, Pal R, Mittal H (2016) Unsupervised data classification using modified cuckoo search method. In: Proceedings of the international conference on contemporary computing (IC3), pp 1–5

  • Kumari L (2013) Improved graph based KNN text classification. Int J Eng Res Appl 03:928–931

    Google Scholar 

  • Labani M, Moradi P, Ahmadizar F, Jalili M (2018) A novel multivariate filter method for feature selection in text classification problems. Eng Appl Artif Intell 70:25–37

    Google Scholar 

  • Lan M, Tan CL, Su J, Lu Y (2009) Supervised and traditional term weighting methods for automatic text categorization. IEEE Trans Pattern Anal Mach Intell 31:721–735

    Google Scholar 

  • Lebanon G (2006) Metric learning for text documents. IEEE Trans Pattern Anal Mach Intell 28:497–508

    Google Scholar 

  • Lewis DD (1992) Feature selection and feature extraction for text categorization. In: Proceedings of the workshop on speech and natural language, pp 212–217

  • Lewis DD, Yang Y, Rose TG, Li F (2004) RCV1: a new benchmark collection for text categorization research. J Mach Learn Res 5:361–397

    Google Scholar 

  • Lin H (2014) Research on energy-efficient text classification. In: Proceedings of the international conference on information technology and electronic commerce, pp 257–261

  • Linh NV, Anh NK, Than K, Dang CN (2017) An effective and interpretable method for document classification. Knowl Inf Syst 50:763–793

    Google Scholar 

  • Liu T (2010) A novel text classification approach based on deep belief network. In: Proceedings of the international conference on neural information processing, pp 314–321

  • Liu WY, Song N (2003) A fuzzy approach to classification of text documents. J Comput Sci Technol 18:640–647

    MathSciNet  MATH  Google Scholar 

  • Liu R, Zhou J, Liu M (2006) A graph-based semi-supervised learning algorithm for web page classification. In: Proceedings of the international conference on intelligent systems design and applications, pp 856–860

  • Liu Z, Lv X, Liu K, Shi S (2010) Study on SVM compared with the other text classification methods. In: Proceedings of the international workshop on education technology and computer science, pp 219–222

  • Malliaros FD, Skianis K (2015) Graph-based term weighting for text categorization. In: Proceedings of the international conference on advances in social networks analysis and mining (ASONAM), pp 1473–1479

  • Mandal AK, Sen R (2014) Supervised learning methods for Bangla web document categorization. Int J Artif Intell Appl 05:93–105

    Google Scholar 

  • Manikandan R, Sivakumar R (2018) Machine learning algorithms for text-documents classification: A review. Mach Learn 3

  • Mansur M, UzZaman N, Khan M (2006) Analysis of N-gram based text categorization for Bangla in a newspaper corpus. In: Proceedings of ICESA, p 6

  • Marie-Sainte SL, Alalyani N (2018) Firefly algorithm based feature selection for Arabic text classification. J King Saud Univ Comput Inf Sci 32:320–328

    Google Scholar 

  • Mesleh AMdA (2007) Chi square feature extraction based SVMS Arabic language text categorization system. J Comput Sci 3:430–435

    Google Scholar 

  • Mikawa K, Ishidat T, Goto M (2011) A proposal of extended cosine measure for distance metric learning in text classification. In: Proceedings of the international conference on systems, man, and cybernetics, pp 1741–1746

  • Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of the international conference on neural information processing systems, pp 3111–3119

  • Mnih A, Hinton G (2007) Three new graphical models for statistical language modelling. In: Proceedings of the international conference on machine learning, pp 641–648

  • Mohammad AH, Al-Momani O, Alwada’n T (2016) Arabic text categorization using k-nearest neighbour, decision trees (c4.5) and Rocchio classifier: a comparative study. Int J Curr Eng Technol 06:477–482

    Google Scholar 

  • Mohanty S, Santi PK, Mishra R, Mohapatra RN, Swain S (2006) Semantic based text classification using wordnets: Indian language perspective. In: Proceedings of international conference on electrical, computer and communication engineering, pp 321–324

  • Murtaza G, Shuib L, Wahab AWA, Mujtaba G, Nweke HF, Al-garadi MA, Zulfiqar F, Raza G, Azmi NA (2019) Deep learning-based breast cancer classification through medical imaging modalities: state of the art and research challenges. Artif Intell Rev pp 1–66

  • Murthy KN (2003) Automatic categorization of Telugu news articles. In: Department of computer and information sciences, University of Hyderabad

  • Nguyen TH, Shirai K (2013) Text classification of technical papers based on text segmentation. In: Proceedings of the international conference on application of natural language to information systems, pp 278–284

  • Parvin H, Dahbashi A, Parvin S, Minaei-Bidgoli B (2012) Improving Persian text classification and clustering using Persian thesaurus. In: Proceedings of the international conference on distributed computing and artificial intelligence, pp 493–500

  • Patil AS, Pawar BV (2012) Automated classification of web sites using naive bayesian algorithm. In: Proceedings of the international multiConference of engineers and computer scientists, pp 14–16

  • Patil M, Game P (2014) Comparison of Marathi text classifiers. ACEEE Int J Inf Technol 04(01):11–22

    Google Scholar 

  • Patil JJ, Bogiri N (2015) Automatic text categorization: Marathi documents. In: Proceedings of the international conference on energy systems and applications, pp 689–694

  • Pawar PY, Gawande SH (2012) A comparative study on different types of approaches to text categorization. Int J Mach Learn Comput 02(04):423–426

    Google Scholar 

  • Peng F, Huang X, Schuurmans D, Wang S (2003) Text classification in Asian languages without word segmentation. Proceedings of the international workshop on information retrieval with Asian languages 11:41–48

  • Pereira RB, Plastino A, Zadrozny B, Merschmann LH (2018) Categorizing feature selection methods for multi-label classification. Artif Intell Rev 49:57–78

    Google Scholar 

  • Prusa JD, Khoshgoftaar TM (2016) Designing a better data representation for deep neural networks and text classification. In: Proceedings of IEEE international conference on information reuse and integration, pp 411–416

  • Puri S (2011) A fuzzy similarity based concept mining model for text classification. Int J Adv Comput Sci Appl 02:115–121

    Google Scholar 

  • Rajan K, Ramalingam V, Ganesan M, Palanivel S, Palaniappan B (2009) Automatic classification of Tamil documents using vector space model and artificial neural network. Expert Syst Appl 36:10914–10918

    Google Scholar 

  • Rakholia RM, Saini JR (2017) Classification of Gujarati documents using naïve bayes classifier. Indian J Sci Technol 10(5):1–9

    Google Scholar 

  • Redmond M, Salesi S, Cosma G (2017) A novel approach based on an extended cuckoo search algorithm for the classification of tweets which contain emoticon and emoji. In: Proceedings of the international conference on knowledge engineering and applications (ICKEA), pp 13–19

  • Saad MK, Ashour W (2010) Arabic text classification using decision trees. In: Proceedings of the international workshop on computer science and information echnologies, pp 75–79

  • Salloum SA, AlHamad AQ, Al-Emran M, Shaalan K (2018) A survey of Arabic text mining. Intelligent natural language processing: trends and applications. Springer, Cham, pp 417–431

    Google Scholar 

  • Sarmah J, Saharia N, Shikhar K (2012) A novel approach for document classification using Assamese wordnet. In: Proceedings of the international global Wordnet conference, pp 324–329

  • Sathe JB, Mali MP (2017) A hybrid sentiment classification method using neural network and fuzzy logic. In: Proceedings of IEEE international conference on intelligent systems and control, pp 93–96

  • Sato M, Orihara R, Sei Y, Tahara Y, Ohsuga A (2017) Japanese text classification by character-level deep convnets and transfer learning. In: Proceedings of the international conference on agents and artificial intelligence, pp 175–184

  • Sebastiani F (2005) Text categorization. In: Encyclopedia of database technologies and applications

  • Shah AA, Rana K (2018) A review on supervised machine learning text categorization approaches. In: Proceedings of international conference on circuits and systems in digital enterprise echnology, pp 1-6

  • Shahi TB, Pant AK (2018) Nepali news classification using naïve bayes, support vector machines and neural networks. In: Proceedings of the international conference on communication, information & computing technology, pp 1–5

  • Socher R, Huang EH, Pennington J, Ng AY, Manning CD (2011) Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In: Proceedings of the international conference on neural information processing systems, pp 801–809

  • Socher R, Pennington J, Huang EH, Ng AY, Manning CD (2011) Semi-supervised recursive autoencoders for predicting sentiment distributions. In: Proceedings of the conference on empirical methods in natural language processing, pp 151–161

  • Suanmali L, Binwahlan MS, Salim N (2009) Sentence features fusion for text summarization using fuzzy logic. In: Proceedings of IEEE international conference on hybrid intelligent systems, pp 142–146

  • Sujana TS, Rao NMS, Reddy RS (2017) An efficient feature selection using parallel cuckoo search and Naive bayes classifier. In: Proceedings of the international conference on networks & advances in computational technologies (NetACT), pp 167–172

  • Swamy MN, Hanumanthappa M (2013) Indian language text representation and categorization using supervised learning algorithm. Int J Data Mining Tech Appl 02:251–257

    Google Scholar 

  • Tandel SS, Jamadar A, Dudugu S (2019) A survey on text mining techniques. In: Proceedings of the international conference on advanced computing & communication systems, pp 1022-1026

  • Tellez ES, Moctezuma D, Miranda-Jiménez S, Graff M (2018) An automated text categorization framework based on hyperparameter optimization. Knowledge-Based Syst 149:110–123

    Google Scholar 

  • Tetali A, Madhukumar BPN, Chandrakumar K (2012) Classification of text using fuzzy based incremental feature clustering algorithm. Int J Adv Res Comput Eng Technol 01:313–318

    Google Scholar 

  • Tsekouras GE, Anagnostopoulos C, Gavalas D, Dafhi E (2007) Classification of web documents using fuzzy logic categorical data clustering. In: Proceedings of international conference on artificial intelligence applications and innovations, pp 93–100

  • Usman M, Ayub S, Shafique Z, Malik K (2016) Urdu text classification using majority voting. Int J Adv Comput Sci Appl 07:1–10

    Google Scholar 

  • Vinoth R, Jayachandran A, Balaji M, Srinivasan R (2014) A hybrid text classification approach using KNN and SVM. Int J Adv Found Res Comput 01(03):20–26

    Google Scholar 

  • Wang Z, Liu Z (2010) Graph-based KNN text classification. In: Proceedings of the international conference on Fuzzy systems and knowledge discovery, pp 2363–2366

  • Wang D, Zhang H (2013) Inverse-category-frequency based supervised term weighting schemes for text categorization. J Inf Sci Eng 29:209–225

    Google Scholar 

  • Wei Z, Miao D, Chauchat JH, Zhao R, Li W (2009) N-grams based feature selection and text representation for Chinese text classification. Int J Comput Intell Syst 2(4):365–374

    Google Scholar 

  • Wenliang C, Xingzhi C, Huizhen W, Jingbo Z, Tianshun Y (2005) Automatic word clustering for text categorization using global information. In: Proceedings of the Asia information retrieval symposium, pp 1–11

  • Wilges B, Mateus G, Nassar S, Cislaghi R, Bastos RC (2016) Fuzzy modeling for multilabel text classification supported by classification algorithms. J comput Sci 12:341–349

    Google Scholar 

  • Wong KW, Chumwatana T, Tikk D (2010) Exploring the use of fuzzy signature for text mining. In: Proceedings of the IEEE international conference on fuzzy systems (FUZZ), pp 1–5

  • Wu TP, Chen SM (1999) A new method for constructing membership functions and fuzzy rules from training examples. IEEE Trans Syst Man Cybern 29:25–40

    Google Scholar 

  • Wu H, Gu X, Gu Y (2017) Balancing between over-weighting and under-weighting in supervised term weighting. In Process Manag 53(02):547–557

    Google Scholar 

  • Wu K, Zhou M, Lu XS, Huang L (2017) A fuzzy logic based text classification method for social media data. In: Proceedings of IEEE international conference on systems, man, and cybernetics, pp 1942–1947

  • Zadeh L (1965) Fuzzy sets. Inf Control 8:338–353

    MATH  Google Scholar 

  • Zhang XY, Yin F, Zhang YM, Liu CL, Bengio Y (2017) Drawing and recognizing Chinese characters with recurrent neural network. IEEE Trans Pattern Anal Mach Intell 40:849–862

    Google Scholar 

  • Zhao W, Ye J, Yang M, Lei Z, Zhang S, Zhao Z (2018) Investigating capsule networks with dynamic routing for text classification. In: Proceedings of the conference on empirical methods in natural language processing, pp 3110–3119

Download references

Acknowledgements

One of the authors thank DST for support in the form of INSPIRE fellowship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kaushik Roy.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dhar, A., Mukherjee, H., Dash, N.S. et al. Text categorization: past and present. Artif Intell Rev 54, 3007–3054 (2021). https://doi.org/10.1007/s10462-020-09919-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-020-09919-1

Keywords

Navigation