Survey on supervised machine learning techniques for automatic text classification

Kadhim, Ammar Ismael

doi:10.1007/s10462-018-09677-1

Survey on supervised machine learning techniques for automatic text classification

Published: 19 January 2019

Volume 52, pages 273–292, (2019)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

Ammar Ismael Kadhim ORCID: orcid.org/0000-0002-9694-7471¹

9931 Accesses
209 Citations
Explore all metrics

Abstract

Supervised machine learning studies are gaining more significant recently because of the availability of the increasing number of the electronic documents from different resources. Text classification can be defined that the task was automatically categorized a group documents into one or more predefined classes according to their subjects. Thereby, the major objective of text classification is to enable users for extracting information from textual resource and deals with process such as retrieval, classification, and machine learning techniques together in order to classify different pattern. In text classification technique, term weighting methods design suitable weights to the specific terms to enhance the text classification performance. This paper surveys of text classification, process of different term weighing methods and comparison between different classification techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Text Classification Using Machine Learning Methods-A Survey

Supervised Machine Learning Text Classification: A Review

A Comparative Study on Term Weighting Schemes for Text Classification

References

Agarwal B, Mittal N (2012) Text classification using machine learning methods–a survey. In: Proceedings of the second international conference on soft computing for problem solving (SocProS 2012), December 28–30. Springer, New Delh, pp 701–709
Allahyari M, Pouriyeh S, Assefi M, Safaei S, Trippe ED, Gutierrez JB, Kochut KA (2017) Brief survey of text mining: classification, clustering and extraction techniques. arXiv preprint arXiv:1707.02919
Aytekin Ç (2013) An opinion mining task in Turkish language: a model for assigning opinions in Turkish blogs to the polarities. J Mass Commun 3(3):179–198
Google Scholar
Bijalwan V, Kumar V, Kumari P, Pascual J (2014) KNN based machine learning approach for text and document mining. Int J Database Theory Appl 7(1):61–70
Article Google Scholar
Bindra A (2012) “SocialLDA: scalable topic modeling in social networks”. Dissertation University of Washington
Burges CJC (1996) Simplified support vector decision rules. In: ICML, Vol. 96, pp 71–77
Canuto S, Salles T, Gonçalves MA, Rocha L, Ramos G, Gonçalves L, Martins W (2014) On efficient meta-level features for effective text classification. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management. ACM, pp 1709–1718
Chen S (2018) K-nearest neighbor algorithm optimization in text categorization. In: IOP conference series: earth and environmental science. IOP Publishing, Vol. 108, No. 5, p 052074
Chen M, Jin X, Shen D (2011) Short text classification improved by learning multi-granularity topics. In: IJCAI, pp 1776–1781
Chouigui A, Khiroun OB, Elayeb B (2017) ANT Corpus: An Arabic news text collection for textual classification. In: IEEE/ACS 14th international conference on computer systems and applications (AICCSA). IEEE, pp 135–142
Debole F, Sebastiani F (2004) Supervised term weighting for automated text categorization. Text mining and its applications. Springer, Berlin, pp 81–97
Book Google Scholar
Elmurngi E, Gherbi A (2017) Detecting fake reviews through sentiment analysis using machine learning techniques. In: IARIA/data analytics, pp 65–72
Feng Y, Zhaohui W, Zhou Z (2005) Multi-label text categorization using k-nearest neighbor approach with m-similarity. String Processing and Information Retrieval. Springer, Berlin
Google Scholar
Fix E, Hodges JL Jr (1951) Discriminatory analysis-nonparametric discrimination: consistency properties. California University, Berkeley
MATH Google Scholar
HaCohen-Kerner Y, Gross Z, Masa A (2005) Automatic extraction and learning of keyphrases from scientific articles. In: Computational linguistics and intelligent text processing. Springer Berlin, pp 657–669
Han EHS, Karypis G, Kumar V (2001) Text categorization using weight adjusted k-nearest neighbor classification. Springer, Berlin, pp 53–65
MATH Google Scholar
Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, Amsterdam
MATH Google Scholar
Hao P, Ying D, Longyuan T (2009) Application for web text categorization based on support vector machine. In: International forum on computer science-technology and applications, IFCSTA’09, Vol. 2. IEEE, pp 42–45
Hassan S, Rafi M, Shaikh MS (2011) Comparing SVM and Naive Bayes classifiers for text categorization with wikitology as knowledge enrichment. In: 14th international multitopic conference (INMIC). IEEE, pp 31–34
Hira ZM, Gillies DF (2015) A review of feature selection and feature extraction methods applied on microarray data. Adv Bioinf 2015:198363
Horecki K, Mazurkiewicz J (2015) Natural language processing methods used for automatic prediction mechanism of related phenomenon. In: Artificial intelligence and soft computing. Springer, pp 13–24
Hu J, Li S, Yao Y, Yu L, Yang G, Hu J (2018) Patent keyword extraction algorithm based on distributed representation for patent classification. Entropy 20(2):104
Article Google Scholar
Huang S, Peng W, Li J, Lee D (2013) Sentiment and topic analysis on social media: a multi-task multi-label classification approach. In: Proceedings of the 5th annual ACM web science conference. ACM, pp 172–181
Ikonomakis M, Kotsiantis S, Tampakas V (2005) Text classification using machine learning techniques. WSEAS Trans Comput 4(8):966–974
Google Scholar
Jiang S, Pang G, Wu M, Kuang L (2012) An improved K-nearest-neighbor algorithm for text categorization. Expert Syst Appl 39(1):1503–1509
Article Google Scholar
Joseph F, Ramakrishnan N (2015) Text categorization using improved K nearest neighbor algorithm. Int J Trends Eng Technol 4:65–68
Google Scholar
Jothi CS, Thenmozhi D (2015) Machine learning approach to document classification using concept based features. Int J Comput Appl 118(20):33–36
Google Scholar
Kadhim AI, Cheah Y-N, Hieder IA, Ali RA (2017) Improving TF-IDF with singular value decomposition (SVD) for feature extraction on Twitter. In: 3rd international engineering conference on developments in civil and computer engineering applications 2017 (ISSN 2409-6997)
Kamruzzaman SM, Haider F (2010) A hybrid learning algorithm for text classification. arXiv preprint arXiv:1009-4574
Khamar K (2013) Short text classification using kNN based on distance function. In: IJARCCE International Journal of Advanced Research in Computer and Communication Engineering. Government Engineering College, Modasa (ISSN Print: 2319-5940 ISSN Online, pp 2278–1021
Kowsari K, Brown DE, Heidarysafa M, Meimandi KJ, Gerber MS, Barnes LE (2017) Hdltex: hierarchical deep learning for text classification. In: 2017 16th IEEE international conference on machine learning and applications (ICMLA). IEEE, pp 364–371
Kuang Q, Xiaoming X (2011) An improved feature weighting method for text classification. Adv Inf Sci Service Sci 3(7):340–346
Google Scholar
Kunchala DR (2015) Applying data mining techniques to social media data for analyzing the student’s learning experience. Ph.D. Dissertation, Texas A&M University-Corpus Christi
Kurada RR, Pavan DKK (2013) Novel text categorization by amalgamation of augmented k-nearest neighborhood classification and k-medoids clustering. arXiv preprint arXiv:1312.2375
Kwok JT-Y (1998) Automated text categorization using support vector machine. In: Proceedings of the international conference on neural information processing (ICONIP 1998)
Kwon O-W, Lee J-H (2003) Text categorization based on k-nearest neighbor approach for web site classification. Inf Process Manag 39(1):25–44
Article MATH Google Scholar
Lai S, Xu L, Liu K, Zhao J (2015) Recurrent convolutional neural networks for text classification. AAAI 333:2267–2273
Google Scholar
Lausch A, Schmidt A, Tischendorf L (2015) Data mining and linked open data—new perspectives for data analysis in environmental research. Ecol Model 295:5–17
Article Google Scholar
Li B, Yu S, Lu Q (2003) An improved k-nearest neighbor algorithm for text categorization. arXiv preprint arXiv:cs/0306099
Marlow C, Naaman M, Boyd D, Davis M (2006) HT06, tagging paper, taxonomy, Flickr, academic article, to read. In: Proceedings of the seventeenth conference on hypertext and hypermedia. ACM, pp 31–40
Masand VH, Mahajan DT, Patil KN, Chinchkhede KD, Jawarkar RD, Hadda TB, Alafeefy AA, Shibi IG (2012) k-NN, quantum mechanical and field similarity based analysis of xanthone derivatives as α-glucosidase inhibitors. Med Chem Res 21(12):4523–4534
Article Google Scholar
Matsuo Y, Ishizuka M (2004) Keyword extraction from a single document using word co-occurrence statistical information. Int J Artif Intell Tools 13(01):157–169
Article Google Scholar
Moreno A, Redondo T (2016) Text analytics: the convergence of big data and artificial intelligence. IJIMAI 3(6):57–64
Article Google Scholar
Mudgal A, Munjal R (2015) Role of support vector machine, fuzzy K-means and Naive Bayes classification in intrusion detection system. Int J Recent and Innov Trends Comput Commun 3:1106–1110
Article Google Scholar
Pitigala S, Li C (2015) Classification based filtering for personalized information retrieval. In: Proceedings of the international conference on information and knowledge engineering (IKE). The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp), pp 125–131
Qi X, Davison BD (2009) Web page classification: features and algorithms. ACM Comput Surv (CSUR) 41(2):12
Article Google Scholar
Rane A, Naik N, Laxminarayana JA (2014) Performance enhancement of K nearest neighbor classification algorithm using 8-bin hashing and feature weighting. In: Proceedings of the 2014 international conference on interdisciplinary advances in applied computing. ACM, p 8
Rennie JDM, Rifkin R (2001) Improving multiclass text classification with the support vector machine
Sadiq AT, Abdullah SM (2012) Hybrid intelligent technique for text categorization. In: International conference on advanced computer science applications and technologies (ACSAT). IEEE, pp 238–245
Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517
Article Google Scholar
Sahami M, Dumais S, Heckerman D, Horvitz E (1998) A Bayesian approach to filtering junk e-mail. Learn Text Categ 62:98–105
Google Scholar
Sebastiani F (2006) Classification of text, automatic. Encycl Lang Linguist 14:457–462
Article Google Scholar
Sharma D (2012) Stemming algorithms: a comparative study and their analysis. Int J Appl Inf Syst 4(3):7–12
Google Scholar
Sharmila V, Vasudevan I, Arasu GT (2014) Pattern based classification for text mining using fuzzy similarity algorithm. J Theor Appl Inf Technol 63(1):92–103
Google Scholar
Shathi SP, Hossain MD, Nadim M, Riayadh SGR, Sultana T (2016) Enhancing performance of Naïve Bayes in text classification by introducing an extra weight using less number of training examples. In: International workshop on computational intelligence (IWCI). IEEE, pp 142–147
Sugiyama M, Kawanabe M (2012) Machine learning in non-stationary environments: introduction to covariate shift adaptation. MIT Press, Cambridge
Book Google Scholar
Suguna N, Thanushkodi K (2010) An improved K-nearest neighbor classification using Genetic Algorithm. Int J Comput Sci Issues 7(2):18–21
Google Scholar
Tatu A, Albuquerque G, Eisemann M, Schneidewind J, Theisel H, Magnork M, Keim D (2009) Combining automated analysis and visualization techniques for effective exploration of high-dimensional data. In: IEEE symposium on visual analytics science and technology, 2009, VAST 2009, pp 59–66
Tilve AKS, Jain SN (2017) A survey on machine learning techniques for text classification. Int J Eng Sci Res Technol 6:513–520
Google Scholar
Trstenjak B, Mikac S, Donko D (2014) KNN with TF-IDF based framework for text categorization. Proc Eng 69:1356–1364
Article Google Scholar
Vapnik V (2000) The nature of statistical learning theory. Springer, New York
Book MATH Google Scholar
Vogrinčič S, Bosnić Z (2011) Ontology-based multi-label classification of economic articles. Comput Sci Inf Syst 8(1):101–119
Article Google Scholar
Xu S (2018) Bayesian Naïve Bayes classifiers to text classification. J Inf Sci 44(1):48–59
Article Google Scholar
Yan Z, Xu C (2010) Combining KNN algorithm and other classifiers. In: 2010 9th IEEE international conference on cognitive informatics (ICCI). IEEE, pp 800–805
Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. In: Advances in neural information processing systems, pp 649–657

Download references

Author information

Authors and Affiliations

Department of Computer Science, College of Medicine, University of Baghdad, Baghdad, Iraq
Ammar Ismael Kadhim

Authors

Ammar Ismael Kadhim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ammar Ismael Kadhim.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kadhim, A.I. Survey on supervised machine learning techniques for automatic text classification. Artif Intell Rev 52, 273–292 (2019). https://doi.org/10.1007/s10462-018-09677-1

Download citation

Published: 19 January 2019
Issue Date: 01 June 2019
DOI: https://doi.org/10.1007/s10462-018-09677-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Survey on supervised machine learning techniques for automatic text classification

Abstract

Access this article

Similar content being viewed by others

Text Classification Using Machine Learning Methods-A Survey

Supervised Machine Learning Text Classification: A Review

A Comparative Study on Term Weighting Schemes for Text Classification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Survey on supervised machine learning techniques for automatic text classification

Abstract

Access this article

Similar content being viewed by others

Text Classification Using Machine Learning Methods-A Survey

Supervised Machine Learning Text Classification: A Review

A Comparative Study on Term Weighting Schemes for Text Classification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation