Empirical Study to Evaluate the Performance of Classification Algorithms on Public Datasets

Bramesh, S. M.; Anil Kumar, K. M.

doi:10.1007/978-981-13-5802-9_41

Empirical Study to Evaluate the Performance of Classification Algorithms on Public Datasets

S. M. Bramesh³⁷ &
K. M. Anil Kumar³⁸

Conference paper
First Online: 24 April 2019

2210 Accesses
2 Citations

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 545))

Abstract

In today’s world, a huge amount of data is stored in the form of electronic documents in the World Wide Web. Text classification algorithms have been widely used for classifying those text documents into a fixed number of predefined classes. The applicable scopes and their performances of these algorithms are different. Therefore, finding an appropriate algorithm for a dataset is becoming a significant emphasis for researchers to solve practical problems quickly. This paper puts forward an experimental evaluation of five significant text classification algorithms with each other and with TF and TF-IDF feature selection methods built using decision tree (C5.0), support vector machine, K-nearest neighbor, Naïve Bayes, and neural network on four public datasets, namely 20news-bydate, ohsumed-first-20000-docs, Reuters 21578-Apte-90 Cat, and 20 Newsgroup. The experimental results are examined from multiple perspectives and summarized to provide usefulness of different algorithms on different datasets.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Gentle Introduction to Naive Bayes algorithm. http://www.cs.columbia.edu/~evs/ml/OthelloStudProj/huang/write-up.html
Lewis DD, Ringutte M (1994) A comparison of two learning algorithms for text categorization. In: Third annual symposium on document analysis and information retrieval, Las Vegas, NV, pp 81–93
Google Scholar
Hull D, Pedersen J, Schutze H (1996) Document routing as statistical classification. In: AAAI Spring symposium on machine learning in information access technical papers, Palo Alto
Google Scholar
Weiss S, Kasif S, Brill E (1996) Text classification in USENET newsgroup: a progress report. In: AAAI Spring symposium on machine learning in information access technical papers, Palo Alto
Google Scholar
Schutze H, Hull D, Pedersen J (1995) A comparison of classifiers and document representations for the routing problem. In: Proceedings of SIGIR, pp 229–237
Google Scholar
Pazzani M, Muramatsu J, Billsus D (1996) Syskill and webert: identifying interesting web sites. In: AAAI Spring symposium on machine learning in information access technical papers, Palo Alto
Google Scholar
Taruna S, Pandey M (2014) An empirical analysis of classification techniques for predicting academic performance. In: IEEE international advance computing conference (IACC)
Google Scholar
Tan S, Zhang J (2008) An empirical study of sentiment analysis for chinese documents. Expert Syst Appl 2622–2629
Google Scholar
Rasjida ZE, Setiawana R (2017) Performance comparison and optimization of text document classification using k-nn and naïve bayes classification techniques. In: 2017 2nd international conference on computer science and computational intelligence ICCSCI, 13–14 Oct 2017, Bali, Indonesia
Google Scholar
Core Team R (2015) A language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria. https://www.R-project.org/
Han J, Kamber M (2006) Data mining: concepts and techniques, 2nd edn. Kaufmann M
Google Scholar
Wu X, Kumar V (2009) The top ten algorithms in data mining. Data mining and knowledge discovery. Chapman & Hall/CRC, CRC Press
Google Scholar
Saha D (2011) Web text classification using a neural network. In: Second international conference on emerging applications of information technology
Google Scholar
Ali S, Smith KA (2006) On learning algorithm selection for classification. Appl Soft Comput 6:119–138
Article Google Scholar
Dataset. http://qwone.com/~jason/20Newsgroups/
Dataset. http://disi.unitn.it/moschitti/corpora.htm

Download references

Author information

Authors and Affiliations

Department of IS & E, PES College of Engineering, Mandya, India
S. M. Bramesh
Department of CS & E, Sri Jayachamarajendra College of Engineering, Mysuru, India
K. M. Anil Kumar

Authors

S. M. Bramesh
View author publications
You can also search for this author in PubMed Google Scholar
K. M. Anil Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. M. Bramesh .

Editor information

Editors and Affiliations

Department of Electronics and Communication Engineering, PES College of Engineering, Mandya, Karnataka, India
V. Sridhar
Department of Computer Science and Engineering, PES College of Engineering, Mandya, Karnataka, India
M.C. Padma
Department of Electronics and Communication Engineering, PES College of Engineering, Mandya, Karnataka, India
K.A. Radhakrishna Rao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bramesh, S.M., Anil Kumar, K.M. (2019). Empirical Study to Evaluate the Performance of Classification Algorithms on Public Datasets. In: Sridhar, V., Padma, M., Rao, K. (eds) Emerging Research in Electronics, Computer Science and Technology. Lecture Notes in Electrical Engineering, vol 545. Springer, Singapore. https://doi.org/10.1007/978-981-13-5802-9_41

Download citation

DOI: https://doi.org/10.1007/978-981-13-5802-9_41
Published: 24 April 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-5801-2
Online ISBN: 978-981-13-5802-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics