Text Classification Using Machine Learning Methods-A Survey

  • Basant Agarwal
  • Namita Mittal
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 236)


Text classification is used to organize documents in a predefined set of classes. It is very useful in Web content management, search engines; email filtering, etc. Text classification is a difficult task due to high- dimensional feature vector comprising noisy and irrelevant features. Various feature reduction methods have been proposed for eliminating irrelevant features as well as for reducing the dimension of feature vector. Relevant and reduced feature vector is used by machine learning model for better classification results. This paper presents various text classification approaches using machine learning techniques, and feature selection techniques for reducing the high-dimensional feature vector.


Text classification Feature selection Machine learning Algorithms 


  1. 1.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)CrossRefGoogle Scholar
  2. 2.
    Al-Harbi, S., Almuhareb, A., Al-Thubaity, A., Khorsheed, M., Al-Rajeh, A.: Automatic Arabic text classification. In: JADT’08, France, pp. 77–83 (2008)Google Scholar
  3. 3.
    Forman, George: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)zbMATHGoogle Scholar
  4. 4.
    Yang, Y., Pedersen, J.O.: A Comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 412–420, 08–12 July 1997Google Scholar
  5. 5.
    Isa, D., Lee, L.H., Kallimani, V.P., RajKumar, R.: Text document pre-processing with the Bayes formula for classification using the support vector machine. IEEE Trans. Knowl. Data Eng. 20(9), 1264–1272 (2008)Google Scholar
  6. 6.
    Yan, X., Gareth J., Li J.T., Wang, B., Sun, C.M.: A study on mutual information-based feature selection for text categorization’. J. Comput. Inf. Syst. 3(3), 1007–1012 (2007)Google Scholar
  7. 7.
    Porter, M.F.: An algorithm for suffix stripping. Program 14(3). 130–137 (1980)Google Scholar
  8. 8.
    Nigam, K., Mccallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39, 103–134 (2000)Google Scholar
  9. 9.
    Joachims, T.: A statistical learning model for text classification for support vector machines. In: 24th ACM International Conference on Research and Development in Information Retrieval (SIGIR) (2001)Google Scholar
  10. 10.
    Dong, Tao, Shang, Wenqian, Zhu, Haibin: An improved algorithm of Bayesian text categorization. J. Softw. 6(9), 1837–1843 (September 2011)Google Scholar
  11. 11.
    Kumar, C.A.: Analysis of unsupervised dimensionality reduction techniques. Comput. Sci. Inf. Syst. 6(2), 217–227 (Dec. 2009)Google Scholar
  12. 12.
    Soon, C.P.: Neural network for text classification based on singular value decomposition. In: 7\(^{th}\) International conference on Computer and Information Technology, pp. 47–52 (2007)Google Scholar
  13. 13.
    Muhammed, M.: Improved k-NN algorithm for text classification. Department of Computer Science and Engineering University of Texas at Arlington, TX, USAGoogle Scholar
  14. 14.
    Ikonomakis, M., Kotsiantis, S., Tampakas, V.: Text classification using machine learning techniques. IEEE Trans. Comput. 4(8) 966–974 (2005)Google Scholar
  15. 15.
    Wang, Z, Qian, X.: Text categorization based on LDA and SVM. In: Computer Science and Software Engineering, 2008 International Conference, vol. 1, pp. 674–677 (2008)Google Scholar
  16. 16.
    Kolenda, T., Hansen, L.K., Sigurdsson, S.: Independent components in text. In: Girolami, M. (ed.) Advances in Independent Component Analysis, Springer-Verlag, New York (2000)Google Scholar
  17. 17.
    Jia-ni, H.U., Wei-Ran, X.U. Jun, G., Wei-Hong, D.: Study on feature methods in chinese text categorization. Study Opt. Commun. 3, 44–46 (2005)Google Scholar
  18. 18.
    Aggarwal, C.C., Zhai, C-X.: A survey of text classification algorithms. Mining Text Data. pp. 163–222, Springer (2012)Google Scholar
  19. 19.
    Aas, K., Eikvil, L.: Text categorisation: A survey”m Tech. rep. 941. Norwegian Computing Center, Oslo, Norway (1999)Google Scholar
  20. 20.
    Schapire, R.E., Singer, Y., Singhal, A.: Boosting and Rocchio applied to text filtering. In: Proceedings of SIGIR-98 21st ACM International Conference on Research and Development in Information Retrieval, pp. 215–223, ACM Press, New York US (1998)Google Scholar
  21. 21.
    Kim, S.B., Rim, H.C., Yook, D.S., Lim, H.S.: Effective Methods for Improving Naive Bayes Text Classifiers. LNAI 2417, 414–423 (2002)Google Scholar
  22. 22.
    Saeys, Y., Inza, I., Larranaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)Google Scholar
  23. 23.
    Zhang, B., Su, J., Xu, X.: A class-incremental learning method for multi-class support vector machines in text classification. In: Proceedings of the 5th IEEE international conference on Machine Learning and, Cybernetics, pp. 2581–2585 (2006)Google Scholar
  24. 24.
    Goyal, R.D.: Knowledge based neural network for text classification. In: Proceedings of the IEEE international conference on Granular, Computing, pp. 542–547 (2007)Google Scholar
  25. 25.
    Meena, M.J., Chandran, K.R.: Naïve bayes text classification with positive features selected by statistical method. In: Proceedings of the IEEE international conference on Advanced, Computing, pp. 28–33 (2009)Google Scholar
  26. 26.
    Li, C.H, Park, S.C.: An efficient document classification model using an improved back propagation neural network and singular value decomposition. J. Expert Syst. Appl. 36(2), pp. 3208–3215 (2009)Google Scholar
  27. 27.
    Wang, Z., He, Y., Jiang, M.: A comparison among three neural networks for text classification. In: 8th IEEE International Conference on, Signal Processing (2006)Google Scholar
  28. 28.
    Zhijie, L., Lv, X., Liu, K., Shi, S.: Study on SVM compared with other text classification methods. In: 2\(^{nd}\) International workshop on education technology and computer, science (2010)Google Scholar
  29. 29.
    Freund, Y., Shapire, R.R.: Experiments with a new boosting algorithm. In: Proceedings of 13th International Conference on, Machine learning, pp. 148–156 (1996)Google Scholar

Copyright information

© Springer India 2014

Authors and Affiliations

  1. 1.Malaviya National Institute of TechnologyJaipurIndia

Personalised recommendations