The Journal of Supercomputing

, Volume 72, Issue 8, pp 3210–3221 | Cite as

Feature selection based on an improved cat swarm optimization algorithm for big data classification

  • Kuan-Cheng Lin
  • Kai-Yuan Zhang
  • Yi-Hung Huang
  • Jason C. HungEmail author
  • Neil Yen


Feature selection, which is a type of optimization problem, is generally achieved by combining an optimization algorithm with a classifier. Genetic algorithms and particle swarm optimization (PSO) are two commonly used optimal algorithms. Recently, cat swarm optimization (CSO) has been proposed and demonstrated to outperform PSO. However, CSO is limited by long computation times. In this paper, we modify CSO to present an improved algorithm, ICSO. We then apply the ICSO algorithm to select features in a text classification experiment for big data. Results show that the proposed ICSO outperforms traditional CSO. For big data classification, the results show that using term frequency-inverse document frequency (TF-IDF) with ICSO for feature selection is more accurate than using TF-IDF alone.


Cat swarm optimization Feature selection Support vector machine Big data classification 


  1. 1.
    Xu Z, Jin R, Ye J, Lyu MR, King I (2009) Non-monotonic feature selection. 26th International conference on machine learningGoogle Scholar
  2. 2.
    Zhao Z, Morstatter F, Sharma S, Alelyani S, Anand A, Liu H (2010) Advancing feature selection research-ASU feature selection repository. School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, TempeGoogle Scholar
  3. 3.
    Cha S-H, Tappert C (2009) A genetic algorithm for constructing compact binary decision trees. J Pattern Recognit Res 1:1–13Google Scholar
  4. 4.
    Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of IEEE international conference on neural networks IV, pp 1942–1948Google Scholar
  5. 5.
    Colorni A, Dorigo M, Maniezzo V (1991) Distributed optimization by ant colonies. In: Proceedings of the 1st European conference on artificial life, pp 134–142, ParisGoogle Scholar
  6. 6.
    Karaboga D (2005) An idea based on honey bee swarm for numerical optimization. Technical report-TR06, Erciyes University, Engineering Faculty, Computer Engineering DepartmentGoogle Scholar
  7. 7.
    Chu SC, Tsai PW (2007) Computational intelligence based on the behavior of cats. Int J Innov Comput Inf Control 3(1):163–173Google Scholar
  8. 8.
    Deivaseelan A, Babu P (2012) Modified cat swarm optimization for Iir system identification. Adv Nat Appl Sci 6(6):731–740Google Scholar
  9. 9.
    Orouskhani M, Orouskhani Y, Mansouri M, Teshnehlab M (2013) A novel cat swarm optimization algorithm for unconstrained optimization problems. Inf Technol Comput Sci 5(11):32–41Google Scholar
  10. 10.
    Lin K-C, Zhang K-Y, Hung JC (2014) Feature selection of support vector machine based on harmonious cat swarm optimization. The 7th IEEE international conference on Ubi-Media computing (UMEDIA’14), Ulaanbaatar, Mongolia, July 12–14Google Scholar
  11. 11.
    Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers, San Mateo, CaliforniaGoogle Scholar
  12. 12.
    Zhang GP (2000) Neural networks for classification: a survey. IEEE Trans Syst Man Cybern Part C Appl Rev 30(4):451–462Google Scholar
  13. 13.
    Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297zbMATHGoogle Scholar
  14. 14.
    Lewis DD (1998) Naive (Bayes) at forty the independence assumption in information retrieval. 10th European conference on machine learning, pp 4–15Google Scholar
  15. 15.
    Hsu C-W, Chang C-C, Lin C-J (2003) A practical guide to support vector classiffication. From website:
  16. 16.
    Chang CC, Lin CJ LIBSVM: a library for support vector machines. From website:
  17. 17.
    Lin KC, Chien HY (2009) CSO-based feature selection and parameter optimization for support vector machine. In: Joint conference on pervasive computing, pp 783–788Google Scholar
  18. 18.
    Hettich S, Blake CL, Merz CJ (1998) UCI repository of machine learning databases. From website:
  19. 19.
    Salzberg SL (1997) On comparing classifiers: pitfalls to avoid and a recommended approach. Data Min Knowl Discov 1:317–328CrossRefGoogle Scholar
  20. 20.
    (2014) Food Culture in Taiwan-Food Categories. From website:
  21. 21.
    Tsai C-H (2000) MMSEG: a word identification system for Mandarin Chinese text based on two variants of the maximum matching algorithm. From website:
  22. 22.
    Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297zbMATHGoogle Scholar
  23. 23.
    Lin K-C, Hsu S-H, Hung JC (2012) Adaptive SVM-based classification systems based on the improved endocrine-based PSO algorithm. Lect Notes Comput Sci 7669:543–552CrossRefGoogle Scholar
  24. 24.
    Lin K-C, Huang Y-H, Hung JC, Lin Y-T (2015) Feature selection and parameter optimization of support vector machines based on modified cat swarm optimization. Int J Distrib Sens Netw 2015:9. Article ID 365869. doi: 10.1155/2015/365869
  25. 25.
    Lin K-C, Hsieh Y-H (2015) Classification of medical datasets using SVMs with hybrid evolutionary algorithms based on endocrine-based particle swarm optimization and artificial bee colony algorithms. J Med Syst 39(10)Google Scholar
  26. 26.
    Lin K-C, Chen S-Y, Hung JC (2015) Feature selection and parameter optimization of support vector machines based on modified artificial fish swarm algorithms. Math Probl Eng 2015:9. Article ID 604108. doi: 10.1155/2015/604108

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Department of Management Information SystemsNational Chung Hsing UniversityTaichungTaiwan, ROC
  2. 2.Department of Mathematics EducationNational Taichung University of EducationTaichungTaiwan, ROC
  3. 3.Department of Information TechnologyOverseas Chinese UniversityTaichungTaiwan, ROC
  4. 4.School of Computer Science and EngineeringThe University of AizuAizu-WakamatsuJapan

Personalised recommendations