Skip to main content

Bare-Bone Particle Swarm Optimisation for Simultaneously Discretising and Selecting Features for High-Dimensional Classification

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 9597)

Abstract

Feature selection and discretisation have shown their effectiveness for data preprocessing especially for high-dimensional data with many irrelevant features. While feature selection selects only relevant features, feature discretisation finds a discrete representation of data that contains enough information but ignoring some minor fluctuation. These techniques are usually applied in two stages, discretisation and then selection since many feature selection methods work only on discrete features. Most commonly used discretisation methods are univariate in which each feature is discretised independently; therefore, the feature selection stage may not work efficiently since information showing feature interaction is not considered in the discretisation process. In this study, we propose a new method called PSO-DFS using bare-bone particle swarm optimisation (BBPSO) for discretisation and feature selection in a single stage. The results on ten high-dimensional datasets show that PSO-DFS obtains a substantial dimensionality reduction for all datasets. The classification performance is significantly improved or at least maintained on nine out of ten datasets by using the transformed “small” data obtained from PSO-DFS. Compared to applying the two-stage approach which uses PSO for feature selection on the discretised data, PSO-DFS achieves better performance on six datasets, and similar performance on three datasets with a much smaller number of features selected.

Keywords

  • Particle swarm optimisation
  • Feature discretisation
  • Feature selection
  • Classification
  • High-dimensional data

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-31204-0_45
  • Chapter length: 18 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   109.00
Price excludes VAT (USA)
  • ISBN: 978-3-319-31204-0
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   139.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.

References

  1. Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40, 16–28 (2014)

    CrossRef  Google Scholar 

  2. Xue, B., Cervante, L., Shang, L., Browne, W., Zhang, M.: A multi-objective particle swarm optimisation for filter-based feature selection in classification problems. Connection Sci. 24, 91–116 (2012)

    CrossRef  Google Scholar 

  3. Ferreira, A.J., Figueiredo, M.A.: Efficient feature selection filters for high-dimensional data. Pattern Recogn. Lett. 33, 1794–1804 (2012)

    CrossRef  Google Scholar 

  4. Tran, B., Xue, B., Zhang, M.: Improved PSO for feature selection on high-dimensional datasets. In: Dick, G., et al. (eds.) SEAL 2014. LNCS, vol. 8886, pp. 503–515. Springer, Heidelberg (2014)

    Google Scholar 

  5. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)

    CrossRef  MATH  Google Scholar 

  6. Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinf. Comput. Biol. 3, 185–205 (2005)

    CrossRef  Google Scholar 

  7. Dougherty, J., Kohavi, R., Sahami, M., et al.: Supervised and unsupervised discretization of continuous features. In: Machine Learning: Proceedings of the Twelfth International Conference, vol. 12, pp. 194–202 (1995)

    Google Scholar 

  8. Ferreira, A.J., Figueiredo, M.A.: An unsupervised approach to feature discretization and selection. Pattern Recognit. 45, 3048–3060 (2012)

    CrossRef  Google Scholar 

  9. Chao, S., Li, Y.: Multivariate interdependent discretization for continuous attribute. In: Third International Conference on Information Technology and Applications, vol. 1, pp. 167–172. IEEE (2005)

    Google Scholar 

  10. Eberhart, R., Kennedy, J.: A new optimizer using particle swarm theory. In: Proceedings of the Sixth International Symposium on Micro Machine and Human Science, pp. 39–43 (1995)

    Google Scholar 

  11. Xue, B., Zhang, M., Browne, W.: Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans. Cybern. 43, 1656–1671 (2013)

    CrossRef  Google Scholar 

  12. Xue, B., Zhang, M., Browne, W.N.: Particle swarm optimisation for feature selection in classification: novel initialisation and updating mechanisms. Appl. Soft Comput. 18, 261–276 (2014)

    CrossRef  Google Scholar 

  13. Cervante, L., Xue, B., Zhang, M., Shang, L.: Binary particle swarm optimisation for feature selection: a filter based approach. In: IEEE Congress on Evolutionary Computation (CEC 2012), pp. 881–888 (2012)

    Google Scholar 

  14. Mohamad, M., Omatu, S., Deris, S., Yoshioka, M.: A modified binary particle swarm optimization for selecting the small subset of informative genes from gene expression data. Inf. Technol. Biomed. 15, 813–822 (2011)

    CrossRef  Google Scholar 

  15. Zhou, W., Dickerson, J.A.: A novel class dependent feature selection method for cancer biomarker discovery. Comput. Biol. Med. 47, 66–75 (2014)

    CrossRef  Google Scholar 

  16. Van den Bergh, F., Engelbrecht, A.P.: A study of particle swarm optimization particle trajectories. Inf. Sci. 176, 937–971 (2006)

    MathSciNet  CrossRef  MATH  Google Scholar 

  17. Kennedy, J.: Bare bones particle swarms. In: Proceedings of IEEE Swarm Intelligence Symposium (SIS 2003), pp. 80–87. IEEE (2003)

    Google Scholar 

  18. Zhang, Y., Gong, D., Hu, Y., Zhang, W.: Feature selection algorithm based on bare bones particle swarm optimization. Neurocomputing 148, 150–157 (2015)

    CrossRef  Google Scholar 

  19. Garcia, S., Luengo, J., Sáez, J.A., Lopez, V., Herrera, F.: A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans. Knowl. Data Eng. 25, 734–750 (2013)

    CrossRef  Google Scholar 

  20. Liu, H., Hussain, F., Tan, C.L., Dash, M.: Discretization: an enabling technique. Data Min. Knowl. Disc. 6, 393–423 (2002)

    MathSciNet  CrossRef  Google Scholar 

  21. Kotsiantis, S., Kanellopoulos, D.: Discretization techniques: a recent survey. GESTS Int. Trans. Comput. Sci. Eng. 32, 47–58 (2006)

    Google Scholar 

  22. Catlett, J.: On changing continuous attributes into ordered discrete attributes. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS, vol. 482, pp. 164–178. Springer, Heidelberg (1991)

    CrossRef  Google Scholar 

  23. Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Mach. Learn. 11, 63–90 (1993)

    CrossRef  MATH  Google Scholar 

  24. Grzymala-Busse, J.W.: Discretization based on entropy and multiple scanning. Entropy 15, 1486–1502 (2013)

    MathSciNet  CrossRef  Google Scholar 

  25. Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. Machine Learning (1993)

    Google Scholar 

  26. Cano, A., Nguyen, D.T., Ventura, S., Cios, K.J.: ur-CAIM: improved CAIM discretization for unbalanced and balanced data. Soft Comput. 20, 173–188 (2014)

    CrossRef  Google Scholar 

  27. Yang, P., Li, J.S., Huang, Y.X.: Hdd: a hypercube division-based algorithm for discretisation. Int. J. Syst. Sci. 42, 557–566 (2011)

    MathSciNet  CrossRef  MATH  Google Scholar 

  28. Flores, J.L., Inza, I., Larrañaga, P.: Wrapper discretization by means of estimation of distribution algorithms. Intell. Data Anal. 11, 525–545 (2007)

    Google Scholar 

  29. Ramirez-Gallego, S., Garcia, S., Benitez, J.M., Herrera, F.: Multivariate discretization based on evolutionary cut points selection for classification. IEEE Trans. Cybern. (2015)

    Google Scholar 

  30. Mahanta, P., Ahmed, H.A., Kalita, J.K., Bhattacharyya, D.K.: Discretization in gene expression data analysis: a selected survey. In: Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology, pp. 69–75. ACM (2012)

    Google Scholar 

  31. Liu, H., Setiono, R.: Chi2: feature selection and discretization of numeric attributes. In: Proceedings of the Seventh International Conference on Tools with Artificial Intelligence, TAI 1995, p. 88. IEEE Computer Society (1995)

    Google Scholar 

  32. Kerber, R.: Chimerge: discretization of numeric attributes. In: Proceedings of the Tenth National Conference on Artificial Intelligence, pp. 123–128. AAAI Press (1992)

    Google Scholar 

  33. Sheela, J.L., Shanthi, D.V.: An approach for discretization and feature selection of continuous-valued attributes in medical images for classification learning. Int. J. Comput. Theory Eng. 1, 154–158 (2009)

    Google Scholar 

  34. Kira, K., Rendell, L.A.: The feature selection problem: traditional methods and a new algorithm. In: Proceedings of the Tenth National Conference on Artificial Intelligence, pp. 129–134. AAAI Press (1992)

    Google Scholar 

  35. Tran, B., Xue, B., Zhang, M.: Overview of particle swarm optimisation for feature selection in classification. In: Dick, G., et al. (eds.) SEAL 2014. LNCS, vol. 8886, pp. 605–617. Springer, Heidelberg (2014)

    Google Scholar 

  36. Zhu, Z., Ong, Y.S., Dash, M.: Markov blanket-embedded genetic algorithm for gene selection. Pattern Recogn. 40, 3236–3248 (2007)

    CrossRef  MATH  Google Scholar 

  37. Patterson, G., Zhang, M.: Fitness functions in genetic programming for classification with unbalanced data. In: Orgun, M.A., Thornton, J. (eds.) AI 2007. LNCS (LNAI), vol. 4830, pp. 769–775. Springer, Heidelberg (2007)

    CrossRef  Google Scholar 

  38. Chuang, L.Y., Chang, H.W., Tu, C.J., Yang, C.H.: Improved binary PSO for feature selection using gene expression data. Comput. Biol. Chem. 32, 29–38 (2008)

    CrossRef  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Binh Tran .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Tran, B., Xue, B., Zhang, M. (2016). Bare-Bone Particle Swarm Optimisation for Simultaneously Discretising and Selecting Features for High-Dimensional Classification. In: Squillero, G., Burelli, P. (eds) Applications of Evolutionary Computation. EvoApplications 2016. Lecture Notes in Computer Science(), vol 9597. Springer, Cham. https://doi.org/10.1007/978-3-319-31204-0_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-31204-0_45

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-31203-3

  • Online ISBN: 978-3-319-31204-0

  • eBook Packages: Computer ScienceComputer Science (R0)