The Data Dimensionality Reduction in the Classification Process Through Greedy Backward Feature Elimination

  • Daniel KostrzewaEmail author
  • Robert Brzeski
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 659)


The article presents the author’s algorithm of dimensionality reduction of used data set, realized through Greedy Backward Feature Elimination. Results of the dimensionality reduction are verified in the process of classification for 2 selected data sets. These data sets contain the data for the realization of the multiclass classification. The article presents not only a description of the algorithm but also an example and the results of classification, carried out by selected classifier before and after the process of dimensionality reduction. At the end of article, a summary and the possibility of further work are provided.


Dimensionality reduction Feature selection Algorithm Classification Multiclass Kappa WEKA UCI URBAN DIGITS 



This work was partly supported by BKM16/RAU2/507 and BK-219/RAU2/2016 grants from the Institute of Informatics, Silesian University of Technology, Poland.


  1. 1.
    Agrawal, R., Imielinski, T., Swami, A.: Database mining: a performance perspective. IEEE Trans. Knowl. Data Eng. 5(6), 914–925 (1993)CrossRefGoogle Scholar
  2. 2.
    Aha, D., Kibler, D.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)zbMATHGoogle Scholar
  3. 3.
    Alpaydin, E., Kaynak, C.: DIGITS data set, UCI Machine Learning Repository.
  4. 4.
    Arie, B.D.: Comparison of classification accuracy using Cohen’s weighted Kappa. Expert Syst. Appl. 34(2), 825–832 (2008)CrossRefGoogle Scholar
  5. 5.
    Costa, E., Lorena, A., Carvalho, A., Freitas, A.: A review of performance evaluation measures for hierarchical classifiers. In: AAAI-2007 Workshop, Vancouver, Canada, pp. 182–196 (2007)Google Scholar
  6. 6.
    Doak, J.: CSE-92-18—An Evaluation of Feature Selection Methods and Their Application to Computer Security. Technical report, UC Davis Dept of Computer Science (1992)Google Scholar
  7. 7.
    Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Ann. Stat. 28(2), 337–407 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)zbMATHGoogle Scholar
  9. 9.
    Johnson, B.: URBAN data set, UCI Machine Learning Repository.
  10. 10.
    Johnson, B.: High resolution urban land cover classification using a competitive multi-scale object-based approach. Remote Sens. Lett. 4(2), 131–140 (2013)CrossRefGoogle Scholar
  11. 11.
    Johnson, B., Xie, Z.: Classifying a high resolution image of an urban area using super-object information. ISPRS J. Photogrammetry Remote Sens. 83, 40–49 (2013)CrossRefGoogle Scholar
  12. 12.
    Josinski, H., Kostrzewa, D., Michalczuk, A., Switonski, A.: The exIWO metaheuristic for solving continuous and discrete optimization problems. Sci. World J. (2014). Article id 831,691Google Scholar
  13. 13.
    Josinski, H., Switonski, A., Jedrasiak, K., Kostrzewa, D.: Human identification based on gait motion capture data. In: IMECS 2012, pp. 507–510 (2012)Google Scholar
  14. 14.
    Kostrzewa, D., Josinski, H.: The exIWO metaheuristic—a recapitulation of the research on the join ordering problem. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) Beyond Databases, Architectures, and Structures, CCIS, vol. 424, pp. 10–19. Springer, Switzerland (2014)CrossRefGoogle Scholar
  15. 15.
    Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining. Springer, Heidelberg (1998)CrossRefzbMATHGoogle Scholar
  16. 16.
    Machine Learning Group at the University of Waikato: Weka 3.
  17. 17.
    Mehrabian, A., Lucas, C.: A novel numerical optimization algorithm inspired from weed colonization. Ecol. Inform. 1(4), 355–366 (2006)CrossRefGoogle Scholar
  18. 18.
    Pahlavani, P., Delavar, M., Frank, A.: Using a modified invasive weed optimization algorithm for a personalized urban multi-criteria path optimization problem. Int. J. Appl. Earth Obs. Geoinf. 18, 313–328 (2012)CrossRefGoogle Scholar
  19. 19.
    Powers, D.: Evaluation: from precision, recall and F-score to ROC, informedness, markedness & correlation. J. Mach. Learn. Technol. 2(1), 37–63 (2011)MathSciNetGoogle Scholar
  20. 20.
    Provost, F., Fawcett, T., Kohavi, R.: The case against accuracy estimation for comparing classifiers. In: ICML 1998, Madison, USA, pp. 445–453 (1998)Google Scholar
  21. 21.
    Wu, X., Kumar, V., Quinlan, J., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G., Ng, A., Liu, B., Yu, P., Zhou, Z.H., Steinbach, M., Hand, D., Steinberg, D.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14, 1–37 (2008)CrossRefGoogle Scholar
  22. 22.
    Yang, J., Honavar, V.: Feature subset selection using a genetic algorithm. IEEE Intell. Syst. 13, 44–49 (1998)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Silesian University of TechnologyGliwicePoland

Personalised recommendations