New Method for Instance Feature Selection Using Redundant Features for Biological Data

  • Waad BouaguelEmail author
  • Emna Mouelhi
  • Ghazi Bel Mufti
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9582)


Biological data bases are characterized by a very large number of features and a few instances which make classification more difficult and time consuming. This problem can be solved using feature selection approach. The Filter feature selection method ranks features according to their significance level. Then it selects the most significant features and discards the rest. The discarded features may provide some useful information and could be useful to further consideration. Hence, we propose a new feature selection method that uses these eliminated features in order to increase the classification performance and avoid the curse of dimensionality. The new approach is based on the idea of transforming the value of the similar features into new instances for the retained features. We aim to reduce the feature space by performing features selection and increasing the learning space in creating new instances using the redundant features.


Curse of dimensionality Relief Feature selection Filter 


  1. 1.
    Bellman, R.: Processus Adaptive Control: A Guided Tour. Princeton University Press, Princeton (1961)CrossRefzbMATHGoogle Scholar
  2. 2.
    Brahim, A.B., Bouaguel, W., Limam, M.: Combining feature selection and data classification using ensemble approaches: application to cancer diagnosis and credit scoring, ch. 24, pp. 517–532. Taylor and Francis (2014)Google Scholar
  3. 3.
    Bouaguel, W.: On Feature Selection Methods for Credit Scoring. Ph.D. thesis, Institut Superieur de Gestion de Tunis (2015)Google Scholar
  4. 4.
    Bouaguel, W., Mufti, G.B.: An improvement direction for filter selection techniques using information theory measures and quadratic optimization. Int. J. Adv. Res. Artif. Intell. 1(5), 7–11 (2012)CrossRefGoogle Scholar
  5. 5.
    For Biotechnology Information, N. C.: Genbank growth (2008)Google Scholar
  6. 6.
    Froidevaux, C., Boulakia, S.C.: Intégration de sources de données génomiques du webGoogle Scholar
  7. 7.
    Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)CrossRefGoogle Scholar
  8. 8.
    Guerif, S.: Rduction de dimension en apprentissage numrique non supervise. Ph.D. thesis, Universit Paris 13 (2006)Google Scholar
  9. 9.
    Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Proceedings of the Ninth International Workshop on Machine Learning, pp. 249–256. Morgan Kaufmann Publishers Inc., San Francisco (1992)Google Scholar
  10. 10.
    Kurzynski, M.W., Rewak, A.: The GA-based bayes-optimal feature extraction procedure applied to the supervised pattern recognition. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2008. LNCS (LNAI), vol. 5097, pp. 620–631. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  11. 11.
    Pomeroy, S.L., Tamayo, P., Gaasenbeek, M., Sturla, L.M., Angelo, M., McLaughlin, M.E., Kim, J.Y.H., Goumnerova, L.C., Black, P.M., Lau, C., Allen, J.C., Zagzag, D., Olson, J.M., Curran, T., Wetmore, C., Biegel, J.A., Poggio, T., Mukherjee, S., Rifkin, R., Califano, A., Stolovitzky, G., Louis, D.N., Mesirov, J.P., Lander, E.S., Golub, T.R.: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870), 436–442 (2002)CrossRefGoogle Scholar
  12. 12.
    Richard, J., Qiang, S.: Computational Intelligence and Feature Selection: Rough and Fuzzy Approaches. John Wiley and Sons, Canada (2008)Google Scholar
  13. 13.
    Salvador, G., Julin, L., Francisco, H.: Data preprocessing in Data Mining. Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland. Springer (2015)Google Scholar
  14. 14.
    Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 856–863 (2003)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.LARODEC, ISGUniversity of TunisTunisTunisia
  2. 2.ISGUniversity of TunisTunisTunisia
  3. 3.LARIME, ESSECUniversity of TunisTunisTunisia

Personalised recommendations