Evolutionary Search of Thresholds for Robust Feature Set Selection: Application to the Analysis of Microarray Data

  • Carlos Cotta
  • Christian Sloper
  • Pablo Moscato
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3005)

Abstract

We deal with two important problems in pattern recognition that arise in the analysis of large datasets. While most feature subset selection methods use statistical techniques to preprocess the labeled datasets, these methods are generally not linked with the combinatorial properties of the final solutions. We prove that it is NP-hard to obtain an appropriate set of thresholds that will transform a given dataset into a binary instance of a robust feature subset selection problem. We address this problem using an evolutionary algorithm that learns the appropriate value of the thresholds. The empirical evaluation shows that robust subset of genes can be obtained. This evaluation is done using real data corresponding to the gene expression of lymphomas.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Davies, S., Russell, S.: NP-completeness of searches for smallest possible feature sets. In: Greiner, R., Subramanian, D. (eds.) AAAI Symposium on Intelligent Relevance, New Orleans, pp. 41–43. AAAI Press, Menlo Park (1994)Google Scholar
  2. 2.
    Downey, R., Fellows, M.: Parameterized Complexity. Springer, Heidelberg (1998)MATHGoogle Scholar
  3. 3.
    Chen, J., Kanj, I., Jia, W.: Vertex cover: further observations and further improvements. In: Widmayer, P., Neyer, G., Eidenbenz, S. (eds.) WG 1999. LNCS, vol. 1665, pp. 313–324. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  4. 4.
    Downey, R., Fellows, M.: Fixed parameter tractability and completeness I: Basic theory. SIAM Journal of Computing 24, 873–921 (1995)MATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Cotta, C., Moscato, P.: The k-Feature Set problem is W[2]-complete. Journal of Computer and Systems Science 67, 686–690 (2003)MATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Harant, J., Pruchnewski, A., Voigt, M.: On dominating sets and independent sets of graphs. Combinatorics, Probability and Computing 8, 547–553 (1999)MATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Weihe, K.: Covering trains by stations or the power of data reduction. In: Battiti, R., Bertossi, A. (eds.) Proceedings of Algorithms and Experiments (Alex 98), Trento, Italy, pp. 1–8 (1998)Google Scholar
  8. 8.
    Bäck, T.: Evolutionary Algorithms in Theory and Practice. Oxford University Press, New York (1996)MATHGoogle Scholar
  9. 9.
    Alizadeh, A., et al.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2001)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Carlos Cotta
    • 1
  • Christian Sloper
    • 2
  • Pablo Moscato
    • 3
  1. 1.Dept. Lenguajes y Ciencias de la ComputaciónUniversity of MálagaMálagaSpain
  2. 2.Department of InformaticsUniversity of Bergen, HIBBergenNorway
  3. 3.Newcastle Bioinformatics Initiative, School of Electrical Engineering and Computer ScienceThe University of NewcastleCallaghanAustralia

Personalised recommendations