Unsupervised Feature Selection for Biomarker Identification in Chromatography and Gene Expression Data

  • Marc Strickert
  • Nese Sreenivasulu
  • Silke Peterek
  • Winfriede Weschke
  • Hans-Peter Mock
  • Udo Seiffert
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4087)

Abstract

A novel approach to feature selection from unlabeled vector data is presented. It is based on the reconstruction of original data relationships in an auxiliary space with either weighted or omitted features. Feature weighting, on one hand, is related to the return forces of factors in a parametric data similarity measure as response to disturbance of their optimum values. Feature omission, on the other hand, inducing measurable loss of reconstruction quality, is realized in an iterative greedy way. The proposed framework allows to apply custom data similarity measures. Here, adaptive Euclidean distance and adaptive Pearson correlation are considered, the former serving as standard reference, the latter being usefully for intensity data. Results of the different strategies are given for chromatography and gene expression data.

Keywords

Feature selection adaptive similarity measures 

References

  1. 1.
    Dy, J., Brodley, C.: Feature selection for unsupervised learning. Journal of Machine Learning Research 5, 845–889 (2004)MathSciNetGoogle Scholar
  2. 2.
    Hammer, B., Strickert, M., Villmann, T.: Supervised neural gas with general similarity measure. Neural Processing Letters 21(1), 21–44 (2005)CrossRefGoogle Scholar
  3. 3.
    Søndberg-Madsen, N., Thomsen, C., Pena, J.: Unsupervised feature subset selection. In: Proceedings on the Workshop on Probabilistic Graphical Models for Classification, pp. 71–82 (2003)Google Scholar
  4. 4.
    Strickert, M., Teichmann, S., Sreenivasulu, N., Seiffert, U.: High-Throughput Multi-Dimensional Scaling (HiT-MDS) for cDNA-array expression data. In: Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S. (eds.) ICANN 2005. LNCS, vol. 3696, pp. 625–634. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  5. 5.
    Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Proceedings of the 20th International Conference on Machine Learning (ICML 2003), pp. 856–863 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Marc Strickert
    • 1
  • Nese Sreenivasulu
    • 2
  • Silke Peterek
    • 3
  • Winfriede Weschke
    • 2
  • Hans-Peter Mock
    • 3
  • Udo Seiffert
    • 1
  1. 1.Leibniz Institute of Plant Genetics and Crop Plant Research GaterslebenPattern Recognition Group 
  2. 2.Leibniz Institute of Plant Genetics and Crop Plant Research GaterslebenGene Expression Group 
  3. 3.Leibniz Institute of Plant Genetics and Crop Plant Research GaterslebenApplied Biochemistry 

Personalised recommendations