Unsupervised Feature Selection for Biomarker Identification in Chromatography and Gene Expression Data
A novel approach to feature selection from unlabeled vector data is presented. It is based on the reconstruction of original data relationships in an auxiliary space with either weighted or omitted features. Feature weighting, on one hand, is related to the return forces of factors in a parametric data similarity measure as response to disturbance of their optimum values. Feature omission, on the other hand, inducing measurable loss of reconstruction quality, is realized in an iterative greedy way. The proposed framework allows to apply custom data similarity measures. Here, adaptive Euclidean distance and adaptive Pearson correlation are considered, the former serving as standard reference, the latter being usefully for intensity data. Results of the different strategies are given for chromatography and gene expression data.
KeywordsFeature selection adaptive similarity measures
- 3.Søndberg-Madsen, N., Thomsen, C., Pena, J.: Unsupervised feature subset selection. In: Proceedings on the Workshop on Probabilistic Graphical Models for Classification, pp. 71–82 (2003)Google Scholar
- 5.Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Proceedings of the 20th International Conference on Machine Learning (ICML 2003), pp. 856–863 (2003)Google Scholar