Sparse Feature Learning Using Ensemble Model for Highly-Correlated High-Dimensional Data
High-dimensional highly correlated data exist in several domains such as genomics. Many feature selection techniques consider correlated features as redundant and therefore need to be removed. Several studies investigate the interpretation of the correlated features in domains such as genomics, but investigating the classification capabilities of the correlated feature groups is a point of interest in several domains. In this paper, a novel method is proposed by integrating the ensemble feature ranking and co-expression networks to identify the optimal features for classification. The main advantage of the proposed method lies in the fact, that it does not consider the correlated features as redundant. But, it shows the importance of the selected correlated features to improve the performance of classification. A series of experiments on five high dimensional highly correlated datasets with different levels of imbalance ratios show that the proposed method outperformed the state-of-the-art methods.
KeywordsFeature selection High-dimensional data Feature correlation
- 4.Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152. ACM, Pittsburgh (1992)Google Scholar
- 5.Braytee, A., Liu, W., Kennedy, P.J.: Supervised context-aware non-negative matrix factorization to handle high-dimensional high-correlated imbalanced biomedical data. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 4512–4519. IEEE, Anchorage (2017)Google Scholar
- 6.Conn, D., Ngun, T., Li, G., Ramirez, C.: Fuzzy forests: extending random forests for correlated, high-dimensional data. UCLA Biostatistics Working Paper Series (2015)Google Scholar
- 10.Huang, H.H., Liu, X.Y., Liang, Y.: Feature selection and cancer classification via sparse logistic regression with the hybrid L1/2 + 2 regularization. PLOS ONE 11(5), 1–15 (2016)Google Scholar