Fast, Accurate, and Stable Feature Selection Using Neural Networks
Multi-voxel pattern analysis often necessitates feature selection due to the high dimensional nature of neuroimaging data. In this context, feature selection techniques serve the dual purpose of potentially increasing classification accuracy and revealing sets of features that best discriminate between classes. However, feature selection techniques in current, widespread use in the literature suffer from a number of deficits, including the need for extended computational time, lack of consistency in selecting features relevant to classification, and only marginal increases in classifier accuracy. In this paper we present a novel method for feature selection based on a single-layer neural network which incorporates cross-validation during feature selection and stability selection through iterative subsampling. Comparing our approach to popular alternative feature selection methods, we find increased classifier accuracy, reduced computational cost and greater consistency with which relevant features are selected. Furthermore, we demonstrate that importance mapping, a technique used to identify voxels relevant to classification, can lead to the selection of irrelevant voxels due to shared activation patterns across categories. Our method, owing to its relatively simple architecture, flexibility and speed, can provide a viable alternative for researchers to identify sets of features that best discriminate classes.
KeywordsFeature selection fMRI MVPA Machine learning
This research was supported by FWO-Flanders Odysseus II Award #G.OC44.13 N to WHA.
Compliance with Ethical Standards
Conflict of Interest
We report no conflicts of interest.
- Abraham, A., Pedregosa, F., Eickenberg, M., Gervais, P., Muller, A., Kossaifi, J., … Varoquaux, G. (2014). Machine learning for neuroimaging with Scikit-learn. arXiv:1412.3919 [Cs, Stat]. Retrieved from http://arxiv.org/abs/1412.3919
- Cao, L. J., & Chong, W. K. (2002). Feature extraction in support vector machine: a comparison of PCA, XPCA and ICA. In Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP ‘02 (Vol. 2, pp. 1001–1005 vol. 2). https://doi.org/10.1109/ICONIP.2002.1198211.
- Chou, C. A., Kampa, K., Mehta, S. H., Tungaraza, R. F., Chaovalitwongse, W. A., & Grabowski, T. J. (2014). Voxel selection framework in multi-voxel pattern analysis of fMRI data for prediction of neural response to visual stimuli. IEEE Transactions on Medical Imaging, 33(4), 925–934. https://doi.org/10.1109/TMI.2014.2298856.CrossRefPubMedGoogle Scholar
- Chu, C., Hsu, A.-L., Chou, K.-H., Bandettini, P., & Lin, C. (2012). Does feature selection improve classification accuracy? Impact of sample size and feature selection on classification using anatomical magnetic resonance images. NeuroImage, 60(1), 59–70. https://doi.org/10.1016/j.neuroimage.2011.11.066.CrossRefPubMedGoogle Scholar
- Das, S. (2001). Filters, wrappers and a boosting-based hybrid for feature selection. In Proceedings of the eighteenth international conference on machine learning (pp. 74–81). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. Retrieved from http://dl.acm.org/citation.cfm?id=645530.658297.Google Scholar
- De Martino, F., Valente, G., Staeren, N., Ashburner, J., Goebel, R., & Formisano, E. (2008). Combining multivariate voxel selection and support vector machines for mapping and classification of fMRI spatial patterns. NeuroImage, 43(1), 44–58. https://doi.org/10.1016/j.neuroimage.2008.06.037.CrossRefPubMedGoogle Scholar
- Dittman, D., Khoshgoftaar, T. M., Wald, R., & Wang, H. (2011). Stability Analysis of Feature Ranking Techniques on Biological Datasets. In 2011 I.E. International Conference on Bioinformatics and Biomedicine (pp. 252–256). https://doi.org/10.1109/BIBM.2011.84.
- Do, L.-N., Yang, H.-J., Kim, S.-H., Lee, G.-S., & Kim, S.-H. (2015). A multi-voxel-activity-based feature selection method for human cognitive states classification by functional magnetic resonance imaging data. Cluster Computing, 18(1), 199–208. https://doi.org/10.1007/s10586-014-0369-9.CrossRefGoogle Scholar
- Fleuret, F. (2004). Fast binary feature selection with conditional mutual information. Journal of Machine Learning Research, 5(Nov), 1531–1555.Google Scholar
- Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3(Mar), 1157–1182.Google Scholar
- Hall, M. A. (1998). Correlation-based feature selection for machine learning.Google Scholar
- Hebart, M. N., Görgen, K., & Haynes, J.-D. (2015). The decoding toolbox (TDT): A versatile software package for multivariate analyses of functional imaging data. Frontiers in Neuroinformatics, 8. https://doi.org/10.3389/fninf.2014.00088.
- Kalousis, A., Prados, J., & Hilario, M. (2005). Stability of feature selection algorithms. In Fifth IEEE International Conference on Data Mining (ICDM’05) (p. 8 pp.-). https://doi.org/10.1109/ICDM.2005.135.
- Kononenko, I., & Simec, E. (1995). Induction of decision trees using Relieff. In Proceedings of the ISSEK94 workshop on mathematical and statistical methods in artificial intelligence (pp. 199–220). Springer, Vienna. https://doi.org/10.1007/978-3-7091-2690-5_14.
- Křížek, P., Kittler, J., & Hlaváč, V. (2007). Improving stability of feature selection methods. In Computer Analysis of Images and Patterns (pp. 929–936). Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74272-2_115, Improving Stability of Feature Selection Methods.
- Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R. P., Tang, J., & Liu, H. (2017). Feature Selection: A Data Perspective. ACM Computing. Surveys, 50(6), 94:1–94:45. : https://doi.org/10.1145/3136625.
- Liu, H., & Setiono, R. (1995). Chi2: feature selection and discretization of numeric attributes. In Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence (pp. 388–391). https://doi.org/10.1109/TAI.1995.479783.
- Mahmoudi, A., Takerkart, S., Regragui, F., Boussaoud, D., & Brovelli, A. (2012). Multivoxel pattern analysis for fMRI data: A review. Computational and Mathematical Methods in Medicine, 2012, e961257. https://doi.org/10.1155/2012/961257.
- McDuff, S. G. R., Frankel, H. C., & Norman, K. A. (2009). Multivoxel pattern analysis reveals increased memory targeting and reduced use of retrieved details during single-agenda source monitoring. Journal of Neuroscience, 29(2), 508–516. https://doi.org/10.1523/JNEUROSCI.3587-08.2009.CrossRefPubMedPubMedCentralGoogle Scholar
- Michel, V., Damon, C., & Thirion, B. (2008). Mutual information-based feature selection enhances fMRI brain activity classification. In 2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro (pp. 592–595). https://doi.org/10.1109/ISBI.2008.4541065.
- Nie, F., Xiang, S., Jia, Y., Zhang, C., & Yan, S. (2008). Trace ratio criterion for feature selection. In In AAAI (pp. 671–676).Google Scholar
- Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., … Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12(Oct), 2825–2830.Google Scholar
- Saeys, Y., Abeel, T., & Peer, Y. V. de. (2008). Robust feature selection using ensemble feature selection techniques. In Machine Learning and Knowledge Discovery in Databases (pp. 313–325). Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87481-2_21, Robust Feature Selection Using Ensemble Feature Selection Techniques.
- Sayres, R., Ress, D., & Grill-Spector, K. (2005). Identifying distributed object representations in human Extrastriate visual cortex. In Proceedings of the 18th international conference on neural information processing systems (pp. 1169–1176). Cambridge: MIT Press Retrieved from http://dl.acm.org/citation.cfm?id=2976248.2976395.Google Scholar
- Wang, Y., Li, Z., Wang, Y., Wang, X., Zheng, J., Duan, X., & Chen, H. (2015). A Novel Approach for Stable Selection of Informative Redundant Features from High Dimensional fMRI Data. arXiv:1506.08301 [Cs, Stat]. Retrieved from http://arxiv.org/abs/1506.08301
- Wright, S. (1965). The interpretation of population structure by F-statistics with special regard to Systems of Mating. Evolution, 19(3), 395–420. https://doi.org/10.1111/j.1558-5646.1965.tb01731.x.CrossRefGoogle Scholar
- Yan, S., Yang, X., Wu, C., Zheng, Z., & Guo, Y. (2014). Balancing the stability and predictive performance for multivariate voxel selection in fMRI study. In Brain Informatics and Health (pp. 90–99). Springer, Cham. https://doi.org/10.1007/978-3-319-09891-3_9, Balancing the Stability and Predictive Performance for Multivariate Voxel Selection in fMRI Study.
- Zeithamova, D., de Araujo Sanchez, M.-A., & Adke, A. (2017). Trial timing and pattern-information analyses of fMRI data. NeuroImage, 153(Supplement C), 221–231. https://doi.org/10.1016/j.neuroimage.2017.04.025.