Strengthening the Forward Variable Selection Stopping Criterion
Given any modeling problem, variable selection is a preprocess step that selects the most relevant variables with respect to the output variable. Forward selection is the most straightforward strategy for variable selection; its application using the mutual information is simple, intuitive and effective, and is commonly used in the machine learning literature. However the problem of when to stop the forward process doesn’t have a direct satisfactory solution due to the inaccuracies of the Mutual Information estimation, specially as the number of variables considered increases. This work proposes a modified stopping criterion for this variable selection methodology that uses the Markov blanket concept. As it will be shown, this approach can increase the performance and applicability of the stopping criterion of a forward selection process using mutual information.
KeywordsVariable Selection Mutual Information Function Approximation
Unable to display preview. Download preview PDF.
- 5.Koller, D., Sahami, M.: Toward optimal feature selection. In: Proc. Int. Conf. on Machine Learning, pp. 284–292 (1996)Google Scholar
- 8.Saunders, C., Gammerman, A., Vovk, V.: Ridge regression learning algorithm in dual variables. In: Proceedings of the 15th International Conference on Machine Learning, pp. 515–521. Morgan Kaufmann, San Francisco (1998)Google Scholar
- 10.Guillen, A., Rojas, I., Rubio, G., Pomares, H., Herrera, L., Gonzalez, J.: A new interface for mpi in matlab and its application over a genetic algorithm. In: ESTSP 2008: Proceedings of the European Symposium on Time Series Prediction, pp. 37–46 (2008)Google Scholar
- 11.Hyndman, R.: Time series data library (1994), http://www-personal.buseco.monash.edu.au/~hyndman/TSDL/hydrology.html
- 13.Astakhov, S., Grassberger, P., Kraskov, A., Stögbauer, H.: Mutual information least dependent component analysis (2004), http://www.klab.caltech.edu/~kraskov/MILCA/