# Data complexity meta-features for regression problems

**Part of the following topical collections:**

## Abstract

In meta-learning, classification problems can be described by a variety of features, including complexity measures. These measures allow capturing the complexity of the frontier that separates the classes. For regression problems, on the other hand, there is a lack of such type of measures. This paper presents and analyses measures devoted to estimate the complexity of the function that should fitted to the data in regression problems. As case studies, they are employed as meta-features in three meta-learning setups: (i) the first one predicts the regression function type of some synthetic datasets; (ii) the second one is designed to tune the parameter values of support vector regressors; and (iii) the third one aims to predict the performance of various regressors for a given dataset. The results show the suitability of the new measures to describe the regression datasets and their utility in the meta-learning tasks considered. In cases (ii) and (iii) the achieved results are also similar or better than those obtained by the use of classical meta-features in meta-learning.

## Keywords

Meta-learning Meta-features Complexity measures## Notes

### Acknowledgements

To the research agencies FAPESP (2012/22608-8), CNPq (482222/2013-1, 308858/2014-0 and 305611/2015-1), CAPES, DAAD and IZKF Aachen for the financial support.

## References

- Amasyali, M., & Erson, O. (2009).
*A study of meta learning for regression*. Tech. rep. ECE Technical Reports 386, Purdue University.Google Scholar - Armstrong, J. S. (2012). Illusions in regression analysis.
*International Journal of Forecasting*,*28*(3), 689–694.CrossRefGoogle Scholar - Bache, K., & Lichman, M. (2013).
*UCI machine learning repository*. http://archive.ics.uci.edu/ml, University of California, Irvine, School of Information and Computer Sciences. - Basak, D., Pal, S., & Patranabis, D. C. (2007). Support vector regression.
*Neural Information Processing-Letters and Reviews*,*11*(10), 203–224.Google Scholar - Brazdil, P., Giraud-Carrier, C., Soares, C., & Vilalta, R. (2008).
*Meta-learning: Applications to data mining*. New York: Springer Science and Business Media.zbMATHGoogle Scholar - Cavalcanti, G., Ren, T., & Vale, A. (2012). Data complexity measures and nearest neighbor classifiers: A practical analysis for meta-learning. In:
*IEEE 24th international conference on tools with artificial intelligence (ICTAI), 2012*(Vol. 1, pp. 1065–1069). IEEE.Google Scholar - Cristianini, N., Shawe-Taylor, J., Elisseeff, A., & Kandola, J. (2002). On kernel-target alignment.
*Advances in Neural Information Processing Systems*,*14*, 367–373.Google Scholar - de Miranda, P., Prudêncio, R. B. C., Carvalho, A., & Soares, C. (2014). A hybrid meta-learning architecture for multi-objective optimization of SVM parameters.
*Neurocomputing*,*143*, 27–43.CrossRefGoogle Scholar - Garcia, L. P., de Carvalho, A. C., & Lorena, A. C. (2015). Effect of label noise in the complexity of classification problems.
*Neurocomputing*,*160*, 108–119.CrossRefGoogle Scholar - Garcia, L. P., de Carvalho, A. C., & Lorena, A. C. (2016). Noise detection in the meta-learning level.
*Neurocomputing*,*176*, 14–25.CrossRefGoogle Scholar - Gomes, T. A. F., Prudêncio, R. B. C., Soares, C., Rossi, A. L. D., & Carvalho, A. (2012). Combining meta-learning and search techniques to select parameters for support vector machines.
*Neurocomputing*,*75*(1), 3–13.CrossRefGoogle Scholar - Ho, T. K., & Basu, M. (2002). Complexity measures of supervised classification problems.
*IEEE Transactions on Pattern Analysis and Machine Intelligence*,*24*(3), 289–300.CrossRefGoogle Scholar - Keerthi, S. S., & Lin, C. J. (2003). Asymptotic behaviors of support vector machines with Gaussian kernel.
*Neural Computation*,*15*(7), 1667–1689.CrossRefzbMATHGoogle Scholar - Kuba, P., Brazdil, P., Soares, C., & Woznica, A. (2002). Exploiting sampling and meta-learning for parameter setting for support vector machines. In:
*VIII Iberoamerican conference on artificial intellignce proceedings of workshop learning and data mining associated with iberamia 2002*, (University of Sevilla, Sevilla (Spain), (pp. 209–216).Google Scholar - Leite, R., Brazdil, P., & Vanschoren, J. (2012). Selecting classification algorithms with active testing. In:
*Proceedings of the 8th international conference on machine learning and data mining in pattern recognition*(pp. 117–131).Google Scholar - Leyva, E., Gonzalez, A., & Perez, R. (2015). A set of complexity measures designed for applying meta-learning to instance selection.
*IEEE Transactions on Knowledge and Data Engineering*,*27*(2), 354–367.CrossRefGoogle Scholar - Loterman, G., & Mues, C. (2012). Selecting accurate and comprehensible regression algorithms through meta learning. In:
*IEEE 12th international conference on data mining workshops*(pp. 953–960).Google Scholar - Maciel, A. I., Costa, I. G., & Lorena, A. C. (2016). Measuring the complexity of regression problems. In:
*IEEE proceedings of the 2016 international conference on neural networks*(**in press**).Google Scholar - Morán-Fernández, L., Bolón-Canedo, V., & Alonso-Betanzos, A. (2017). Can classification performance be predicted by complexity measures? A study using microarray data.
*Knowledge and Information Systems*,*51*(3), 1067–1090.CrossRefGoogle Scholar - Orriols-Puig, A., Maci, N., & Ho, T. K. (2010).
*Documentation for the data complexity library in c++*. Tech. rep., La Salle—Universitat Ramon Llull.Google Scholar - Pappa, G. L., Ochoa, G., Hyde, M. R., Freitas, A. A., Woodward, J., & Swan, J. (2014). Contrasting meta-learning and hyper-heuristic research: The role of evolutionary algorithms.
*Genetic Programming and Evolvable Machines*,*15*(1), 3–35.CrossRefGoogle Scholar - Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in python.
*The Journal of Machine Learning Research*,*12*, 2825–2830.MathSciNetzbMATHGoogle Scholar - Smith, M. R., White, A., Giraud-Carrier, C., & Martinez, T. (2014). An easy to use repository for comparing and improving machine learning algorithm usage. Preprint. arXiv:14057292.
- Soares, C. (2008). Development of metalearning systems for algorithm recommendation. In: P. Brazdil, C. Giraud-Carrier, C. Soares & R. Vilalta (Eds.),
*Meta-learning: applications to data mining*(pp. 33–62). Springer.Google Scholar - Soares, C., & Brazdil, P. B. (2006). Selecting parameters of SVM using meta-learning and kernel matrix-based meta-features. In:
*Proceedings of the 2006 ACM symposium on applied computing*, ACM, SAC ’06, (pp. 564–568).Google Scholar - Soares, C., Brazdil, P. B., & Kuba, P. (2004). A meta-learning method to select the kernel width in support vector regression.
*Machine Learning*,*54*(3), 195–209.CrossRefzbMATHGoogle Scholar - Thornton, C., Hutter, F., Hoos, H., & Leyton-Brown, K. (2013). Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. In:
*Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining*(pp. 847–855).Google Scholar - Wistuba, M., Schilling, N., & Schmidt-Thieme, L. (2016). Two-stage transfer surrogate model for automatic hyperparameter optimization. In:
*European conference on machine learning and knowledge discovery in databases*(pp. 199–214).Google Scholar