Abstract
The complex case in which several variables contain missing values needs to be analyzed by means of an iterative procedure. The imputation methods most commonly employed, however, rely on parametric assumptions. In this paper we propose a new non-parametric method for multiple imputation based on Ensemble Support Vector Regression. This procedure works under quite general assumptions and has been tested with different simulation schemes. We show that the results obtained in this way are better than the ones obtained with other methods usually employed to get a complete data set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Breiman, L.: Bagging predictors. Machine Learning 26, 123–140 (1996)
Boser, B., Guyon, I., Vapnik V.: A training algorithm for optimal margin classifiers. Proceedings of the Fifth Annual Workshop on Computational Learning Theory, 144–152 (1992)
Brand, J., Buuren, S., Groothuis-Oudshoorn, K., Gelsema, E. S.: A toolkit in SAS for the evaluation of multiple imputation methods. Statistical Neerlandica 57, 36–45 (2003)
Cherkassky, V., Yunqian, M.: Practical selection of SVM parameters and noise estimation for SVM regression. Neural Networks 17(1), 113–126 (2004)
Dempster, A. P., Laird, N., Rubin, D. B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 39, 1–38 (1977)
Di Ciaccio, A.: Bootstrap and Nonparametric Predictors to Impute Missing Data, Cladag,(2008)
Durrant, G. B.: Imputation methods for handling item-nonresponse in the social sciences: A methodological review. NCRM Working Paper Series, (2005)
Di Zio, M., Guarnera, U.: A multiple imputation method for non-Gaussian data. Metron - International Journal of Statistics LXVI(1), 75–90 (2008)
Efron, B.: Missing data, imputation, and the bootstrap. Journal of the American Statistical Association 89, 463–475 (1994)
Freud, Y., Shapire, R.: A decision theoretic generalization of online learning and an application to boosting. J. Comput.System Sci. 55(1), 119–139 (1997)
Graham, J.W., Schafer, J.L.: On the performance of multiple imputation for multivariate data with small sample size. Hoyle, R. (Ed.), Statistical Strategies for Small Sample Research. Sage, Thousand Oaks, CA, 1–29 (1999)
Kim, H.-C., Pang, S., Je, H.-M., Kim, D., Yang Bang, S.: Constructing support vector machine ensemble. Pattern Recongnition 36, 2757–2767 (2003)
Little, R., Rubin, D.: Statistical Analysis with Missing Data. New York, Wiley (1987)
Mallinson, H., Gammerman, A.: Imputation Using Support Vector Machines. http://www.cs.york.ac.uk/euredit/ (2003)
Raghunathan, T. E., Lepkowski, J. M., Van Hoewyk, J., and Solenberger, P.: A Multivariate Technique for Multiply Imputing Missing Values Using a Sequence of Regression Models. Survey Methodology 27(1), 85–96 (2001)
Rubin, D.B.: Multiple Imputation for Nonresponse in Surveys. Jhon Wiley &Sons (1987)
Rubin, D.B., Schenker, N.: Multiple Imputation for interval estimatation from simple random samples with ignorable nonresponse. Journal of the American Statistical Association 81, 366–374 (1986)
Safaa R. Amer: Neural Network Imputation in Complex Survey Design. International Journal of Electrical, Computer, and Systems Engineering 3(1), 52–57 (2009)
Shafer, J.L.: Analysis of Incomplete Multivariate Data. Chapman and Hall (1997)
Shafer, J.L., Olsen, M.K.: Multiple imputation for multivariate missing-data problems: a data analyst’s perspective. Multivariate Behavioral Research 33, 545571 (1998)
Smola, A.J., Schölkopf B.: A Tutorial on Support Vector Regression. NeuroCOLT, Technical Report NC-TR-98–030, Royal Holloway College, University of London, UK (1998)
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer Verlag (1999)
Wang, F., Yangh, H-Z.: epsilon-insensitive support vector regression ensemble algorithm based on improved adaboost. Computer Engineering and Applications 44, 42–44 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Scacciatelli, D. (2012). Ensemble Support Vector Regression:A New Non-parametric Approach for Multiple Imputation. In: Di Ciaccio, A., Coli, M., Angulo Ibanez, J. (eds) Advanced Statistical Methods for the Analysis of Large Data-Sets. Studies in Theoretical and Applied Statistics(). Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21037-2_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-21037-2_15
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21036-5
Online ISBN: 978-3-642-21037-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)