Abstract
Tree-based procedures have been recently considered as non parametric tools for missing data imputation when dealing with large data structures and no probability assumption. A previous work used an incremental algorithm based on cross-validated decision trees and a lexicographic ordering of the single data to be imputed. This paper considers an ensemble method where tree-based model is used as learner. Furthermore, the incremental imputation concerns missing data of each variable at turn. As a result, the proposed method allows more accurate imputa-tions through a more efficient algorithm. A simulation case study shows the overall good performance of the proposed method against some competitors. A MatLab implementation enriches Tree Harvest Software for non-standard classification and regression trees.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
ARIA, M. and SICILIANO, R. (2003): Learning from Trees: Two-Stage Enhance-ments. CLADAG 2003, Book of Short Papers, CLUEB, Bologna, 21–24.
BREIMAN, L. (1996). Bagging Predictors. Machine Learning, 36.
CHU, C.K. and CHENG, P.E. (1995): Nonparametric regression estimation with missing data. Journal of Statistical Planning and Inference, 48, 85–99.
EIBL, G. and PFEIFFER, K.P. (2002): How to make AdaBoost.M1 work for weak base classifiers by changing only one line of the code. Machine Learning: ECML 2002, Lecture Notes in Artificial Intelligence.
FREUND, Y. and SCHAPIRE R.E. (1997): A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and Sys-tem Sciences, 55.
FRIEDMAN, J.H. and POPESCU, B.E. (2005): Predictive Learning via Rule En-sembles. Technical Report of Stanford University.
HASTIE, T., TIBSHIRANI, R. and FRIEDMAN X (2002): The Elements of Sta-tistical Learning, Springer Verlag, New York.
IBRAHIM, J.G., LIPSITZ, S.R. and CHEN, M.H. (1999): Missing Covariates in Generalized Linear Models when the missing data mechanism is non-ignorable. Journal of the Royal Statistical Society, B, 61, 173–190.
LITTLE, J.R.A and RUBIN, D.B. (1987): Statistical Analysis with Missing Data. John Wiley and Sons, New York.
LITTLE, J.R.A. (1992): Regression with Missing X’s: a Review. Journal of the American Statistical Association, 87, 1227–1237.
MOLA, F. and SICILIANO, R. (1992): A two-stage predictive splitting algorithm in binary segmentation. In Dodge, Y. and Whittaker, J. (Eds.): Computational Statistics. Physica Verlag, Heidelberg, 179–184.
MOLA, F. and SICILIANO, R. (1997): A Fast Splitting Procedure for Classification and Regression Trees. Statistics and Computing, 7, 208–216.
PETRAKOS, G., CONVERSANO, C., FARMAKIS, G., MOLA, F., SICILIANO, R. and STAVROPOULOS, P. (2004): New ways to specify data edits, Journal of Royal Statistical Society, Series A, 167, 249–274.
SICILIANO, R., ARIA, M. and CONVERSANO, C. (2004): Tree Harvest: Methods, Software and Some Applications. In Antoch J. (Ed.): Proceedings in Computational Statistics. Physica-Verlag, 1807–1814.
SICILIANO, R. and CONVERSANO, C. (2002): Tree-based Classifiers for Conditional Missing Data Incremental Imputation, Proceedings of the International Conference on Data Clean, University of Jyvaskyla.
SICILIANO, R. and MOLA, F. (2000): Multivariate Data Analysis through Classification and Regression Trees. Computational Statistics and Data Analysis, 32, 285–301.
VACH, W. (1994): Logistic Regression with Missing Values and covariates. Lecture notes in statistics, vol. 86, Springer Verlag, Berlin.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Heidelberg
About this paper
Cite this paper
Siciliano, R., Aria, M., D’Ambrosio, A. (2006). Boosted Incremental Tree-based Imputation of Missing Data. In: Zani, S., Cerioli, A., Riani, M., Vichi, M. (eds) Data Analysis, Classification and the Forward Search. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-35978-8_31
Download citation
DOI: https://doi.org/10.1007/3-540-35978-8_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35977-7
Online ISBN: 978-3-540-35978-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)