Skip to main content

Boosted Incremental Tree-based Imputation of Missing Data

  • Conference paper
Data Analysis, Classification and the Forward Search

Abstract

Tree-based procedures have been recently considered as non parametric tools for missing data imputation when dealing with large data structures and no probability assumption. A previous work used an incremental algorithm based on cross-validated decision trees and a lexicographic ordering of the single data to be imputed. This paper considers an ensemble method where tree-based model is used as learner. Furthermore, the incremental imputation concerns missing data of each variable at turn. As a result, the proposed method allows more accurate imputa-tions through a more efficient algorithm. A simulation case study shows the overall good performance of the proposed method against some competitors. A MatLab implementation enriches Tree Harvest Software for non-standard classification and regression trees.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • ARIA, M. and SICILIANO, R. (2003): Learning from Trees: Two-Stage Enhance-ments. CLADAG 2003, Book of Short Papers, CLUEB, Bologna, 21–24.

    Google Scholar 

  • BREIMAN, L. (1996). Bagging Predictors. Machine Learning, 36.

    Google Scholar 

  • CHU, C.K. and CHENG, P.E. (1995): Nonparametric regression estimation with missing data. Journal of Statistical Planning and Inference, 48, 85–99.

    Article  MATH  Google Scholar 

  • EIBL, G. and PFEIFFER, K.P. (2002): How to make AdaBoost.M1 work for weak base classifiers by changing only one line of the code. Machine Learning: ECML 2002, Lecture Notes in Artificial Intelligence.

    Google Scholar 

  • FREUND, Y. and SCHAPIRE R.E. (1997): A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and Sys-tem Sciences, 55.

    Google Scholar 

  • FRIEDMAN, J.H. and POPESCU, B.E. (2005): Predictive Learning via Rule En-sembles. Technical Report of Stanford University.

    Google Scholar 

  • HASTIE, T., TIBSHIRANI, R. and FRIEDMAN X (2002): The Elements of Sta-tistical Learning, Springer Verlag, New York.

    Google Scholar 

  • IBRAHIM, J.G., LIPSITZ, S.R. and CHEN, M.H. (1999): Missing Covariates in Generalized Linear Models when the missing data mechanism is non-ignorable. Journal of the Royal Statistical Society, B, 61, 173–190.

    Article  MATH  Google Scholar 

  • LITTLE, J.R.A and RUBIN, D.B. (1987): Statistical Analysis with Missing Data. John Wiley and Sons, New York.

    MATH  Google Scholar 

  • LITTLE, J.R.A. (1992): Regression with Missing X’s: a Review. Journal of the American Statistical Association, 87, 1227–1237.

    Article  Google Scholar 

  • MOLA, F. and SICILIANO, R. (1992): A two-stage predictive splitting algorithm in binary segmentation. In Dodge, Y. and Whittaker, J. (Eds.): Computational Statistics. Physica Verlag, Heidelberg, 179–184.

    Google Scholar 

  • MOLA, F. and SICILIANO, R. (1997): A Fast Splitting Procedure for Classification and Regression Trees. Statistics and Computing, 7, 208–216.

    Article  Google Scholar 

  • PETRAKOS, G., CONVERSANO, C., FARMAKIS, G., MOLA, F., SICILIANO, R. and STAVROPOULOS, P. (2004): New ways to specify data edits, Journal of Royal Statistical Society, Series A, 167, 249–274.

    Google Scholar 

  • SICILIANO, R., ARIA, M. and CONVERSANO, C. (2004): Tree Harvest: Methods, Software and Some Applications. In Antoch J. (Ed.): Proceedings in Computational Statistics. Physica-Verlag, 1807–1814.

    Google Scholar 

  • SICILIANO, R. and CONVERSANO, C. (2002): Tree-based Classifiers for Conditional Missing Data Incremental Imputation, Proceedings of the International Conference on Data Clean, University of Jyvaskyla.

    Google Scholar 

  • SICILIANO, R. and MOLA, F. (2000): Multivariate Data Analysis through Classification and Regression Trees. Computational Statistics and Data Analysis, 32, 285–301.

    Article  MATH  Google Scholar 

  • VACH, W. (1994): Logistic Regression with Missing Values and covariates. Lecture notes in statistics, vol. 86, Springer Verlag, Berlin.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Heidelberg

About this paper

Cite this paper

Siciliano, R., Aria, M., D’Ambrosio, A. (2006). Boosted Incremental Tree-based Imputation of Missing Data. In: Zani, S., Cerioli, A., Riani, M., Vichi, M. (eds) Data Analysis, Classification and the Forward Search. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-35978-8_31

Download citation

Publish with us

Policies and ethics