Advertisement

Statistical Methods & Applications

, Volume 24, Issue 1, pp 159–175 | Cite as

Exploring copulas for the imputation of complex dependent data

  • F. Marta L. Di Lascio
  • Simone Giannerini
  • Alessandra Reale
Article

Abstract

In this work we introduce a copula-based method for imputing missing data by using conditional density functions of the missing variables given the observed ones. In theory, such functions can be derived from the multivariate distribution of the variables of interest. In practice, it is very difficult to model joint distributions and derive conditional distributions, especially when the margins are different. We propose a natural solution to the problem by exploiting copulas so that we derive conditional density functions through the corresponding conditional copulas. The approach is appealing since copula functions enable us (1) to fit any combination of marginal distribution functions, (2) to take into account complex multivariate dependence relationships and (3) to model the marginal distributions and the dependence structure separately. We describe the method and perform a Monte Carlo study in order to compare it with two well-known imputation techniques: the nearest neighbour donor imputation and the regression imputation by EM algorithm. Our results indicate that the proposal compares favourably with classical methods in terms of preservation of microdata, margins and dependence structure.

Keywords

Imputation Copula function Multivariate dependence  Donor imputation EM-based regression imputation 

Notes

Acknowledgments

The authors wish to thank Paola Monari (University of Bologna, Italy) and Antonia Manzari (Italian Statistical Institute, ISTAT) for their support and useful discussions. The first author acknowledges the support of Free University of Bozen-Bolzano, School of Economics and Management via the project “Multivariate analysis techniques based on copula function”.

References

  1. Chen J, Shao J (2000) Nearest neighbour imputation for survey data. J Off Stat 16(2):113–131zbMATHGoogle Scholar
  2. Cherubini U, Luciano E, Vecchiato W (2004) Copula methods in finance. Wiley, ChichesterCrossRefzbMATHGoogle Scholar
  3. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood estimation for incomplete data via the EM algorithm. J R Stat Soc Ser B Stat Methodol 39(1):1–38zbMATHMathSciNetGoogle Scholar
  4. Hörmann W, Leydold J, Derflinger G (2007) Inverse transformed density rejection for unbounded monotone densities. ACM Trans Model Comput Simul 18(1):16Google Scholar
  5. Jhun M, Jeong HC, Koo JY (2007) On the use of adaptive nearest neighbors for missing value imputation. Commun Stat Simul Comput 36:1275–1286CrossRefzbMATHMathSciNetGoogle Scholar
  6. Joe H (1997) Multivariate models and multivariate concepts. Chapman & Hall, New YorkCrossRefzbMATHGoogle Scholar
  7. Joe H, Xu J (1996) The estimation method of inference functions for margins for multivariate models. Technical Report 166, Department of Statistics, University of British ColumbiaGoogle Scholar
  8. Käärik E, Käärik M (2009) Modeling dropouts by conditional distribution, a copula-based approach. J Stat Plan Inference 139:3830–3835CrossRefzbMATHGoogle Scholar
  9. Kalton G, Kasprzyk D (1982) Imputing for missing survey responses. Proceedings of the survey research methods section. Washington DC, American Statistical Association, p 22–31Google Scholar
  10. Kalton G, Kasprzyk D (1986) The treatment of missing survey data. Surv Methodol 12:1–16Google Scholar
  11. Little RJA (1988) Missing data adjustments in large surveys. J Bus Econ Stat 6(2):287–295Google Scholar
  12. Muñoz JF, Rueda M (2009) New imputation methods for missing data using quantiles. J Comput Appl Math 232:305–317CrossRefzbMATHMathSciNetGoogle Scholar
  13. Nelsen RB (2006) Introduction to copulas. Springer, New YorkzbMATHGoogle Scholar
  14. Rivero C, Castillo A, Zufiria PJ, Valdés T (2004) Global dynamics of a system governing an algorithm for regression with censored and non-censored data under general errors. J Comput Appl Math 166:535–551CrossRefzbMATHMathSciNetGoogle Scholar
  15. Schafer JL (1997) Analysis of incomplete multivariate data. Chapman & Hall, LondonCrossRefzbMATHGoogle Scholar
  16. Sklar A (1959) Fonctions de répartition à n dimensions et leurs marges. Publ Inst Stat Univ Paris 8:229–231MathSciNetGoogle Scholar
  17. Trivedi PK, Zimmer DM (2005) Copula modeling: an introduction for practitioners. Foundations and trends in econometrics, vol 1. Boston, Now Publisher Inc, pp 1–111Google Scholar
  18. Wang Y, Wan W, Wang RS, Feng E (2009) Model, properties and imputation method of missing snp genotype data utilizing mutual information. J Comput Appl Math 229:168–174CrossRefzbMATHMathSciNetGoogle Scholar
  19. Zimmer DM, Trivedi PK (2006) Using trivariate copulas to model sample selection and treatment effects: application to family health care demand. J Bus Econ Stat 24:63–76CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • F. Marta L. Di Lascio
    • 1
  • Simone Giannerini
    • 2
  • Alessandra Reale
    • 3
  1. 1.Faculty of Economics and ManagementFree University of Bozen-BolzanoBolzanoItaly
  2. 2.Department of Statistical SciencesUniversity of BolognaBolognaItaly
  3. 3.ISTAT, Italian Statistical InstituteRomeItaly

Personalised recommendations