Exploring copulas for the imputation of complex dependent data
- 266 Downloads
In this work we introduce a copula-based method for imputing missing data by using conditional density functions of the missing variables given the observed ones. In theory, such functions can be derived from the multivariate distribution of the variables of interest. In practice, it is very difficult to model joint distributions and derive conditional distributions, especially when the margins are different. We propose a natural solution to the problem by exploiting copulas so that we derive conditional density functions through the corresponding conditional copulas. The approach is appealing since copula functions enable us (1) to fit any combination of marginal distribution functions, (2) to take into account complex multivariate dependence relationships and (3) to model the marginal distributions and the dependence structure separately. We describe the method and perform a Monte Carlo study in order to compare it with two well-known imputation techniques: the nearest neighbour donor imputation and the regression imputation by EM algorithm. Our results indicate that the proposal compares favourably with classical methods in terms of preservation of microdata, margins and dependence structure.
KeywordsImputation Copula function Multivariate dependence Donor imputation EM-based regression imputation
The authors wish to thank Paola Monari (University of Bologna, Italy) and Antonia Manzari (Italian Statistical Institute, ISTAT) for their support and useful discussions. The first author acknowledges the support of Free University of Bozen-Bolzano, School of Economics and Management via the project “Multivariate analysis techniques based on copula function”.
- Hörmann W, Leydold J, Derflinger G (2007) Inverse transformed density rejection for unbounded monotone densities. ACM Trans Model Comput Simul 18(1):16Google Scholar
- Joe H, Xu J (1996) The estimation method of inference functions for margins for multivariate models. Technical Report 166, Department of Statistics, University of British ColumbiaGoogle Scholar
- Kalton G, Kasprzyk D (1982) Imputing for missing survey responses. Proceedings of the survey research methods section. Washington DC, American Statistical Association, p 22–31Google Scholar
- Kalton G, Kasprzyk D (1986) The treatment of missing survey data. Surv Methodol 12:1–16Google Scholar
- Little RJA (1988) Missing data adjustments in large surveys. J Bus Econ Stat 6(2):287–295Google Scholar
- Trivedi PK, Zimmer DM (2005) Copula modeling: an introduction for practitioners. Foundations and trends in econometrics, vol 1. Boston, Now Publisher Inc, pp 1–111Google Scholar