Multilevel and time-series missing value imputation for combined survey and longitudinal context data

Wutchiett, David; Durand, Claire

doi:10.1007/s11135-021-01186-8

Multilevel and time-series missing value imputation for combined survey and longitudinal context data

Published: 14 October 2021

Volume 56, pages 1799–1828, (2022)
Cite this article

Quality & Quantity Aims and scope Submit manuscript

509 Accesses
1 Citation
Explore all metrics

Abstract

Comparative research examining relationships between individual-level survey response data and time-varying country context variables for political or socioeconomic characteristics is often complicated by missing values. Surveys and longitudinal context measures may be produced during alternative years and at differing frequencies. Observations may be intermittent or may only cover few consecutive years across a full longitudinal sequence. Statistical evaluations that do not impute values with consideration to data’s missingness characteristics may produce biased estimates. Model-based approaches for missing value imputation such as multiple imputation and time series imputation offer means through which imputed values may be produced given complex hierarchical and longitudinal relations. Using incomplete survey data for institutional trust measures from 554,104 respondents from twenty-seven Eastern European and Central Asian countries between 1993 and 2016, and corresponding longitudinal context descriptors of demographic, socioeconomic and political conditions, multilevel multiple imputation and time-series imputation methods were compared and evaluated. Where missingness is intermittent across the breadth of longitudinal sequence, time series imputation may produce convincing estimates for national-level variables’ values while understating uncertainty associated with imputation. When missing values are numerous and span tail ends of a sequence, multivariate multilevel multiple imputation with time variable fixed effects may produce better estimates for country-variables through incorporation of information derived from additional covariates and other countries’ concurrent trajectories. Multilevel multiple imputation models with random slopes for time variables were found to have beneficial qualities in that countries’ unique longitudinal trends are emphasized and fit while that effects of pooled observations and additional covariates contribute to estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sampling Techniques for Quantitative Research

Practical challenges in mediation analysis: a guide for applied researchers

Article Open access 12 April 2024

Social Determinants of Mental Health: Where We Are and Where We Need to Go

Article 17 September 2018

Availability of data and material

Available upon request.

Code availability

Available upon request.

Notes

For example, within the regions of Eastern Europe and West Asia, measurement and public release of several economic and demographic characteristics began in large part with the 1990s. Only since the 2000s has the report of context characteristic data largely increased in frequency and regularity within the countries of Central and Southwest Asia and Southeast Europe.
These countries are Albania, Armenia, Belarus, Bosnia-Herzegovina, Bulgaria, Croatia, Czech Republic, Estonia, Georgia, Greece, Hungary, Kazakhstan, Kyrgyzstan, Latvia, Lithuania, Macedonia, Moldova, Poland, Romania, Russia, Serbia, Slovakia, Slovenia, Tajikistan, Turkey, Ukraine and Uzbekistan.
The programs were the World Values Survey, the Caucasus Barometer, Consolidation of Democracy in Central and Eastern Europe, European Quality of Life, European Social Survey, European Values Study, Life in Transition, New Baltic Barometer, the New Europe Barometer, the New Russia Barometer, Values and Change in Post-communist Europe and the Asian Barometer.
Approaches’ prediction matrices used within the mice imputation function are further defined in Appendix B.
When conducting univariate time series imputation, Poland and Serbia’s longitudinal sequences for the poverty rate had zero observations. In such a case, the univariate time series method has no data to draw upon and no imputations could be performed. As such, yearly mean values of the complete imputed poverty rate variable across the other twenty-five countries were used to generate estimates for the two countries’ yearly poverty rate values.
Countries with no observations for a variable were not considered.
The missing data pattern for the combined two-level data is described in Appendix A.
A preliminary complete case analysis of associations between poverty rate and all other considered longitudinal context and survey respondent covariates through use of a multilevel model with random effects found all independent variables except respondent age to be significantly associated with a country’s yearly poverty rate.

References

Bryk, A.S., Raudenbush, S.W.: Toward a more appropriate conceptualization of research on school effects: a three-level hierarchical linear model. Am. J. Educ. 97(1), 65–108 (1989)
Article Google Scholar
Bryk, A.S., Raudenbush, S.W.: Hierarchical linear models: applications and data analysis methods. Sage Publications, Inc. (1992)
Cave, W., Giovannini, E.: The statistical measurement of services: recent achievements and remaining challenges. Metroeconomica 58(3), 479–501 (2007)
Article Google Scholar
Center for systemic peace. Polity IV database version. Center for systemic peace, Vienna, VA (2018).
Durand, C., Peña Ibarra, L., Rezgui, N. Wutchiett, D. Institutional Trust in the World. https://doi.org/10.5683/SP2/TGJV6G. Scholars portal dataverse, V4. (2020)
Durand, C., Peña Ibarra, L., Rezgui, N., Wutchiett, D.: How to combine and analyse all available data from diverse sources: an analysis of institutional trust across regions of the World. Qual Quant (2021). https://doi.org/10.1007/s11135-020-01088-1
Article Google Scholar
Easterly, W., Ritzen, J., Woolcock, M.: Social cohesion, institutions, and growth. Econ Polit 18(2), 103–120 (2006)
Article Google Scholar
Elahi, A. Challenges of data collection: with special regard to developing countries. In OECD: statistics, knowledge and policy 2007: Measuring and fostering the progress of societies. (2008).
Enders, C.K., Mistler, S.A., Keller, B.T.: Multilevel multiple imputation: a review and evaluation of joint modeling and chained equations imputation. Psychol. Methods 21(2), 222 (2016)
Article Google Scholar
Gelman, A., Hill, J. Data analysis using regression and hierarchical/multilevel models. Cambridge, New York (2007).
Gesthuizen, M., Van der Meer, T., Scheepers, P.: Ethnic diversity and social capital in Europe: tests of Putnam’s thesis in European countries. Scand. Polit. Stud. 32(2), 121–142 (2009)
Article Google Scholar
Grund, S. Multiple imputation for three-level and cross-classified data. R bloggers. https://www.r-bloggers.com/2019/01/multiple-imputation-for-three-level-and-cross-classified-data (2019).
Grund, S., Lüdtke, O., Robitzsch, A.: Multiple imputation of missing data at level 2: a comparison of fully conditional and joint modeling in multilevel designs. J. Educational Behavioral Statist. 43(3), 316–353 (2018a)
Article Google Scholar
Grund, S., Lüdtke, O., Robitzsch, A.: Multiple imputation of missing data for multilevel models: simulations and recommendations. Organ. Res. Methods 21(1), 111–149 (2018b)
Article Google Scholar
Honaker, J., King, J.: What to do about missing values in time-series cross-section data. Am. J. Political Sci. 54(2), 561–581 (2010)
Article Google Scholar
Hudson, J.: Institutional trust and subjective well-being across the EU. Kyklos 59(1), 43–62 (2006)
Article Google Scholar
Joye, D., Sapin, M., Wolf, C.: Weights in comparative surveys? Call for Open. Black Box Newslett. Harmon. Soc. Sci. 5(2), 2–16 (2019)
Google Scholar
Kołczyńska, M.: Micro-and Macro-Level Determinants of Participation in Demonstrations: An Analysis of Cross-National Survey Data Harmonized Ex-Post, pp. 1–36. Methods, Data, Analyses pp (2019)
Google Scholar
Kornai, J.: The great transformation of central Eastern Europe: success and disappointment. Econ. Transit. 14(2), 207–244 (2006)
Article Google Scholar
Lindberg, S.I., Coppedge, M., Gerring, J., Teorell, J.: V-Dem: a new way to measure democracy. J. Democr. 25(3), 159–169 (2014)
Article Google Scholar
Little, Roderick J.A., Rubin, D.B. Statistical Analysis with Missing Data. Wiley (2019).
McDonald, D.G., Dimmick, J.: The Conceptualization and measurement of diversity. Commun. Res. 30(1), 60–79 (2003)
Article Google Scholar
Medve-Bálint, G., Boda, Z.: The poorer you are, the more you trust? The effect of inequality and income on institutional trust in East-Central Europe. Czech Sociol. Rev. 50(3), 419–454 (2014)
Article Google Scholar
Mistler, S.A., Enders, C.K.: A comparison of joint model and fully conditional specification imputation for multilevel missing data. J. Educ. Behav. Stat. 42(4), 432–466 (2017)
Article Google Scholar
Montalvo, D. Understanding trust in municipal governments. AmericasBarometer Insights 35 (2010)
Moritz, S., Bartz-Beielstein, T.: ImputeTS: time series missing value imputation in R. R. J. 9(1), 207–218 (2017)
Article Google Scholar
Moritz, S., Sardá, A., Bartz-Beielstein, T., Zaefferer, M., Stork, J.: Comparison of different methods for univariate time series imputation in R. ArXiv Preprint (2015)
Nardulli, P.F., Wong, C.J., Singh, A., Peyton, B., Bajjaliegh, J.: The Composition of Religious and Ethnic Groups (CREG) Project. University of Illinois, Urbana-Champaign, Cline Center for Democracy (2012)
Google Scholar
Raghunathan, T.E., Lepkowski, J.M., Van Hoewyk, J., Solenberger, P.: A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv. methodol. 27(1), 85–96 (2001)
Robila, M. Families in Eastern Europe: Context, Trends and Variations. In: Families in Eastern Europe. pp 1–14 Emerald Group Publishing Limited (2004).
Rontos, K., Roumeliotou, M.: Generalized social trust in Greece and its association with demographic and socio-economic predictors. Port. J. Soc. Sci. 12(1), 63–84 (2013)
Google Scholar
Rubin, D.B.: Multiple Imputation for Nonresponse in Surveys. John Wiley, New York (1987)
Book Google Scholar
Schafer, J.L., Yucel, R.M.: Computational strategies for multivariate linear mixed-effects models with missing values. J. Comput. Graph. Stat. 11(2), 437–457 (2002)
Article Google Scholar
Seeta Prabhu, K.: Social statistics for human development reports and millennium development goal reports: challenges and constraints. J. Hum. Dev. 6(3), 375–397 (2005)
Article Google Scholar
Snijders, T. and Bosker, R. Multilevel Analysis. An introduction to Basic and Advanced Multilevel Modeling (Second edition) pp. 354 Sage Publications, London (2012).
Taljaard, M., Donner, A., Klar, N.: Imputation strategies for missing continuous outcomes in cluster randomized trials. Biom. J. 50(3), 329–345 (2008)
Article Google Scholar
Tang, L., Song, J., Belin, T.R., Unützer, J.: A comparison of imputation methods in a longitudinal randomized clinical trial. Stat. Med. 24(14), 2111–2128 (2005)
Article Google Scholar
Teachman, J.D.: Analysis of population diversity: measures of qualitative variation. Sociol. Methods Res. 8(3), 341–362 (1980)
Article Google Scholar
van Buuren, S.: Multiple imputation of multilevel data. Handb. Adv. Multilevel Anal. 10, 173–196 (2011)
Google Scholar
van Buuren, S. Flexible Imputation of Missing Data. Chapman and Hall/CRC (2018).
van Buuren, S., Groothuis-Oudshoorn, K.: Mice: multivariate imputation by chained equations in R. J. Stat. Softw. 45, 1–68 (2010)
Google Scholar
van Buuren, S., Brand, J.P., Groothuis-Oudshoorn, C.G., Rubin, D.B.: Fully conditional specification in multivariate imputation. J. Stat. Comput. Simul. 76(12), 1049–1064 (2006)
Article Google Scholar
Vodopivec, M., Wörgötter, A., Raju, D.: Unemployment benefit systems in central and Eastern Europe: a review of the 1990s. Comp. Econ. Stud. 47(4), 615–651 (2005)
Article Google Scholar
You, J.: Social trust: fairness matters more than homogeneity. Polit. Psychol. 33(5), 701–721 (2012)
Article Google Scholar
Yucel, R.M.: Multiple imputation inference for multivariate multilevel continuous data with ignorable non-response. Philos. Trans. r. Soc. Math. Phys. Eng. Sci. 366(1874), 2389–2403 (2008)
Google Scholar
Zhao, J.H., Schafer, J.L. pan: Multiple Imputation for Multivariate Panel or Clustered Data. R Package Version 1.4 (2016).

Download references

Funding

This work was supported in part by funding from a Fonds de recherche du Quebec – Société et culture (FRQSC) doctoral research scholarship.

Author information

Authors and Affiliations

Office of Applied Research, Evaluation, and Data Analytics, City University of New York, New York, NY, USA
David Wutchiett
Department of Sociology, University of Montreal, Montreal, QC, Canada
David Wutchiett & Claire Durand

Authors

David Wutchiett
View author publications
You can also search for this author in PubMed Google Scholar
Claire Durand
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Wutchiett.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

See Appendix Fig.

6.

Appendix B Multiple imputation model prediction matrices in mice R package format

See Appendix Table

Table 4 Multiple imputation prediction matrix for multilevel model with country random effects (ML RE) approach

Full size table

4,

Table 5 Multiple imputation prediction matrix for multilevel models with longitudinal variable random slopes (ML RS) approach

Full size table

5 and

Table 6 Multiple imputation prediction matrix for second step multilevel model with random effects for survey respondent variables of the two-step imputation (TS + ML RE) approach

Full size table

6

Within the prediction matrices, rows with only zero values are not imputed whereas those with the other numeric values describe the model specifications in regard to the columns’ variables. Context variables are consistently defined across approaches with fixed effects. Individual survey respondent variables are defined with fixed effects with an additional country cluster-specific mean value fixed effect. As populations randomly sampled for public opinion surveys within a country would tend to continue to largely exist across years and retain similar characteristics in terms of their distributions, generating a mean value variable for individuals’ demographics used during imputation enables the modeling approach to take into account the accumulated information about a country’s population. Since values for context measures are expected to vary more considerably over time, departing from sequentially adjacent observed values, the addition of country means would tend to reduce the effect size associated with variables included to introduce model adjustments to fit context variables’ variation across longitudinal sequences. Since increasing sensitivity to time trends was a central goal, these additional country means were not included.

For approach two, the year components are set to random slopes whereas they are defined as fixed effects for approaches one and three. These prediction matrices defined as R data frame objects were used within the ‘mice’ function during the multiple imputation process.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wutchiett, D., Durand, C. Multilevel and time-series missing value imputation for combined survey and longitudinal context data. Qual Quant 56, 1799–1828 (2022). https://doi.org/10.1007/s11135-021-01186-8

Download citation

Accepted: 15 June 2021
Published: 14 October 2021
Issue Date: June 2022
DOI: https://doi.org/10.1007/s11135-021-01186-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multilevel and time-series missing value imputation for combined survey and longitudinal context data

Abstract

Access this article

Similar content being viewed by others

Sampling Techniques for Quantitative Research

Practical challenges in mediation analysis: a guide for applied researchers

Social Determinants of Mental Health: Where We Are and Where We Need to Go

Availability of data and material

Code availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A

Appendix B Multiple imputation model prediction matrices in mice R package format

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multilevel and time-series missing value imputation for combined survey and longitudinal context data

Abstract

Access this article

Similar content being viewed by others

Sampling Techniques for Quantitative Research

Practical challenges in mediation analysis: a guide for applied researchers

Social Determinants of Mental Health: Where We Are and Where We Need to Go

Availability of data and material

Code availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A

Appendix B Multiple imputation model prediction matrices in mice R package format

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation