Skip to main content
Log in

Multilevel and time-series missing value imputation for combined survey and longitudinal context data

  • Published:
Quality & Quantity Aims and scope Submit manuscript

Abstract

Comparative research examining relationships between individual-level survey response data and time-varying country context variables for political or socioeconomic characteristics is often complicated by missing values. Surveys and longitudinal context measures may be produced during alternative years and at differing frequencies. Observations may be intermittent or may only cover few consecutive years across a full longitudinal sequence. Statistical evaluations that do not impute values with consideration to data’s missingness characteristics may produce biased estimates. Model-based approaches for missing value imputation such as multiple imputation and time series imputation offer means through which imputed values may be produced given complex hierarchical and longitudinal relations. Using incomplete survey data for institutional trust measures from 554,104 respondents from twenty-seven Eastern European and Central Asian countries between 1993 and 2016, and corresponding longitudinal context descriptors of demographic, socioeconomic and political conditions, multilevel multiple imputation and time-series imputation methods were compared and evaluated. Where missingness is intermittent across the breadth of longitudinal sequence, time series imputation may produce convincing estimates for national-level variables’ values while understating uncertainty associated with imputation. When missing values are numerous and span tail ends of a sequence, multivariate multilevel multiple imputation with time variable fixed effects may produce better estimates for country-variables through incorporation of information derived from additional covariates and other countries’ concurrent trajectories. Multilevel multiple imputation models with random slopes for time variables were found to have beneficial qualities in that countries’ unique longitudinal trends are emphasized and fit while that effects of pooled observations and additional covariates contribute to estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Availability of data and material

Available upon request.

Code availability

Available upon request.

Notes

  1. For example, within the regions of Eastern Europe and West Asia, measurement and public release of several economic and demographic characteristics began in large part with the 1990s. Only since the 2000s has the report of context characteristic data largely increased in frequency and regularity within the countries of Central and Southwest Asia and Southeast Europe.

  2. These countries are Albania, Armenia, Belarus, Bosnia-Herzegovina, Bulgaria, Croatia, Czech Republic, Estonia, Georgia, Greece, Hungary, Kazakhstan, Kyrgyzstan, Latvia, Lithuania, Macedonia, Moldova, Poland, Romania, Russia, Serbia, Slovakia, Slovenia, Tajikistan, Turkey, Ukraine and Uzbekistan.

  3. The programs were the World Values Survey, the Caucasus Barometer, Consolidation of Democracy in Central and Eastern Europe, European Quality of Life, European Social Survey, European Values Study, Life in Transition, New Baltic Barometer, the New Europe Barometer, the New Russia Barometer, Values and Change in Post-communist Europe and the Asian Barometer.

  4. Approaches’ prediction matrices used within the mice imputation function are further defined in Appendix B.

  5. When conducting univariate time series imputation, Poland and Serbia’s longitudinal sequences for the poverty rate had zero observations. In such a case, the univariate time series method has no data to draw upon and no imputations could be performed. As such, yearly mean values of the complete imputed poverty rate variable across the other twenty-five countries were used to generate estimates for the two countries’ yearly poverty rate values.

  6. Countries with no observations for a variable were not considered.

  7. The missing data pattern for the combined two-level data is described in Appendix A.

  8. A preliminary complete case analysis of associations between poverty rate and all other considered longitudinal context and survey respondent covariates through use of a multilevel model with random effects found all independent variables except respondent age to be significantly associated with a country’s yearly poverty rate.

References

  • Bryk, A.S., Raudenbush, S.W.: Toward a more appropriate conceptualization of research on school effects: a three-level hierarchical linear model. Am. J. Educ. 97(1), 65–108 (1989)

    Article  Google Scholar 

  • Bryk, A.S., Raudenbush, S.W.: Hierarchical linear models: applications and data analysis methods. Sage Publications, Inc. (1992)

  • Cave, W., Giovannini, E.: The statistical measurement of services: recent achievements and remaining challenges. Metroeconomica 58(3), 479–501 (2007)

    Article  Google Scholar 

  • Center for systemic peace. Polity IV database version. Center for systemic peace, Vienna, VA (2018).

  • Durand, C., Peña Ibarra, L., Rezgui, N. Wutchiett, D. Institutional Trust in the World. https://doi.org/10.5683/SP2/TGJV6G. Scholars portal dataverse, V4. (2020)

  • Durand, C., Peña Ibarra, L., Rezgui, N., Wutchiett, D.: How to combine and analyse all available data from diverse sources: an analysis of institutional trust across regions of the World. Qual Quant (2021). https://doi.org/10.1007/s11135-020-01088-1

    Article  Google Scholar 

  • Easterly, W., Ritzen, J., Woolcock, M.: Social cohesion, institutions, and growth. Econ Polit 18(2), 103–120 (2006)

    Article  Google Scholar 

  • Elahi, A. Challenges of data collection: with special regard to developing countries. In OECD: statistics, knowledge and policy 2007: Measuring and fostering the progress of societies. (2008).

  • Enders, C.K., Mistler, S.A., Keller, B.T.: Multilevel multiple imputation: a review and evaluation of joint modeling and chained equations imputation. Psychol. Methods 21(2), 222 (2016)

    Article  Google Scholar 

  • Gelman, A., Hill, J. Data analysis using regression and hierarchical/multilevel models. Cambridge, New York (2007).

  • Gesthuizen, M., Van der Meer, T., Scheepers, P.: Ethnic diversity and social capital in Europe: tests of Putnam’s thesis in European countries. Scand. Polit. Stud. 32(2), 121–142 (2009)

    Article  Google Scholar 

  • Grund, S. Multiple imputation for three-level and cross-classified data. R bloggers. https://www.r-bloggers.com/2019/01/multiple-imputation-for-three-level-and-cross-classified-data (2019).

  • Grund, S., Lüdtke, O., Robitzsch, A.: Multiple imputation of missing data at level 2: a comparison of fully conditional and joint modeling in multilevel designs. J. Educational Behavioral Statist. 43(3), 316–353 (2018a)

    Article  Google Scholar 

  • Grund, S., Lüdtke, O., Robitzsch, A.: Multiple imputation of missing data for multilevel models: simulations and recommendations. Organ. Res. Methods 21(1), 111–149 (2018b)

    Article  Google Scholar 

  • Honaker, J., King, J.: What to do about missing values in time-series cross-section data. Am. J. Political Sci. 54(2), 561–581 (2010)

    Article  Google Scholar 

  • Hudson, J.: Institutional trust and subjective well-being across the EU. Kyklos 59(1), 43–62 (2006)

    Article  Google Scholar 

  • Joye, D., Sapin, M., Wolf, C.: Weights in comparative surveys? Call for Open. Black Box Newslett. Harmon. Soc. Sci. 5(2), 2–16 (2019)

    Google Scholar 

  • Kołczyńska, M.: Micro-and Macro-Level Determinants of Participation in Demonstrations: An Analysis of Cross-National Survey Data Harmonized Ex-Post, pp. 1–36. Methods, Data, Analyses pp (2019)

    Google Scholar 

  • Kornai, J.: The great transformation of central Eastern Europe: success and disappointment. Econ. Transit. 14(2), 207–244 (2006)

    Article  Google Scholar 

  • Lindberg, S.I., Coppedge, M., Gerring, J., Teorell, J.: V-Dem: a new way to measure democracy. J. Democr. 25(3), 159–169 (2014)

    Article  Google Scholar 

  • Little, Roderick J.A., Rubin, D.B. Statistical Analysis with Missing Data. Wiley (2019).

  • McDonald, D.G., Dimmick, J.: The Conceptualization and measurement of diversity. Commun. Res. 30(1), 60–79 (2003)

    Article  Google Scholar 

  • Medve-Bálint, G., Boda, Z.: The poorer you are, the more you trust? The effect of inequality and income on institutional trust in East-Central Europe. Czech Sociol. Rev. 50(3), 419–454 (2014)

    Article  Google Scholar 

  • Mistler, S.A., Enders, C.K.: A comparison of joint model and fully conditional specification imputation for multilevel missing data. J. Educ. Behav. Stat. 42(4), 432–466 (2017)

    Article  Google Scholar 

  • Montalvo, D. Understanding trust in municipal governments. AmericasBarometer Insights 35 (2010)

  • Moritz, S., Bartz-Beielstein, T.: ImputeTS: time series missing value imputation in R. R. J. 9(1), 207–218 (2017)

    Article  Google Scholar 

  • Moritz, S., Sardá, A., Bartz-Beielstein, T., Zaefferer, M., Stork, J.: Comparison of different methods for univariate time series imputation in R. ArXiv Preprint (2015)

  • Nardulli, P.F., Wong, C.J., Singh, A., Peyton, B., Bajjaliegh, J.: The Composition of Religious and Ethnic Groups (CREG) Project. University of Illinois, Urbana-Champaign, Cline Center for Democracy (2012)

    Google Scholar 

  • Raghunathan, T.E., Lepkowski, J.M., Van Hoewyk, J., Solenberger, P.: A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv. methodol. 27(1), 85–96 (2001)

  • Robila, M. Families in Eastern Europe: Context, Trends and Variations. In: Families in Eastern Europe. pp 1–14 Emerald Group Publishing Limited (2004).

  • Rontos, K., Roumeliotou, M.: Generalized social trust in Greece and its association with demographic and socio-economic predictors. Port. J. Soc. Sci. 12(1), 63–84 (2013)

    Google Scholar 

  • Rubin, D.B.: Multiple Imputation for Nonresponse in Surveys. John Wiley, New York (1987)

    Book  Google Scholar 

  • Schafer, J.L., Yucel, R.M.: Computational strategies for multivariate linear mixed-effects models with missing values. J. Comput. Graph. Stat. 11(2), 437–457 (2002)

    Article  Google Scholar 

  • Seeta Prabhu, K.: Social statistics for human development reports and millennium development goal reports: challenges and constraints. J. Hum. Dev. 6(3), 375–397 (2005)

    Article  Google Scholar 

  • Snijders, T. and Bosker, R. Multilevel Analysis. An introduction to Basic and Advanced Multilevel Modeling (Second edition) pp. 354 Sage Publications, London (2012).

  • Taljaard, M., Donner, A., Klar, N.: Imputation strategies for missing continuous outcomes in cluster randomized trials. Biom. J. 50(3), 329–345 (2008)

    Article  Google Scholar 

  • Tang, L., Song, J., Belin, T.R., Unützer, J.: A comparison of imputation methods in a longitudinal randomized clinical trial. Stat. Med. 24(14), 2111–2128 (2005)

    Article  Google Scholar 

  • Teachman, J.D.: Analysis of population diversity: measures of qualitative variation. Sociol. Methods Res. 8(3), 341–362 (1980)

    Article  Google Scholar 

  • van Buuren, S.: Multiple imputation of multilevel data. Handb. Adv. Multilevel Anal. 10, 173–196 (2011)

    Google Scholar 

  • van Buuren, S. Flexible Imputation of Missing Data. Chapman and Hall/CRC (2018).

  • van Buuren, S., Groothuis-Oudshoorn, K.: Mice: multivariate imputation by chained equations in R. J. Stat. Softw. 45, 1–68 (2010)

    Google Scholar 

  • van Buuren, S., Brand, J.P., Groothuis-Oudshoorn, C.G., Rubin, D.B.: Fully conditional specification in multivariate imputation. J. Stat. Comput. Simul. 76(12), 1049–1064 (2006)

    Article  Google Scholar 

  • Vodopivec, M., Wörgötter, A., Raju, D.: Unemployment benefit systems in central and Eastern Europe: a review of the 1990s. Comp. Econ. Stud. 47(4), 615–651 (2005)

    Article  Google Scholar 

  • You, J.: Social trust: fairness matters more than homogeneity. Polit. Psychol. 33(5), 701–721 (2012)

    Article  Google Scholar 

  • Yucel, R.M.: Multiple imputation inference for multivariate multilevel continuous data with ignorable non-response. Philos. Trans. r. Soc. Math. Phys. Eng. Sci. 366(1874), 2389–2403 (2008)

    Google Scholar 

  • Zhao, J.H., Schafer, J.L. pan: Multiple Imputation for Multivariate Panel or Clustered Data. R Package Version 1.4 (2016).

Download references

Funding

This work was supported in part by funding from a Fonds de recherche du Quebec – Société et culture (FRQSC) doctoral research scholarship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Wutchiett.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

See Appendix Fig. 

Fig. 6
figure 6

Missingness pattern within combined country-level and individual-level data Variables for “Country”, “Year”, “Year^2”, “Partial Democracy”, “Autocracy” and “Regime in Transition” were not missing for any observation and were not included in missingness pattern plots. Variables “Trust Parl” for trust in parliament/congress, “Female”, “Age” and “Education” are survey respondent variables. The other variables described socioeconomic and political context characteristics. Missingness is measured on a scale of 0 to 1

6.

Appendix B Multiple imputation model prediction matrices in mice R package format

See Appendix Table

Table 4 Multiple imputation prediction matrix for multilevel model with country random effects (ML RE) approach

4,

Table 5 Multiple imputation prediction matrix for multilevel models with longitudinal variable random slopes (ML RS) approach

5 and

Table 6 Multiple imputation prediction matrix for second step multilevel model with random effects for survey respondent variables of the two-step imputation (TS + ML RE) approach

6

Within the prediction matrices, rows with only zero values are not imputed whereas those with the other numeric values describe the model specifications in regard to the columns’ variables. Context variables are consistently defined across approaches with fixed effects. Individual survey respondent variables are defined with fixed effects with an additional country cluster-specific mean value fixed effect. As populations randomly sampled for public opinion surveys within a country would tend to continue to largely exist across years and retain similar characteristics in terms of their distributions, generating a mean value variable for individuals’ demographics used during imputation enables the modeling approach to take into account the accumulated information about a country’s population. Since values for context measures are expected to vary more considerably over time, departing from sequentially adjacent observed values, the addition of country means would tend to reduce the effect size associated with variables included to introduce model adjustments to fit context variables’ variation across longitudinal sequences. Since increasing sensitivity to time trends was a central goal, these additional country means were not included.

For approach two, the year components are set to random slopes whereas they are defined as fixed effects for approaches one and three. These prediction matrices defined as R data frame objects were used within the ‘mice’ function during the multiple imputation process.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wutchiett, D., Durand, C. Multilevel and time-series missing value imputation for combined survey and longitudinal context data. Qual Quant 56, 1799–1828 (2022). https://doi.org/10.1007/s11135-021-01186-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11135-021-01186-8

Keywords

Navigation