Abstract
Comparative research examining relationships between individual-level survey response data and time-varying country context variables for political or socioeconomic characteristics is often complicated by missing values. Surveys and longitudinal context measures may be produced during alternative years and at differing frequencies. Observations may be intermittent or may only cover few consecutive years across a full longitudinal sequence. Statistical evaluations that do not impute values with consideration to data’s missingness characteristics may produce biased estimates. Model-based approaches for missing value imputation such as multiple imputation and time series imputation offer means through which imputed values may be produced given complex hierarchical and longitudinal relations. Using incomplete survey data for institutional trust measures from 554,104 respondents from twenty-seven Eastern European and Central Asian countries between 1993 and 2016, and corresponding longitudinal context descriptors of demographic, socioeconomic and political conditions, multilevel multiple imputation and time-series imputation methods were compared and evaluated. Where missingness is intermittent across the breadth of longitudinal sequence, time series imputation may produce convincing estimates for national-level variables’ values while understating uncertainty associated with imputation. When missing values are numerous and span tail ends of a sequence, multivariate multilevel multiple imputation with time variable fixed effects may produce better estimates for country-variables through incorporation of information derived from additional covariates and other countries’ concurrent trajectories. Multilevel multiple imputation models with random slopes for time variables were found to have beneficial qualities in that countries’ unique longitudinal trends are emphasized and fit while that effects of pooled observations and additional covariates contribute to estimation.
Similar content being viewed by others
Availability of data and material
Available upon request.
Code availability
Available upon request.
Notes
For example, within the regions of Eastern Europe and West Asia, measurement and public release of several economic and demographic characteristics began in large part with the 1990s. Only since the 2000s has the report of context characteristic data largely increased in frequency and regularity within the countries of Central and Southwest Asia and Southeast Europe.
These countries are Albania, Armenia, Belarus, Bosnia-Herzegovina, Bulgaria, Croatia, Czech Republic, Estonia, Georgia, Greece, Hungary, Kazakhstan, Kyrgyzstan, Latvia, Lithuania, Macedonia, Moldova, Poland, Romania, Russia, Serbia, Slovakia, Slovenia, Tajikistan, Turkey, Ukraine and Uzbekistan.
The programs were the World Values Survey, the Caucasus Barometer, Consolidation of Democracy in Central and Eastern Europe, European Quality of Life, European Social Survey, European Values Study, Life in Transition, New Baltic Barometer, the New Europe Barometer, the New Russia Barometer, Values and Change in Post-communist Europe and the Asian Barometer.
Approaches’ prediction matrices used within the mice imputation function are further defined in Appendix B.
When conducting univariate time series imputation, Poland and Serbia’s longitudinal sequences for the poverty rate had zero observations. In such a case, the univariate time series method has no data to draw upon and no imputations could be performed. As such, yearly mean values of the complete imputed poverty rate variable across the other twenty-five countries were used to generate estimates for the two countries’ yearly poverty rate values.
Countries with no observations for a variable were not considered.
The missing data pattern for the combined two-level data is described in Appendix A.
A preliminary complete case analysis of associations between poverty rate and all other considered longitudinal context and survey respondent covariates through use of a multilevel model with random effects found all independent variables except respondent age to be significantly associated with a country’s yearly poverty rate.
References
Bryk, A.S., Raudenbush, S.W.: Toward a more appropriate conceptualization of research on school effects: a three-level hierarchical linear model. Am. J. Educ. 97(1), 65–108 (1989)
Bryk, A.S., Raudenbush, S.W.: Hierarchical linear models: applications and data analysis methods. Sage Publications, Inc. (1992)
Cave, W., Giovannini, E.: The statistical measurement of services: recent achievements and remaining challenges. Metroeconomica 58(3), 479–501 (2007)
Center for systemic peace. Polity IV database version. Center for systemic peace, Vienna, VA (2018).
Durand, C., Peña Ibarra, L., Rezgui, N. Wutchiett, D. Institutional Trust in the World. https://doi.org/10.5683/SP2/TGJV6G. Scholars portal dataverse, V4. (2020)
Durand, C., Peña Ibarra, L., Rezgui, N., Wutchiett, D.: How to combine and analyse all available data from diverse sources: an analysis of institutional trust across regions of the World. Qual Quant (2021). https://doi.org/10.1007/s11135-020-01088-1
Easterly, W., Ritzen, J., Woolcock, M.: Social cohesion, institutions, and growth. Econ Polit 18(2), 103–120 (2006)
Elahi, A. Challenges of data collection: with special regard to developing countries. In OECD: statistics, knowledge and policy 2007: Measuring and fostering the progress of societies. (2008).
Enders, C.K., Mistler, S.A., Keller, B.T.: Multilevel multiple imputation: a review and evaluation of joint modeling and chained equations imputation. Psychol. Methods 21(2), 222 (2016)
Gelman, A., Hill, J. Data analysis using regression and hierarchical/multilevel models. Cambridge, New York (2007).
Gesthuizen, M., Van der Meer, T., Scheepers, P.: Ethnic diversity and social capital in Europe: tests of Putnam’s thesis in European countries. Scand. Polit. Stud. 32(2), 121–142 (2009)
Grund, S. Multiple imputation for three-level and cross-classified data. R bloggers. https://www.r-bloggers.com/2019/01/multiple-imputation-for-three-level-and-cross-classified-data (2019).
Grund, S., Lüdtke, O., Robitzsch, A.: Multiple imputation of missing data at level 2: a comparison of fully conditional and joint modeling in multilevel designs. J. Educational Behavioral Statist. 43(3), 316–353 (2018a)
Grund, S., Lüdtke, O., Robitzsch, A.: Multiple imputation of missing data for multilevel models: simulations and recommendations. Organ. Res. Methods 21(1), 111–149 (2018b)
Honaker, J., King, J.: What to do about missing values in time-series cross-section data. Am. J. Political Sci. 54(2), 561–581 (2010)
Hudson, J.: Institutional trust and subjective well-being across the EU. Kyklos 59(1), 43–62 (2006)
Joye, D., Sapin, M., Wolf, C.: Weights in comparative surveys? Call for Open. Black Box Newslett. Harmon. Soc. Sci. 5(2), 2–16 (2019)
Kołczyńska, M.: Micro-and Macro-Level Determinants of Participation in Demonstrations: An Analysis of Cross-National Survey Data Harmonized Ex-Post, pp. 1–36. Methods, Data, Analyses pp (2019)
Kornai, J.: The great transformation of central Eastern Europe: success and disappointment. Econ. Transit. 14(2), 207–244 (2006)
Lindberg, S.I., Coppedge, M., Gerring, J., Teorell, J.: V-Dem: a new way to measure democracy. J. Democr. 25(3), 159–169 (2014)
Little, Roderick J.A., Rubin, D.B. Statistical Analysis with Missing Data. Wiley (2019).
McDonald, D.G., Dimmick, J.: The Conceptualization and measurement of diversity. Commun. Res. 30(1), 60–79 (2003)
Medve-Bálint, G., Boda, Z.: The poorer you are, the more you trust? The effect of inequality and income on institutional trust in East-Central Europe. Czech Sociol. Rev. 50(3), 419–454 (2014)
Mistler, S.A., Enders, C.K.: A comparison of joint model and fully conditional specification imputation for multilevel missing data. J. Educ. Behav. Stat. 42(4), 432–466 (2017)
Montalvo, D. Understanding trust in municipal governments. AmericasBarometer Insights 35 (2010)
Moritz, S., Bartz-Beielstein, T.: ImputeTS: time series missing value imputation in R. R. J. 9(1), 207–218 (2017)
Moritz, S., Sardá, A., Bartz-Beielstein, T., Zaefferer, M., Stork, J.: Comparison of different methods for univariate time series imputation in R. ArXiv Preprint (2015)
Nardulli, P.F., Wong, C.J., Singh, A., Peyton, B., Bajjaliegh, J.: The Composition of Religious and Ethnic Groups (CREG) Project. University of Illinois, Urbana-Champaign, Cline Center for Democracy (2012)
Raghunathan, T.E., Lepkowski, J.M., Van Hoewyk, J., Solenberger, P.: A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv. methodol. 27(1), 85–96 (2001)
Robila, M. Families in Eastern Europe: Context, Trends and Variations. In: Families in Eastern Europe. pp 1–14 Emerald Group Publishing Limited (2004).
Rontos, K., Roumeliotou, M.: Generalized social trust in Greece and its association with demographic and socio-economic predictors. Port. J. Soc. Sci. 12(1), 63–84 (2013)
Rubin, D.B.: Multiple Imputation for Nonresponse in Surveys. John Wiley, New York (1987)
Schafer, J.L., Yucel, R.M.: Computational strategies for multivariate linear mixed-effects models with missing values. J. Comput. Graph. Stat. 11(2), 437–457 (2002)
Seeta Prabhu, K.: Social statistics for human development reports and millennium development goal reports: challenges and constraints. J. Hum. Dev. 6(3), 375–397 (2005)
Snijders, T. and Bosker, R. Multilevel Analysis. An introduction to Basic and Advanced Multilevel Modeling (Second edition) pp. 354 Sage Publications, London (2012).
Taljaard, M., Donner, A., Klar, N.: Imputation strategies for missing continuous outcomes in cluster randomized trials. Biom. J. 50(3), 329–345 (2008)
Tang, L., Song, J., Belin, T.R., Unützer, J.: A comparison of imputation methods in a longitudinal randomized clinical trial. Stat. Med. 24(14), 2111–2128 (2005)
Teachman, J.D.: Analysis of population diversity: measures of qualitative variation. Sociol. Methods Res. 8(3), 341–362 (1980)
van Buuren, S.: Multiple imputation of multilevel data. Handb. Adv. Multilevel Anal. 10, 173–196 (2011)
van Buuren, S. Flexible Imputation of Missing Data. Chapman and Hall/CRC (2018).
van Buuren, S., Groothuis-Oudshoorn, K.: Mice: multivariate imputation by chained equations in R. J. Stat. Softw. 45, 1–68 (2010)
van Buuren, S., Brand, J.P., Groothuis-Oudshoorn, C.G., Rubin, D.B.: Fully conditional specification in multivariate imputation. J. Stat. Comput. Simul. 76(12), 1049–1064 (2006)
Vodopivec, M., Wörgötter, A., Raju, D.: Unemployment benefit systems in central and Eastern Europe: a review of the 1990s. Comp. Econ. Stud. 47(4), 615–651 (2005)
You, J.: Social trust: fairness matters more than homogeneity. Polit. Psychol. 33(5), 701–721 (2012)
Yucel, R.M.: Multiple imputation inference for multivariate multilevel continuous data with ignorable non-response. Philos. Trans. r. Soc. Math. Phys. Eng. Sci. 366(1874), 2389–2403 (2008)
Zhao, J.H., Schafer, J.L. pan: Multiple Imputation for Multivariate Panel or Clustered Data. R Package Version 1.4 (2016).
Funding
This work was supported in part by funding from a Fonds de recherche du Quebec – Société et culture (FRQSC) doctoral research scholarship.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A
See Appendix Fig.
6.
Appendix B Multiple imputation model prediction matrices in mice R package format
See Appendix Table
4,
5 and
Within the prediction matrices, rows with only zero values are not imputed whereas those with the other numeric values describe the model specifications in regard to the columns’ variables. Context variables are consistently defined across approaches with fixed effects. Individual survey respondent variables are defined with fixed effects with an additional country cluster-specific mean value fixed effect. As populations randomly sampled for public opinion surveys within a country would tend to continue to largely exist across years and retain similar characteristics in terms of their distributions, generating a mean value variable for individuals’ demographics used during imputation enables the modeling approach to take into account the accumulated information about a country’s population. Since values for context measures are expected to vary more considerably over time, departing from sequentially adjacent observed values, the addition of country means would tend to reduce the effect size associated with variables included to introduce model adjustments to fit context variables’ variation across longitudinal sequences. Since increasing sensitivity to time trends was a central goal, these additional country means were not included.
For approach two, the year components are set to random slopes whereas they are defined as fixed effects for approaches one and three. These prediction matrices defined as R data frame objects were used within the ‘mice’ function during the multiple imputation process.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wutchiett, D., Durand, C. Multilevel and time-series missing value imputation for combined survey and longitudinal context data. Qual Quant 56, 1799–1828 (2022). https://doi.org/10.1007/s11135-021-01186-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11135-021-01186-8