Abstract
Hierarchically nested data structures are often analyzed by means of multilevel techniques. A common situation in crossnational comparative research is data on two levels, with information on individuals at level 1 and on countries at level 2. However, when dealing with few level2 units (e.g. countries), results from multilevel models may be unreliable due to estimation bias (e.g. underestimated standard errors, unreliable countrylevel variance estimates). This chapter provides a discussion on multilevel modeling inaccuracies when using a small level2 sample size, as well as a list of available alternative analytic tools for analyzing such data. However, as in practice many of these alternatives remain unfeasible in testing hypotheses central to crossnational comparative research, the aim of this chapter is to propose and illustrate a new technique – the 2step metaanalytic approach – reliable in the analysis of nested data with few level2 units. In addition, this method is highly infographic and accessible to the average social scientist (not skilled in advanced simulation techniques).
You have full access to this open access chapter, Download chapter PDF
Similar content being viewed by others
Keywords
6.1 Introduction
Crossnational comparative research often is based on the analysis of hierarchically nested data structures containing information on multiple levels. A common situation integrates data at two levels, with microlevel (level1) information about individuals and macrolevel (level2) information about countries. In the social sciences, the most popular way of analyzing such hierarchical crossnational data is by means of multilevel techniques (Hox et al. 2010; RabeHesketh and Skrondal 2008; Snijders and Bosker 1999).
Multilevel analysis is very effective when dealing with data at multiple levels because it allows the estimation of effects occurring at all these levels (e.g. individual effects, country effects) simultaneously, as well as the estimation of interactions between variables at different levels (crosslevel effects). A multilevel model may have the following structure:
where the outcome y_{ic}(for person i in country c) depends on observed individual characteristics (X_{ic}), observed countrylevel characteristics (Z_{c}), crosslevel interactions between observed characteristics at the individual and country level (X_{ic}*Z_{c}), unobserved individual effects (e_{ic}) and unobserved country effects (u_{c}), under the assumption that unobserved effects are normally distributed and uncorrelated with observed effects.
Nowadays, most software packages offer a broad suite of multilevel models that are easy to use by social scientists. However, one considerable problem in estimating multilevel models concerns a low number of level2 observations in a sample. For instance, many multicountry datasets contain large numbers of individuals per country (often hundreds or thousands), but include only a small number of countries (often less than 30 or even less than 20). With few level2 units, the use of multilevel models may result in unreliable inferences because of biased estimates (coefficients and variance components) and inaccurate (often underestimated) standard errors (Arend and Schafer 2019; Austin 2010; Bell et al. 2010; Bryan and Jenkins 2016; Hox 1998; Maas and Hox 2004; McNeish and Stapleton 2016; Van der Leeden et al. 2008). When an increase in level2 units is not feasible, one could consider alternative analytical tools such as the implementation of completely different analyses techniques or the use of multilevel simulationbased models able to surmount estimation bias and provide accurate statistical tests (Bryan and Jenkins 2016; Goldstein 2011; Hamaker and Klugkist 2010; Maas and Hox 2004; McNeish and Stapleton 2016). Yet, many of these techniques may not be feasible for various substantive or practical reasons.
This paper discusses aspects related to the analysis of nested data structures with a small level2 sample size in the context of crossnational research. Specifically, we will discuss several alternative analytical tools one can apply to overcome problems of standard multilevel modeling, as well as their limitations. However, our main goal is to propose and illustrate a viable alternative technique – what we term the 2step metaanalytic approach – suited for the analysis of multicountry datasets with a small number of countries (but the method can be easily applied to any type of analysis of nested data with few level2 sample size). This method provides accurate estimators and standard errors (SEs) and allows for reliable inference when one is interested in modeling both individual and country effects. Next to providing accurate estimations, the method we propose is highly infographic ensuring a fast and clear information communication, and is accessible to the average social scientist (not skilled in using more advanced simulation techniques).
6.2 Unreliability of Estimates in Multilevel Models with a Small Level2 Sample Size
The reliability of multilevel estimates may be questioned when the number of level2 units (e.g. countries) is low (Arend and Schafer 2019; Bell et al. 2010; Bryan and Jenkins 2016; Hox et al. 2010; Hox 1998; McNeish and Stapleton 2016). This warning has been evoked regularly in multilevel textbooks (RabeHesketh and Skrondal 2008; Snijders and Bosker 1999), but in practice it has been often disregarded for several reasons. For one thing, many of these warnings were quite abstract and not accompanied by clear explanations and guidelines about which number of level2 units is considered too low and which problems a researcher may encounter if model assumptions are violated. General rules of thumb regarding the minimum number of level2 units required for accurate estimation in multilevel analyses varies considerably between authors, and range from 10 to 100 level2 units (Hox 1998; Kreft and de Leeuw 1998; RabeHesketh and Skrondal 2008; Snijders and Bosker 1999), with 30 units as the most common recommendation (Hox 1998; Maas and Hox 2004).
In essence, the standard multilevel models rely on maximum likelihood estimation methods which are based on the assumption that errors are normally distributed and variances across groups are heterogeneous (Seco et al. 2013). When the level2 distributional assumption is violated (which may be the case when dealing with few units), multilevel estimates and their standard errors (especially for the variance components) may not be accurate. Several MonteCarlo simulation studies have shown that the minimum sample size for obtaining unbiased estimates in multilevel analysis depends on the type of dependent variable (e.g. continuous, categorical), the type and number of predictor variables, the use of (un)balanced group sizes, the specific model parameters of interest (fixed, random or variance components), the potential interest in crosslevel interactions, the specification of the random and fixed parts, or the choice of estimation method (Austin 2010; Bell et al. 2010; Bryan and Jenkins 2016; Maas and Hox 2004, 2005; McNeish and Stapleton 2016; SchmidtCatran and Fairbrother 2015; Stegmueller 2013; Van der Leeden et al. 2008). For example, to obtain unbiased point estimates of coefficients of model predictors, Maas and Hox (2004) recommend a minimum of 10 level2 units, for good variance estimates at least 30 units, and for accurate SEs a minimum of 50 units. In practice, it remains hard to draw general conclusions from the existing studies that are directly applicable to many complex research designs in multicountry studies.
The article of Bryan and Jenkins (2016) came as a real wakeup call for the multilevel community conducting crossnational research. Their MonteCarlo simulations showed the conditions under which multilevel estimates and their standard errors (SEs) may be unreliable or biased, and provided guidelines for what should be considered a minimum number of level2 units when conducting multilevel analysis in multicountry studies. Table 6.1 presents a summary of their findings for both linear and logit models. In short, when analyzing continuous outcomes, individuallevel estimates (fixed effects or variance components) are reliable regardless of the number of level2 units. However, a minimum of 25 level2 units should be available when analyzing countrylevel effects. Fitting multilevel logit models with a low number of countries brings up even more problems than the linear models, and biased estimates can be found also for fixed effects. The general recommendation is to have at least 30 level2 units when fitting logit models.
6.3 Common Solutions for Modeling Nested Data with Few Level2 Units
When concluding that a level2 sample is too small to apply standard multilevel models, the next step is to identify viable alternative methods to answer the same multilevellike research questions. Several authors have discussed alternative modeling approaches which include common frequentist techniques (e.g. regression models), correction estimators (e.g. Huber/White sandwich estimators or nonlinear transformations of the dependent variable) and more versatile resampling procedures for statistical inference such as Bootstrapping and Bayesian approaches (Bryan and Jenkins 2016; Cheah 2009; Goldstein 2011; Hamaker and Klugkist 2010; McNeish and Stapleton 2016; Seco et al. 2013). However, drawbacks of many of these approaches are that they may not be suited for testing more complex crosslevel hypotheses, that they are not easily available in commonly used software packages, or that they require advanced statistical skills and/or computational power which most applied researchers do not possess. As a result, empirical research has continued to use multilevel models, even when level2 sample sizes were questionable. Below, we briefly discuss various suggested methods to analyze multicountry datasets with few level2 units. We will focus on frequentist methods and sampling techniques and do not discuss correction estimators as they have been proven to perform unsatisfactory with small sample sizes (Diggle et al. 2002). For the sake of parsimony, in this discussion we restrict ourselves to solutions for models with continuous outcome variables. However, most solutions would also apply to models with other types of dependent outcomes, such as binary ones.
Regression with (CountrySpecific) Clustered Standard Errors on Pooled Data
If we would analyze nested data with the most commonly used regression method – OLS – we may end up making inaccurate statistical inferences. Individuallevel model errors within the same country may be correlated and if we fail to control for the withincountry error correlation we may obtain downwardly biased SEs, shrunken confidence intervals, large tstatistics and small pvalues (Cameron and Miller 2015; Hox 1998). In addition, if OLS regression models would use a countrylevel predictor (continuous or dichotomous), the country SEs may be biased as well (Cameron and Miller 2015). Given this situation, regression with clustered SEs may be used instead as it accounts for the dependence between individual observations. This method is now widely used and incorporated in most common statistical software packages (e.g. STATA). However, this method only controls for withincountry correlation, but it does not specifically model it (Bryan and Jenkins 2016). Moreover, estimation of SEs may be inaccurate with less than 20 level2 units for balanced designs and less than 50 level2 units for unbalanced designs (Cameron and Miller 2015). Additionally, because we are often specifically interested in crossnational variation of effects, a multitude of interaction terms between variables of substantive interest and country dummies are required to test specific hypotheses. The overload of interactions and high incidence of multicollinearity in the resulting variables makes many analyses of interest unfeasible when using this method.
Regression with CountrySpecific FixedEffects on Pooled Data
Whereas the previous method controls for intracountry correlations, the effects estimated are not countryspecific, but assumed to be equal across countries (i.e. level2 units). This is problematic as many analyses aim to specifically model country effects. An alternative is to use the pooled data and fit distinct country intercepts (as fixed parameters). With this technique, the unobserved factors of each country are not separately modeled but are integrated in the intercepts of each country (Bryan and Jenkins 2016). However, given that we model fixed parameters for each country, countrylevel factors cannot be included as additional predictors. Similar to the previous method, crossnational variation in certain effects can be analyzed only through interactions between country indicators and individual factors and again we are confronted with the issue of estimating a large number of parameters and an overload of interactions difficult to interpret. Moreover, (Cameron and Miller 2015) warn that by introducing countryspecific fixed effects, our estimations lose precision and estimation bias may still occur when the number of countries is small.
TwoStep Approach
Bryan and Jenkins (2016) proposed a more exploratory approach in which regressions are fitted in two steps. The first step is performed at the individual level using countryspecific fixed effects. Thus, regular regression models are fitted separately for each country. The second step is conducted at the country level, and country effects are analyzed by regressing the country intercepts on the countrylevel predictors. Although this technique is advantageous as it reveals the sources of variation, the small number of countries continues to be a problem in implementing a regular regression model in the second step (Combs 2010; Green 1991; Nunnally 1978).
Multilevel Bootstrapping
The three methods described above represent variations on the classical regression models as alternatives for multilevel modeling. However, to obtain unbiased estimates and correct SEs in complex research designs in which distributional assumptions are not met, many authors recommend the use of resampling techniques and one such technique is multilevel bootstrapping (Goldstein 2011; Goldstein et al. 2002; Seco et al. 2013). Three different bootstrap strategies have been used to correct for estimates bias and inaccurate SEs (Carpenter et al. 2003; Seco et al. 2013; Thai et al. 2013; Van der Leeden et al. 2008):

(a)
parametric residual bootstrapping – new data is generated by keeping the predictors fixed and resampling with replacement of the residuals at the two levels from a normal distribution;

(b)
nonparametric residual bootstrapping – new data is generated by keeping the predictors fixed and resampling with replacement residuals at both levels from the observed basic residuals;

(c)
case bootstrap – new data is generated from the original sample before any modeling is performed (for an overview of different options for cases bootstrap see Roberts and Fan 2004; Van der Leeden et al. 2008).
Among these three bootstrapping procedures, residual bootstrapping has been established as providing the most accurate estimations (Carpenter et al. 2003). Still, Seco and colleagues (2013) showed that residual bootstrapping does not perform very well for small group sizes. In other words, bootstrapping remains incapable to solve the problems of regular multilevel modeling with few level2 units. In addition, bootstrapping is also procedurally quite difficult for most social researchers as it is not typically integrated as an automated option in the commonly used software packages and often requires advanced programing skills.
Bayesian Multilevel Models
Bayesian estimation for multilevel data is considered to be one of the best analytical approaches when dealing with small samples (Hamaker and Klugkist 2010). In essence, the Bayesian approach builds on the regular multilevel approach in specifying the models at each level, but it deviates by introducing an additional step in which prior distributions are defined for the model parameters (Hamaker and Klugkist 2010; Stegmueller 2013). In other words, the Bayesian estimation approach focuses on obtaining a posterior distribution for model parameters starting from a prior distribution and the observed data. Compared to classical frequentist methods, the Bayesian approach has the advantage that it is not based on the normality assumption or asymptotic results, which is important when dealing with small sample sizes (Hamaker and Klugkist 2010). However, with this approach, the specification of priors is crucial for obtaining unbiased estimates, especially with a small number of level2 units (Austin 2010), and arriving at proper specifications of these priors remains challenging for any user.
In conclusion, the first two methods are good alternatives to multilevel modeling if modeling level2 information is not explicitly the focus of the research. If the latter is the case, resampling multilevel techniques (bootstrapping and Bayesian) are recommended. Still, these methods are not widely implemented in research software packages, their use requires advanced statistical and programming skills, specialist software and computational performance (to reduce long computational time in exploratory analysis) – elements which are often not available to most social researchers.
6.4 An Alternative Stepwise Approach for Testing Individual, Country and CrossLevel Effects
A general issue in crossnational research is that it has been centered primarily on individual or countrylevel effects, whereas crosslevel effects have received rather little attention. This is unfortunate, as these types of effects are often very interesting from a substantive point of view. In many comparative projects, the main interest is in examining whether individuallevel effects vary across countries and whether we can explain this type of variation with crosslevel effects in which individuallevel variables are interacted with countrylevel variables of interest. Multilevel models may answer such questions very well. However, as our overview in the previous section has made clear, they cannot be implemented when the number of countries is low (often below 30). In addition, we listed several reasons which make the alternative methods recommended by literature unfeasible for research. In this section, we present an alternative (the 2step metaanalytical approach), a stepwise approach which includes the use of metaanalysis and metaregression to analyze variations across different effects as well as moderating countrylevel factors. Such a stepwise approach can replicate effects estimated in multilevel analysis, is reliable with few level2 units and is easy and straightforward to apply without requiring very advanced analytical and programing skills. This method is described and illustrated below.
6.4.1 The 2Step MetaAnalytical Approach
Metaanalysis and metaregression are often applied in medical research to summarize or combine results on specific relationships that have been tested in multiple separate studies (Borenstein et al. 2009). In these instances, studies constitute the second level of analysis. In such approaches the aims are (1) to generate an overall estimate for the strength of the relationship under consideration, (2) to assess whether significant crossstudy variation in the overall effect estimate exists, and (3) to determine which studylevel factors could explain the variation (if crossstudy heterogeneity is encountered). Crossstudies metaanalytical research in the medical field in the majority of cases includes few studies (often 10 or fewer) and much attention has been paid to develop methods providing reliable and unbiased estimates and correct confidence intervals for estimations (Friede et al. 2017; Rover et al. 2015; Wiksten et al. 2016). However, results of the metaanalytical approach should be interpreted with caution with very few studies, which in the medical research is considered to be less than 5, or even 3 studies (for specific information see Seide et al. 2019; Rover et al. 2015).
If one would replace studies as the level2 units by countries, it is relatively straightforward to see how this procedure could be used in analyzing crossnational differences in the strength of particular individuallevel relationships. It basically entails two steps.
Step 1. Separate Regression Models for Each Country
In the first step, separate regression models are fitted for each country. Suppose one has information on 15 countries, this would lead to 15 countryspecific estimates of the relationship of interest. Compared to the common use in the medical field, the advantage in this particular case is that the study design and methodology are very similar across countries, thus reducing the extent to which variation (or heterogeneity as it is usually called in the metaanalytical literature) in the estimates could be due to differences in initial approach (Friede et al. 2017).
Step 2. MetaAnalysis and MetaRegression
In the second step, a metaanalysis is performed on the set of countryspecific estimates of the relationship of interest. Two different types of metaanalyses have been developed: fixedeffects and randomeffects. Fixedeffects metaanalysis assumes a common effect of a risk factor for a certain outcome and provides an average estimate (Borenstein et al. 2009; Friede et al. 2017; Palmer and Sterne 2016). Randomeffects metaanalysis assumes that the ‘true’ effect of interest may vary across level2 units (Harbord and Higgins 2008; Palmer and Sterne 2016), and this seems a much more reasonable assumption in most studies on countryeffects. Randomeffects metaanalysis separates real differences in the effect of the predictor on the outcome from sampling variability/chance. In the metaanalysis community, much attention has been paid to developing and testing methods that estimate confidence intervals that are reliable and unbiased, even with very small numbers of level2 units (Rover et al. 2015; Seide et al. 2019; Wiksten et al. 2016). Simulation studies showed that certain estimation methods such as KnappHartung – although more conservative – may be implemented with few level2 units (Friede et al. 2017). The randomeffects metaanalysis approach also offers a test of whether the estimate of interest shows significant variation across countries. If the level of variation is low (and not statistically significant), the conclusion is that the relationship of interest is countryinvariant. If the level of variation is substantial, one could proceed and use metaregression to try to explain this variation. In addition to providing reliable estimates of the overall strength of an effect of interest and its crosscountry variability, this method provides powerful opportunities for visualization of the variation in the strength of effects across countries (information that is much more difficult to attain if using multilevel analysis or other methods).
As mentioned above, if the results of the metaanalysis suggest variability in countryeffects, metaregression can be used to identify factors that may explain this heterogeneity. Metaregression (Harbord and Higgins 2008; Thompson and Higgins 2002; Thompson and Sharp 1999) can be used to analyze the moderating role of a factor by regressing the countryeffects on countrylevel predictors. The advantage of using metaregression instead of OLS regression is twofold (Palmer and Sterne 2016). First, when using multicountry data, we need to ensure that the data are properly weighted. By assigning weights to studies, we ensure that large studies are less likely to dominate the analysis and small studies are not seen as unimportant. Second, in situations including few units of analysis/countries, metaregression applications offer solutions to accurately establish the statistical significance of an effect such as the KnappHartung modification (Knapp and Hartung 2003) or the permutationbased resampling (Harbord and Higgins 2008; Gagnier et al. 2012). These characteristics make the method eminently suited for studying which countrylevel variables could explain crossnational differences in relationships of interest.
In the next section, we will illustrate this method with an empirical example and compare the results with those from a ‘classic’ multilevel analysis.
6.4.2 Example: The Relationship Between Parental Education and Teenage Parenthood Across 15 European Countries
To illustrate the 2step metaanalytical approach, we examine the relationship between parental education and teenage parenthood across 15 European countries. It is wellknown that children from a lower socialclass background run a higher risk of teenage pregnancy and thus of teenage parenthood than children from a higher socialclass background (Pirog et al. 2018). What is less known, is whether this risk varies across countries. We expect that it does, and more specifically, that the risk is weaker in countries that offer better opportunities for individual agency and development. In such countries, institutional, cultural and economic factors are thought to buffer the potentially negative consequences of family disadvantage.
As continuous dependent variables are most common in social science applications, we will first use OLS regression to derive parameter estimates of the countryspecific effects of socialclass background on the risk of teenage pregnancy. In this way, we will illustrate both the ‘traditional’ multilevel approach and the 2step metaanalytic approach. However, this method can also be applied if logistic regression is used for the withincountry regressions (although the specifics of the method are a bit more complicated). In a second example we will briefly illustrate how our method can be used in the latter case.
Data
We use data on 15 countries from the Gender and Generations Project (see Fokkema et al., 2016 for details). These data were collected between 2004 and 2009. To make results as comparable as possible across countries, we select men and women born between 1966 and 1975, leaving us with between 1000 and 2000 respondents per country. The following countries are included: Austria, Australia, Bulgaria, Belgium, Czech Republic, France, Georgia, Germany, Lithuania, the Netherlands, Norway, Poland, Romania, Russia, and Sweden. Our final sample consists of 29,022 individuals.
Variables
The key dependent variable of interest (Teenage parenthood) is whether the respondent had a first birth before the age of 20 (0 = no, 1 = yes). The key individuallevel independent variable is the level of education of the parents. Information on the educational attainment of both parents was available, scored according to ISCED. To facilitate comparison across countries, these were converted into the newly developed continuous ISLEDscaling (Schröder & Ganzeboom, 2014; Brons and Mooyaart 2018). The mean of the ISLED scores of both parents was used as the indicator of Parental education. If information on only one parent was available, the ISLED score of that parent was used. ISLED scores vary between 0 and 100. To facilitate interpretation, we divided scores by ten. A number of additional individuallevel variables were included in the analyses (Gender, Age, Number of siblings, Without BIOparents < 15 – whether respondents grew up most of their youth before age 15 with both parents or not, and Unknown parental status – unknown whether they grew up with both biological parents).
The countrylevel variable of interest is the Human Development Index (HDI), developed by the UN. This is a composite measure based on life expectancy (indicating people’s ability to live a long and healthy life), educational attainment (indicating people’s ability to acquire knowledge) and living standards (indicating people’s ability to acquire a decent standard of living). We use the HDI score of the 15 countries in the year 2000 as this is the earliest date for which HDI scores are available for all countries included (ideally, we would have wanted scores for the period 1990–1995 as this comes closer to the period in which our respondents made fertility decisions). Fig. 6.1 shows the HDI scores of the countries in our sample.
6.4.2.1 Example for Continuous Outcomes
The ‘Classic’ Multilevel Approach
In the first example we analyze data by estimating linear probability models, effectively treating our binary outcome variable as a continuous one. We do so to facilitate the comparison of model estimates across models and across countries. The more complicated logit model estimations and comparisons (see also Mood 2010) will be presented in Sect. 3.2.2. A further advantage of the linear probability model is that we can interpret the effect estimate of our parental education variable as the shift in the percentage of respondents experiencing a teenage birth resulting from a tenpoint difference in the ISLED score of a respondent’s parents. We ran two multilevel models. The first is a randomslope model, in which both the intercept and the slope of ISLED are allowed to vary across countries. The second is another randomslope model in which HDI is added as a countrylevel indicator and the interaction between parental education and HDI as a crosslevel indicator. The results from both models are presented below as Stata output.
Output 1 shows that, across all 15 countries, there is a negative effect of parental education on the risk of experiencing teenage parenthood. A tenpoint increase in ISLED is associated with a 1.6% decrease in the risk of teenage parenthood. In addition, Output 1 shows that there exists considerable crosscountry variation in the effect of parental education. The random slope for parental education is.0001896, with an estimated standard error of.0000747, so the estimate is more than 2.5 times its standard error. In Output 2, HDI and the interaction between parental education and HDI are added. HDI has a statistically significant negative effect, suggesting that teenage parenthood is less common the higher the HDI score of countries is. In addition, the interaction between parental education and HDI also is statistically significant. The negative parental education gradient becomes weaker the higher the HDI score of countries is. This is in line with our expectations. Furthermore, Output 2 shows that the estimate for the random slope of parental education has dropped by almost half (from.0001896 to.0001037), suggesting that HDI can explain almost half of the countrylevel variation in the effect of parental education.
The 2Step MetaAnalytical Approach
The alternative metaanalysis approach we propose starts with estimating a separate linear probability model per country, leading to 15 identically specified models overall. Output 3 shows the example for the Czech Republic. For the chosen country, the estimate of the association between parental education and the risk of teenage pregnancy is −.030, suggesting that a 10% increase in parental education leads to a 3.0% decrease in teenage parenthood. Estimates for all countries can be found in Table 6.2.
In the second step, the countryspecific estimates of interest (in this particular case, the estimates of the relationship between parental education and the risk of teenage parenthood) are collected into one dataset that is used as input for the metaanalysis. Table 6.2 shows an example of such a dataset, that includes additional parameters of potential interest as well as HDI as a countrylevel indicator. Using this dataset, we performed a metaanalysis (using the metan command in STATA16). The results of this analysis are presented in Output 4 and graphically in Fig. 6.2.
Output 4 shows the estimates of the association for all countries, as well as their confidence intervals. The largest (negative) association is found in Bulgaria (−.049), whereas the smallest in found in Sweden (.001). At the bottom of Output 4, information on the heterogeneity of the countryspecific estimates is provided. Higgins and colleagues (2003) suggest that values for indicator of heterogeneity (I^{2}) is low if I^{2} is between.25 and.50, moderate if it is between.50 and.75 and high if it above.75. In our example, I^{2} is high (91.3%) and the tests of heterogeneity are statistically significant, suggesting that a high level of variation in the association between parental education and teenage parenthood exists across countries. Above the information on heterogeneity, two estimates of the pooled overall association are presented. The IV (InverseVariance) estimate assumes a fixedeffect model, whereas the DL (DerSimonianLaird) estimate assumes a randomeffect model (DerSimonian and Kacker 2007). Theoretically, we assumed heterogeneity in the association between parental education and teenage parenthood, and this assumption was confirmed by the heterogeneity analysis. Thus, the DL estimate of the pooled effect is our preferred estimate of the association in the pooled sample. Overall, a tenpoint increase in parental education leads to a 1.6% decrease in teenage parenthood. Two things should be noted. First, the randomeffect estimate is larger and has a larger confidence interval than the fixedeffect estimate. Second, the randomeffect estimate of the association between parental education and teenage parenthood is exactly the same as the estimate that we derived from the ‘classic’ multilevel model (see Output 1). Figure 6.2 shows a graphical representation of these same findings. One nice aspect of such a graphical representation is that it is very easy to evaluate the position of individual countries. In addition, it allows the researcher to get a first, intuitive grasp of the type of countries with high and low scores and thus whether a pattern is visible at first sight.
Given that our metaanalysis has shown significant variation in the association between parental education and teenage parenthood across countries, we perform a metaregression to examine which countrylevel factor(s) are related to this association (Harbord and Higgins 2008). In our particular example, we performed a metaregression in which the countrylevel estimates of the association between parental education and teenage parenthood are regressed on the countryspecific HDI scores. Results are presented in Output 5 and Fig. 6.3.
Output 5 shows that the association between parental education and teenage parenthood significantly varies by HDIlevel in a country. The effect estimate for HDI is statistically significant (.1211, with a SE of.0346). Note that this effect estimate is very similar to the crosslevel effect estimated in our ‘classic’ multilevel model (.1192, with a SE of.0370). The effect estimate suggests that the association between parental education and teenage parenthood is weaker in countries with a higher HDI score. To allow a better assessment of this finding, the regression line linking the association between parental education and teenage parenthood and HDI are plotted in Fig. 6.3. To facilitate interpretation, we limited the HDI scores (Xaxis) to a range that is observed in our dataset. In addition to the regression line, also the 15 separate country data points are depicted in Fig. 6.3. This figure shows that in countries with low HDI scores (around.70), the association between parental education and teenage parenthood is quite strong (effect of around −.03), suggesting that a 10% point increase in ISLED scores leads to a decrease in the percentage of people experiencing teenage parenthood by about 3%. In countries with high HDI scores (around.90), the association between parental education and teenage parenthood is negligible. Thus, these findings are in line with our expectations.
6.4.2.2 Example for Binary Outcomes
Our example treated the dependent variable as continuous, thus allowing to use OLS regression. Clearly, this metaanalytic 2step procedure can also be used with logistic regression as the first step in the analysis. In fact, the vast majority of the applications of metaanalysis in epidemiology use binary outcomes, and thus perform metaanalysis and metaregression with odds ratios from multiple clinical trials (e.g. Sattar et al. 2010) or observational studies (e.g. Jones et al. 2015) as the dependent variable of interest.
Although it is common in epidemiology to use odds ratios as the dependent variables in metaanalysis, in modern social sciences such applications are regarded as problematic. Mood (2010) has shown that odds ratios resulting from logistic regressions of different samples (e.g. different population subgroups or different countries) cannot be compared to each other, as the unobserved heterogeneity in the model can vary across samples. However, the author suggests that average marginal effects (AME), that can be derived from the logistic regression results, can be meaningfully compared across samples. AME gives the average effect of an independent variable on the probability that the dichotomous dependent variable equals 1. As this quantity does not depend on the unobserved heterogeneity in the model, it can be used to compare effects across countries and the AMEs (and their standard errors) for different countries can be input in a metaanalysis and metaregression, just as the B coefficients in an OLS regression can.
To illustrate this approach, we repeated the analysis presented above, but now used logistic regression rather than OLS regression, and calculated average marginal effects based on the logistic regression models. Next, we performed metaanalysis and metaregression on these estimates. The results are presented in Output 6 and 7. The average effect of a tenpoint increase in parental ISLED score is −.010, suggesting that on average, a tenpoint increase in ISLED decreases the probability of a teenage birth by 1%. This effect hardly differs from the average effect (−.009) in the linear probability model. In addition, the pattern of countryvariation in scores also very strongly resembles the one in Output 4. The results of the metaregression also correspond quite closely with the ones resulting from the linear probability model (.143 versus.121).
6.5 Conclusion
The use of multilevel analysis in comparative research has recently been criticized as the number of countries involved in crossnational analysis is often viewed as too limited to allow reliable inferences and unbiased estimation of parameters of interest. This chapter proposes the 2step metaanalytic approach as an alternative to ‘classic’ multilevel analysis if one is interested in understanding crossnational variation in the link between individuallevel variables as well as crosslevel interactions. After a brief discussion of the main criticisms on the multilevel approach when using few level2 units and an overview of existing modeling alternatives, the 2step metaanalytical approach is outlined and illustrated using examples for both continuous and dichotomous outcomes. Still, although this method is discussed in the context of analyzing data for a small number of countries, it may be applicable to any type of research including a small number of level2 units (e.g. schools, municipalities, hospitals).
The method we propose in this chapter as an alternative for multilevel analysis has several strengths. First, when using it, one can obtain reliable estimates and accurate SEs even when the number of countries is small (smaller than the 25–30 suggested as lower limit for multilevel analyses). Moreover, as a very small number of countries may lead to spurious findings even with metaanalytical techniques, with such techniques one may still be able to provide accurate inferences by an appropriate choice of estimation methods (e.g. KnappHartung modification) and permutationbased resampling. For example, ZoutewelleTerovan and Liefbroer (2018) included 12 countries in their analyses and used a permutation test with adjustment for multiplicity – suited for a small number of countries and multiple covariates (Harbord and Higgins 2008). Second, when one is interested in specifically modeling individual effects, countrylevel effects and crosslevel interactions, the 2step metaanalytic approach provides great opportunities for such modeling. This method is superior to multilevel modeling as its graphic display allows a much more intuitive feel of what the findings mean in terms of the positioning of individual countries than is usually true for multilevel analysis. Also, whereas many of the alternative techniques to multilevel modeling presented in Sect. 6.2 encounter difficulties in explicitly modeling countrylevel effects and crosslevel interactions, our method is capable to comprehensively do so. Additional examples of this approach can be found in other publications within the Context of Opportunities (CONOPP) project (see Brons and Harkonen 2018; Brons et al. 2017; Koops 2020; ZoutewelleTerovan and Liefbroer 2018). Third, whereas our discussion and examples focus on modeling one random slope, multiple random slopes (e.g. how teenage parenthood is linked to parental education, parental separation and number of siblings) can be modeled as well with this method by repeating the 2step metaanalytical approach for multiple associations. Fourth, in this chapter we center on a twolevel nested design. However, the 2step metaanalytical approach could also be extended to situations where individuals are nested within more than one level (e.g. cohorts or regions within countries). Methods to analyze 3level data have been developed within epidemiology and psychometry, e.g. when multiple instruments are used within studies to measure the same underlying concept. In this approach, the instruments are viewed as a second level within ‘trials’. A countrydesign with an additional level consisting of regions (or cohorts) within countries can be viewed as a variation on this theme (Cheung 2014; Jackson et al. 2011; Van den Noortgate et al. 2015). Finally, our method is not only accessible to the averageskilled researcher (as it is easy to conduct and interpret and does not require any advanced simulation skills), but also only requires brief computational time and little computational power to run models, and can be performed with the most common software programs used in social sciences (e.g. STATA, R, SAS).
At the same time, the metaanalytic method proposed is not a panacea. First, it remains difficult to establish a quantitative minimum for the level2 sample size when conducting metaanalyses – such limits are rarely recommended. To date no consistent guidelines for minimum sample sizes exist. Some authors argue for a minimum of 3 level2 units (Rover et al. 2015), others discuss a minimum of 8 units (Jenkins and QuintanaAscencio 2020). However, such minima depend on aspects such as the amount of variance observed (a very small sample may be problematic with substantial statistical heterogeneity), the size of studies or the number of predictors used (Gagnier et al. 2012; Jenkins and QuintanaAscencio 2020). Second, to deal with the small number of level2 units, it has been recommended to use certain estimation techniques (e.g. KnappHartung) or resampling options to establish significance (e.g. permutation test). However, such methods are recognized for being quite conservative and one may run the risk of obtaining false negatives. It is difficult for us to establish the circumstances in which such situations occur (it was also not the goal of this chapter), but this is clearly one aspect that future research needs to clarify (Gagnier et al. 2012). Still, whereas in our research we may have marginally missed the reporting of some significant effects, the conservativeness of methods used increases out confidence in effects that reach the threshold for statistically significance. Third, the implementation of the method may become more difficult (although it remains feasible) if one of the individuallevel variables has multiple categories. An important aspect of this approach is that the effects of additional variables are allowed to vary across countries (as separate analyses are performed for each country). On the one hand, this can be viewed as an advantage, as other variables might also have quite different effects across countries. On the other hand, this leads to the estimation of many parameters and one could view the multilevel model, with its assumption of fixed effects across countries, as a more parsimonious approach.
The 2step metaanalytic approach is proposed as an alternative to multilevel modeling when the number of level2 units is small and one is interested in modeling individual, country and crosslevel effects. However, it is not our intention to claim that the method is superior to multilevel modeling. In fact, in our example we observed that multilevel analysis would still have led to accurate inferences. This suggests that in some situations of few level2 units multilevel modeling still performs well and it was beyond the purpose of this chapter to demonstrate under which conditions it no longer does. Our main goal is to present a viable alternative when multilevel inferences are questionable. Our method may also be used as a sensitivity analysis to support results obtained from multilevel modeling. Furthermore, the 2step metaanalytic approach may be preferable when one is interested in graphically displaying heterogeneity and making inferences based on the positions and characteristics of individual countries.
References
Arend, M.G., and T. Schafer. 2019. Statistical power in twolevel models: A tutorial based on Monte Carlo simulation. Psychological Methods 24 (1): 1–19.
Austin, P.C. 2010. Estimating multilevel logistic regression models when the number of clusters is low: A comparison of different statistical software procedures. The International Journal of Biostatistics 6 (1): 1–18.
Bell, B.A., G.B. Morgan, J.D. Kromrey, and J.M. Ferron. 2010. The impact of small cluster size on multilevel models: A Monte Carlo examination of twolevel models with binary and continuous predictors. JSM Proceedings, Survey Research Methods Section 1 (1): 4057–4067.
Borenstein, M., L.V. Hedges, J.P. Higgins, and H.R. Rothstein. 2009. Introduction to metaanalysis. UK: Wiley.
Brons, M.D., and J. Harkonen. 2018. Parental education and family dissolution: A crossnational and cohort comparison. Journal of Marriage and Family 80 (2): 426–443.
Brons, M.A., and J.E. Mooyaart. 2018. The Generations & Gender Programme: Constructing harmonized, continuous socioeconomic variables for the GGS Wave. Technical Paper GGS Methodology: 1–22.
Brons, M.D., A.C. Liefbroer, and H.B.G. Ganzeboom. 2017. Parental socioeconomic status and first union formation: Can European variation be explained by the Second Demographic Transition theory? European Sociological Review 33 (6): 809–822.
Bryan, M.L., and S.P. Jenkins. 2016. Multilevel modelling of country effects: A cautionary tale. European Sociological Review 32 (1): 3–22.
Cameron, A.C., and D.L. Miller. 2015. A practitioner’s guide to clusterrobust inference. Journal of Human Resources 50 (2): 317–372.
Carpenter, J.R., H. Goldstein, and J. Rasbash. 2003. A novel bootstrap procedure for assessing the relationship between class size and achievement. Journal of the Royal Statistical Society: Series C (Applied Statistics) 52 (4): 431–443.
Cheah, B.C. 2009. Clustering standard errors or modeling multilevel data, 2–4. New York: Research Note University of Columbia.
Cheung, M.W.L. 2014. Modeling dependent effect sizes with threelevel metaanalyses: A structural equation modeling approach. Psychological Methods 19 (2): 211–229.
Combs, J.G. 2010. Big samples and small effects: Let’s not trade relevance and rigor for power. Academy of Management Journal 53 (1): 9–13.
DerSimonian, R., and R. Kacker. 2007. Randomeffects model for metaanalysis of clinical trials: An update. Contemporary Clinical Trials 28 (2): 105–114.
Diggle, P.J., P. Heagerty, K. Liang, and S.L. Zeger. 2002. Analysis of longitudinal data. Oxford University Press.
Fokkema, T., A. Kveder, N. Hiekel, T. Emery, and A.C. Liefbroer. 2016. Generations and Gender Programme Wave 1 data collection: An overview and assessment of sampling and fieldwork methods, weighting procedures, and crosssectional representativeness. Demographic Research 34: 499–524.
Friede, T., C. Rover, S. Wandel, and B. Neuenschwander. 2017. Metaanalysis of few small studies in orphan diseases. Research Synthesis Methods 8 (1): 79–91.
Gagnier, J.J., D. Moher, H. Boon, C. Bombardier, and J. Beyene. 2012. An empirical study using permutationbased resampling in metaregression. Systematic Reviews 1 (1): 1–9.
Goldstein, H. 2011. Bootstrapping in multilevel models. In Handbook of advanced multilevel analysis, ed. J.J. Hox and J.K. Roberts, 163–172. New York: Routledge.
Goldstein, H., W. Browne, and J. Rasbash. 2002. Multilevel modelling of medical data. Statistics in Medicine 21 (21): 3291–3315.
Green, S.B. 1991. How many subjects does it take to do a regression analysis. Multivariate Behavioral Research 26 (3): 499–510.
Hamaker, E.L., and I. Klugkist. 2010. Bayesian estimation of multilevel models. In Handbook of advanced multilevel analysis, ed. J.J. Hox and J.K. Roberts, 137–161. New York: Routledge.
Harbord, R.M., and J.P. Higgins. 2008. Metaregression in stata. In Metaanalysis in Stata: An updated collection from the Stata Journal, ed. T.M. Palmer, J.A.C. Sterne, H.J. Newton, and N.J. Cox, vol. 8(4), 2nd ed., 493–519. College Station: Stata Press.
Higgins, J.P.T., S.G. Thompson, J.J. Deeks, and D.G. Altman. 2003. Measuring inconsistency in metaanalyses. BMJ 327 (7414): 557–560.
Hox, J. 1998. Multilevel modeling: When and why. In Classification, data analysis, and data highways, ed. I. Balderjahn, R. Mathar, and M. Schader, 147–154. New York: Springer.
Hox, J.J., M. Moerbeek, and R. van de Schoot. 2010. Multilevel analysis: Techniques and applications. Routledge.
Jackson, D., R. Riley, and I.R. White. 2011. Multivariate metaanalysis: Potential and promise. Statistics in Medicine 30 (20): 2481–2498.
Jenkins, D.G., and P.F. QuintanaAscencio. 2020. A solution to minimum sample size for regressions. PLoS One 15 (2): 1–15.
Jones, L., G. Bates, E. McCoy, and M.A. Bellis. 2015. Relationship between alcoholattributable disease and socioeconomic status, and the role of alcohol consumption in this relationship: A systematic review and metaanalysis. BMC Public Health 15 (400): 1–14.
Knapp, G., and J. Hartung. 2003. Improved tests for a random effects metaregression with a single covariate. Statistics in Medicine 22 (17): 2693–2710.
Koops, J.C. 2020. Understanding nonmarital childbearing in Europe and NorthAmerica. The role of socioeconomic background and ethnicity in different societal contexts. (PhD thesis). University of Groningen, Groningen.
Kreft, I., and J. de Leeuw. 1998. Introducing multilevel modeling. Sage.
Maas, C.J., and J.J. Hox. 2004. Robustness issues in multilevel regression analysis. Statistica Neerlandica 58 (2): 127–137.
———. 2005. Sufficient sample sizes for multilevel modeling. Methodology 1 (3): 86–92.
McNeish, D., and L.M. Stapleton. 2016. Modeling clustered data with very few clusters. Multivariate Behavioral Research 51 (4): 495–518.
Mood, C. 2010. Logistic regression: Why we cannot do what we think we can do, and what we can do about it. European Sociological Review 26 (1): 67–82.
Nunnally, J.C. 1978. Psychometric theory. 2nd ed. New York: McGrawHill.
Palmer, T.M., and J.A.C. Sterne. 2016. Metaanalysis in Stata: An updated collection from the Stata Journal. 2nd ed. College Station: Stata Press.
Pirog, M.A., H. Jung, and D. Lee. 2018. The changing face of teenage parenthood in the United States: Evidence from NLSY79 and NLSY97. Child & Youth Care Forum 47 (3): 317–342.
RabeHesketh, S., and A. Skrondal. 2008. Multilevel and longitudinal modeling using Stata. STATA Press.
Roberts, J.K., and X. Fan. 2004. Bootstrapping within the multilevel/hierarchical linear modeling framework: A primer for use with SAS and SPLUS. Multiple Linear Regression Viewpoints 30 (1): 23–34.
Rover, C., G. Knapp, and T. Friede. 2015. HartungKnappSidikJonkman approach and its modification for randomeffects metaanalysis with few studies. BMC Medical Research Methodology 15 (1): 1–7.
Sattar, N., D. Preiss, H.M. Murray, P. Welsh, B.M. Buckley, A.J. de Craen, S.R.K. Seshasai, J.J. McMurray, D.J. Freeman, J.W. Jukema, and P.W. Macfarlane. 2010. Statins and risk of incident diabetes: A collaborative metaanalysis of randomised statin trials. The Lancet 375 (9716): 735–742.
SchmidtCatran, A.W., and M. Fairbrother. 2015. The random effects in multilevel models: Getting them wrong and getting them right. European Sociological Review 32 (1): 23–38.
Schröder, H., and H.B. Ganzeboom. 2014. Measuring and modelling level of education in European societies. European Sociological Review 30(1): 119–136.
Seco, G.V., M.A. Garcia, M.P.F. Garcia, and P.E.L. Rojas. 2013. Multilevel bootstrap analysis with assumptions violated. Psicothema 25 (4): 520–528.
Seide, S.E., C. Rover, and T. Friede. 2019. Likelihoodbased randomeffects metaanalysis with few studies: empirical and simulation studies. BMC Medical Research Methodology 19 (1): 16.
Snijders, T.A.B., and R.J. Bosker. 1999. Multilevel analysis: An introduction to basic and advanced multilevel modeling. Sage.
Stegmueller, D. 2013. How many countries for multilevel modeling? A comparison of frequentist and Bayesian approaches. American Journal of Political Science 57 (3): 748–761.
Thai, H.T., F. Mentre, N.H. Holford, C. VeyratFollet, and E. Comets. 2013. A comparison of bootstrap approaches for estimating uncertainty of parameters in linear mixedeffects models. Pharmaceutical Statistics 12 (3): 129–140.
Thompson, S.G., and J.P. Higgins. 2002. How should metaregression analyses be undertaken and interpreted? Statistics in Medicine 21 (11): 1559–1573.
Thompson, S.G., and S.J. Sharp. 1999. Explaining heterogeneity in metaanalysis: A comparison of methods. Statistics in Medicine 18 (20): 2693–2708.
Van den Noortgate, W., J.A. LopezLopez, F. MarinMartinez, and J. SanchezMeca. 2015. Metaanalysis of multiple outcomes: A multilevel approach. Behavior Research Methods 47 (4): 1274–1294.
Van der Leeden, R., E. Meijer, and F.M. Busing. 2008. Resampling multilevel models. In Handbook of multilevel analysis, ed. R. Van der Leeden and E. Meijer, 401–433. Springer.
Wiksten, A., G. Rucker, and G. Schwarzer. 2016. Hartung–Knapp method is not always conservative compared with fixedeffect metaanalysis. Statistics in Medicine 35 (15): 2503–2515.
ZoutewelleTerovan, M., and A.C. Liefbroer. 2018. Swimming against the stream: Nonnormative family transitions and loneliness in later life across 12 nations. The Gerontologist 58 (6): 1096–1108.
Acknowledgements
The research leading to these results has received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP/20072013)/ERC Grant Agreement n. 324178.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2021 The Author(s)
About this chapter
Cite this chapter
Liefbroer, A.C., ZoutewelleTerovan, M. (2021). MetaAnalysis and MetaRegression: An Alternative to Multilevel Analysis When the Number of Countries Is Small. In: Liefbroer, A.C., ZoutewelleTerovan, M. (eds) Social Background and the Demographic Life Course: CrossNational Comparisons. Springer, Cham. https://doi.org/10.1007/9783030673451_6
Download citation
DOI: https://doi.org/10.1007/9783030673451_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 9783030673444
Online ISBN: 9783030673451
eBook Packages: HistoryHistory (R0)