New Approaches to Multilevel Analysis
- First Online:
- Cite this article as:
- Beard, J.R. J Urban Health (2008) 85: 805. doi:10.1007/s11524-008-9314-7
- 273 Downloads
The paper by Yu-Sheng and Ying-Chih in this issue highlights some of the challenges confronting studies examining neighborhood-level influences on health. Studies of this sort often involve data on individuals nested within neighborhoods. A key issue that arises in analyses of these types of data structure is the potential for nonindependence of observations (i.e., the possibility of within-neighborhood correlations between individual-level outcomes). Ignoring this can result in invalid standard errors, incorrect (typically anticonservative) inferences, and inefficient estimates.1 Multilevel models are particularly suited to analysis of these types of data structures and are being increasingly used in studies of neighborhood-level effects.2,3
Another issue that needs to be considered in studies of this type is the potential for colinearity between many of the environmental characteristics of interest. For example, neighborhoods with low education levels are likely to also be areas of high unemployment. To address this issue, a number of studies have used factor analysis to aggregate neighborhood-level characteristics into a smaller number of constructs that may better reflect the dimensions underlying these associations.4
Part of the analysis by Yu-Sheng and Ying-Chih uses factor analysis to identify three dimensions of neighborhood. They then test the influence of these using multilevel models. However, as the authors point out, while multilevel models address the potential for intraneighborhood clustering of individual-level characteristics, they treat neighborhood dimensions as independent characteristics and neither provide information on, nor take account of, any relationship between these dimensions. To address this issue, they applied cluster analysis to the neighborhood dimensions identified by factor analysis. This generated six types of neighborhoods with, for example, neighborhoods with a high concentration of inhabitants younger than 15 years, a moderate education level, and a moderate level of single-parent families being associated with poorer outcomes. This provides a more complex perspective on the mix of characteristics that may make a neighborhood supportive of good health.
A further important issue for studies investigating neighborhood effects is the spatial distribution of this data and the potential for spatial autocorrelation between neighborhood estimates. Thus, for example, a neighborhood immediately adjacent to a neighborhood with high unemployment may, itself, be more likely to have a higher rate of unemployment than would be expected by chance. This is exacerbated by the arbitrary (from a health perspective) boundaries of the administrative data often used in this type of research.
Spatial analyses can be useful in addressing these issues. These can be used firstly to assess whether areas sharing a common boundary have more similar area-level residuals than would be expected under spatial randomness.5 If there is significant spatial autocorrelation in a particular analysis, spatial models can then be used to take account of spatial adjacency issues between neighborhoods.6 The conditional autoregressive (CAR) model, for example, models area-specific intercepts as random effects.7 This can be used to adjust relative risk estimates in neighborhoods toward the mean risk in neighboring areas, has the capacity to also deal with temporal adjacency (for example, if data has been collected over a number of years), and can include multivariable analyses. CAR models are also useful in situations where some expected cell counts are low as they draw information from neighboring cells in generating their estimates. By comparison, a rule of thumb for building multilevel models suggests these require >29 groups with >29 individuals per group.8
However, spatial models are Bayesian in their construction, which may present researchers who are trained in frequentist approaches some challenges in analysis. They also require extensive computing capacity and have limitations when dealing with large numbers of spatial units.
Perhaps we now need to explore how all these approaches can be brought together in a way that addresses interneighborhood and intraneighborhood autocorrelation, low cell numbers, colinearity of neighborhood measures, and clustering of these measures. One such approach may be to use spatial models to develop exposure estimates. This can help overcome low cell counts and take account of spatial and temporal adjacency. Factor analysis might then be applied to these neighborhood estimates in order to identify parsimonious dimensions of neighborhood, which could then be included in traditional multilevel models. If the approach suggested by Yu-Sheng and Ying-Chih becomes more widely accepted, clustering between these dimensions can also be explored.