The spatial distribution of known predictors of autism spectrum disorders impacts geographic variability in prevalence in central North Carolina
- First Online:
- Cite this article as:
- Hoffman, K., Kalkbrenner, A.E., Vieira, V.M. et al. Environ Health (2012) 11: 80. doi:10.1186/1476-069X-11-80
- 5.1k Downloads
The causes of autism spectrum disorders (ASD) remain largely unknown and widely debated; however, evidence increasingly points to the importance of environmental exposures. A growing number of studies use geographic variability in ASD prevalence or exposure patterns to investigate the association between environmental factors and ASD. However, differences in the geographic distribution of established risk and predictive factors for ASD, such as maternal education or age, can interfere with investigations of ASD etiology. We evaluated geographic variability in the prevalence of ASD in central North Carolina and the impact of spatial confounding by known risk and predictive factors.
Children meeting a standardized case definition for ASD at 8 years of age were identified through records-based surveillance for 8 counties biennially from 2002 to 2008 (n=532). Vital records were used to identify the underlying cohort (15% random sample of children born in the same years as children with an ASD, n=11,034), and to obtain birth addresses. We used generalized additive models (GAMs) to estimate the prevalence of ASD across the region by smoothing latitude and longitude. GAMs, unlike methods used in previous spatial analyses of ASD, allow for extensive adjustment of individual-level risk factors (e.g. maternal age and education) when evaluating spatial variability of disease prevalence.
Unadjusted maps revealed geographic variation in surveillance-recognized ASD. Children born in certain regions of the study area were up to 1.27 times as likely to be recognized as having ASD compared to children born in the study area as a whole (prevalence ratio (PR) range across the study area 0.57-1.27; global P=0.003). However, geographic gradients of ASD prevalence were attenuated after adjusting for spatial confounders (adjusted PR range 0.72-1.12 across the study area; global P=0.052).
In these data, spatial variation of ASD in central NC can be explained largely by factors impacting diagnosis, such as maternal education, emphasizing the importance of adjusting for differences in the geographic distribution of known individual-level predictors in spatial analyses of ASD. These results underscore the critical importance of accounting for such factors in studies of environmental exposures that vary across regions.
KeywordsAutism spectrum disorders (ASD)Intellectual disability (ID)Spatial analysisDisease mappingGeneralized additive models (GAMs)Geographic information systems (GIS)
Adjusted prevalence ratio
- ADDM Network
Autism and developmental disabilities monitoring network
Akaike’s information criterion
Autism spectrum disorders
Autism spectrum disorders with intellectual disability
Autism spectrum disorders without intellectual disability
Generalized additive model
Geographic information systems
Autism spectrum disorders (ASD) are complex neurodevelopmental disorders characterized by impaired social interaction and communication, and restrictive and repetitive behavior. Estimates in 2008 indicate that approximately 1 in 88 children have ASD and that the prevalence of documented ASD is on the rise. The causes for ASD remain largely unknown and widely debated[3, 4]. Environmental exposures are hypothesized to contribute to ASD etiology; however, identifying exposures of concern has been complicated by the relative rarity of ASD, the extensive number of candidate exposures, and the lack of exposure measurements during early life, the developmentally relevant time period for ASD. Thus, studies have explored the geographic distribution of ASD as a means for generating hypotheses about spatially distributed environmental exposures. Additionally, studies have used geographically-based exposure assignment to evaluate the impact of specific exposures such as air pollutants, pesticides, and hazardous wastes on ASD risk[6–9].
One problem with evaluating the geographically non-random prevalence of ASD for etiologic purposes lies in difficulties disentangling the geographic distribution of other factors associated with diagnosis[10, 11]. For example, higher maternal education is associated with increased ASD diagnosis in the United States[12, 13] but not in some European countries. These results suggest that maternal education is a factor in promoting recognition of ASD, but not necessarily the occurrence of ASD. Identifying patterns related to ASD diagnosis may be helpful for public health services allocation; however, to generate hypotheses about etiology, we must distinguish diagnostic patterns from patterns of ASD occurrence. For example, prioritizing investigations of a geographically-based environmental cause may be unwarranted if the observed spatial pattern is driven by maternal education (i.e. spatial confounding).
Two previous studies investigated geographic variability of ASD as a means of identifying environmental exposures related to ASD prevalence[15, 16]. Both reported that children born in certain regions of California were more likely to have a recognized ASD than children born in other parts of the state[15, 16]. The authors attributed their findings to regional differences in the underlying population (i.e. demographic and socioeconomic factors) or geographic variability of environmental exposures; but were unable to disentangle factors promoting diagnosis from environmental exposures because they did not adjust for potentially important individual-level spatial confounding[15, 16].
In order to determine the potential for environmentally distributed exposure to be associated with ASD in central North Carolina (such as contaminants in air or water), we first explored whether spatial differences in ASD prevalence existed and then whether differences remained after accounting for spatially distributed covariates associated with ASD risk and diagnosis. We used a method of spatial epidemiology that applies generalized additive models (GAMs) to assess the spatial variation of disease in a region while simultaneously adjusting for other geographically distributed individual-level factors[17, 18] such as maternal education, age, and smoking. Combining GAMs and geographic information systems allowed us to predict a continuous surface of ASD prevalence across our study area. Our research serves not only to expand the consideration of spatial patterns of ASD to geographic regions other than California but also to improve the utility of such studies by directly examining how adjustment for known risk and predictive factors influences geographic patterns. Patterns remaining after accounting for factors that influence ASD recognition may better reveal the distribution of novel geographically distributed etiologic factors impacting ASD prevalence.
The Autism and Developmental Disabilities Monitoring (ADDM) Network is an active, population-based surveillance program that monitors the prevalence of developmental disabilities among children aged 8 years, an age by which most children with ASD have been evaluated, in selected geographic regions across the United States. The ADDM Network conducts standardized review of medical and educational records and trained clinicians determine whether standardized case definitions for ASD and intellectual disability (ID) are met. Our analyses utilized data from the North Carolina ADDM site (NC-ADDM), which began biennial surveillance in 2002. Analyses were restricted to children born in the 8 counties that were under surveillance during all study years (2002–2008).
ASD and ID classification
To fully explore the impact of spatial confounding on the geographic distribution of developmental disability, we considered 4 different disability groups: all children with an ASD, 2 subgroups within all ASD classified by the co-occurrence of ID (ASD+ID and ASD-ID), and the independent group of all children with ID without regard to ASD status. Examining these independent and cross-classified groups provided a more nuanced disentanglement of factors which may act differently to promote ASD, versus ID, recognition, as follows. Diagnosis of ASD requires comprehensive evaluation of a constellation of behaviors that can be more involved than a singular evaluation assessment of Intelligence Quotient (IQ) to determine ID. Yet, ASD often occurs with ID, frequently presenting more severe disability that is recognized at an earlier age. In addition, Durkin et al. (2010) reported a differential socioeconomic (SES) gradient in ASD prevalence by the presence or absence of ID; SES acts as a stronger risk factor for ASD-ID compared to ASD+ID. Because we expect SES to vary spatially and be a spatial confounder, analyzing ASD without respect to ID may mask the true spatial patterning of disease.
Children met the standardized case definition for ASD if clinician reviewers determined their developmental evaluation records indicated behaviors consistent with ASD, based on the Diagnostic and Statistical Manual IVTM criteria for Autistic Disorder, Asperger Disorder, or Pervasive Developmental Disorder Not-Otherwise-Specified. Children met the standardized definition for ID if clinician review of developmental evaluations determined they had an IQ ≤ 70 on the most recently administered psychometrics test such as the Battelle–cognitive domain, Differential Ability Scales, Stanford-Binet–4th ed., Wechsler Preschool and Primary Scale of Intelligence, and the Wechsler Intelligence Scale for Children-III, or, in the absence of test scores, a written statement in the records indicated the presence of a severe or profound intellectual disability. Our analyses included 561 children with ASD, who were further classified into two subgroups: children with ASD without ID (ASD-ID, n=330), and children with ASD and ID (ASD+ID, n=231). As a comparison to the ASD analyses, we conducted additional analyses investigating the spatial variability in the prevalence of ID (n=1,028) regardless of ASD status.
Residential location and covariates
Surveillance data were linked to birth records to obtain the residential address and covariate information at the time of birth of children with ASD and ID. Birth data were chosen to reflect the most etiologically relevant time period for brain development.
We combined several methods to geocode (i.e. assign latitude and longitude coordinates) the residential birth addresses of study children to improve geocoding accuracy and reduce positional errors. First, we ran all addresses through the ZP4 software which cleans and updates addresses using U.S. Postal Service databases (e.g. converting rural routes to E-911 street names) (version expiring March 2011, Semphoria; Memphis Tennessee). We then cleaned all addresses individually and geocoded them using ArcGIS (version 9.3, Redlands, California) and U.S. Census Tigerline files. For unmatched addresses, we used Google Maps to locate the residence where possible. Using these methods, we successfully geocoded 12,299 (93.41%) of the 13,167 residential addresses. Of the addresses we were unable to geocode, 379 (2.88%) were post office boxes and 489 (3.71%) were addresses that were either incomplete or that we were not able to geocode to a specific location. Geocoding success was similar for the birth cohort and children with ASD and ID.
We estimated the log odds of ASD and ID using GAMs, an extension of linear regression models that can analyze binary outcome data and accommodate both parametric and non-parametric model components. For non-parametric model terms, GAMs replace the traditional exposure term in an ordinary logistic regression with a smooth term (i.e. a term of best fit after adjusting for other covariates). In these analyses we applied a bivariate smooth to latitude and longitude coordinates and included all other covariates as parametric terms[17, 18].
We used a locally weighted regression smoother (loess) which adapts to changes in the data density that are likely to occur in analyses of residential locations due to variability in population density. To predict prevalence the loess smoother utilizes information from nearby data points (weighting information based on its distance from the prediction point). The region or neighborhood from which data are drawn to predict prevalence is based on the percentage of data points in the neighborhood and is referred to as the span size. Choice of span size is a trade-off between bias and variability. A larger span size (more data included) results in a flatter surface with low variability but increased bias, while a small span size results in high variability and comparatively low bias. We determined the optimal amount of smoothing (optimal span size), by minimizing Akaike’s Information Criterion (AIC)[17, 18]. A large optimal span size indicates less spatial variability in prevalence compared to a small optimal span.
We created a grid covering the study area that extended across all latitude and longitude coordinates of all addresses. We excluded grid points in regions of the study area where no children in our sample were born. The resulting grid comprised approximately 4,300 points. We predicted the log odds for each of the 4 developmental outcome groups (e.g. ASD) at each point on the grid and calculated an odds ratio using the entire study area as the referent group; the odds at each point was divided by the odds from a reduced model which omitted the latitude and longitude smooth term. We did not remove children with developmental outcomes from the random selection of births drawn to represent the underlying distribution of births and, consequently, some cases were included in the denominator. As a result, the odds ratios from these models are mathematically equivalent to prevalence ratios. We mapped prevalence ratios using a continuous color scale (blue to red) and a constant scale range for all maps with the same outcome (i.e. ASD; ASD+ID; ASD-ID; and ID).
To provide information in the interpretation of spatial patterns that could be driven by sparse data, we preformed a 2-step statistical screening procedure, as follows. First, we tested the null hypothesis that developmental disability does not depend on geographic location, generally (e.g. the predicted map surface is a flat horizontal plane). Residential locations of individuals were permuted 999 times while preserving their case status and covariates [as described in. In each permutation, the GAM with the optimal span size from the original dataset was run and the global deviance statistic computed. We used a conservative p<0.025, which accounts for inflated type 1 error rates associated with using the optimal span size for the original dataset in permutations, to determine if location was a statistically significant predictor of disability [details in. If the global statistic indicated that location was a statistically significant predictor of disability generally, we next evaluated location-specific (point-wise) departures from the null hypothesis using the same set of permutations[18, 31]. Regions with significantly increased or decreased surveillance-recognized ASD or ID prevalence were defined as points ranking in the upper or lower 2.5% of the distribution of permuted prevalence ratios at each point, respectively.
Statistical analyses were performed in the R Package 2.12.02 (Vienna, Austria) using the gam library and a local scoring algorithm GAM estimation procedure and publically available statistical code[32, 33]. All maps were created using ArcGIS 9.3 (version 9.3, Redlands, California).
Selected Characteristics of the Birth Cohort and Children with ASD and ID in Eight North Carolina Counties in 2002, 2004, 2006 and 2008
Birth Cohort n (%)
All ASD n (%)
ASD-ID n (%)
ASD+ID n (%)
All ID n (%)
Year of Birth
Maternal Age at Birth
Maternal Educational Attainment
Less than High School
College or More
Maternal Tobacco Use During Pregnancy
Robustness of analyses
Our final dataset contained some siblings. In addition to being genetically more similar to each other, siblings typically share the same residence. Including siblings living at the same address in analyses could induce spatial clustering as a result of familial (i.e. genetic) similarities rather than geographically-linked factors. To assess the robustness of our results to including a small number of sibling groups, we conducted secondary analyses including only one randomly selected child per family. Families were defined as children for whom the mother had the same first and maiden name and date of birth (obtained from birth certificates). Because information for fathers was missing and incomplete on many birth records, we did not attempt to identify paternal siblings.
Selected characteristics for the birth cohort and children with ASD or ID are displayed in Table1. Children with ASD were more likely to be male, have older mothers, and mothers with higher educational attainment. As has been reported previously[2, 38], the age 8 prevalence of ASD in the region increased from 2002–2008, however the prevalence of ID remained relatively stable (Table1). Residential locations at birth for our study population are displayed in Figure1 with slight alteration to preserve confidentiality.
Summary of Spatial Analyses
All ASD (birth cohort n=11,034 and ASD n=532)
ASD-ID (birth cohort n=11,034 and ASD-ID n=318)
ASD+ID (birth cohort n=11,034 and ASD+ID n=214)
All ID (birth cohort n=11,034 and ID n=916)
Additionally adjusting for method of delivery, marital status, birthweight, and adequacy of prenatal care did not change the appearance of maps for any of the 4 developmental disability groups considered, or the range of prevalence ratios observed across the study area; these variables were dropped from adjusted analyses. Spatial confounding was driven primarily by the higher educational attainment in Alamance, Chatham, Durham, and Orange Counties, and to a lesser extent, to greater maternal age observed in the same counties; maps with adjustment for maternal age and education only had the same optimal span size (0.95), a similar range of prevalence ratios, and were visually very similar to the fully adjusted maps. When we examined the patterns of maternal educational attainment and age across the study area, both were similar to the unadjusted ASD pattern. For example, mothers living in the areas where the unadjusted ASD prevalence was highest were approximately 1.75 times as likely to have completed college than women in the study area as a whole (Additional file1: Figure S1).
Among the 532 children with ASD, 214 (40.22%) also had an ID. The spatial patterns of both subgroups (ASD+ID and ASD-ID) were similar to each other and to the map of the combined group (all ASD); the locations of increased prevalence and the intensity of the increases were similar across all maps (Table2; Additional file1: Figure S2). Adjusting for spatial confounding resulted in flatter maps of ASD+ID and ASD-ID; however, greater attenuation was observed in the analysis of ASD-ID in which the optimal span size increased from 0.70 to 0.95 after adjustment (indicating the surface was less variable/ more flat after adjustment; Table2; unadjusted figures not shown). Additionally, the range of PRs was more substantially attenuated by adjustment in the analysis of ASD-ID. In our sample, ASD-ID was 1.33 times as high in children who had mothers with college or more education (referent group=children of mothers with some post secondary education; P=0.059); but these same mothers were less likely than mothers with some postsecondary education to have a child with ASD + ID (aPR = 0.67, P = 0.060).
Only 269 sibling pairs were in our analyses and their impact on analyses of both ASD and ID was negligible. When we randomly selected one child from each family for analyses, the pattern of spatial variability was quite similar to analyses including all children (results not shown).
Although we observed spatial variability in ASD prevalence in unadjusted analyses, the pattern of ASD appeared to be largely explained by factors influencing diagnosis -- maternal age and education -- which differed across the study area. The larger optimal span size and global p-value in the adjusted analyses, as well as decreased variability in prevalence ratios across the study area, all indicated less geographic variability after adjustment for known risk and predictive factors for ASD. Our results corroborate previous reports[15, 16] demonstrating spatial variability in ASD prevalence before adjusting for spatial confounding. In secondary analyses Van Meter and colleagues reported the impact of living in neighborhoods with high ASD prevalence was diminished in analyses adjusting for parental education, a finding consistent with our results.
Our results highlight the importance of adjusting for the geographic distribution of known individual-level predictors in spatial analyses. Here, adjustment for socioeconomic factors associated with diagnosis greatly attenuated geographic variability in ASD prevalence. This has implications for studies of ASD etiology that assign environmental exposures based on geography, such as air pollutants and agricultural pesticides, for which we caution the causal interpretation of geographic patterns that are not controlled for individual-level factors. Using GAMs allowed us to carefully account for known risk and diagnostic factors, a major strength of these analyses. In addition, we considered both the statistical and qualitative aspects of the observed geographic patterns, tempering interpretation of areas of sparse data by considering tests of global deviance and changes in the optimal span size along-side visual inspection of the patterns in our interpretation of results.
NC-ADDM Network provided a large population based sample with information on co-occurring conditions, allowing us to consider differences in spatial confounding by the severity and type of disability. Patterns for separate disability groups, ASD+ID and ASD-ID, were visually similar to those for all ASD; however, ASD-ID appeared to be more impacted by spatial confounding as indicated by a greater attenuation in prevalence ratios and a increase of the optimal span size in the adjusted analysis. The associations between ASD (+/−ID) and maternal education and age that we observed in these analyses are consistent with previously reported associations in non-spatial analyses[3, 4] which indicate that ASD-ID is more strongly associated with higher SES than ASD+ID.
Although the global significance test did not indicate variability, our results suggest that ID prevalence varied geographically across the study area; even after adjusting for known spatial confounders the optimal span size remained small and results were visually suggestive of geographic differences in ID prevalence. There are several possible explanations for the observed variability in ID, including residual confounding by unmeasured variables and differences in the distribution of environmental factors across the study area.
Linking data from the NC-ADDM Network and vital records also strengthened our analyses. Vital records provided individual-level data on a number of covariates, allowing us to account for spatial confounding by these factors, and also provided residential location at the time of birth, which may be more relevant to ASD etiology than the address at diagnosis. Although the majority of children with ASD at age 8 in ADDM were also born in the study area, local changes in address were common (68.14% of children with ASD had a change in residential address between birth and age 8). If the spatial patterns we observed are due to difference in diagnosis, spatial variability may be more apparent using a later address corresponding to the time of diagnosis.
A final strength of our study was our ability to evaluate the influence of potential sibling clusters in analyses. While it is often suggested that including siblings in analyses will induce clustering, we did not observe this pattern in our data. One possible explanation is that the number of siblings in our analyses was relatively small (e.g. 269 sibling pairs in the ASD analysis of 11,566 children), due in part to the biennial study design and the selection of only part of the source population (15%).
There are also several important limitations of our methods. We chose the optimal tradeoff between bias and variance of the smooth (the optimal span size) by minimizing the AIC. Selecting the optimal span size for a dataset, however, may obscure important small scale variability in maps; it is possible that our analyses were conducted on a scale too large to identify small scale environmental exposures relevant in the etiology of ASD. Examining different span sizes may reveal important features of the data. Similarly, we assessed spatial confounding by visually comparing maps of prevalence before and after adjustment holding the span size and included observations constant and by investigating changes in the optimal span size before and after adjustment; however, more objective methods of assessing confounding are needed and are an important topic for future research. We also used p-values as a screening tool, to evaluate the global spatial variation as well as areas of increased or decreased prevalence. Our use of p-values here was to evaluate whether or not spatial variability existed (i.e. whether the prevalence of ASD was constant across the geographic area). Nonetheless, many epidemiologists prefer the use of confidence intervals, which provide information on the precision of the observed association. While it is possible to calculate confidence intervals in these analyses, it is visually difficult to display three surfaces simultaneously on maps. Additionally, we conducted permutation tests using the span size of the observed data to test the null hypothesis that the map is flat. Permuted datasets under this null hypothesis may have had a larger optimal span size, particularly for the ID analyses, which had a relatively small span size for the observed data (optimal span = 0.30). Consequently, using the span size of the observed data could result in a p-value that is too small.
Our results demonstrate the importance of adjusting for predictive and diagnostic factors that may spatially confound the search for novel environmentally-distributed risk factors for ASD. These adjustment methods can help the search for causes of developmental disabilities proceed more efficiently by aiding the interpretation of geographic patterns. Cumulatively, our results suggest that known predictive factors for ASD account for a large portion of the observed geographic variability in prevalence in central North Carolina. Although we did not identify spatial variability in ASD in NC after controlling for known predictive and risk factors, follow up of these results in other regions is may provide clues for ASD etiology.
The Autism and Developmental Disabilities Network is funded by the CDC (5-UR3-DD000089-03). Kate Hoffman was funded by NIEHS T32ES007018. We gratefully acknowledge Amy Herring for her statistical advice in conceptualizing this research. The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention or the National Institute of Environmental Health Sciences.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License(http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.