Maternal and Paternal Age are Jointly Associated with Childhood Autism in Jamaica
- First Online:
- Cite this article as:
- Rahbar, M.H., Samms-Vaughan, M., Loveland, K.A. et al. J Autism Dev Disord (2012) 42: 1928. doi:10.1007/s10803-011-1438-z
- 378 Views
Several studies have reported maternal and paternal age as risk factors for having a child with Autism Spectrum Disorder (ASD), yet the results remain inconsistent. We used data for 68 age- and sex-matched case–control pairs collected from Jamaica. Using Multivariate General Linear Models (MGLM) and controlling for parity, gestational age, and parental education, we found a significant (p < 0.0001) joint effect of parental ages on having children with ASD indicating an adjusted mean paternal age difference between cases and controls of [5.9 years; 95% CI (2.6, 9.1)] and a difference for maternal age of [6.5 years; 95% CI (4.0, 8.9)]. To avoid multicollinearity in logistic regression, we recommend joint modeling of parental ages as a vector of outcome variables using MGLM.
KeywordsAutism spectrum disordersMaternal agePaternal ageMultivariate General Linear ModelsMulticollinearity
Autism Spectrum Disorders (ASDs) are complex lifelong neurodevelopmental and behavioral disorders manifesting in infancy or early childhood, characterized by impairments in social interaction and communication, and repetitive, stereotyped behavior. ASD has become a serious public health concern with a major familial and societal economic impact. The number of children diagnosed with ASD is approximately 1% (Autism and Developmental Disabilities Monitoring Network Surveillance Year 2006 Principal Investigators & Centers for Disease Control and Prevention (CDC) 2009) and has been on the rise for over two decades (Blaxill 2004). While diagnostic changes and improvements in detection probably contribute to this increase (Chakrabarti and Fombonne 2005; Blaxill 2004), a true rise in incidence may also be occurring (Blaxill 2004).
The etiology of ASD is not fully understood but there is consensus that genetics and environmental factors interact to play a role in its development. Several studies conducted worldwide have reported advanced maternal and/or paternal age as risk factors for adverse developmental and behavioral outcomes (Saha et al. 2009a) and autism (Larsson et al. 2005; Gardener et al. (2009); Croen et al. 2007; Durkin et al. 2008; Tsuchiya et al. 2008). However, these studies have yielded contradictory results, with some reporting only paternal age as a risk factor for ASD (Sasanfar et al. 2010; Gabis et al. 2010; Reichenberg et al. 2006; Tsuchiya et al. 2008; Zhang et al. 2010). Another set of investigators reported only maternal age as a risk factor for ASD (Bilder et al. 2009; Croen et al. 2002; Glasson et al. 2004; Gillberg 1980; Tsai and Stewart 1983), and others have reported both paternal and maternal age as risk factors for ASD (Larsson et al. 2005; Shelton et al. 2010; Croen et al. 2007; Durkin et al. 2008; King et al.2009; Grether et al. 2009).
Given that most of these findings are based on case–control studies, the majority of these investigators utilized logistic regression to assess the independent contributions of maternal and paternal ages to ASD in children. Since the maternal and paternal age variables are highly correlated, it is not clear if the difference in the ASD risk for either maternal or paternal age reported by these studies is due to multicollinearity. Multicollinearity is a phenomenon attributed to placing two or more highly correlated independent variables in a regression model, which may produce unstable estimates of standard errors and p-values (Gordon 1968). For instance, due to a high correlation between maternal and paternal ages, the apparent lack of predictability of paternal age seen in previous studies that reported maternal age as the only ASD risk factor among the parental ages may have been simply due to multicollinearity. In other words, when both parental ages are entered into a logistic regression analysis, the model picks the maternal age as a significant predictor of ASD because it explains the highest amount of shared variance in the ASD status, as compared to the paternal age. In this situation, even though the paternal age was significantly associated with ASD in the univariable analysis, it will not be recognized as a significant predictor of ASD in the multivariable model unless it can explain a significant amount of shared variance in addition to what has already been explained by maternal age. Thus, even though paternal age might play an important role as a risk factor for having a child with ASD, the logistic regression methods previously used to address this issue may have obscured the true nature of this risk factor.
While some of the aforementioned investigators have acknowledged the potential effects of multicollinearity on their findings, others have not even mentioned this possibility in their reports. Durkin et al. (2008) reported independent contributions of maternal and paternal age to ASD in children, but they did not acknowledge the possible effects of multicollinearity on the estimation of regression coefficients (Durkin et al. 2008). On the other hand, King et al. (2009) have recognized the need for avoiding problems generated by multicollinearity and used linear decomposition of maternal and paternal age by creating two new variables, U1 = sum of paternal and maternal ages and U2 = the difference between paternal and maternal ages. Using these two decomposed variables, King et al. (2009) were able to identify the individual effects of paternal and maternal ages and reported that both of these variables contribute to ASD risk in children (King et al. 2009). However, as highlighted by Durkin et al. (2010) and by King et al. (2010), the continuous parameterization proposed by King et al. (2009) assumes that a fixed incremental change in the two decomposed variables U1 and U2 at any point in the parental age range leads to the same percentage change in the log odds of ASD risk which may not be valid (Durkin et al. 2010; King et al. 2010). These findings highlight the possibility that the differential results may solely be attributable to the type of statistical method used.
The purpose of this study is to introduce Multivariate General Linear Models (MGLM) for the assessment of the joint effects of maternal and paternal ages on having children with ASD by analyzing data from the Jamaican Autism study. We apply MGLM to model maternal and paternal ages as a vector of outcome variables. We also use two other statistical methods: 1) conditional logistic regression (CLR) using the case status as the dependent variable and maternal and paternal ages as independent variables; and 2) a CLR model using the case status as the dependent variable with the two decomposed variables, U1 and U2, introduced by King et al. (2009) as independent variables. We compare the findings from these three different methods and discuss the advantages and disadvantages of each.
The Jamaican Autism study is an NIH-supported age- and sex-matched case–control study that began enrollment in December 2009, investigating whether environmental exposures to mercury, lead, arsenic, manganese, and cadmium have a role in autism. Based on the available data, we investigated factors associated with ASD, including maternal and paternal age at the time of the children’s birth. Children listed in the University of the West Indies’ (UWI) Jamaica Autism Database, who were previously identified as having ASD based on the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV-TR) (American Psychiatric Association 2000) criteria, and the Childhood Autism Rating Scale (CARS) (Schopler et al. 1980), were invited to participate for reassessment of their ASD status. The Autism Diagnostic Observation Schedule (ADOS) (Lord et al. 2002) and the Autism Diagnostic Interview-Revised (ADI-R) (Rutter et al. 2003b) were administered by a trained clinician to these children and their parents/guardians, respectively, to confirm the diagnosis of an ASD for the purposes of this research. The inclusion criteria for all children in the study were that each child must be born in Jamaica and be between 2–8 years of age at the time of enrollment. For ascertainment of ASD status, we used standard algorithms developed for scoring ADOS (Lord et al. 2000) and ADI-R and established cut-off points (Lord et al. 1997). Each ASD case was confirmed based on both ADI-R and all three domains in ADOS. For each case, an age- and sex-matched control was identified from schools and well child clinics. The criteria for matching required that the age of control children be within 6 months of their matched cases. The Lifetime form of the Social Communication Questionnaire (SCQ) (Rutter et al. 2003a) was administered to the parents/guardians of control children to rule out symptoms of ASD. We set the criteria for including children in the control groups as having a SCQ score of 0–6. This cut-off point of 6 is one standard deviation above the mean SCQ score of typically developing school children (Mulligan et al. 2009).
We also administered a pre-tested questionnaire to the parents/guardians of both cases and controls to collect demographic and socioeconomic information including parental levels of education at the time of the children’s birth, pregnancy history of the mothers (e.g., parity and gestational age of the child), and potential exposure to heavy metals through food or occupation of the parents. At the end of each interview, we collected 5 mL of whole blood, 2 mL of saliva (parents and children), and hair samples (only from children) to be analyzed for a variety of environmental and genetic exposure variables. Results related to these biological samples will be reported separately at a later time. All participating parents provided written informed consent. In addition, this study was approved by the Institutional Review Boards of the University of Texas Health Science Center at Houston and the University of the West Indies. The data presented here represent an interim analysis of 68 matched case–control pairs. As shown below, the available data provide sufficient power to detect meaningful effect sizes for continuous variables, such as parental age.
Prior to inferential statistical analysis, we describe the basic demographic and socioeconomic characteristics of the cases and their matched controls. To assess the association between maternal and paternal age and ASD in their children, we use two different statistical methods. First, we use conditional logistic regression, a standard method for the analysis of data from matched case–control studies (Breslow and Day 1980). In order to minimize the effect of multicollinearity due to a very high correlation between the maternal and paternal ages, we also analyze the data following the decomposition strategy proposed by King et al. (2009) (King et al. 2009). Specifically, we adopt the two variables, U1 = sum of paternal and maternal ages and U2 = the difference between paternal and maternal ages, as stated by King et al. (2009). The purpose of this decomposition is to reduce the correlation (dependency) between the two decomposed variables, (U1 and U2), as compared with the correlation between the paternal and maternal ages. These two decomposed variables are included as independent variables in a conditional logistic regression model instead of the paternal and maternal age. The regression coefficients of the decomposed variables are transformed to estimate the effects of paternal and maternal ages on ASD in children as described by King et al. (2009). We implemented a Multivariate General Linear Model (Muller and Stewart 2006; Johnson and Wichern 2007) to directly model the joint distribution of variables (paternal age, maternal age) which also takes into account the correlation between these two variables. In this analysis, we include case–control status as an independent variable, outcome variables (paternal age, maternal age) as a vector of dependent variables, a random effect term, and a set of 67 dummy variables representing the 68 pairs of cases and controls in the model to account for matched pairs. This multivariate approach allows for adjustments by other covariates including parity (Opara and Zaidi 2007; Beebe 2005; Creinin and Simhan 2009), and gestational age (MacKay et al. 2010), and parental levels of education (Croen et al. 2007). Since the levels of education obtained by parents are significantly correlated, we created a binary variable indicating whether both parents had education up to high school or at least one of the parents obtained education beyond high school to minimize the potential effects of multicollinearity due to the parental levels of education.
E(Y1i|X1i,X2i, X3i, X4i, X5ij) stands for mean paternal age given the X1i,X2i, X3i, X4i, X5ij;
E(Y2i|X1i,X2i, X3i, X4i, X5ij) stands for mean maternal age given the X1i,X2i, X3i, X4i, X5ij;
β10, β11, β12, β13, β14, β15j are regression coefficients associated with paternal age;
β20, β21, β22, β23, β24, β25j are regression coefficients associated with maternal age.
All statistical analyses were performed at 5% level of significance using SAS Version 9.2 (SAS Institute Inc. 2008). Based on the available 68 pairs of children, we have at least 80% power to detect moderate effect sizes (effect size greater than or equal to 0.35 standard deviations) between cases and controls at a 5% level of significance. The multivariate analysis will have even greater power because parental age will be analyzed jointly.
Sociodemographic characteristics of children and their parents by ASD case status
Case (n = 68)
Control (n = 68)
Age of childa (months)
Age < 48
48 ≤ Age ≤ 71
Age ≥ 72
Number of children in the household (age ≤ 18 years)b
Education of motherc (at child’s birth)
Education of fatherd (at child’s birth)
Gestational agef (weeks)
Number of siblings
Number of half-siblings
Number of half-siblings
Joint distribution of birth order and parity in children with ASD, n = 68
Birth order of children with ASDa
Comparison of associations between parental age and ASD in children based on two different regression models, Univariate General Linear Model (GLM) and Conditional Logistic Regression (CLR) using 68 matched pairs
Case (n = 68)
Control (n = 68)
(95% CI) univariate GLM
Matched odds ratio
(95% CI) multivariable CLR
Maternal age (years)a
4.47 (2.13, 6.81)
1.10 (1.04, 1.16)
1.23 (1.09, 1.40)
Paternal age (years)b
4.06 (1.33, 6.80)
1.07 (1.01, 1.12)
1.06 (0.97, 1.16)
1.18 (0.72, 1.64)
0.45 (0.29, 0.70)
0.20 (0.07, 0.56)
Gestational age (weeks)d
0.17 (−0.94, 0.60)
1.00 (0.90, 1.15)
1.05 (0.76, 1.44)
2.67 (0.76, 4.60)
1.12 (1.02, 1.24)
2.85 (1.09, 4.61)
1.13 (1.04, 1.22)
Head circumference (cm)g
0.82 (0.19, 1.43)
1.31 (1.05, 1.63)
Comparison between MGLM method and Decomposition method (King et al.2009) for assessment of associations between parental age and ASD in children based on 68 matched pairs
Mean difference (95% CI)
Matched odds ratio (95% CI)
Mean difference (95% CI)
Matched odds ratio (95% CI)
Maternal age (years)
4.8 (2.5, 7.2)
1.050 (1.010, 1.092)
6.5 (4.0, 8.9)
1.165 (1.039, 1.307)
Paternal age (years)
4.0 (1.2, 6.8)
1.010 (0.977, 1.041)
5.9 (2.6, 9.1)
1.090 (0.980, 1.212)
Associations between parental age and case status using MGLM based on 68 matched pairs
Paternal mean age difference
Maternal mean age difference
Model 1: Case status, pairsb
4.0 (1.2, 6.8)
4.8 (2.5, 7.2)
Model 2: Case status, pairs, and parity
5.8 (2.6, 8.9)
7.5 (5.1, 9.9)
Model 3: Case status, pairs, and gestational age
4.0 (1.1, 6.8)
4.7 (2.3, 7.0)
Model 4: Case status pairs, parental levels of education
4.6 (1.6, 7.6)
4.2 (1.5, 6.8)
Model 5: Case status pairs, parity, gestational age
5.6 (2.3, 8.8)
7.3 (4.9, 9.7)
Model 6: Case status, pairs, parity, parental levels of education
6.1 (2.9, 9.3)
6.6 (4.2, 9.0)
Model 7: Case status, pairs, gestational age, parental levels of education
4.4 (1.4, 7.4)
4.0 (1.4, 6.7)
Model FINAL: Case status, pairs, gestational age, parity, parental levels of education
5.9 (2.6, 9.1)
6.5 (4.0, 8.9)
In our Jamaican study, using the MGLM approach, we found that parental ages are jointly associated with having a child with ASD. Since the MGLM approach treats parental ages as a vector of outcome variables, the reported results include unadjusted and adjusted estimates for the mean paternal and maternal ages by the case status. As compared with parents of controls, mean paternal and maternal ages of the cases were 4.0 and 4.8 years higher respectively. However, when adjusted for parity, parental levels of education, and gestational age, the difference in mean paternal and maternal ages between cases and controls increased to 5.9 and 6.5 years, respectively. Other case–control studies have reported significant mean age differences between cases and controls, ranging from 1–2 years for maternal and paternal age (Glasson et al. 2004; Durkin et al. 2008; Golding et al. (2010); Tsai and Stewart 1983; Mouridsen et al. 1993; Reichenberg et al. 2006; Sasanfar et al. 2010). Another study has reported a significant median age difference of 3 years between cases and controls for maternal and paternal age (Shelton et al. 2010). Our results confirm previous findings of the role of maternal and paternal age on increased likelihood of having children with ASD.
Since this is a case–control study and we do not have all the biological data for parents, we are unable to comment regarding the causality and pathways that link ASD in children to older paternal and maternal age. However, other investigators (Fraga and Esteller 2007; Bilder et al. 2009; Shelton et al. 2010) have provided various thoughts related to this matter. For example, Shelton et al. (2010) stated, “Although poor birth outcomes have been associated with advanced maternal and paternal age, the specific mechanisms are not well understood. Genetic, epigenetic, immunologic, endocrine, environmental, and other factors may underlie the increased risks for ASD associated with maternal and paternal age” (Shelton et al. 2010). Similar to our study, almost all previous studies reported a significant correlation between the age of the parents, such as 0.74 (Croen et al. 2007) and 0.80 (Saha et al. 2009b). In our study, the observed correlation between paternal and maternal age was 0.57. Multiple studies have used regression techniques (e.g., logistic regression) and have acknowledged that the high correlation between the ages of the parents has the potential to cause multicollinearity, thereby giving inconsistent results (Reichenberg et al. 2006; Zhang et al. 2010; Shelton et al. 2010; Croen et al. 2007; King et al. 2009). Various techniques were therefore instituted to reduce the effect of multicollinearity. For example, categorizing the age variables for both parents may lead to a lower correlation between these variables. However, it is well documented that categorization of variables often results in loss of information and a different set of predictors in a final logistic regression model (Schellingerhout et al. 2009). In addition, categorization of variables in nonhomogeneous groups may result in reversal of an association (King et al. 2009). Furthermore, categorization strategy is usually encountered with the concerns regarding arbitrary choices of the cut points for age categories (Durkin et al. 2008; Grether et al. 2009; Reichenberg et al. 2006). When assessing the role of parental ages about having a child with ASD, two important methods have been utilized to reduce the effect of multicollinearity caused by the significant correlation between maternal and paternal age. These include the decomposition strategy proposed by King et al. (2009) and the restriction method introduced by Shelton et al. (Shelton et al. 2010). Since the sample size in this study is relatively small, the restriction method may not be effective because it involves stratification by age categories. Therefore, we have replicated the decomposition strategy to compare our method with that of King et al. (2009), controlling for the same covariates as in the final model of the MGLM.
Our findings from the MGLM approach indicate a significant association between parental age (paternal and maternal ages jointly) and having a child with ASD. In addition, our results indicate that paternal age may have a weaker association with having a child with ASD than maternal age. The King et al. (2009) decomposition method (King et al. 2009), however, results in a significant association between maternal age and having a child with ASD, but not for paternal age. The reason for this discrepancy is due to the statistical method used and we will further elaborate on these differences.
Since the correlation between maternal and paternal ages in our data is considered high (r = 0.57), it is very likely that the results obtained from the logistic regression, in which maternal and paternal ages were entered as continuous independent variables, are affected by multicollinearity. In fact, in the univariable analyses we observed significant associations between ASD status in children and maternal and paternal ages when evaluated individually in the CLR. However, when both paternal and maternal ages were evaluated jointly as independent variables in the CLR, only then did the maternal age maintain its statistical significance and the adjusted p-value for paternal age became no longer significant. While it appears that our findings from the MGLM approach are in conflict with those of the CLR, a correct interpretation of this non-significant adjusted p-value is that in the presence of the information provided by the maternal age, paternal age does not add significant value to the prediction of the ASD status. On the other hand, the King et al. (2009) method does not fully eliminate the effect of multicollinearity potentially caused by the decomposed variables U1 and U2. Though the decomposition strategy is successful in reducing the correlation between maternal and paternal ages from 0.57 to 0.29 between the two decomposed variables U1 and U2, some residual effect of multicollinearity may still affect the estimates obtained using the King et al. (2009) method (King et al. 2009). Because our method is based on simultaneous modeling of the mean age of parents as a vector of dependent variables (outcome variables), its advantage is that the parameter estimates are not influenced by multicollinearity between the parental ages at all. However, even with our new approach, one needs to be careful with potential multicollinearity due to other highly correlated independent variables. For example, to control for parental levels of education in our final MGLM models we did not enter both maternal and paternal levels of education as separate independent variables. Instead, we created a binary variable indicating whether both parents had education up to high school or at least one of the parents had education beyond high school. In all final multivariable models, we have controlled for parental levels of education (binary variable) in addition to parity and gestational age.
The King et al. (2009) method includes the two decomposed variables U1 and U2 as continuous independent variables in the logistic regression model. As highlighted by Durkin et al. (2010) the continuous parameterization proposed by King et al. (2009) assumes that a fixed incremental change in U1 and U2 beginning anywhere in the parental age range leads to the same percent change in log odds of ASD risk but this assumption may not be satisfied. Also, the CLR with parental ages as continuous independent variables in the model makes similar assumptions. Because our MGLM method considers age of parents as a vector of dependent variables, it does not rely on meeting such restrictive incremental effect assumptions except those mentioned earlier. Both King et al. (2010) and Durkin et al. (2010) agree about the need to be sensitive to model specification and assumptions.
While a high correlation between the maternal and paternal ages could cause multicollinearity in the CLR models, the MGLM models could benefit from a high correlation between maternal and paternal age. This is because the MGLM approach utilizes information about the joint distribution of maternal and paternal ages which results in more stable estimates of the variance (or standard error) for the estimated regression parameters, resulting in a more reliable p-value. The MGLM capability of the simultaneous testing of the effects of parental ages not only helps to minimize the likelihood of making Type I error, but also improves the statistical efficiency due to high correlation between the ages of parents. As a result of this comparison, we believe our MGLM approach is more effective in reducing the effect of multicollinearity caused by the high correlation between parental ages, is not subject to assumptions of continuous parameterization of parental ages as independent variables that it is unlikely to satisfy, and is more efficient from a statistical perspective. Therefore, we believe that the results obtained from the MGLM approach are more reliable.
There are some limitations in our study. We have reported a significant association between parental age and ASD in children but an important factor which could potentially be a significant confounder is birth order. One of the limitations of this study is that information about birth order was only collected for children with ASD. Since birth order is highly correlated with parity among ASD cases, we are confident that the effect of confounding by birth order is accounted through the adjustment for parity. Some other studies have explicitly tried to control for stoppage effect (Hultman et al. (2010) by studying only later-born children in families who have previously had similarly affected offspring. Stoppage is caused due to the fact that some parents may decide not to have any more children after their first child is diagnosed with ASD.(Jones and Szatmari 1988; Selkirk et al. 2009) For example, Zhao et al. (2007) studied the third-born children with ASD for this reason. Considering the limited sample size and having only 16 ASD cases with parity of 3 or higher, it is not feasible to assess this effect in the current study.
One of the limitations of using MGLM to assess association between parental ages and ASD is that the MGLM cannot provide estimates for the relative risk of ASD in offspring due to a one unit increase in maternal and paternal ages. This is because the MGLM approach treats parental ages as a vector of outcome variables and produces adjusted estimates for the mean paternal and maternal ages by the case status. Therefore, the comparison between cases and controls can be made based on the adjusted mean age difference. By contrast, the CLR method provides estimates for the odds ratios that could be used to approximate the ASD relative risk due to one year increase in maternal or paternal ages. Another limitation is that despite our effort to match on age, we observed a slight difference (0.76 months) between the mean ages of cases and controls due to the matching criterion that specified that the age of the controls be within 6 months of their respective matched cases. However, we do not expect this small difference in age between cases and controls to have any major influence on parental age differences between cases and controls.
This study is also limited by the fact that our data cannot as yet provide information about the mechanisms by which older maternal and paternal age might be linked to the risk of having a child with an ASD. Since these findings are based on a case–control study and the available information is limited, we are unable to speculate about the specific causality and pathways that could be involved. However, we are collecting additional information regarding genetics and parental environmental exposures to heavy metals including lead, mercury, arsenic, manganese, and cadmium, and may have an opportunity to explore the nature of these associations in our future studies.
Further, since the Jamaican Autism study enrolls participants from a relatively homogeneous population of children and parents (not only ethnically but also genetically homogeneous), it is possible that our findings about the association of parental age with having children with ASD may not be generalizable to other populations. However, the existence of reported similar findings in other populations would argue against this possibility.
Our interim analysis of 68 matched pairs suggests that maternal and paternal age are jointly associated with ASD in offspring. In addition, the association between the maternal age and having an ASD child seems to be stronger than the association between the paternal age and ASD in offspring. Although the mechanisms underlying this association are unknown, we nonetheless believe that these findings represent an advance in our understanding of population risks for ASD. Because of the statistical approach used, our results help to clarify what is known about parental ages as risk factors for the birth of a child with an ASD. Further studies should examine other variables that may moderate the effect of parental age on the risk for ASD.
This research is co-funded by the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) and the National Institutes of Health Fogarty International Center (NIH-FIC) by a grant [R21HD057808] awarded to the University of Texas Health Science Center at Houston (UTHealth). We also acknowledge the support provided by the Biostatistics/Epidemiology/Research Design (BERD) component of the Center for Clinical and Translational Sciences (CCTS) for this project. CCTS is mainly funded by the NIH Centers for Translational Science Award (NIH CTSA) grant (UL1 RR024148), awarded to the University of Texas Health Science Center at Houston in 2006 by the National Center for Research Resources (NCRR). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NICHD or the NIH-FIC or the NCRR.