Background

Many traits in humans differ by sex and age, but analyses of gene expression typically do not include them as covariates [1]. It has been shown in simulation studies that incorporating appropriate covariates in linkage analysis improves power without compromising type I error [2]. We were interested to determine if there are loci that influence gene expression whose detection is conditional on inclusion or exclusion of age and/or sex from the analysis. In addition, since the data were from three-generation families, we determined to what extent the age effects are accounted for by generation effects.

Methods

Association between age and sex, and gene expression

We used the 3554 transcripts that were reported to show greater variation between than within 94 grandparents from CEPH (Centre d'Etude du Polymorphisme Humain) pedigrees [1]. Age data were obtained from http://www.coriell.org; they were not available for any member of pedigree 1454 and three individuals from three other families. We used generalized estimating equations (GEE) [3] in the computer program R to test whether the expressions of each gene differed by age and sex using family as a clustering variable and using an exchangeable correlation structure. In addition, indicators for generation were also added to the model. We calculated q-values to estimate the false-discovery rate for covariate effects [4].

Expression quantitative trait linkage analysis

Expression quantitative trait linkage (eQTL) analysis was performed using MERLIN-REGRESS v 1.0.1 with a bug fix for missing covariates [5, 6]. Mendelian inconsistencies between grandparents and children were removed. Marker allele frequencies for 2871 single-nucleotide polymorphisms (SNPs) were estimated from the data and single-point linkage analysis was used (10,203,534 tests). Variance-components linkage analysis, using MERLIN, was used to analyze the X chromosome because MERLIN-REGRESS cannot analyze X chromosome data. We present results using the following criteria for considering linkage results to be different between the analysis with and without covariates: linkage results with a LOD score >3 and either a 3 LOD unit increase or decrease in linkage when sex or age was included as a covariate. Linkage analysis including age and both age and generation were used to determine what effect including generation had on linkage results.

Results

Association between age, sex, generation and gene expression

Descriptive information regarding age and sex are provided in Table 1. From the GEE analysis of gene expression data, Figure 1A shows the distribution of the number of significant tests if different q-value thresholds are used for the models with age, sex, and generation. After adjustment for familial correlation there were 30 genes that showed significant differences by sex, compared to 1950 genes that were significantly associated with age (an FDR threshold of 0.01 was used). The respective p-values for test of significance were 0.000097 for sex and 0.023 for age. When generation was included in the model (Fig. 1B) only 6 of the age effects remained significant, while generation effects were significant for 277 and 862 genes, for the indicators for the grandparental (g1) and parental (g2) generations respectively (the number of genes with sex effects (29) remained similar). The p-values for this model were: sex = 8.1 × 10-5, age = 1.6 × 10-5, g1 = 0.0014, and g2 = 0.0053.

Figure 1
figure 1

Distribution of the number of gene expressions that are significant in the GEE model for the range of q -values. A, Covariates are sex and age; B, sex, age, and grandparental (gen1) and parental (gen2) generation.

Table 1 Descriptive information about age and sex

Expression quantitative trait linkage analysis

Figure 2 shows the LOD scores with and without sex and age, respectively. Note the greater change in LOD scores when age (Fig. 2B) was included as a covariate than when sex (Fig. 2A) was included. Specifically, 37 linkage signals disappear and 17 appear when sex was included. As expected, all of those traits with linkage results that changed when sex was included map to the sex chromosomes, and the loci showing changes in linkage were on the autosomes (Table 2). Of particular interest were the four genes that are on the X chromosome and for which linkage signals appear on the autosomes when sex is included. This suggests that autosomal loci influence the expression of some genes that escape X-inactivation. For the age analysis, 462 significant linkage results disappeared while 223 appeared when age was included in the analysis. Table 3 provides a list of the 10 top SNPs and traits that show evidence for linkage when age is either included or excluded as a covariate. When linkage analysis was performed with both age and generation as covariates, the change in the linkage results (Fig. 2C) was not as marked as when only age was included as a covariate (Fig. 2B). According to our criteria, only four linkage results disappeared when generation was added to age, and none appeared.

Figure 2
figure 2

Distribution of LOD scores when covariates are included in linkage analysis. Sex (A) or age (B) were included as covariates and are plotted on vertical axis, compared to no covariates on the horizontal axis. C, On the vertical axis age and generation were included as covariates, compared to age as a covariate on the horizontal axis. The solid line indicates symmetry, the dashed lines are at ±3 LOD scores from symmetry.

Table 2 Marked changes in linkage when sex was included as a covariate
Table 3 Marked changes in linkage when age was included as a covariate

Discussion

We found that the majority of genes show significant differences in expression by age, while only a subset show significant sex differences. There were more linkage signals that were no longer significant when sex or age were included as covariates than appeared as a consequence of inclusion of these covariates. When generation was also included in the linkage analysis with age, few linkage results changed. Limitations of our analysis include the fact that age data were missing for all individuals in one family and for three individuals in other families.

In addition, we took a blanket approach to all traits because performing detailed diagnostics for numerous traits is not straightforward. To investigate the potential risks of this approach, we examined the trait distributions for the linkage results that changed dramatically when either sex or age was included as covariate (Tables 2 and 3). As expected, many of the traits that had marked sex effects on linkage results were bimodally distributed, which may result in false-positive linkage results [7]. Interestingly, when we repeated linkage analyses for loci where age and sex had marked effects using the variance-components methodology (as opposed to MERLIN-REGRESS), there was little difference between inclusion and exclusion of the covariate, raising concern about the validity of the regression results.

Morley et al. [1] selected genes for linkage analysis based on greater variance between, compared to within, individuals. They performed this analysis on the grandparents: of those available in this data set, the mean age was 72 years (SD = 8, range = 61–92). However, the grandparents were extracted from linkage analysis. Only the traits of children, whose mean age was 18 years (SD = 8, ranges = 3–37) were used for the linkage analysis. Reasons for this may be related to limitation of available methods or a decision to attempt to reduce age effects. However, if the variance is not the same for grandparents and children, then such an approach may results in genes that are falsely included or excluded from the genetic linkage analysis. Furthermore, although in our analysis we used age, the age effects are mostly removed when generation was included. In such a situation age will be highly correlated with birth order within a sibship and therefore we cannot exclude that this has resulted in confounding.

We used a simple exchangeable covariance structure in our GEE analyses. This takes into account the family dependence to some extent but may not be the most appropriate covariance structure for the data. It would be interesting to investigate other covariance structures and assess the impact that they have on the overall findings.

Conclusion

Age, and to a lesser degree sex, influence gene expression in transformed B lymphocytes. Although including sex as a covariate did not result in many changes in the linkage results, when age was included the results changed more markedly-specifically there were fewer significant linkage results when age was included as a covariate. It appears that many of these age effects can be accounted for by generational differences in gene expression. Inclusion of covariates in quantitative trait linkage analysis may improve power and reduce false positives.