Introduction

Genome-wide association studies (GWAS) using single SNP analysis have been very successful in identifying loci for various quantitative traits and diseases (Manolio et al. 2008). It became apparent that complex traits are usually determined by many genes with small effects and that results from single SNP analysis provide limited biological insight and only partly explain the genotypic variation of the studied trait. Instead of analyzing single SNPs, the combined effect of a SNP set, grouped per pathway or gene region, can be tested for association with the trait of interest. Such SNP set analysis could be used as an alternative approach for GWAS analysis and, since the composition of SNP sets is often based on pathways, should be able to provide additional biological insight of the studied trait.

Since the amount of tests in SNP set analysis is low compared to single SNP analysis, it requires a lower penalty for multiple testing. Therefore, SNP set analysis is also very suitable in studies with low power for GWAS analysis. The last couple of years, several methods have been developed to perform SNP set analysis on GWAS data (Wang et al. 2010; Fridley and Biernacka 2011; Holmans 2010). There are two main types of methods, the competitive and the self-contained tests. The competitive tests compare the association between a SNP set and trait to a standard defined by the genotyped SNPs outside the SNP set (complement), while the self-contained tests compare the SNP set to a fixed standard that does not depend on the complement (Goeman and Buhlmann 2007).

Human longevity is a complex trait that is assumed to be determined by variation in many genes with small effects. Previous GWA studies, in which single SNP analyses were performed (Newman et al. 2010; Deelen et al. 2011), have identified only one genome-wide significant locus contributing to survival into old age; APOE. However, the genetic contribution to human lifespan variation, determined in twin studies, is estimated at 25–30% (Gudmundsson et al. 2000; Hjelmborg et al. 2006; Skytthe et al. 2003) and, although the effect of genetic variation in APOE is relatively large, the heritability of longevity is only partially explained by this variation (Deelen et al. 2011). Part of the remaining heritability might be explained by functionally related SNPs with small effects, of which the joint effect could not be detected in a single SNP analysis. Testing of SNP sets of candidate pathways for association with longevity would therefore be valuable.

The insulin/insulin-like growth factor (IGF-1) signaling (IIS) pathway is considered as a candidate pathway for studying human longevity. It is involved in the adaptation of the organism to its (changing) environment (Tatar et al. 2003). When experimentally induced in model organisms like worms, flies, and mice, mutations in genes that play a role in IIS, e.g., homologues of human IGF1R, INSR, IRS1, PI3K, and FOXO, were shown to have a considerable effect on lifespan (Kenyon et al. 1993; Kimura et al. 1997; Tatar et al. 2001; Holzenberger et al. 2003; Bluher et al. 2003; Morris et al. 1996; Friedman and Johnson 1988; Clancy et al. 2001; Hwangbo et al. 2004; Ogg et al. 1997; Lin et al. 1997; Giannakou et al. 2004; Selman et al. 2011). Although the IIS pathway is evolutionarily conserved, the complexity of the human IIS pathway (Fig. 1) is much larger compared to that of model organisms. Several studies have investigated associations between single SNPs in genes from the IIS pathway and human longevity. The most prominent results came from FOXO3 (Willcox et al. 2008; Flachsbart et al. 2009; Anselmi et al. 2009; Pawlikowska et al. 2009; Li et al. 2009; Soerensen et al. 2010) and AKT1 (Pawlikowska et al. 2009), which showed associations with longevity in several independent cohort studies.

Fig. 1
figure 1

Insulin/IGF-1 signaling pathway. The insulin/IGF-1 signaling pathway consists of the core components IGF1R/IR/IRR, IRS, PI3K, AKT/SGK, FOXO and SIRT, and proteins that have a direct activating or inhibiting effect on these proteins. The small closed circles (containing Ac, P, or Ub) indicate an activating effect of the posttranslational modification on the protein, while the small dashed circles indicate an inhibiting effect. The straight arrows pointing to these small circles indicate an activating effect on the posttranslational modification, while the dashed arrows indicate an inhibiting effect. Ac acetylation, P phosphorylation, Ub ubiquitylation

Another candidate pathway for studying human longevity is the mechanism of telomere maintenance (TM). Telomeres are structures at the end of chromosomes, consisting of TTAGGG tandem repeats (Moyzis et al. 1988), which protect chromosomes from degradation or rearrangement (Blackburn 1991). In normal human cells, telomere length declines with every cell division (Harley et al. 1990), and when a critical length is reached, the cell will enter replicative senescence (Allsopp 1996). In human epidemiological studies in blood, increased telomere length has been associated with longevity (Atzmon et al. 2010), while decreased telomere length has been associated with increased mortality (Cawthon et al. 2003; Bakaysa et al. 2007; Kimura et al. 2008), although some studies showed contradictory results (Martin-Ruiz et al. 2005; Bischoff et al. 2006). Telomere integrity is essentially regulated by two protein networks, telomerase and its associated factors, which regulate telomere length, and the shelterin complex, which covers the telomeres (de Lange 2005; Collins and Mitchell 2002) (Fig. 2). Several studies have investigated associations between single SNPs in telomerase and shelterin genes and telomere length. The most promising results came from TERC and TERT (Atzmon et al. 2010; Codd et al. 2010; Levy et al. 2010; Mirabello et al. 2010; Rafnar et al. 2009), of which the latter has also been associated with human longevity (Atzmon et al. 2010).

Fig. 2
figure 2

Telomere maintenance pathway. The telomere maintenance pathway consists of proteins belonging to telomerase and its associated factors or to the shelterin complex. Telomere elongation is performed by telomerase after binding to the telomere (a). However, binding of the shelterin protein POT1 to the telomere blocks this process (b)

In this study, we used four self-contained tests (PLINK set-based test, Purcell et al. 2007; GRASS, Chen et al. 2010; Global test, Goeman et al. 2004; and SNP ratio test, O'Dushlaine et al. 2009) and one competitive test (the comparative approach of Global test) to study the joint effect of genetic variation in the IIS and TM pathways on human longevity. For the analyses, we used genotyped GWAS data of nonagenarian siblings from the Leiden Longevity Study (LLS) and younger population controls from the Rotterdam Study (RS) (Deelen et al. 2011).

Materials and methods

Study populations

Leiden longevity study

For the LLS, long-lived siblings of European descent were recruited together with their offspring and the partners of the offspring. Families were included if at least two long-lived siblings were alive and fulfilled the age criterion of 89 years or older for males and 91 years or older for females, representing less than 0.5% of the Dutch population in 2001 (Schoenmaker et al. 2006). In total, 944 long-lived proband siblings were included with a mean age of 94 years (range, 89–104), 1,671 offspring (61 years, 39–81), and 744 partners (60 years, 36–79). DNA from the LLS was extracted from samples at baseline using conventional methods (Beekman et al. 2006). For the GWAS, 403 unrelated LLS siblings (one sibling from each sibling pair) were included (LLS GWAS cases) (Deelen et al. 2011).

Rotterdam study

The RS is a prospective population-based study of people aged 55 years and older, which was designed to study neurological, cardiovascular, locomotor, and ophthalmological diseases (Teichert et al. 2009). The study consists of 7,983 participants from the baseline cohort (RS-I) and 3,011 participants from an independent extended cohort formed in 1999 (RS-II) from which DNA was isolated between 1990 and 1993 (RS-I) or between 2000 and 2001 (RS-II). For the GWAS, 1,731 participants from the combined cohort who were below 60 years of age and for whom GWAS data were available were included as controls (LLS GWAS controls) (Deelen et al. 2011).

Population substructure

Multidimensional scaling analysis in PLINK (http://pngu.mgh.harvard.edu/purcell/plink, Purcell et al. 2007) showed that there was no substructure in the GWAS data to an extent that would affect the observations (Deelen et al. 2011).

Genotyping and SNP selection

For the SNP set analyses, we used the genotype data from the GWAS described by Deelen et al. (2011). The LLS GWAS cases were genotyped using Illumina Infinium HD Human660W-Quad BeadChips (Illumina, San Diego, CA, USA). The RS GWAS controls were genotyped using Illumina Infinium II HumanHap 550K Beadchips and Illumina Infinium II HumanHap550-Duo BeadChips (Illumina), respectively (Teichert et al. 2009). Of the 551,606 SNPs measured in both the LLS GWAS cases and RS GWAS controls, 516,712 SNPs passed quality control using the following criteria: SNP call rate ≥0.95 or MAF ≥0.01 in RS GWAS controls and LLS GWAS cases, PHWE ≥10−4 and no between-chip effect in the RS GWAS controls, and good cluster plots in the LLS GWAS cases and RS GWAS controls if \( P < 1 \times 10^{{ - 14}} \) (Deelen et al. 2011).

We analyzed SNPs within a 10-kb window around genes encoding proteins that belonged to the IIS (Fig. 1) and TM pathway (Fig. 2). A gene was defined as an NCBI Entrez Gene (mRNA or RNA) cluster, corresponding to a set of transcripts (RefSeq) for which the alignments can be obtained from the UCSC genome browser (http://genome.ucsc.edu/), in which all transcripts within a cluster agree on strand and overlap. Due to an overlap of the 10-kb windows around IGF2 and INS, two SNPs, rs4320932 and rs7924316, were assigned to both genes.

Statistical analysis

PLINK set-based test

In the PLINK set-based test (–set-test, http://pngu.mgh.harvard.edu/purcell/plink; Purcell et al. 2007), a single SNP analysis (in our case, a trend test) of the original pathway or gene SNP set is performed. For each SNP set, a mean SNP statistic is calculated from the single SNP statistics of a maximum amount (–set-max) of independent SNPs below a certain P value threshold (–set-p). If SNPs are not independent, i.e., in case linkage disequilibrium (r 2) is above a certain threshold (–set-r2), the SNP with the lowest P value in the single SNP analysis is selected. The same analysis is performed with a certain amount (–mperm) of simulated SNP sets in which the phenotype status of the individuals is permuted. An empirical P value for the SNP set is computed by calculating the number of times the test statistic of the simulated SNP sets exceeds that of the original SNP set. For the analysis in this study, the parameters were set to –set-p 0.05 –set-r2 0.5, –set-max 99999, and –mperm 10,000.

GRASS

GRASS (http://linchen.fhcrc.org/grass.html; Chen et al. 2010) calculates “eigenSNPs” for each gene in the pathway SNP set by summarizing the variation of a gene using principal component analysis. Subsequently, one or more of these “eigenSNPs” per gene are selected using regularized logistic regression to calculate a test statistic for each pathway SNP set. The same analysis is performed with simulated SNP sets in which the phenotype status of the individuals is permuted. The P value per pathway SNP set is calculated by comparing the test statistic of the original pathway SNP set with that of the combined simulated pathway SNP sets. For the analysis in this study, the amount of simulated pathway SNP sets was 10,000.

Global test

In this study, we used a modified version of the Global test (http://www.bioconductor.org/help/bioc-views/release/bioc/html/globaltest.html; Goeman et al. 2004), which is capable and powerful for analyzing GWAS data (Chapman and Whittaker 2008; Pan 2009). This test is based on a multiple logistic regression model that uses the phenotype as the response variable and the SNPs in the SNP set as covariates and which automatically takes the correlations between SNPs into account. The null hypothesis is tested that none of the SNPs in the SNP set are associated with the phenotype. P values are calculated using a permutation test based on 10,000 permutations.

For the comparative approach, 10,000 random SNP sets per pathway SNP set were generated and tested to determine the chance to find a similar-sized SNP set with a comparable or lower P value as compared to the original pathway SNP set.

SNP ratio test

The SNP ratio test (http://sourceforge.net/projects/snpratiotest/; O'Dushlaine et al. 2009) performs a single SNP analysis (in our case, a trend test) of the original pathway or gene SNP set and of similar-sized SNP sets in which the phenotype status of the individuals is permuted. An empirical P value of the SNP set is computed by calculating the ratio between the proportion of SNPs that shows an association below a certain P value threshold (p) in the original GWAS dataset and in the simulated GWAS datasets. The amount of significant SNPs in the simulated GWAS datasets is defined as the top n SNPs with the lowest P values, where n is the amount of SNPs with an association below p in the original GWAS dataset. For the analysis in this study, we made use of the scripts described in “SRT_documentation_090310.pdf” (http://sourceforge.net/projects/snpratiotest/). For the analysis in this study, p was set to 0.05, and the amount of simulated datasets used was 10,000.

Statistical significance

To adjust for multiple testing, the significance level was set at the Bonferroni-corrected nominal P value (which is 0.05/(number of pathway or gene SNP sets tested)).

Results

For the IIS pathway, we selected genes encoding proteins that belong to the well-described core of the pathway, consisting of IGF1R/IR/IRR, IRS, PI3K, AKT/SGK, FOXO, and SIRT, or that had a direct activating or inhibiting effect on these core components (van der Horst and Burgering 2007; Taniguchi et al. 2006). In addition, we selected several FOXO target genes that play a role in cell-cycle inhibition, oxidative-stress resistance, metabolism, and apoptosis (van der Horst and Burgering 2007) (Fig. 1). For the TM pathway, we selected genes encoding proteins that were specifically associated with telomeres and belonged to telomerase and its associated factors or to the shelterin complex (Vulliamy et al. 2008; Harrington et al. 1997; de Lange 2005) (Fig. 2). We analyzed SNPs within a 10-kb window around the selected genes (based on Pawlikowska et al. 2009) from genotyped GWAS data of 403 unrelated nonagenarian participants from the LLS and 1,670 middle-aged controls from the RS (Deelen et al. 2011). A description of the investigated samples is given in Table S1. In total, 1,021 SNPs in 68 IIS pathway genes and 88 SNPs in 13 TM pathway genes were analyzed (Tables 1, 2, S3A, and S3B).

Table 1 Characteristics of the insulin/IGF-1 signaling pathway proteins
Table 2 Characteristics of the telomere maintenance pathway proteins

Four methods, PLINK set-based test, Global test, GRASS, and SNP ratio test (Table S2), were used to investigate the association of the SNP sets from the IIS and TM pathways with longevity. As a biological negative control, we also analyzed a SNP set of 223 SNPs in 9 genes previously associated with eye and hair color (Eriksson et al. 2010) (Tables 3 and S3C). Both candidate pathways were consistently associated with longevity across all four tests (Table 4). We applied Bonferroni correction to adjust for the number of tested pathways (i.e., 2, so for significance P < 0.025). After Bonferroni correction, the IIS pathway SNP set remained significant in GRASS and Global test, while the TM pathway SNP set remained significant in the PLINK set-based test, GRASS, and Global test. Using the comparative approach in Global test as a competitive test, we also showed that the probability to find a random SNP set with the same amount of genes as the IIS or TM pathway and a comparable or lower P value is less than 5% (2.11% for the IIS and 2.95% for the TM pathway).

Table 3 Characteristics of the eye and hair color pathway proteins
Table 4 Results of gene set analysis of insulin/IGF-1 signaling, telomere maintenance, and eye and hair color pathway SNP sets

To determine which genes are mainly responsible for the observed association of the pathway SNP sets from the IIS and TM pathways with longevity, we also investigated the association of gene SNP sets from these pathways. Although the power to detect an association using gene SNP set analysis is lower than for pathway SNP set analysis, due to the larger amount of tests, it provides a ranking of genes based on the contribution to the observed associations of the pathways. To analyze the gene SNP sets, we used the PLINK set-based test, Global test, and SNP ratio test. GRASS was not used, since the underlying statistical method of this test is less suitable for analysis of gene SNP sets. Nine of the 68 IIS pathway gene SNP sets (AKT1, AKT3, FOXO4, IGF2, INS, PIK3CA, SGK1, SGK2, and YWHAG) and 1 of the 13 TM pathway gene SNP sets (POT1) showed an association (P < 0.05) with longevity in at least two tests (Tables 5 and 6).

Table 5 Results of gene set analysis of insulin/IGF-1 signaling pathway gene SNP sets
Table 6 Results of gene set analysis of telomere maintenance pathway gene SNP sets

Discussion

To study the effect of the IIS and TM pathways on longevity, SNP set analysis on GWAS data of 403 nonagenarian cases and 1,670 population controls was performed. Both pathway SNP sets associated significantly with longevity. The gene SNP sets analysis showed that the association of the IIS pathway was scattered over several genes (AKT1, AKT3, FOXO4, IGF2, INS, PIK3CA, SGK1, SGK2, and YWHAG), while the association of the TM pathway seems to be mainly determined by one gene (POT1).

The proteins encoded by the IIS gene SNP sets that associate with longevity are involved in several parts of the IIS pathway (Fig. 1). Akt1, Akt3, Foxo4, Igf2, Ins2, Pik3ca, and Sgk1 knockout mice all show abnormalities in growth and/or increased mortality (www.informatics.jax.org; Blake et al. 2011), which indicates that these genes are indeed responsible for the growth- and lifespan-regulating effects of the IIS pathway. Previously, SNPs in several of the significant IIS pathway genes (AKT1, FOXO4, INS, and PIK3CA) were studied by single SNP analysis, and only one SNP, rs3803304 in AKT1, which was not measured in our study, showed an association with longevity (Pawlikowska et al. 2009). However, gene set testing, which could have detected association of additional genes containing SNPs with many small effects, was not applied in that study. Most signaling cascades require cooperation of several genes in multiple branches of the cascade. This indicates that, for signaling pathways, mutations in different genes could result in similar downstream effects, which would explain the scattered association in the IIS pathway.

Although SNPs in FOXO3A have previously been associated with longevity in several independent studies (Willcox et al. 2008; Flachsbart et al. 2009; Anselmi et al. 2009; Pawlikowska et al. 2009; Li et al. 2009; Soerensen et al. 2010), the gene SNP set showed no effect in our study in the PLINK set-based test, Global test, and SNP ratio test (P = 0.181, P = 0.138, and P = 0.180, respectively) (Table 5). This might be due to the fact that the effects of FOXO3A on longevity are most prominent in centenarians. As was previously reported by Flachsbart et al., centenarians represent a highly selected phenotype even among nonagenarians (Flachsbart et al. 2009). In addition, the genetic contribution to longevity in general is increased at higher ages (Hjelmborg et al. 2006), and the small effects of longevity-promoting gene variants, relative to other factors, may be larger in centenarians (Perls et al. 2002) and not detectable in nonagenarians. The cases in our study, which are from long-lived families, have a mean age of 94 years, yet we had only 11 individuals >100 years, which may explain the absence of significance of the FOXO3A association in our population.

POT1 is part of the shelterin complex and is responsible for the binding of this complex to the TTAGGG repeats of telomeres. Binding of POT1 to the telomere leads to decreased elongation by telomerase (de Lange 2005). Reduction of POT1 in human fibroblasts by RNAi leads to induction of apoptosis, chromosomal instability, and senescence (Yang et al. 2005). The same effects are observed in Pot1b knockout mice (He et al. 2009; Hockemeyer et al. 2008). In addition, telomerase-deficient Pot1b knockout mice show a reduction in lifespan compared to “normal” telomerase-deficient mice (Hockemeyer et al. 2008), which stresses the importance of TM in lifespan regulation. Most protein complexes contain one or several proteins essential for specific functions of the complex, e.g., binding, transport, or activation/repression activity. This indicates that, for pathways containing a protein complex, mutations in a single gene, encoding such an essential protein, could be sufficient to alter the function of the complex, which would explain the single-gene association in the TM pathway.

There are two main kinds of pathway analyses, explorative and candidate based. Since we want to focus on two pathways, the IIS and TM pathways, we performed candidate-based pathway analysis. The advantage of testing candidate pathways instead of explorative testing is the decreased penalty for multiple testing, due to the limited amount of tests performed. For information about pathways, several databases are available, e.g., Gene Ontology (Ashburner et al. 2000) and Kyoto Encyclopedia of Genes and Genomes (KEGG) (Kanehisa and Goto 2000), which are particularly useful for explorative studies (Wang et al. 2010). However, to our knowledge, the IIS and TM pathways are not described in sufficient detail in these databases, and we therefore assembled these pathways based on literature. Although the IIS pathway is available in KEGG (hsa04910; insulin-signaling pathway), only four of the nine IIS pathway genes that were associated with longevity, AKT1, AKT3, INS, and PIK3CA, were part of this pathway, which indicates that the pathway definition used in this study could have had a large influence on the results of the analysis.

Different pathway tests could show contradictory results, even when analyzing the same GWAS data (Wang et al. 2010). These discrepancies are caused by differences in, for example, the underlying statistical methods of the tests. Therefore, we used several pathway tests in parallel for our analysis. Some of the available pathway tests require SNP P values as input data, while others require raw genotypes (Wang et al. 2010). Given that we have GWAS data available, we selected pathway tests that make use of raw genotypes. All four selected pathway tests are self-contained tests which deal with the complexity of SNP set testing by permuting the case-control status. While, the PLINK set-based test, Global test, and SNP ratio test do not completely incorporate LD information, GRASS employs PCA to deal with correlations within each gene. A simulation study showed that in general, GRASS was more powerful than the PLINK set-based test (Chen et al. 2010). Simulation studies for Global test or SNP ratio test are not yet available. However, despite the differences between the methods, they all showed similar results for the IIS and TM pathways in this study.

SNP set analysis could have power to detect significant association, even if the power to detect associations in single SNP analysis is low (Fridley and Biernacka 2011), as was previously shown in the Welcome Trust Case Control Consortium (Torkamani et al. 2008). Our study has a power <1% to detect single SNP associations of the tested SNPs with an OR of 1.2 and a minor allele frequency of 0.25 (the mean frequency of the tested SNPs). However, because the small (non-significant) effects of the SNPs are jointly tested, the pathway SNP set analysis is able to detect a significant association of the IIS and TM pathway. This indicates that SNP set analysis could be a useful approach for studies which showed no significant associations in single SNP analysis.

There is still much debate about the optimal size of the window used in SNP set analysis (Holmans 2010; Fridley and Biernacka 2011; Wang et al. 2010), and we choose a fixed window of 10 kb to take into account effects of SNPs in regulatory regions surrounding the genes. The same window was also used in a previous study of the IIS pathway (Pawlikowska et al. 2009). Although there is a chance that we will miss some functional SNPs, increasing the window would increase the chance that SNPs are included with no functional relationship to the tested gene.

The amount and diversity of SNPs measured per gene/pathway is highly variable between genotyping platforms used for GWAS. In addition, there is a large variety in allele frequencies and presence of SNPs between populations. For single SNP analysis, one is dependent on association of the same SNP (or a SNP in high LD) for replication. However, when due to varying allele frequencies, different SNPs associate in different populations, SNP set analysis determines the combined effect of SNPs within a gene and is able to overcome this problem. Therefore, replication of SNP set analysis is assumed to be more reproducible between genotyping platforms and populations (Luo et al. 2010; Wang et al. 2010). To support these assumptions, our findings should be replicated in other cohorts.

In conclusion, we have shown that genetic variation in genes involved in the IIS and TM pathways is associated with human longevity. In addition, we provide evidence that different self-contained tests show similar results when applied to candidate-based pathway analysis.