Introduction

Major depressive disorder (MDD) is a common [1] complex [2] psychiatric disorder with major health concerns worldwide [3, 4]. Heritability estimates of ~37% [5] implicate a partially genetic causation. This has motivated numerous linkage studies [6], association studies of candidate genes [7] and recently even a few genome-wide association studies (GWAS) [8]. However, convincing replication across independent studies is lacking, and no major MDD risk locus has been identified leaving much of the estimated heritability unexplained. This implies that MDD is genetically far more complex than previously expected, with influences from both allelic and locus heterogeneities in parallel with exposures to adverse social events and behavior, hampering success in identifying genetic risk factors [6, 7, 9] within this heterogeneous syndrome.

A recent mega-analysis shows that it is unlikely that MDD is caused by a few genetic risk factors with large effect [8]. The explained genetic variance is still low which opens up for the possibility that interaction effects from both genetic and environmental risk factors are important source for explaining the remaining heritability [1014]. The role of interactions (also known as epistasis or synergy) as a versatile mechanism for regulating health and disease has been proposed [1518], and has recently been suggested to increase risk for schizophrenia [19, 20] and bipolar disorder [21]. Increasing the sample size alone and using a single gene model will not reveal the complete genetic architecture of MDD. Detailed characterization of the interacting landscape will thus be a critical factor in understanding why people are at risk for this debilitating disorder [22, 23].

Numerous methods are available for detecting gene-gene interactions [2426], each of which has been designed with different assumptions of how biological interactions can be detected with statistical methods.

Historically, two statistical principles have evolved: one represents genetic model-based methods and one model-free method [27, 28]. The model-based approaches are designed to calculate interactions based on linear regression [29]. The model-free systems use different applications of data mining and machine learning approaches [30, 31]. It is not yet clear which model is best at identifying interacting factors involved in disease susceptibility.

For this study we stated two hypotheses. Our first hypothesis was that common genetic risk factors in a set of candidate genes, act through a variety of interactions in the susceptibility for MDD. Our second hypothesis was that no single available statistical method would completely capture all possible forms of interactions. We therefore chose three different methods for modeling potential genetic interactions and examined to what extent these different statistical methods report synergy effects with relevance for susceptibility to MDD.

To address these questions we designed an incremental approach (analytic strategy in Additional file 1: Figure S1) with a reduction of genetic heterogeneity using two cohorts, to systematically examine SNP-interaction effects using both genetic model-based and model-free methods. The former consists of one additive and one multiplicative interaction method. The model-free method was a multifactor dimensionality reduction (MDR) approach.

The objective was not to compare the statistical methods for detecting interaction but we evaluated the methods in those situations where the combinations of pairs of tested SNPs were identical. Both study populations we used in this study have been used for genome-wide association analyses of MDD [32, 33]. The candidate genes we used in this study have not been tested in a separate analysis previously.

Methods and materials

Study populations

The Sequenced Treatment Alternatives to Relive Depression (STAR*D) study [34] and the Genetic Association Information Network, GAIN major depressive disorder study [35] provided the two study populations. STAR*D subjects were outpatients from primary care or psychiatric clinics who met DSM-IV criteria for single or recurrent non-psychotic MDD, age 18–65 years, with no history of bipolar disorder, schizophrenia, schizoaffective disorder, or any other psychotic disorder. The STAR*D sample comprised of 634 controls matched with 1,256 cases. GAIN cases were recruited from four different longitudinal studies of patients with MDD in the Netherlands [32, 3638]. Inclusion criteria were life-time diagnosis of MDD (DSM-IV), and an age between 18–65 years. A sample of 1,822 biologically unrelated controls was matched with 1,821 cases. Full details of both study populations are available in Additional file 2: Supplementary Materials.

Candidate gene selection SNP genotyping and quality control

The 68 candidate genes we used in this study were derived from in the STAR*D study and were prioritized on the basis of previous results of association to MDD, for association to response to antidepressant treatment [39] (Additional file 3: Table S1). Full details of genotyping, marker selection and QC analyses are provided in Additional file 2: Supplementary Materials.

We included markers that mapped within 10 kb of each candidate gene according to NCBI36/hg18. After QC analysis the STAR*D dataset consisted of 1,240 cases and 630 control and 654 SNP markers. The GAIN dataset consisted of 1,732 cases, 1,783 controls and 1,516 SNPs.

Statistical power

Power calculations are conditional on estimates of the correct specifications of the underlying genetic model. Since such a ‘true’ model is not known, we allowed several model parameters to vary, including misspecification of the diagnosis. The mean power estimates for the single marker analyses were 53% in GAIN and 32% in STAR*D samples (Additional file 4: Table S2). Detailed power calculations are provided in Additional file 2: Supplementary Materials.

As a method for power calculation in the additive interaction method or the MDR approach is not validated we have not generated any estimates of power in the interaction analyses.

Marker selection for SNP-SNP interaction analysis

For the logistic regression methods where it became unmanageable to test interactions between all markers, our approach was to prioritize markers according to predicted attributable dependencies [28] defined through main effect (P < 0.05, n = 43) in single marker analysis [40] and from markers with no-main effects (n = 49) using predicted dependencies based on a computational filtering approach implemented in the MDR software [41]. In total, for the two logistic regression methods, 92 markers were tested against the full dataset of 3,704 markers. For the genetic model-free MDR approach, we managed to include the full SNP-dataset using an exhaustive interaction method. We did not filter the markers for pair-wise linkage disequilibrium (LD) patterns. However, when presenting the top 10 interactions for each model LD pruning was performed with an r2 value > 0.2. Interaction analyses were only performed for autosomal markers because the current software versions do not handle chromosome X markers correctly.

Statistical analyses

Single marker association analysis and meta-analysis

For autosomal markers association was tested using a Cochrane Armitage test (SNPtest) assuming an additive model, using the top 4 principal components as covariates. For chromosome X markers the Unphased software (additive allele-wise model for the log odds ratio) was used. Fixed effects (Mantel-Haenzel) or random effects (DerSimonian-Laird) meta-analyses were performed after Woolf’s test for heterogeneity with an applied threshold of P < 0.05, performed in R (rmeta-package). The single-marker analysis and meta-analysis was performed in order to assess a shared genetic background to MDD between the two study populations. We used nominal P value as our cutoff for defining homogeneity in association between the two study populations. The threshold for reporting association to MDD in the single marker analyses was estimated by a family-wise error rate (FWER), using Bonferroni’s method.

SNP-SNP interaction analysis

We considered three different algorithms for detecting SNP-SNP interactions. Interaction effects in the additive and multiplicative methods were tested using logistic regression implemented in a JAVA coded software designed to handle large datasets (coded in our own lab and available upon request) which is a modified version of the GEIRA software [42]. Testing for interaction effects using a machine learning and data mining approach was conducted in a MDR software [43]. See Additional file 2: Supplementary Materials for detailed descriptions of these algorithms. To examine the ability of the identified interactions to predict disease status, we calculated the sensitivity and specificity scores. The population attributable fraction (PAF) was calculated to quantify the proportion of MDD affected individuals attributed to the genetic risk (described in Additional file 2: Supplementary Materials).

Test for correlation between identified pair-wise set of markers between different statistical interaction methods was estimated by calculating Pearson’s r (carried out in R).

Results

Assessment of genetic heterogeneity between STAR*D and GAIN

We considered a nominal threshold of association (P < 0.05) for evaluating a shared genetic background in the 68 candidate genes to MDD between the two study populations.

Of the 173 SNPs that were genotyped in both STAR*D and GAIN none were found to be nominally associated in both study populations. Of the 34 SNPs found at P < 0.05 in STAR*D 10 markers were genotyped in GAIN.

Of the 57 SNPs with P < 0.05 in GAIN 3 markers were genotyped in STAR*D (Additional file 5: Figure S2A-B and Additional file 6: Table S3A-B). We imputed markers in GAIN for the 15 genes that contained nominally associated markers in STAR*D. This analysis generated 28 additional SNPs at P < 0.05 of which none overlapped with nominally associated markers in STAR*D (Additional file 6: Table S3A and C). None of the nominally associated markers in the two study populations were in high LD (r2 ≥ 0.8).

As a final step to test for heterogeneity we included both study populations and performed a meta-analysis. We selected the nominally associated SNPs in STAR*D that were identical with any marker in GAIN, genotyped or imputed (n = 15). However, only 10 markers had the same risk allele and only one marker was nominally associated in the meta-analysis, rs2137330 in the NR3C2 gene, pooled OR = 1.13 and P = 0.039 (Additional file 5: Figure S2C).

A shared genetic risk between the two study-populations could not be confirmed. Therefore only the GAIN sample, which was the sample with more complete genotyping in the candidate genes, was used for the subsequent gene-gene interaction analyses.

Overall interaction findings

Genotypes from 3,704 imputed or genotyped markers in the 63 tested candidate genes (chromosome X genes excluded and two autosomal genes without markers in GAIN) were tested for interaction effects. None of the tests generated P values that survived correction for multiple testing with permutation in either the logistic regression or MDR methods. Additional file 7: Figure S3A-C illustrates the complete set of interactions generated from the different methods that were used for the assessment of interaction effects. For each one of the linear models, 340,768 interactions were evaluated (testing 92 against 3,704 markers), whereas for the MDR method 3,704 markers were tested in an exhaustive interaction analysis.

The large number of tests performed penalized our approach with regard to report a significant interaction, and none of the tests generated a P value that survived correction for multiple testing.

In spite of the non-significant results there are interesting observations worth noting. Among the model-based methods, a large number of interactions were observed with P values of interaction below a nominal significance level (Additional file 7: Figure S3A-B). We therefore hypothesized, based on the notion that our study sample was markedly underpowered, that the 0.5% most significant interactions would be enriched for interactions involved in MDD-susceptibility. In a very conservative approach, we therefore next considered the top 10 ranked interactions (LD filtered, r2 ≤ 0.2) in each one of the different interaction methods as a source of information for better understanding of the genetic interacting landscape in these candidate genes for MDD-susceptibility.

Our next observation concerns the unrealistically large fraction of negative AP estimates. In some instances the observed AP values exceeded the theoretical limit of −1.0 (i.e. AP < −1.0). This observation prompted us to address the question if the negative AP values are valid estimates for the reduction of risk (Additional file 8: Figure S4A-D). We observed a skewing of AP values towards the negative side of the distribution and a correlation between large negative AP values and few individuals in one of the exposure groups. There did not appear to be a correlation between the lowest observed OR and negative AP, as AP < −1.0 was observed for both OR close to 1.0 and large OR’s. However there was a negative correlation between the largest observed single exposure OR and AP when AP < 0. We concluded that although these negative AP values may indicate negative interactions, their magnitudes are far too large to warrant statistical analysis. No deflecting pattern was observed among positive AP values.

Top 10 interactions

Of the ten strongest interactions generated from the three different algorithms, none represents pairs of markers that have previous been reported to interact in the susceptibility to MDD. We also noted that within this group of interactions the markers did not represent those selected based on predictive interaction effects using main effect or an algorithm that predict involvements SNP-SNP interactions. Although the three different methods generated metrics of interaction effects with different assumptions and were thus not readily comparable, we further noted that the estimates of interaction effects were surprisingly strong for the regression models.

Among the ten most significant interactions in the dominant additive model, the effect measures of interaction (AP) were relatively strong (AP > 0.51, uncorrected P < 3.77E-5). The most significant interaction (uncorrected P = 9.10E-9, permuted P = 0.14) consisted of two markers located in the ARGHAP10 gene, rs9332471 and rs6845865 (AP = 0.58) (Table 1A). The rs9332471 marker also shows interaction with another marker, rs2306910, in the ARHGAP10 gene. These 3 markers are in a weak LD (r2 ≤ 0.2) with each other. This gene is involved in intracellular signaling and has a role in regulation of apoptosis. In the recessive model (Table 1B), the strongest signal was observed between markers located in the GRIN2B and RNF20 genes (AP = 0.69, uncorrected P = 7.65E-7, permuted P = 0.56).

Table 1 Result of the 10 most nominally significant interactions

For the additive model the strongest interaction effect, AP value of 0.72 (95% C.I., 0.42-1.02), was for markers in the genes ARHGAP10 and GRIK4.

In the multiplicative interaction method (Table 1C-D) the strongest effect measure, for the interaction term β3, in the recessive model (ORrec = 4.99, uncorrected P = 6.95E-5) was for markers in GRIN2A and GRIN2B genes and for the dominant model (ORdom = 4.16, uncorrected P = 4.41E-5) for markers in GRIK1 and GRIK4 genes. Additionally, there was a marked interaction (ORrec = 3.70) from the ARHGAP10 gene in combination with the SLC6A2 gene, and from the HTR2A gene in combination with the COMT gene (ORdom = 3.05).

Using the model-free approach, there were no interaction events with large effect, OR < 1.51 (Table 1E). The accuracy for how classification could be predicted into high- versus low risk groups was relatively modest (top value = 0.55, permuted P = 0.14). The marginal difference in ratio of affected to unaffected individuals in high versus low risk groups explains the weak cross-validation metric (CVC) value of 4/10, confirming that the study population was obviously too small to identify shared genetic interacting loci with small effects with the MDR approach. Full details of result metrics of the top 10 interactions are provided Additional file 9: Table S4.

Test for correlation between the different interaction methods

To find out to what extent the different algorithms identified pairs of markers of relevance to disease we performed a test for correlation between the three interactions methods. We computed the Pearson’s correlation coefficient (r) to test the covariance of interactions among the 0.5% region in one method against the entire dataset of another method. This test was used for combinations of all methods (Additional file 10: Figure S5A-L). Only weak correlations could be identified. The measures of correlation between the model-based methods and between the multiplicative dominant and MDR yielded r = 0.28 and r = 0.33 respectively. An even weaker correlation was found between the additive and MDR methods.

Model fitting and population attributable fraction

For the 10 most significant interactions attributable fraction (PAF) scores, sensitivity and specificity scores or AUC values were calculated (Table 1A-E). No pairs of markers performed substantially better than any others. None of the reported top interactions on their own explained the diagnosis of MDD to large extent, confirming that two interacting genetic risk factors with major implications to MDD in the general population were not identified in the present study. This observation further support that the interacting genetic landscape is likely to be highly evolved and many interactions are underlying a diagnosis of MDD.

Discussion

A key determinant to understand why people are at risk of complex disorders is the genetic architecture, i.e. number of risk loci, allele frequencies and interaction mechanism between genetic and non-genetic risk factors. In fact, the issue of missing heritability can most likely to a large extent be explained by interactions between risk factors.

Genetic risk for MDD is considered to partially be manifested from multiple genes which interact with each other [44]. We have performed an analysis looking at the genetic interacting landscape in candidate genes for MDD testing three different methods, each one designed to detect plausible statistical interaction effects. No significant interaction was identified. Without excluding the possibility that no genetic interaction with risk for MDD is present in these candidate genes, we speculate that our study sample is not large enough to permit identification of interacting genetic risk factors with small effect due to the high number of independent tests performed. We therefore hypothesize that the most significant interactions are enriched for interactions implicated in MDD-susceptibility from these candidate genes and thus provides indicative information for genetic risk for MDD (Table 1A-D).

Using the multiplicative interaction method, large effect sizes from the interaction term were reported from interactions in the glutamate system. Measured as the attributable proportion due to interaction (AP), the additive method identified pairs of markers with large effects. An important note is however that estimates of AP do not make any inference for how many exposures are present in the identified sufficient cause. Nor do they describe the number of sufficient causes that they are involved in. Hence, the AP value should not be interpreted as the proportion of risk to MDD that is explained by this interaction but the excess risk for MDD among individuals with both risk factors which are due to their interaction.

In the analysis using the model-free approach, the interaction with the best classifier value (balanced accuracy) was not repeatedly validated. The poor consistency of identified interactions is mirrored by the modest OR estimates and indicates the presence of many genetic factors that interact, each with only marginal risk contribution to MDD. Since we observed that the same markers were involved in several different 2-way models (Table 1) we employed a 3-way analysis for an additional subset of markers using the MDR approach. We found (Additional file 2: Supplementary Materials) that the top interaction was not consistently validated (3/10) and, without precluding the possibility that no synergy effects with relevance to MDD exist in these candidate genes, conclude that the curse of higher order of dimensions attenuates the power to predict an interaction in a sparse dataset.

From the results of the analysis of sensitivity and specificity scores, we conclude that none of the top interactions are on their own sufficient to explain why people develop MDD (Table 1A-E). Previous studies [45] indicate that the best disease classifiers are not necessarily found for interactions with low P values. Due to computational reasons, we have not searched our dataset for interactions with large classifier scores among less significant interaction events. Our interpretation is that the disease classifier scoring, along with the results from many separate interactions with small effect suggests that there is an extensive network of combined risk factors that underlies risk to MDD from these candidate genes. Detecting gene-gene interactions with relevance to risk of disorder can be more complicated and hard to interpret than expected, as theoretically, a certain gene could be protective in combination with one gene but risk conferring in combination with another gene. Following the same discussion, it is therefore unknown whether the expected effect size of interactions will be substantially larger than previously found in the single marker analyzes or whether the effect size will only be marginally larger. Despite the absence of significant interactions after correction for multiple testing, we advocate that the present study still holds crucial information that merits further attention. Firstly, the approach to include markers without LD filtering and with a limited restriction for pruning markers for predicted interaction effect provided insight into the interacting landscape in these 63 candidate genes. None of the top 10 interactions in any method and only a minimal proportion of the interactions in the top 0.5% region contained markers predicted to have interaction effects. However, in respect of the two logistic regression methods one of the pair-wise markers was in fact selected based on these two criteria and the observation in this case refers to interactions with both markers selected based on predicted effects. This is of significant interest since neither of the two a priori approaches we employed to select markers with predicted interactions effects seems to be optimal. This is a great challenge for future study designs if significantly interacting markers should be identified.

Secondly, to our knowledge none of the present top interactions have previously been reported to interact in risk for MDD. We performed a literature search on PubMed and the Genetic Association Database (http://geneticassociationdb.nih.gov/) and found relatively few gene-gene interaction analyses have been conducted for MDD. There are reports of interaction between BDNF and NTRK2 and the 5-HTTLRP region in the SLC6A4 gene [46, 47]. In the present study, no interaction involving these three genes (in total 98 markers including the BDNF Val66Met polymorphism) were among the 0.5% most significant interactions in any of the methods. Thirdly, for the model-based approaches we had expected to find the largest effects in groups with few individuals. However, our preliminary results indicate that the genetic risk, in these candidate genes, is shared by many and confers risk with small effect.

Given the limited number of genotypes in the STAR*D study that were available at the time when this study was initiated, in comparison to marker coverage in GAIN, greatly reduced the possibility to produce identical markers between the STAR*D and the GAIN samples. This situation has implications for our interpretation of genetic heterogeneity between the two study populations. We made the conclusion based on the nominal P value from single marker analysis and cannot exclude the possibility that our decision would have been different if a much larger study populations would have been available. Nevertheless, the study sample was predetermined and given the circumstances with low power we conclude that for the given marker coverage the two samples were heterogeneous in the genetic risk to MDD. We stress that even though we presumed that the two study populations are different in the genetic risk to MDD based on the single-marker analysis we cannot exclude that the interaction effects might be shared.

Considering the region harboring the top 0.5% interactions we observed correlation between P values from interactions for the model-based methods, and between the multiplicative dominant and MDR methods. These correlations were however weak (r value close to zero, although significant). Additionally, only a proportion of the top 0.5% interactions were found among the top 0.5% interactions in another method (Additional file 10: Figure S5A-L). We, therefore, conclude that the different methods for testing interactions will not select the same pairs of markers as potential candidates to explain the risk of MDD which is not surprising given the different assumption behind the methods. These results are consistent with those from experimental genetic interaction studies in model systems which show that interactions corresponds to definitions of both additive and multiplicative models [48] which support our study design to search for interaction using different definitions.

The observation of negative AP estimates below −1.0 in our dataset was unexpected (Table 1A-B). AP is defined as the proportion of disease that is attributed due to interaction among those who are exposed to both risk factors [49, 50]. It has been claimed that AP values can vary between +1 and −1 [51]. AP > 0 means excess risk, or positive interaction, and AP < 0 means negative interaction, or protection between the two risk factors [51, 52]. It is therefore surprising that we frequently observed values below −1.0 in our dataset. It has previously been shown that measures of additive interactions calculated by substituting OR with RR in case–control studies may not be reliable estimates when the incidence of the disease is high, as the case for MDD where it is 15%,with inflation of the AP estimates as a consequence [52]. This cannot however be the only explanation for AP values below −1.0 as we have observed the same distribution in a dataset (data not shown) where we tested interaction between susceptibility SNPs for multiple sclerosis which has a prevalence less than ~0.2% [53]. By correlating AP values to several other measures (Additional file 8: Figure S4A-D) we demonstrate that AP values are apparently not normally distributed [50] and moreover that negative AP values were unexpectedly inflated at a low number of individuals in any risk group, as well as at large OR values for single exposures. Our conclusion is that AP can attain large negative values per se. A closer look at the definition of AP reveals the reason; the denominator equals the OR of the full model, which sometimes is small for pairs of SNPs that exhibit negative interaction. Thus, our interpretation is that negative values have a meaning for estimating protection although it is hard to interpret the magnitude of that protective effect. Following the line of arguments and discussion above we decided to report on the complete dataset but only comment on interaction with positive or synergistic interactions, i.e. positive AP values.

Limitations

This study has several limitations. Firstly, the GAIN sample was underpowered for finding genetic interacting risk factors with small effect. To retain the best power of the remaining sample we did not test for interactions using any sub-categorization of MDD diagnosis or gender as this would have reduced power. This might reduce our chance to find genetic interacting risk factors for these separate categories. Because the present study was shown to have limited power for identifying genetic risk factors with small effects we designed the interaction analysis without partitioning the GAIN sample into a testing and replication datasets. Our study should thus only be considered as an explorative study which needs further confirmation. Secondly, due to the lack of prior evidence for superiority of any method to give priority to markers with a predicted major role in interactions effects [25, 54] we reduced our chances of reporting a significant interaction through our massive screening approach.

Conclusions

Our main objective of finding interacting genetic risk loci for MDD-susceptibility was not fruitful. Despite the absence of significant results we advocate that there are still interesting results in the present study. Firstly, with restriction to the 63 candidate genes our results further support the hypothesis that MDD has an ologogenetic background with an exceptionally evolved network of genetic interactions contributing small amount to the overall-risk of developing MDD. Secondly, our study design to test all markers without giving priority to a sub-sample of markers based on predicted interactions gave vital information. None of the most significant interactions contains markers based on main effect from single marker analysis or based on prediction using an a priori algorithm. Moreover, we also conclude that there is a little overlap in interacting markers identified in the different algorithms used. This might argue that none of the algorithms are adequate on their own. Thus, our results motivate that both model-based and model-free methods should be applied if interacting markers for complex traits shall be identified.

Identification of such interactions may help us to prioritize which biological processes that should be studied further in order to better understand MDD pathogenesis.