The MEDLINE search identified 385 articles, of which 10 [21–30] met our inclusion criteria and were subsequently included in the systematic review and meta-analysis (ESM Fig. 1). The search in EMBASE (total number of studies = 381) identified one additional study  meeting the selection criteria, but no further studies were identified through the search in PsycInfo (total number of studies = 17). Thus, a total of 19 studies were retrieved for detailed evaluation, of which 11 studies were selected for inclusion in the meta-analysis. The extracted data of the 11 studies included are presented in ESM Table 2. It should be noted that two studies [23, 24] reported HRs rather than ORs.
The results of the quality assessment of the included studies can be found in ESM Table 3. Only one study  provided adequate description of the four groups (diabetes depressed, diabetes non-depressed, non-diabetes depressed, non-diabetes non-depressed) in terms of confounding variables such as age, sex, number and type of complications. As such, it was, in most cases, impossible to see whether the groups were comparable on important confounding variables. However, most studies (n = 8) controlled statistically for a number of potentially confounding variables.
Four studies used diagnostic criteria to determine depression status [24–26, 30], but none of these studies reported whether there was any blinding of the assessors to the status of diabetes. However, as Brown et al.  and O’Connor et al.  were retrospective studies and diagnosis of depression was made by general practitioners independent of the study rather than by trained assessors, lack of blinding was deemed less important. However, reliability of the diagnosis of depression in these two studies was not reported.
The follow-up time varied between less than 2 years  and 10 years . Two studies [22, 26] combined data from waves at different time points (2 and 5 years follow-up) to calculate depression incidence. Finally, while all but two studies [24, 31] reported the proportion of the cohort that was followed up, only three studies [21, 22, 25] reported whether dropout rates and reasons for dropout were similar across groups. It should be noted that the criteria that were used for establishing diabetes differed among the studies, with most relying on doctor’s diagnosis [24, 31] or self-report of doctor’s diagnosis [22, 23, 25–27]. Others also used medication use or blood tests such as OGTT or fasting plasma glucose [21, 27–30] or HbA1c levels .
Based on all studies, including 48,808 cases of type 2 diabetes, the pooled OR was 1.24 (95% CI 1.09–1.40). The forest plot of the OR and 95% CI of each study, and the pooled OR of both the FEM and the REM, are shown in Fig. 1.
The funnel plot suggested possible publication bias (Fig. 2). This was also supported by Begg’s adjusted rank correlation test (p = 0.70) and the Egger test (p = 0.81).
The test for heterogeneity was significant (Q = 30.84; df = 10; p = 0.001), with a moment-based estimate of between studies variance of 0.021, indicating heterogeneity. We therefore stratified the studies by method of defining depression. Among the six studies that relied on questionnaires to define depression, the pooled OR was 1.19 (95% CI 1.03–1.39) and the test for heterogeneity was not significant (Q = 8.03; df = 5; p = 0.16), with a moment-based estimate of between studies variance of 0.013. In the five studies that defined depression using diagnostic criteria, the risk of depression was higher than for studies using questionnaires, with a pooled OR of 1.29 (95% CI 1.05–1.59). The test for heterogeneity was significant (Q = 22.42; df = 4; p < 0.001), with a moment-based estimate of between studies variance of 0.032. However, as one study  relying on diagnostic criteria to define depression included incident (new) cases of diabetes, the analysis was rerun without this study. The pooled OR for the four remaining studies was 1.47 (95% CI 1.34–1.60) and the test for heterogeneity was not significant (Q = 0.91; df = 3; p = 0.82), with a moment-based estimate of between studies variance of 0.00.
Because stratifying by (controlling for) method of depression measurement (diagnosis or questionnaire) reduced the significant overall study heterogeneity to non-significance, meta-regression analysis was performed to examine this further. The results showed that type of depression measurement (β = −0.25; p < 0.001, 95% CI −0.37, −0.14) was a significant predictor of depression incidence, with rates higher for studies using diagnostic criteria to define depression.
Additional meta-analyses were performed to determine whether other factors might account for the heterogeneity in study-specific incidence rates. For example, as can be seen in Fig. 1, the forest plot showed that the ORs increase over time, suggesting that the incidence of depression is increased in more recent studies. Therefore, a set of single-factor meta-regression analyses was performed with time since publication, follow-up time, number of follow-up depression measurements (1 vs >1 or continuous assessment), sample size, and number of people with diabetes as separate predictors of depression incidence OR. The results showed that year of publication (β = 0.09; p < 0.001, 95% CI 0.05–0.12) was a significant predictor of depression incidence OR, but time of follow-up, sample size, the number of people with diabetes, and number of follow-up depression measurements were not (p > 0.09). Because type of depression measurement and year of publication may be confounded, we also examined these together in a single regression model. The results showed that year of publication remained significant (β = 0.09, p < 0.02, 95% CI 0.02–0.16) but type of depression measurement was no longer significant (β = −0.07, p < 0.95, 95% CI −0.24–0.22). The results did not change when repeated without Brown et al. .