Associations between exploratory dietary patterns and incident type 2 diabetes: a federated meta-analysis of individual participant data from 25 cohort studies

Purpose In several studies, exploratory dietary patterns (DP), derived by principal component analysis, were inversely or positively associated with incident type 2 diabetes (T2D). However, findings remained study-specific, inconsistent and rarely replicated. This study aimed to investigate the associations between DPs and T2D in multiple cohorts across the world. Methods This federated meta-analysis of individual participant data was based on 25 prospective cohort studies from 5 continents including a total of 390,664 participants with a follow-up for T2D (3.8–25.0 years). After data harmonization across cohorts we evaluated 15 previously identified T2D-related DPs for association with incident T2D estimating pooled incidence rate ratios (IRR) and confidence intervals (CI) by Piecewise Poisson regression and random-effects meta-analysis. Results 29,386 participants developed T2D during follow-up. Five DPs, characterized by higher intake of red meat, processed meat, French fries and refined grains, were associated with higher incidence of T2D. The strongest association was observed for a DP comprising these food groups besides others (IRRpooled per 1 SD = 1.104, 95% CI 1.059–1.151). Although heterogeneity was present (I2 = 85%), IRR exceeded 1 in 18 of the 20 meta-analyzed studies. Original DPs associated with lower T2D risk were not confirmed. Instead, a healthy DP (HDP1) was associated with higher T2D risk (IRRpooled per 1 SD = 1.057, 95% CI 1.027–1.088). Conclusion Our findings from various cohorts revealed positive associations for several DPs, characterized by higher intake of red meat, processed meat, French fries and refined grains, adding to the evidence-base that links DPs to higher T2D risk. However, no inverse DP–T2D associations were confirmed. Supplementary Information The online version contains supplementary material available at 10.1007/s00394-022-02909-9.

A solution to overcome the limitation of study-specific findings is to replicate the association of DPs with T2D in independent populations. So far, only one study investigated the generalizability of T2D-associations with DPs derived by principal component analysis (PCA). However, this study was restricted to European populations participating in the EPIC-InterAct consortium with the aim to replicate only those T2D-associated DPs which were derived in country-specific analyses within this consortium [13]. In addition to PCA, patterns derived by reduced rank regression, were also replicated [14][15][16]. The main principle for those replication approaches is the reconstruction of pattern variables based on the reported pattern structure. In this context, it has been proposed to derive so-called simplified DP variables to construct less populationdependent DP variables with a content approximately similar to that of original exploratory DPs. It has been shown that the DP variables, calculated with this method, correlated highly with the original DP and reflected variation in intake of individual components well [14,16,17]. Hence, this approach seems well suited to replicate study-specific associations of exploratory DPs in independent study populations. To date, however, this method has not been used to examine exploratory DPs in relation to T2D across populations from different continents of the world.
To overcome the research gap of investigating the generalizability of DP-T2D associations using the approach of simplified DPs, the present study aimed 1) to investigate the association of previously reported T2D-associated DPs [1] with incident T2D and 2) to evaluate, if two DPs of overlapping FGs ("mainly healthy" and "mainly unhealthy"), also previously identified in the same systematic review [1], are associated with incident T2D. For this purpose, the InterConnect collaboration project offers a well-suited research platform for federated meta-analyses of harmonized individual level study data from 25 cohorts [18][19][20][21][22][23][24][25][26][27][28][29][30][31][32][33] across different continents and adjusting for a common set of potential confounders across studies [34][35][36]. As another advantage, this approach allowed the inclusion of cohorts that have relevant data, but never published on the topic before.

Dietary assessment and construction of dietary patterns
Dietary intake was assessed by food frequency questionnaires (FFQ) in most cohorts, by dietary history interview and a 24-h recall in one cohort each (Table S1). For the present study food intake encoded in g/day was used. Some cohorts provided only standard portion sizes and frequency of consumed food items, which were converted into g/day. For some US cohorts, where information on portion size was not available, variable-specific standard portion sizes sourced from the United States Department of Agriculture [39] were used.
The dietary data of all cohorts were then harmonized to form a set of food groups. For this purpose, the FGs used in the published DPs associated with T2D risk were compared. Based on this, a set of FGs was defined to be used across all published DPs (Tables 1, S2 and S3). If for a specific food item, which was used in the original DP, no intake information was available in other included studies, it was omitted. Then the respective study-specific food items were added in each InterConnect cohort to form the corresponding harmonized FG (Excel Table S6). Subsequently, DPs were constructed based on the harmonized FGs. The structure of DPs was defined based on the findings of our previous systematic review [1], thus reflecting a) DPs found to be significantly associated with T2D risk in at least one cohort study (13 individual DPs) and b) two DPs reflecting DPs with overlapping food composition: the DP reflecting the overlap of "mainly healthy" food groups was composed of fruits, vegetables, legumes, poultry and fish, while the DP of " mainly unhealthy" food groups was composed of refined grains, French fries, red meat, processed meat, high-fat dairy products and eggs. Thus, 15 DPs in total were constructed. To calculate individual DP scores for study participants, the approach of simplified DPs [17] was used. In PCA-derived DPs, all food groups contribute with a respective factor loading to the overall pattern structure. The simplification approach considers only those FGs with strong contribution to the respective DP (factor loading (FL) ≥ 0.2) in the original DPs. Details of which FGs were combined to calculate the respective simplified DP scores are shown in Tables 1, S2 and S3. These FGs were standardized according to the distribution in each participating study, respectively. Then, simplified DP scores were calculated by summing up the selected FGs without any weighting (in original DP the respective FL is the weighting) and by also considering negative algebraic signs for those FGs with negative FL from the original publication. Finally, study-specific simplified DP scores were also standardized to allow meta-analysis across cohorts [17].

Ascertainment of incident T2D
To minimize potential variations due to varying diagnosis criteria of T2D incidence across cohorts, two harmonized outcomes were defined [40]. As primary outcome, clinically incident T2D was defined when any one or more of the following criteria were fulfilled: (1) ascertained by linkage to a registry or medical record; (2) confirmed antidiabetic medication usage; (3) self-report of physician diagnosis or antidiabetic medication, verified by any of the following: (a) at least one additional source from 1 or 2 above, (b) biochemical measurement (glucose or HbA1c), (c) a validation study with high concordance. As secondary outcome with less strict criteria, we defined incident T2D, when any of the following criteria were fulfilled: (1) ascertained by linkage to a registry or medical record; (2) confirmed antidiabetic medication usage; (3) self-report of physician diagnosis or antidiabetic medication or (4) biochemical measurement (glucose or HbA1c).

Assessment of covariates
We defined a set of potential confounders to be used in analyses based on: (1) frequent usage in the studies of the 13 published T2D-associated DPs and (2) availability across all participating InterConnect cohorts (Table S4). The final set of confounders included: age at baseline (years), sex, body mass index (BMI) (kg/m 2 ), physical activity (PA, cohort specific items were used), education (cohort specific items were used), smoking (never, former, current smoker), alcohol consumption (g/day), hypertension (yes/no), and energy intake (kcal/day). The recorded data of confounders of the respective InterConnect cohorts were used and harmonized across all cohorts, if possible (Table S5). All cohorts provided age in years, BMI in kg/m 2 , hypertension as yes or no. Smoking was harmonized as never, former, and current smoker, energy intake into kcal/day and alcohol into g/day.
In the Golestan Cohort Study from Iran alcohol consumption was used as never or ever drinker. Study-specific coding was used for PA and education because harmonization was not feasible due to extensive differences in codes (Table S5).

Statistical analysis
All analyses were conducted using R within the DataSH-IELD federated meta-analysis programming library [35]. For analysis, participants with the following criteria were excluded: T2D, myocardial infarction, stroke or cancer at baseline to avoid reverse causation, extreme energy intake (men < 800 kcal or > 4200 kcal, women < 500 kcal or > 3500 kcal), missing follow up time, missing confounders, and more than 10% missing food items. In total, 46.9% of the participants of the InterConnect cohorts were excluded (Table 2). Baseline characteristics were calculated stratified by cohorts. Normally distributed variables were presented as mean and standard deviation (SD), not normally distributed as median and interquartile range (IQR), and categorical variables as relative percentages. Incidence rate ratios (IRRs) and 95% confidence intervals (CI) were estimated to test for the associations between 1 standard deviation (SD) increase in DP scores and incident T2D in each cohort separately, using Piecewise Poisson regression adjusted for age, sex, BMI, PA, education, smoking, alcohol consumption, hypertension and energy intake. The Piecewise Poisson regression is available in the Data-SHIELD library and has been shown to represent a close approximation to the Cox Proportional Hazards regression [41]. For the European Prospective Investigation into Cancer and Nutrition (EPIC)-InterAct cohorts a weighting was 17.9 (21.1) applied that is analogous to Prentice weighting (weights of 1 for all cases and weights of #non−casesinwholecohort #non−casesinsubcohort for non-cases) to account for the case-cohort design in survival analyses, when using the piecewise Poisson method [42].
Pooled IRR were estimated using random-effects metaanalysis models and were visualized with forest plots. Heterogeneity was assessed using I 2 , p value of chi-square test and tau 2 statistic. For each DP a statistical model for the primary and the secondary outcome was calculated. For sensitivity analysis we calculated a second set of the 13 DPs by considering only FGs with FL ≥ 0.4 in the original publication to identify those strongly contributing to the DP. Moreover, a sensitivity analysis with exclusion of certain component FGs was conducted to estimate if few FGs were mainly driving the association from the UDP3, which showed the strongest association with T2D. To account for characteristics potentially explaining heterogeneity between the cohorts, meta-regressions were calculated with the pooled IRR as dependent variable and age, BMI, follow-up time and region as the independent variables. For this, the metareg function within the metafor package (version 3.02) in R was used.

Results
In the present analysis, data from 390,664 participants across 25 cohorts with a median follow-up time ranging from 3.8 to 25.0 years were included ( Table 2). Four cohorts included only women (EPIC-InterAct-France, Mexican Teachers' Cohort (MTC), Swedish Mammography Cohort (SMC), Women's Health Initiative Observational Study (WHI-OS)) and two only men (Cohort of Swedish Men (COSM), Puerto Rico Heart Health Program (PRPHH)). Participants from Coronary Artery Risk Development in Young Adults (CARDIA) study, MTC and Seguimiento University of Navarra (SUN) cohort were of younger age (24.9-41.8 years), whereas participants from other cohorts were older (49.5-63.1 years). The mean BMI ranged from 23.9 kg/m 2 in SUN to 29.3 kg/m 2 in EPIC-InterAct-Spain. During follow-up, 29,386 clinically incident cases of T2D were recorded for the primary outcome and 36,527 incident cases for the secondary outcome.
The dietary intake of harmonized FGs showed marked differences between the cohorts (Excel Supplemental Table). For example, reported median fruit intake was highest in MTC (321.7 g/day) and about three times higher than median intake in the cohorts with lowest fruit intake like CARDIA (94.9 g/day) and EPIC-InterAct-Germany (91.4 g/ day). Particularly high intakes compared to other cohorts were observed for vegetables in SUN Study (391 g/day), legumes and soy (but mostly beans) in Brazilian Longitudinal

Healthy dietary patterns and risk of T2D
None of the HDPs (  [2][3][4][5] were robustly associated with a reduced risk of T2D. This was the case for the two outcome definitions and for the two versions of each HDP constructed using different cut-offs of FL to define component FGs. HDP1 was significantly associated with a higher T2D risk (primary outcome: pooled IRR per SD = 1.057, 95% CI 1.027-1.088; secondary outcome: IRR per SD = 1.042, 95% CI 1.018-1.065, Table 3). This DP contains vegetables, fruits, margarine, nuts, poultry, eggs, fish, red meat, whole milk, high fat dairy and low-medium fat dairy. However, this association was absent in sensitivity analysis, when only FGs with published absolute FL ≥ 0.4 (vegetables and fruits, Table 2) were used to construct the HDP1 (Supplemental Table 6). HDP3, composed of fruits and dairy products, was also not significantly associated with T2D risk (pooled IRR per SD = 0.976, 95% CI 0.948-1.005, Table 3), when using the secondary outcome definition. For the remaining HDPs (2, 4-6) the pooled risk estimators did not indicate associations with T2D risk (Table 3). Overall, there was moderate to substantial heterogeneity (I 2 = 58-83%, Table 3) for the HDP-T2D associations. For HDP1, none of the characteristics (age, BMI, follow-up time and region) explained the observed heterogeneity (I 2 = 66%) in meta-regressions (data not shown).

Unhealthy dietary patterns and risk of T2D
Five of the seven UDPs (UDP3-7) were associated with a higher T2D risk in pooled analyses across all cohorts (  Table 6). Most cohort-specific IRRs indicated that UDP3 was associated with a higher T2D risk or a trend towards an association (Figs. 1, 2). Similar findings, although weaker, were observed for UDPs 4-7, where heterogeneity ranged from moderate (I 2 = 49% for UDP 4) to substantial (I 2 = 81% for UDP 6). Here, region explained a considerable proportion of the heterogeneity for UDP6 (29%) and UDP7 (25%), while follow-up time explained 30% for UDP5 and 24% for UDP6 of the overall heterogeneity. No association with T2D risk was found for UDP 1 and UDP 2, neither for the two outcome definitions nor for the two FL cut-offs (Table 3, Supplemental Table 6).

Dietary patterns with "mainly healthy" and "mainly unhealthy" food groups and T2D risk
We evaluated the two DPs reflecting previously published DPs with overlapping FG components irrespective of whether they have been described to be associated with T2D previously or not [1]. The DP consisting of "mainly healthy" FGs, i.e. fruits, vegetables, legumes, poultry and fish, was not associated with T2D risk across the included cohorts (primary outcome: pooled IRR per 1 SD = 1.033, 95% CI 0.998-1.071; secondary outcome: pooled IRR per 1 SD = 1.000, 95% CI 0.975-1.026) (Fig. 3, Supplemental Fig. 6). The heterogeneity across studies was substantial (primary outcome: I 2 = 84%, secondary outcome: I 2 = 76%). Hence, the forest plots show the cohorts arranged by region. In contrast, the DP consisting mainly of "mainly unhealthy" FGs, i.e. refined grains, French fries, red meat, processed meat, high-fat dairy products and eggs, was significantly associated with a higher T2D risk (primary outcome: pooled IRR per 1 SD = 1.079, 95% CI 1.051-1.108; secondary outcome: pooled IRR per 1 SD = 1.067, 95% CI 1.037-1.098) (Fig. 3, Supplemental  Fig. 6). The heterogeneity was moderate for the primary outcome (I 2 = 58%), but substantial for the secondary outcome (I 2 = 74%). Most study-specific IRRs indicated a higher risk of this DP, except for the Golestan Cohort Study, which pointed towards an inverse association.

Sensitivity analysis of UDP 3
UDP3 was composed of the FGs red meat, processed meat, poultry, eggs, fish, French fries, refined grain products, and rice. To assess the contribution of these individual FGs to the T2D risk of UDP3, a sensitivity analysis was carried out by excluding individual FGs (Supplemental Table 7). The exclusion of refined grains resulted in the highest reduction of the IRR estimate (from 1.094-1.047, − 4.74%), followed by processed meat (− 1.66%) and eggs (− 1.10%).

Discussion
This study investigated associations between exploratory DPs and T2D risk in a large number of prospective cohort studies in a worldwide context, using harmonized data analyses across all studies and federated meta-analyses of individual studies. No robust inverse associations were observed between HDPs and risk of T2D. HDP1 was associated with a higher T2D risk in primary analysis, but this unexpected finding was not confirmed in sensitivity analyses. We observed more consistent findings for UDPs with five of the seven UDPs being associated with higher T2D risk in our meta-analysis of included studies. We investigated two DPs which reflect commonly shared FGs of exploratory DPs identified in previous studies on DP and T2D. The DP with "mainly healthy" FGs, characterized by higher intakes of vegetables, legumes, fruits, poultry and fish, was not associated with T2D risk, but the DP with "mainly unhealthy" FGs, characterized by red meat, processed meat, high-fat dairy products, eggs, refined grains and French fries, was associated with a higher T2D risk. The effect size for all the significant associations was relatively modest with IRRs being 1.10 per 1 SD increased DP score or less. Previous studies have shown differences in risk associations between DPs and T2D in U.S. cohorts and the European EPIC-InterAct study, although this was restricted to a priori DPs like the Dietary Approaches to Stop Hypertension (DASH) diet, the Alternative Healthy Eating Index (AHEI) or reduced rank regression-derived DPs [1,43]. Given the strong heterogeneity in the composition of exploratory DPs already in the European context, this underlines the importance of investigating if population-specific DP-T2D associations can be replicated across diverse populations, where even higher heterogeneity is expected. To our knowledge, this is the first study to investigate if associations of exploratory DPs with T2D risk can be replicated across cohorts from multiple regions across the world.
We have previously investigated the generalizability of exploratory DPs associations with T2D in EPIC-InterAct, a European-wide cohort study [13]. In this analysis, three DPs identified in country-specific analyses were associated with T2D. However, only one DP was consistently associated with T2D risk across the included European cohorts (pooled IRR per 1 SD: 1.12, 95% CI 1.04-1.20). This DP was characterized by high intakes of processed meat, potatoes (including French fries), vegetable oils, sugar, cake and cookies, and tea. Besides the EPIC-InterAct study, we are not aware of any further systematic replication of associations of exploratory DPs and T2D. Also, the EPIC-InterAct study did not attempt to replicate T2D-associated DPs identified in other cohorts than EPIC-InterAct, which has been our current major aim.
We were able to replicate associations with higher T2D risk for five of seven investigated UDPs. These five UDPs (UDP3-7) share red meat, processed meat, French fries and refined grains (comprising refined grain bread and refined grain breakfast cereals) as component FGs. Also eggs and high-fat dairy products were component FGs of three out of these five DPs. These FGs are identical to those which we used to construct one DP based on commonly shared "mainly unhealthy" FGs of published DPs [1]. Consequently, this pattern was also associated with a higher T2D risk in our meta-analysis: we observed a pooled IRR of 1.08 per 1 SD, 95% CI 1.05-1.11 for the primary outcome definition, being slightly stronger than the risk estimates for most of the UDPs, which ranged between pooled IRRs of 1.04 for the UDP5 by Yu et al. [7] and for UDP7 by Schoenaker et al. [9] to 1.07 for the UDP4 identified by Erber et al. [6]. However, an even higher risk estimate was found for UDP3 (IRR of 1.10 per 1 SD, 95% CI 1.06-1.15), which had been observed in the Melbourne Collaborative Cohort Study to be associated with higher risk of T2D [5]. This DP was not only characterized by red and processed meat, eggs, French fries, refined grains, but also by fish, poultry and rice. We noted that the DPs associated with higher risk in our meta-analyses had only potatoes (including French fries) and processed meat in common with the DP identified to be associated in the EPIC-InterAct study [13]. To gain insight into the role of individual FGs for pattern associations, we conducted a sensitivity analysis on the UDP3-T2D association by excluding individual FGs one at a time. Particularly the exclusion of refined grains led to an attenuation of the risk estimate from IRR of 1.10 to 1.05 for the primary outcome. Still, other components seemed to contribute to the associations and we interpret the synergy of these component FGs in this pattern as driving the association with T2D. The UDPs which were identified as being associated with a higher risk of T2D did not only show overlaps but also differences in component FGs. For example, butter (UDP4), sugar and confectionary and offals (UDP5) or pizza (UDP6, UDP7) were patternspecific components besides the commonly shared FGs. Two of the UDPs (UDP5, UDP6) additionally shared the FG sugar-sweetened beverages. This food group was also a component in 4 out of 5 previously identified reduced rank regression-patterns, which were associated with higher T2D risk [14,[44][45][46] and evidence from a systematic literature review suggests 13% risk increase for T2D per one serving (250 mL/day), even after adjustment for BMI [47]. The UDP6 was furthermore characterized by the negatively weighted FGs cakes & cookies, legumes, vegetables, fruits and whole grains. However, after exclusion of these FGs due to the use of the cut-off FL ≥ 0.4, the IRR was only marginally changed.
None of the HDPs, either individual DPs described by single studies or the DP defined by commonly shared "mainly healthy" FGs of investigated patterns, were inversely associated with T2D risk in our meta-analyses. This is generally in line with evidence for single FGs being components of such DPs. For instance, vegetables, fruits, legumes, poultry and fish have not been clearly identified to relate to lower T2D risk in cohort studies [48]. In contrast to the original observation from the Finnish Mobile Clinic Health Examination Survey [4], we observed the HDP1 being associated with a higher risk of T2D. Red meat and eggs-frequent components of UDPs-were also contributing components of this pattern; thus, the direction of association in our analysis could potentially be driven by these two components. While a higher T2D risk of red meat is well documented [48], the role of egg consumption remains unclear [49]. Differences how specific foods are prepared and/or consumed together across populations may explain their association with healthy or unhealthy patterns. Furthermore, if a food group like fish is the main animal protein source in a population, detrimental components like methylmercury could play a more important role leading to health detrimental effects than in a population, where these components play a minor role due to less intake [50].
Besides the components of the investigated DPs, it is relevant to discuss overall methodological limitations. To Fig. 1 Incidence rate ratios and 95% confidence intervals for the association between replicated dietary pattern variables and incident type 2 diabetes. Shown are results for the primary outcome definition and harmonized food groups with published factor loadings > 0.2 by subgroups of region. Associations are adjusted for age, sex, BMI, physical activity, education, smoking, alcohol consumption, total energy intake and hypertension. CI confidence intervals, IRR incidence rate ratios, HDP healthy dietary pattern, UDP unhealthy dietary patterns ◂ Fig. 2 Incidence rate ratios and 95% confidence intervals for the association between replicated dietary pattern variables and incident type 2 diabetes. Shown are results for the secondary outcome definition and harmonized food groups with published factor loadings > 0.2 by subgroups of region. Associations are adjusted for age, sex, BMI, physical activity, education, smoking, alcohol consumption, total energy intake and hypertension. CI confidence intervals, IRR incidence rate ratios, HDP healthy dietary pattern, UDP unhealthy dietary patterns Fig. 2 (continued)   Fig. 3 Incidence rate ratios and 95% confidence intervals for the association between the dietary patterns of "mainly healthy" and "mainly unhealthy" food groups and incident type 2 diabetes using the primary outcome. Associations are shown by subgroups of region and adjusted for age, sex, BMI, physical activity, education, smoking, alcohol consumption, total energy intake and hypertension. CI confidence intervals, IRR incidence rate ratios enable the meta-analytical investigation of the DPs across so many different cohorts in the first place, we harmonized the cohort specific food items into a number of food groups. This inherits the problem of summarizing different numbers of food items into one food group, depending on the original dietary assessment. Hence, the difference in median intake of certain food groups between the cohorts could be due to real dietary intake differences in the populations or due to a higher extent of inquired food items. Furthermore, the condensing of food items into food groups led to a lack of granularity. Hence, potential differences in the association with T2D of specific food items, e.g. green leafy vegetables [51], could not be distinguished from other food items within this food group. Another methodological limitation could be the lack of detail about preparation methods, e.g. frying, in the dietary assessment of most of the participating cohorts. Hence, this may have led to an underestimation of the association for the UDP3, which related to each of fried fish, poultry and rice in the original study by Hodge et al. [5], while we could only consider overall intake of fish, poultry and rice in our study. A distinction between French fries and potatoes (non-fried) was also not possible in all participating cohorts. However, a recent meta-analysis investigated the association of potatoes with T2D risk and distinguished between French fries and boiled/baked/mashed potatoes and both types of potato culinary preparations were associated with a higher T2D risk, although to a higher extent for 150 g/ day intake of French fries (RR of 1.66, 95% CI 1.43-1.94) compared to 150 g/day intake of boiled potatoes (RR of 1.09, 95% CI 1.01-1.18) [52]. Hence, we would still expect the risk estimates to point to a similar direction. Besides the food items, a common set of important and well-established confounders had to be harmonized across the cohorts. The set was selected based on those confounders, which were reported in the original publications of DPs and based on the availability of confounders in the participating InterConnect cohorts. Clearly, due to the harmonization approach and the technical setup for federated data analysis, it was not possible to account for all potential confounders, either being generally important (e.g. family history of diabetes) or being relevant for some specific study populations (e.g. ethnicity). Still, the consideration of a harmonized confounder set could be seen as strength of this study. Alongside the exposure and covariates, the outcome definitions needed also harmonization attempts. Due to different definitions of T2D as outcome in the participating cohorts, we have applied two different outcome definitions (primary, secondary). To assess if large differences in the number of T2D cases in some cohorts due to the definitions affect the associations, we conducted a sensitivity analysis. We compared the IRR for subgroup analyses of cohorts with a large (> 40%) to small (≤ 40%) difference and did observe slightly attenuated associations for all UDPs (data not shown). This indicated that a stricter outcome definition ("primary outcome") resulted in slightly stronger associations.
Furthermore, the DPs were replicated in the different cohorts by using a simplification process which restricts the DP score calculations to those FGs with high FL and ignores differences in FL between FGs [17]. However, many original DPs contained only very few FGs with relative high FL (≥ 0.4). So, for instance, the simplified UDP3 resulted in red meat as the only FG and hence lost the complex pattern structure. Therefore, we decided to use FGs in the simplified pattern with FL ≥ 0.2 as the main analysis. The simplification ignores relative differences in contributions of FGs to DPs (reflected by differences in FLs), however, it supports interpretation of DPs in terms of FG intake [17]. While the approach has been successfully applied to replicate other data-driven pattern associations [14,43], we cannot rule out that the relative loss in precision in DP score calculation has influenced the success of pattern-T2D association replications in our study.
We observed moderate to strong heterogeneity of associations across cohorts, with I 2 values ranging from 49% (UDP4) to 85% (UDP3). Heterogeneity between studies may have different explanations. The condensation of foods into harmonized FGs in the cohorts may have led to the inclusion of heterogeneous food items due to strong culinary differences between populations, but also due to different extent of inquired food items depending on the dietary assessment instrument. Another explanation for heterogeneity could have been the inclusion of cohorts with a short follow-up time, introducing the bias of reverse causation. Especially for HDPs, participants with a high risk at developing T2D could have changed their dietary habits by eating more health promoting food groups, but still developed the disease. However, this could not be confirmed by the results of our meta-regression on several characteristics of the cohorts (region, follow-up time, age, BMI). Here, the follow-up time explained only a considerable proportion of heterogeneity for two UDPs (UDP5, UDP6). Overall, the magnitude of the pooled risk estimates was much smaller compared to the original studies. However, comparability is constrained, since the risk estimates are given per 1 SD increase and SD is highly dependent on the population distribution of the respective DPs. Nevertheless, we were restricted to the calculation of analyses assuming a linear association between the DPs and T2D, due to the federated approach and the solutions, which could be realised with DataSH-IELD. Hence, generalizable conclusions based solely on the magnitude of risk estimates from the meta-analyses should be done with caution and no quantitative recommendations can be deduced for public health guidance. Therefore, we mainly base our conclusions on the consistency of direction of associations: in the meta-analyses with significant pooled risk estimates, the majority of included cohorts pointed also towards a higher risk. Another limitation was the standardization of FGs for DP score calculation based on the distribution of FG intake in the respective cohorts. This could be a problem, if food intake distributions differ extensively between those cohorts compared to the study population where a DP had previously been reported from and hence, may jeopardize attempts to replicate associations of DPs with disease risk. However, two main reasons were pivotal for this approach. On the one hand, the information on the intake distribution was not provided in most original publications, but rather the correlation structure as a basis for the exploratory derivation of DPs. On the other hand, even if this information would be provided by the original publications, this would result in more limitations: In most studies, non-or semi-quantitative dietary assessment instrument were applied and hence, the reported intake distributions did not provide a valid estimation of absolute intakes. Furthermore, dietary assessment instruments per se differed between the cohorts and nothing is known about their comparability in estimating food intake. Another limitation of this study was the high exclusion rate of 46.9%. Hence, a potential selection bias due to missing follow-up time, covariates or food intake data could not be ruled out.

Conclusion
To our knowledge, this is the first study replicating population-specific associations of exploratory DPs with T2D risk across a large number of cohort studies from different continents. Our meta-analyses of harmonized individuallevel data from various cohorts revealed a higher T2D risk for several DPs characterized by higher intake of red meat, processed meat, French fries and refined grains (comprising refined grain bread and refined grain breakfast cereals). These results confirm former study-specific results in a generalizable context and therefore enrich evidence for DPs related to higher T2D risk. However, none of the inverse associations of investigated HDPs could be confirmed across different cohorts.
Author contributions The author's responsibilities were as follows: MBS, NJW and NGF: designed the research; SD and AF: evaluated the meta-data, SD, AF, TRPB, MP and GOD: harmonized the Inter-Connect data; SD: analyzed data; SD, FJ and MBS: wrote the manuscript and have primary responsibility for final content; and all authors: interpreted the results and critically revised the article for important intellectual content, and read and approved the final manuscript. The corresponding authors attest that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted. MBR and MAMG acknowledge that the SUN Project has received funding from the Spanish Government-Instituto de Salud Carlos III, and the European Regional Development Fund (FEDER) (RD 06/0045, CIBER-OBN, Grants PI10/02658, PI10/02293, PI13/00615, PI14/01668, PI14/01798, PI14/01764, PI17/01795, PI20/00564 and G03/140), PNSD-2020/021, the Navarra Regional Government  (2019) for epidemiological studies on dairy products and cardiometabolic diseases from the Dutch Dairy Association and the Danish Dairy Research Foundation. PMV and PV acknowledge funding from GlaxoSmithKline, the Faculty of Biology and Medicine of Lausanne, and the Swiss National Science Foundation (grants 33CSCO-122661, 33CS30-139468, 33CS30-148401 and 33CS30_177535/1). MK and the Whitehall II study were supported by the UK Medical Research Council (MRCMR/R024227/1), the Wellcome Trust (221854/Z/20/Z) and the US National Institutes of Health (NIH, RF1AG062553, R01AG056477), during the conduct of the study. The funding sources did not participate in the design or conduct of the study; collection, management, analysis, or interpretation of the data; or preparation, review, or approval of the manuscript.

Funding
Availability of data and material Due to the federated and collaborated design of this InterConnect study, data and material cannot be made accessible. Individual study meta-data may be available upon request from the individual study PI's.

Code availability
The analysis code can be provided on request.

Conflict of interest
The authors declare no conflict of interest.
Ethics approval All cohorts obtained ethical review board approval at the host institution and written informed consent from participants.

Consent to participate
All participants in the individual cohorts gave their signed informed consent at recruitment.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.