Background

With a global burden of 18.1 million new cases and 9.9 million deaths in 2020 [1], cancer is one of the leading non-communicable diseases. Despite the extensive research in the field, a causal relationship with cancer has been established only for a limited number of risk factors. Identification of causal relationships with specific risk factors and separation from spurious associations is key to cancer prevention. Despite being considered the gold standard for identification of causal relationships, randomized controlled trials (RCT) are often impractical or even unfeasible to perform due to time constraints and ethical issues. Conversely, the capacity of epidemiological observational studies to identify causal relationships is limited, due to confounding, reverse causation, and other biases [2].

Mendelian randomization (MR) is an analytic approach which utilizes genetic variation as a randomized instrument of the exposure of interest to provide insights into causality. As genetic variants are assumed to be randomly distributed at conception, MR can be considered akin to a “natural” RCT [3, 4]. By using genetic variants (single-nucleotide polymorphisms [SNPs]) as instrumental variables (IV) to assess the association of a genetically predicted exposure with the outcome of interest, MR analyses can provide estimates less prone to some common epidemiological biases. Nevertheless, for a MR analysis to be valid, three assumptions for IVs must be met: (a) the genetic variants should be associated with the exposure; (b) the genetic variants must not be associated with measured or unmeasured confounders of the exposure-outcome association; (c) conditional on the exposure and the confounders, the genetic variants must be independent of the outcome. Given the growing availability of large-scale genomic information from published genome-wide association studies (GWAS), it is no wonder that during the past decade MR analyses have seen a substantial increase, especially after the introduction of the “two-sample” summary-data MR approach that can improve feasibility and efficiency [5].

Researchers are faced with the challenge of evaluating the MR evidence, filtering this information and deriving valid inferences. The continuously increasing amount of new scientific information coupled with the fact that two of the three MR assumptions (b and c) cannot be confirmed empirically further complicates this cumbersome task. Furthermore, the field of evaluating MR associations is rapidly evolving [6, 7]. The investigation and assessment of the potential violations of the MR assumptions, especially in the case of multiple instruments, is a key step towards a valid inference and a robust interpretation of potential causal associations. Several sensitivity analyses have been proposed that address the validity of these assumptions, and the results from MR studies that do not use them should be viewed as incomplete [8].

In this paper, we systematically reviewed the literature investigating associations between genetically predicted risk factors and any type of cancer using MR approaches. Firstly, we aimed to map and describe the current state of MR literature on cancer risk, identify areas where research has focused, and identify possible gaps and emerging areas of interest. Furthermore, we aimed to evaluate these associations using a breadth of well-established MR methods and the most commonly applied sensitivity analyses to identify those presenting robust evidence for causality. We note that the word “robust” refers to evidence of causality for the studied associations, not the quality of the analysis.

Methods

This systematic review was conducted in accordance to the published protocol that was registered in the open Science Network registries (https://osf.io/2ruct) and is reported following the Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA) checklist [9].

Search Strategy

A detailed description of the search strategy and inclusion and exclusion criteria along with the data extraction process is provided in the Additional file 1: Supplementary methods [10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26]. Briefly, we searched the Medline (via PubMed) and Scopus databases from inception to 06/10/2020 using a combination of the terms “Mendelian randomization,” “genetic instrument,” and “cancer” and their synonyms for MR studies investigating the association of genetically predicted risk factors with risk of cancer development or mortality. We also screened the references of relevant reviews and the references of the included studies. We extracted information on the exposure and outcome of interest, the genetic instrument, the MR design (one-sample or two-sample, based on whether the gene-exposure and gene-outcome associations were estimated on the same or different populations), and main MR analysis results (as defined by the authors). We further extracted information on a number of sensitivity MR methods, namely MR-Egger, weighted median (WM), MRPRESSO, and also multivariable MR (MVMR).

Evaluation of Robustness in the identified associations

The robustness of the evidence was categorized into four a priori designed levels of evidence for causality (robust, probable, suggestive, insufficient evidence) (Fig. 1) based on information from both the main MR analysis and at least one of the MR-Egger, WM, MRPRESSO, and MVMR. These methods were chosen as they are the most commonly used in the MR literature to assess and adjust for potential assumption violations. The grading was performed in the following manner: Robust evidence for causality was achieved when all the performed methods (i.e., main analysis, and MR-Egger, WM, MRPRESSO, and MVMR) for the specific association presented a nominally significant p value. We used instead the p value threshold for the main analysis adjusted for multiple testing when this was reported. Furthermore, in all methods, the direction of the effect estimates needed to be concordant. The evidence was graded even if some of the sensitivity analyses were not performed, but at least one was required for the evaluation. Probable evidence for causality was achieved when at least one method (main or sensitivity analysis) had a nominally significant p value of 0.05 (for the main analysis, we took the p value threshold as set up by the study due to multiple testing) and direction of the effect estimate was concordant for all the methods. Suggestive evidence for causality was achieved when at least one method had a nominally significant p value (for the main analysis, we took the p value threshold as set up by the study due to multiple testing), but the direction of the effect estimates differed between methods. Associations that presented nominally non-significant p value for all methods (in the main analysis, the p value did not survive the threshold set up by the study due to multiple testing) were classified as insufficient evidence for causality. This category may contain associations for which evidence for causality is unclear (due to low power and wide confidence intervals) but also associations for which MR analyses suggest that a moderate size of causal effect is unlikely. Finally, associations that did not present any of the sensitivity analyses were categorized as non-evaluable evidence. We also performed a separate analysis by removing the MR-Egger test from the criteria as it often provides different results from the other methods due to low power [27, 28]. Associations presenting MR-Egger as the sole sensitivity analysis were not graded in this separate evaluation.

Fig. 1
figure 1

Categorization of the evidence. * For the main analysis: statistically significant at the threshold set up by the study due to multiple testing or at 0.05 if no multiple testing threshold was defined. For the sensitivity analyses: statistically significant at 0.05

The structure of this evidence quality grading relates more to polygenic MR analyses than to MR analyses for gene products (e.g. proteins) that are conducted using variants from a cis-gene window and are more likely to use only one or a few SNPs as instrument. Therefore, we further assessed the associations in the non-evaluable evidence category by evaluating how many of them used biological relevance and cis IV definitions and among them how many conducted a colocalization analysis, which evaluates the shared, local genetic architecture and causality between two traits [29].

Patient and public involvement

No patients were involved in the development of the research question or the outcome measures, nor were they involved in the study design or the interpretation of the results.

Results

The search strategy yielded a total of 6074 original search results of which 305 were evaluated in full text and 115 records were excluded [12, 14, 15, 20,21,22, 30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138] (specific reasons for exclusion are presented in Additional file 2: File S1) leading to 190 eligible MR publications [139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328] (Fig. 2). These 190 publications presented 4667 MR associations for 16 exposure categories, including 852 unique exposures, namely amino acids and derivatives (N = 81 unique exposures), anthropometrics (N = 47), circulating leukocyte telomere length (N = 1), diabetes and related biomarkers (N = 37), dietary intake and micronutrient concentrations (N = 42), fatty acids and derivatives (N = 59), growth factors (N = 12), inflammatory biomarkers (N = 82), lifestyle, education and behavior (N = 35), lipid metabolism biomarkers (N = 148), methylations (N = 14), reproductive factors (N = 8), steroids (N = 24), clinical measurements (N = 21), other diseases and traits (N = 47), and other metabolites/biomarkers (N = 194) (Additional file 2: File S2), and 21 cancer sites (i.e. head and neck, esophageal, stomach, small intestine, colorectal, liver and biliary tract, pancreatic, lung, skin/melanoma, sarcomas, breast, cervical, endometrial, ovarian, prostate, kidney, bladder and urinary tract, central nervous system, thyroid, leukemias and lymphomas, and any cancer/mixed) and their subsites. The vast majority of associations (N = 4532; 97%) investigated cancer risk with only 135 (3%) associations being on cancer mortality. The complete evidence base of the extracted information is provided in the Additional file 2: File S3.

Fig. 2
figure 2

Study selection flowchart

Description of the evidence base

The 190 MR studies on cancer were published as early as 2009, but the majority (N = 135; 71%) were published after 2018. Most publications (N = 149; 78%) used a two-sample MR design, 30 publications (15.7%) used a one-sample design, and 11 publications (5.8%) presented both one- and two-sample MR analyses. The design of one publication was unclear (Fig. 3).

Fig. 3
figure 3

Time trend of Mendelian randomization (MR) publications on cancer risk or mortality, by MR design

For most MR analyses, the variants used as instruments for the exposure were derived from populations of European ancestry (N = 3183; 68.2%), 31 (0.7%) from Asian, four (0.1%) Amish, three (0.1%) South American, and 56 (1.2%) mixed, while for 1390 (29.8%) associations, the exposure population ancestry was not reported. Regarding the outcome, in most comparisons (3221; 69%) population ancestry was European, 233 (5%) Asian, 12 (0.3%) South American, one African, and 101 (2.2%) mixed, while for 1099 (23.5%) outcome population ancestry was not reported.

Body mass index (BMI) was the most frequently studied exposure with 278 MR analyses across 40 publications, followed by vitamin D-related phenotypes with 149 MR analyses across 25 publications, and height with 109 MR analyses across 23 publications. The sample size for the exposure genetic analysis was reported in 3454 associations with a median of 17,649 participants (range, 231 for the metabolite X-12435 to 1232091 for smoking initiation).

The most frequently studied cancer was breast, which was investigated in 63 publications, followed by lung (N = 57), colorectal (N = 53), and prostate (N = 49). In contrast, pancreatic cancer had the highest number of MR analyses (N = 646; 13.8%), followed by lung (N = 634; 13.6%), breast (N = 586; 12.6%), and ovarian (N = 582; 2.5%). With regards to the number of cases, breast cancer had the highest number of cases (median N = 69,501 across 534 analyses), followed by prostate cancer (median N = 44,825 across 352 analyses), with small intestine cancer having the smallest median number of participants (N = 156; 36 analyses).

Description of the instrument selection

The median number of SNPs used as instruments was five, ranging from one to 3163, whereas for 141 (3%) MR analyses this information was not reported (Additional file 2: Table S1). In the majority of the analyses (4108; 88%), instrument selection was based on the genome-wide significance threshold 5 × 10−8, 87 (1.9%) analyses used a stricter threshold of significance, 102 (2.2%) analyses used a more lenient threshold, and in 370 (7.9%) analyses the significance threshold for instrument selection was not reported. For 1241 (26.6%) associations, the authors reported that the choice of the instruments was based on their biological relevance to the exposure of interest. The most frequently used clumping thresholds for SNP inclusion were r2 < 0.001 (N = 1203; 25.9%), r2 < 0.01 (N = 1058; 22.7%), and r2 < 0.1 (N = 1059; 22%). The percentage of variance explained (R2) was reported for 2162 (46.3%) associations and ranged from 0.01 to 100% (for chemokine [C-X-C motif] ligand 1 and chemokine [C-C motif] ligand 4) with a median of 2.9% (Additional file 2: Table S1). Only about one-in-four associations (N = 1135) reported a numerical estimation of the power of the MR analysis, with a median reported power of 76% (range 1 to 100%) (Additional file 2: Table S1). A total of 1326 (28%) associations reported on the adjustments used for the exposure GWAS. The majority (N = 1283; 96.8%) adjusted for population stratification, 907 (68.4%) adjusted for age, 720 (54.3%) for sex, and 271 (20.4%) used adjustments specific to genotyping methods. Other adjustments included study location or assessment center (N = 169; 12.8%), anthropometrics (N = 85; 6.4%), lifestyle factors (N = 73; 5.5%), and study year/time (N = 42; 3.1%), whereas in 81 (1.7%) analyses a number of additional adjustment factors were used.

Description of the results and robustness of the evidence

Most analyses were based on a two-sample (N = 4304; 92.2%) and only 363 (7.8%) used a one-sample design. The statistical analysis method of preference as main analysis with 2974 (63.7%) associations was the inverse-variance weighted method (either fixed-effect or random-effects), whereas 734 (15.7%) associations were derived from likelihood-based analyses. Other statistical analysis approaches used for the main MR analysis included the Wald ratio, generalized models (generalized least squares and generalized summary-based MR), two-stage regression approaches (35% of the one-sample designs), WM, and MR using robust-adjusted profile scores. Forty-two publications (22.1%) performed an adjustment for multiple comparisons, and from the 4667 total associations only 523 (11.2%) were statistically significant in the main analysis at the threshold set up by the study due to multiple testing or at nominal significance (p value < 0.05) if no multiple testing threshold was defined. Sensitivity analyses were mostly performed in two-sample MR, and a limited number of these sensitivity analyses were performed in one-sample MR designs.

Across two-sample designs, MR-Egger was evaluated in 1293 (30%) analyses with 140 (10.8%) of those presenting a nominally statistically significant MR-Egger slope; a total of 1055 (24.5%) associations performed a WM analysis with 217 (20.6%) being statistically significant, while sensitivity analyses using MRPRESSO or multivariable MR were fairly limited with only 142 (3.3%; with N = 55; 38.7% statistically significant) and 171 (4%; with N = 53; 31% statistically significant) associations, respectively (Additional file 2: Table S2). Across the 363 analyses with one-sample design, 46 performed a MR-Egger (N = 3; 6.5% significant), 27 a WM (N = 5; 18.5% significant), no analysis performed MRPRESSO, and 27 performed a MVMR analysis (N = 9; 33.3% significant) (Additional file 2: Table S2).

A total of 1467 (31.4%) MR associations reported in 121 publications presented results on both the main and at least one sensitivity analysis and were further evaluated based on the aforementioned grading scheme. The rest of the MR associations (N = 3200; 68.6%) across 123 publications only presented results for the main analysis and therefore could not be graded. Of those 3200 associations, 293 (9.2%) had a one-sample and 2907 (90,8%) a two-sample design. For 36.6% (N = 1171) of analyses, the authors selected the IVs based on their biological relevance to the exposure, with 1106 (94.5%) of them having a two-sample design. A total of 238 (7.4%) associations with only a main analysis were statistically significant (or survived a multiple testing threshold) and for only 60 (25.2%) of those the selection of the instrument was based on biological relevance. Of those, 14 used a cis definition for the selected instruments, but none of those performed a colocalization analysis.

A graphical overview of the robustness of the evidence per exposure category and cancer group is presented in Fig. 4. Out of the 1467 graded associations, we observed 87 MR analyses that presented robust evidence (5.9%; 1.9% of total MR analyses), 275 with probable evidence (18.8%; 5.9% of total), 89 with suggestive evidence (6.1%; 1.9% of total), and 1016 with insufficient evidence (69.3%; 21.8% of total) based on the results of the main and sensitivity analyses. Across the 16 exposure categories, anthropometrics had the highest number of robust analyses (N = 16; 18.4%), followed by steroids (N = 13; 15%), circulating leukocyte telomere length (N = 13; 15%), the other diseases and traits category (N = 12; 13.8%), and lipids (N = 10;11.5%), whereas no robust association was found among the amino acids and derivatives, fatty acids and derivatives, inflammatory biomarkers, methylations, and other metabolites and biomarkers categories (Table 1). Across cancers, the highest number of robust associations was observed for breast cancer with 29 (33.3%) of the 87 robust associations, followed by lung cancer (N = 14; 16.1%) and endometrial (N = 11; 12.6%). Head and neck, stomach, small intestine, pancreatic, cervical, and central nervous system cancers did not present any robust MR associations (Table 2). The network of the robust exposure–cancer associations is presented in Fig. 5.

Fig. 4
figure 4

Evidence map

Table 1 Number and percent of Mendelian randomization analyses per grading category by exposure category
Table 2 Number and percent of Mendelian randomization analyses per grading category by cancer group
Fig. 5
figure 5

Network of the exposure–cancer associations of the Mendelian randomization analyses presenting robust evidence. Note: For circulating telomere length, the red arrows refer to longer while the green arrows refer to shorter genetically predicted telomere length. For HMG-GoA reductase, the green arrow to ovarian cancer refers to decreased genetically predicted levels of the exposure. Abbreviations: AC: adenocarcinoma; BMI: body mass index; ER−: estrogen receptor negative; ER+: estrogen receptor positive; FEV1: forced expiratory volume in one second; HDL: high-density lipoprotein; HMG-CoA: 3-Hydroxy-3-methylglutaryl coenzyme A; IGF-1: insulin-like growth factor 1; LDL: low-density lipoprotein; SCC: squamous cell carcinoma; SHBG: sex-hormone-binding globulin

The 16 robust associations from the anthropometrics category pertained to BMI (including childhood BMI and early life body size) and waist-to-hip ratio (WHR) with decreased risk of total breast cancer [164, 250, 255, 299], estrogen receptor positive (ER+) [250, 299], and negative (ER−) disease [164, 250, 299]); BMI with increased risk of kidney/renal cell [240] and endometrial [293] cancer, and adult height with increased overall [204] and ovarian cancer risk [194]. Thirteen robust associations were observed in the steroids category, pertaining to the positive association of different measures of testosterone with breast (total, ER+) and endometrial cancer, and to the negative association of sex-hormone-binding globulin (SHBG) and endometrial cancer [301]. Thirteen robust associations were also found for longer (shorter) leukocyte telomere length pertaining to increased (decreased risk, respectively) risk of total cancer [244], lung (total, adenocarcinoma [AC], AC-never smokers) [241], kidney/renal cell [185], osteosarcoma [314], skin [288], thyroid [288], leukemia [288], and lymphoma and multiple myeloma [288]. The 10 robust associations from the lipid metabolism biomarkers category pertained to high-density lipoprotein cholesterol (HDL-C) with increased risk of breast (total, ER+, ER−) [279] but decreased risk of overall cancer [197]; triglycerides (TGL) with decreased risk of breast [207]; low-density lipoprotein cholesterol (LDL-C) with decreased risk of endometrial (total, non-endometrioid) [321] and lung squamous cell carcinoma (SCC) [178]; total cholesterol and lung SCC (decreased risk) [178]; and 3-hydroxy-3-methylglutaryl coenzyme A (HMG-CoA) reductase with ovarian cancer (decreased risk for decreased genetically predicted levels of the exposure) [309]. From the lifestyle, education, and behavior category, six associations were found with robust evidence, namely between smoking and increased risk of lung cancer (total [286, 328], SCC [328], small cell [328]), two between physical activity and decreased risk of colorectal cancer [296] and one between chronotype and decreased risk of breast cancer [254]. From the dietary intake and micronutrient concentrations category, we found eight robust associations pertaining to magnesium with breast (total and ER+, increased risk) [324], ferritin with liver (increased risk) [311], alcohol consumption with lung (increased risk) [286], and vitamin B12 with increased risk of ovarian cancer of low malignant potential [274]. Transferrin saturation showed increased risk of liver cancer, but transferrin levels presented a decreased risk [311]. The rest of the robust associations pertained to age at menarche with ovarian (total and serous; decreased risk) [260], alcohol use disorder diagnostic codes with ovarian serous (decreased risk) [317], endometriosis with ovarian [261] and with endometriosis-uterine leiomyoma [235] (both increased risk), gallstone disease with gallbladder (increased risk) [264], insulin-like growth factor 1 (IGF-1) with breast (increased risk) [295], obstructive sleep apnea syndrome with breast (increased risk) [271], polycystic ovary syndrome with ovarian endometrioid (decreased risk) [237], stem cell growth factor beta (SCGF-β) with prostate (decreased risk) [304], schizophrenia with breast (total, ER+, ER−; increased risk) [210], standardized forced expiratory volume in 1 s with lung SCC (increased risk) [281], thyroid-stimulating hormone with cancer overall (decreased risk) [313], type 2 diabetes with esophageal (decreased risk) [312], and vitiligo with non-melanoma skin, melanoma, and ovarian (decreased risk) [306].

When the MR-Egger test was removed from the grading scheme as a sensitivity analysis, a total of 70 associations with probable and four with suggestive evidence were upgraded to robust, while 35 associations were upgraded from suggestive to probable. In contrast, 23 MR analyses with probable and 32 with suggestive evidence were downgraded to insufficient evidence. Finally, 15 associations with robust evidence, 34 with probable, 17 with suggestive, and 242 with insufficient evidence now presented only a main analysis and were non-evaluable (Additional file 2: Table S3).

Discussion

In this large systematic overview, we searched and mapped current literature evaluating the association of 852 distinct genetically predicted risk factors across 16 broad exposure categories in relation to 21 cancer sites and their subtypes by evaluating the results of 190 publications and over 4600 MR associations. Using a set of clear, comprehensive and easily replicable criteria to evaluate the validity of the reported associations, we found that less than 90 of the reported MR analyses presented robust evidence for causality and that the vast majority of the analyses did not perform sensitivity analyses, at least with regard to MR-Egger, WM, MRPRESSO, and MVMR. Most of the MR analyses supported by robust evidence were observed for anthropometric indices, steroid hormones, telomere length, and lipids.

The median number of IV size across all analyses was relatively small (N = 5), despite most studies being conducted in an era of large GWASs across a wide breadth of phenotypes. This may partially be explained by the large number of infrequently used biomarkers that were assessed in some studies [245, 315]. This may have affected the implementation of sensitivity analyses such as MR-Egger in several cases that did not include enough IVs. However, in only a limited number of analyses a further exploration of the association was performed using other approaches such as colocalization. Apart from sensitivity MR analyses not being frequently performed in the original studies (often but not always due to lack of sufficient number of IVs), other valuable insights regarding the methodological approaches can be gained by examining this evidence base. We observed that several different clumping thresholds for pruning SNPs were applied. While most studies used thresholds ranging from r2 < 0.001 to r2 < 0.1, one in ten had an even more liberal threshold. Researchers should consider adjusting for the potential correlation between IVs when using less strict thresholds such as 0.1 or higher [329]. Of note is also that less half of the analyses provided the percentage of variance explained by the IV and less than one quarter provided a power estimation, although some studies presented the power estimations graphically, but we were not able to extract those. Both the R2 and a priori power estimation are equally important for evaluating the capacity of an IV to provide valid and accurate estimates and can help to differentiate between non-significant but otherwise underpowered associations from real null ones.

Across the MR analyses pertaining to anthropometric exposures, robust evidence was observed predominantly for BMI. BMI was inversely associated with risk of total, ER+, and ER− breast cancer (mostly post-menopausal), which was supported by robust evidence across several different MR analyses. In contrast, observational evidence supports a positive association of body fatness with post-menopausal breast cancer risk, and an inverse association for premenopausal disease [22, 330, 331]. These contradictory results between MR and observational evidence may be attributed to the fact that genetically predicted BMI reflects more closely early life body fatness [164, 332], and early life body fatness has been inversely associated in observational [333] and in MR studies [164, 299] with both pre- and post-menopausal breast cancer. Robust evidence was also observed for a positive association of BMI and endometrial cancer in Asian populations [293], which is in line with the observational evidence on body fatness and endometrial cancer in the general population [330, 334, 335]. The results were also consistent in the main analysis of the four MR publications on BMI and endometrial cancer among European populations; however, these publications did not perform any sensitivity analyses for endometrial cancer [149, 203, 236], so they could not be evaluated in our grading scheme. The positive association of body fatness with renal cell carcinoma from observational studies [330, 336, 337] was confirmed in our review based on robust evidence for BMI and probable evidence for WHR and body fat percentage, both of which were upgraded to robust in the sensitivity analysis excluding the MR-Egger analysis. Several well-acknowledged observational associations of adiposity and cancer risk, namely for ovarian [330, 334, 338] and colorectal [330, 339] cancer were only supported by probable evidence. The association for ovarian cancer from the largest MR study to-date failed to reach robust evidence due to the main analysis not surviving the multiple comparisons threshold set by the original publication that investigated many risk factors, despite being nominally significant [261]. Similarly, for colorectal cancer, the MR analyses, despite consistently indicating an increased risk [164, 167], did not reach robust evidence due to several reasons, including not surviving the multiple correction thresholds and having non-significant sensitivity analyses. BMI also presented probable evidence of an increased risk with lung SCC. The results from observational data are showing inverse associations for BMI and risk of total lung cancer [330, 340], which are likely due to residual confounding by smoking [341]. With respect to other anthropometric exposures, namely adult height, WHR, waist and hip circumference, the results were in line with the ones for BMI although being supported by lower levels of evidence in MR studies, with the exception of adult height and overall [204] and ovarian cancer [194] that reached robust evidence.

Robust and probable evidence was also found for the positive association of genetically predicted testosterone concentrations with risk of breast and endometrial cancer, and the negative association of SHBG with endometrial cancer. These results have been partially confirmed in observational evidence [342, 343]. Conversion of androgens into estrogens in the adipose tissue of post-menopausal women may partially explain these results, due to the role of estrogens in breast [344] and endometrial cancer cell proliferation [345]. On the other hand, excess weight, insulin resistance, and hyperinsulinemia have been associated with changes in total and bioavailable plasma sex steroid levels in women through a number of mechanisms that can lead to a decrease in plasma SHBG levels, and a rise in bioavailable testosterone [346].

A considerable fraction of the studies focused on circulating leukocyte telomere length, for which robust associations were observed with total cancer, and with lung, leukemia, lymphoma, osteosarcoma, skin, and thyroid cancers, where longer telomeres increased the risk (or shorter lengths decreased the risk) of these cancers. Furthermore, a positive association with increased telomere length was supported by probable evidence for a number of other cancer sites, such as glioma, bladder, kidney, melanoma, multiple myeloma, non-Hodgkin’s lymphoma, ovarian, and prostate cancer, several of which were upgraded to robust with the exclusion of the MR-Egger analysis. In contrast, negative associations of increased telomere length with cervical, head and neck, pancreatic, and skin basal cell cancers were supported by probable evidence. The observational evidence has created controversy in the literature about the direction of the associations [347, 348], while in a recent umbrella review the strength of the observational evidence was deemed relatively weak and inconsistent [349]. A recent review on the association of telomere length and cancer risk highlighted the importance of the pleiotropic effects of certain telomere-related loci such as TERT, TERC, and OBFC1 [20], while mediation MR analyses have indicated that a considerable proportion of the association between the TERT region and lung cancer risk is mediated by telomere length [241]. The current understanding is that telomeres may both promote and also limit cancer proliferation and neoplastic progression [350, 351], although the potential of proliferation from longer telomeres seemingly overshadows the risk stemming from genetically determined shorter telomeres [352].

Several associations were identified for lipids, especially TGL, total cholesterol, LDL-C, and HDL-C. Specifically, the negative association of TGL with total and ER+ breast cancer was supported by robust and probable evidence, which is in line with the observational evidence [353, 354]. For LDL-C and HDL-C, the MR results were consistent across several studies, indicating a positive association with total, ER+, and ER− breast cancer. These associations are further supported by consistent results from MVMR analyses adjusting for other lipid traits. However, the observational evidence is contradictory for LDL-C and HDL-C, as previous meta-analyses have shown a negative association for LDL-C and no association for HDL-C [354, 355]. With regard to endometrial cancer, we found robust evidence for a negative association with LDL-C and lower levels of evidence for associations with other lipids [321]. These results were concordant with MVMR analyses adjusting for BMI, but further MVMR analyses mutually adjusting for lipids were not performed. Limited observational evidence indicates a positive association with TGL [356,357,358] but no association with LDL-C or HDL-C [356, 359, 360]. An emerging robust association was observed between HMG-CoA reductase, the drug target of statins, and lower risk of ovarian cancer with consistent MVMR results accounting for BMI. Observational evidence for statin use suggests a decreased risk of ovarian cancer among statin users [361]. Only two associations presented robust evidence with lung SCC, pertaining to a negative association for total cholesterol and LDL-C, but MVMR analyses were not conducted, while for total lung cancer these associations were supported by probable evidence. Observational studies indicated a lower risk of lung cancer for circulating lipids [362]. For several other cancers such as colorectal, glioma, lymphomas, pancreatic, kidney, and multiple myeloma, the MR results were limited and inconsistent, without any robust evidence. The role of lipid metabolism in carcinogenesis and tumor growth has been acknowledged in the literature [363, 364] although the molecular mechanism is not yet fully understood and the associations are complicated by the potential role of different lipid subfractions and correlation between different lipids as well as with other traits and diseases such as BMI or metabolic syndrome [365, 366]. Regulating lipid metabolism has been identified as a promising target for anti-cancer interventions [363]. An overview of reviews on statin use has shown low levels of evidence in meta-analyses of observational studies for decreased risk of breast, colorectal, esophageal, gastric, hematological, liver, and prostate cancers, while the results from meta-analyses of RCTs were null [367].

Many of the included associations were non-evaluable due to not performing any of the sensitivity analyses required for our grading. Reasons may vary across studies, including inability to do so due to low number of instruments, especially for the MR-Egger analyses, prioritization of statistically significant associations for further evaluation with sensitivity analyses, or sensitivity analyses not being part of the authors’ analysis plan. There is a necessity to study these associations more comprehensively, especially in the cases of polygenic definition of instruments, which are more prone to biases or pleiotropy that can drive associations both towards and away from the null. Regardless of the reason and the appropriateness of the decision to include sensitivity MR analyses, these associations are not sufficiently investigated and are all considered non-evaluable in our grading scheme, which focuses on evaluating the robustness for causality of the studied associations.

Other efforts to summarize the evidence of MR analyses on cancer risk have been performed previously. However, they were either limited to specific exposures [12, 14, 18, 20] or cancer sites [15, 16], or used a more narrative approach of presenting and assessing the MR results [11, 13, 19], while none performed a formal evaluation of the evidence. Instead, our review used predefined criteria for the categorization of the evidence for causality, which increases the transparency and reproducibility of our results. We did not evaluate the quality of reporting of the MR studies, as there are only some very recent efforts focusing in this topic [17], and comprehensive reporting guidelines were very recently developed [7]. In addition, as guidelines for performing MR studies [6] have also very recently been developed and are not yet widely agreed upon, we refrained from using those to evaluate the quality of the identified studies. Although the grading scheme utilized in our review prohibited us from evaluating a large proportion of the included MR analyses because they did not report on any sensitivity MR analysis, most of the results that received robust evidence were in line with previous observational research and are further supported by mechanistic evidence.

Several limitations need to be acknowledged. Our search strategy may have resulted in missing some relevant studies, especially if the MR analysis was not the primary focus of some studies but only a supplementary analysis, which seems to be increasingly common in recent GWA studies. In these cases, however, we would not expect a comprehensive evaluation of the studied associations using sensitivity MR analyses, which would only lead to inflation of the number of associations with non-evaluable evidence. The structure of the criteria for evaluation of the robustness of the MR evidence for causality was more geared towards the evaluation of two-sample MR approaches, but the percentage of one-sample designs that did not perform one of the pre-specified sensitivity analyses was only marginally higher than that of two-sample designs. Associations evaluated in earlier publications, especially those before many of the sensitivity analyses were introduced, could also not be evaluated. However, the majority of the studies were published after 2018 and the earlier associations often relied on limited number of cases or on instruments including only a limited number of SNPs and with low percentage of variability explained. Information of the percentage of variance explained and statistical power of the instrument was often not reported, and thus a complete assessment of weak instrument bias could not be performed. Therefore, the grading scheme did not allow us to distinguish MR analyses that presented robust evidence of lack of association from MR analyses that did not present an association due to being insufficiently powered. Future studies may benefit from reporting this information. The approach undertaken in this review for grading the associations did not allow to us to evaluate MR analyses that only presented a main analysis without being supported by sensitivity analyses. Since two of the three MR assumptions are not directly testable, a MR analysis is imperative to be supported by a comprehensive evaluation of complementary and sensitivity analyses to increase credibility of the results, as such approaches can at least give some indication of large violations of the assumptions. Most MR analyses evaluating associations for gene products using cis instruments were non-evaluable using our current criteria as most included one or two SNPs as IVs, and the sensitivity analyses could not be applied. However, only two of these studies performed colocalization analysis and neither presented statistically significant associations for these specific analyses. More recently introduced sensitivity MR analyses were not included in the current evaluation, as their use is very infrequent in the MR literature. Finally, there is discrepancy in the availability of genetic data for different cancers, and hence the MR studies that have been possible; thus, cancer consortia are encouraged to make their summary data more readily and widely available.

Conclusions

The field of cancer epidemiology is challenging to evaluate due to the sheer amount of available observational evidence and further burdened by the increasing interest on MR methodologies that could complement findings from traditional observational research. Our work summarizes and evaluates the robustness of the MR analyses evidence for causality in cancer prevention and etiology. Only a minority of the evaluated MR analyses were supported by robust evidence. In addition, we identified gaps in the conduct and reporting of MR studies that will assist in developing stronger future reporting guidelines.