Genome-wide association study of endometrial cancer in E2C2

Endometrial cancer (EC), a neoplasm of the uterine epithelial lining, is the most common gynecological malignancy in developed countries and the fourth most common cancer among US women. Women with a family history of EC have an increased risk for the disease, suggesting that inherited genetic factors play a role. We conducted a two-stage genome-wide association study of Type I EC. Stage 1 included 5,472 women (2,695 cases and 2,777 controls) of European ancestry from seven studies. We selected independent single-nucleotide polymorphisms (SNPs) that displayed the most significant associations with EC in Stage 1 for replication among 17,948 women (4,382 cases and 13,566 controls) in a multiethnic population (African America, Asian, Latina, Hawaiian and European ancestry), from nine studies. Although no novel variants reached genome-wide significance, we replicated previously identified associations with genetic markers near the HNF1B locus. Our findings suggest that larger studies with specific tumor classification are necessary to identify novel genetic polymorphisms associated with EC susceptibility. Electronic supplementary material The online version of this article (doi:10.1007/s00439-013-1369-1) contains supplementary material, which is available to authorized users.


Introduction
Endometrial cancer (EC), a neoplasm of the uterine epithelial lining, is the most common gynecological malignancy in developed countries and the fourth most common cancer among US women (www.cancer.org 2013). This disease primarily affects postmenopausal women and is more common in women of European ancestry. In the USA in 2013, an estimated 49,560 women may develop EC and 8,190 may die from the disease, a case fatality similar to that of breast cancer. The estimated lifetime risk of women developing the disease in the USA is 1 in 38 (www.cancer.org 2013). EC is categorized into two distinct subtypes based on histologic and clinical characteristics. Type I ECs, the most common in women of European ancestry (80-90 %), are mostly endometrioid adenocarcinomas (EA). The remaining 10-20 % of ECs are Type II, which predominantly consist of serous and clear cell carcinomas.
EC risk is strongly increased by a Western lifestyle, with up to tenfold higher incidence rates in Western, industrialized countries than in Asia or rural Africa (Pisani et al. 1993). Major risk factors include obesity and use of postmenopausal estrogen-only hormone therapy (ET). Excess body weight has been associated with a two to fivefold increase in EC risk in both pre-and postmenopausal women, and has been estimated to account for about 40-50 % of EC incidence in affluent societies (Bergstrom et al. 2001). Epidemiological evidence also suggests increased risks in association with early age of menarche, late age of menopause, nulliparity and infertility. Furthermore, women with a family history of EC have their risk increased by nearly twofold (Gruber and Thompson 1996;Lucenteforte et al. 2009) and an even greater risk in rare family cancer syndromes such as Lynch syndrome (also termed hereditary nonpolyposis colorectal cancer, HNPCC) Nicolaides et al. 1994;Risinger et al. 1993), suggesting that inherited genetic factors increase susceptibility to EC. Though these studies support an inherited genetic component to risk (Vasen et al. 1994;Schildkraut et al. 1989;Gruber and Thompson 1996;Seger et al. 2011), twin studies suggest that the familial aggregation in risk may be mostly due to shared environmental factors and not shared genetics (Lichtenstein et al. 2000).
The predominant mechanistic hypothesis describing Type I endometrial carcinogenesis is known as the "unopposed estrogen" hypothesis (Key and Pike 1988). This theory states that EC risk is increased among women who have high circulating levels of bioavailable estrogens and low levels of progesterone, so that the mitogenic effect of estrogens is insufficiently counterbalanced by the opposing effect of progesterone. The unopposed estrogen hypothesis is supported by observations that the use of ET (Herrinton and Weiss 1993;Persson et al. 1989) and of Oracon (a sequential oral contraceptive (OC) characterized by an unusually high ratio of estrogenic to progestogenic activity) (Weiss and Sayvetz 1980) greatly increase EC risk, while use of combined OCs (i.e., containing progestins as well as estrogen throughout the treatment period) 1 3 is associated with a reduced risk (Henderson et al. 1983). A further observation that led to the unopposed estrogen hypothesis is that mitotic rates of endometrial tissue are higher during the follicular phase of the menstrual cycle, when progesterone levels are low and the uterine lining undergoes proliferation, than during the luteal phase (Ferenczy et al. 1979). Progesterone counteracts the growthstimulatory effects of estrogen by inducing glandular and stromal differentiation (Clarke and Sutherland 1990;Ace and Okulicz 1995) and endometrial hyperplasia can be reversed by progestin therapy (Ehrlich et al. 1981). Many of the genes in the sex steroid hormone metabolism pathway have served as "candidates" in search of polymorphic variants that predispose to EC. Although some studies suggest that SNPs in these genes, for example, the CYP19A1 (aromatase) gene, are associated with EC risk (Setiawan et al. 2009), very little of the genetic risk can be explained by these SNPs.
To this end, efforts have been undertaken to identify genes involved in EC causation. Recently, two genomewide association studies (GWAS) of EC have been conducted (Spurdle et al. 2011;Long et al. 2012). However, only one study identified a novel genome-wide significant association (P = 7.1 × 10 −10 ) with a susceptibility marker located at 17q12 (rs4430796), near the HNF1 homeobox B (HNF1B) gene, in relation to EC. Though originally identified in women of European ancestry, this locus has been replicated in other ethnicities (Setiawan et al. 2012). This marker has also been associated with prostate cancer , diabetes (Winckler et al. 2007;Gudmundsson et al. 2007) and certain subtypes of ovarian cancer . In search of additional common genetic variants, we conducted a two-stage GWAS of EC among women participating in studies that are part of the Epidemiology of Endometrial Cancer Consortium (E2C2, details in Supplementary Table 1).

Results
We conducted a GWAS within the E2C2 to identify genetic loci that predispose to EC. Details on the 15 participating studies are provided in Supplementary Table 1 After quality control metrics were applied (see methods), over 524K-genotyped SNPs remained in each study for a combined total of unique 873K SNPs for analysis. The genomic control lambda for the study was 1.008, indicating little evidence of population substructure, relatedness or differential genotyping between cases and controls ( Fig. 1). No SNP association reached genome-wide significance (P < 5 × 10 −8 ) (Fig. 2). In particular, we did not replicate rs1202524 (P = 0.39), a reported EC susceptibility locus in Asian women (Long et al. 2012), in our Stage 1 population of women of European ancestry.
Among SNPs associated with the smallest ranked P values, rs9344 and rs1352075 at the 11q13.3 locus caught our attention because of significant associations between this locus and breast cancer (Turnbull et al. 2010) and renal cancer (Purdue et al. 2011) in prior GWAS. Thus, we initially pursued a fast-track replication for seven SNPs independently associated with EC (r 2 < 0.2) at P < 1 × 10 −5 from Stage 1, as well as two HNF1B SNPs (rs4430796 and rs11651755) identified by Spurdle et al. (Spurdle et al. 2011) (Supplementary Table 2). The fast-track replication was conducted in a multiethnic sample of 2,294 cases and 3,395 controls from two cohorts [MEC and the Prevention Study II (CPSII) Nutrition cohort] and five case-control studies [the Alberta Health Services (AHS) study, FHCRC, Estrogen, Diet, Genetics, and Endometrial Cancer (EDGE) study, Turin, and Women's  Table 2b). We selected 2,129 SNPs with P < 0.0037 in Stage 1 for follow-up in a subset of the fast-track replication studies and two previously conducted GWAS (ANECS/SEARCH and SECGS) for Stage 2 (Supplementary Tables 3 and 4). DNA samples from a multiethnic sample of women in AHS, FHCRC, MEC and EDGE (Supplementary Table 5) were genotyped for 1,818 of these SNPs as custom content on Illumina's Human Exome 12v1 chip; the remaining SNPs failed design or quality control. After pooled analysis, no SNP association reached genome-wide significance in women of European ancestry or in women of multiple ethnicities combined, either among Type I EC cases (Table 2) or among those with endometrioid subtype (Table 3). In addition, we further adjusted for BMI; results did not change qualitatively (data not shown).

Discussion
Our present study reports results from a new independent GWAS of EC based on a total of 7,077 cases and 16,343 controls from the E2C2 (Table 1). We did not identify any novel loci associated with EC that reached genome-wide significance (p < 5 × 10 −8 ).
In a joint analysis of the GWAS and replication populations, the variant most significantly associated with EC was rs9459805 on chromosome 6 at the RNASET2 locus (OR = 1.19, 95 % CI 1.10-1.29; P = 1.11 × 10 −5 , Table 2). Of potential interest, two variants suggestively associated with EC (rs12514742, joint P = 5.78 × 10 −5 ; rs12521272, joint P = 7.37 × 10 −5 ) are located at the prolactin receptor (PRLR) gene locus on chromosome 5. Circulating levels of prolactin, a polypeptide hormone involved in numerous physiological processes including reproduction, are higher among EC patients compared to healthy controls (Levina et al. 2009;Yurkovetsky et al. 2007;Kanat-Pektas et al. 2010), and increased PRLR expression has been noted for endometrial tumors compared to noncancerous endometrial tissue. Prolactin signaling via PRLR has also been shown to potentiate proliferation and inhibit chemotherapy-induced apoptosis of EC cell lines (Levina et al. 2009). Additional studies in independent populations are required to confirm whether variants at the PRLR locus influence EC risk.
To date, only one locus associated with EC at the genome-wide significance level has been identified by GWAS (Spurdle et al. 2011). Located within the HNF1B gene on chromosome 17, the common variant most significantly associated with EC (rs4430796; OR per G allele = 0.84, 95 % CI 0.79-0.89; P = 7.1 × 10 −10 ) in the GWAS by Spurdle et al. (2011) was nominally associated with EC in our discovery (Stage 1) population in the expected direction (OR per G allele = 0.92, P = 0.03; Supplementary Table 2a). This effect estimate is consistent with a winner's-curse adjustment of the original GWAS effect estimate, which also yields a per G allele OR of 0.92 (Zhong and Prentice 2008). Further genotyping within fast-track replication studies confirmed the association of the rs4430796 G allele with reduced EC risk among women of European ancestry (joint OR = 0.90, 95 % CI 0.85-0.96; P = 5.2 × 10 −4 ) with no evidence of heterogeneity between studies (P = 0.50). In the earlier GWAS by Spurdle et al., the discovery phase was restricted to patients with the endometrioid histologic subtype of EC. Additionally restricting the replication stage to cases with endometrioid histology (~77 % of cases) slightly strengthened the association between rs4430796 and EC risk (joint OR = 0.82, 95 % CI 0.77-0.87; P = 4.3 × 10 −11 ) in the study by Spurdle et al. (2011).
Our GWAS study included all EC cases diagnosed with Type 1 tumors, a group consisting of the following histologic subtypes: endometrioid adenocarcinoma (ICD-O-3 codes 8380, 8381, 8382, 8383), adenocarcinoma tubular (8210, 8211), papillary adenocarcinoma (8260, 8262, 8263), adenocarcinoma with squamous metaplasia (8570), mucinous adenocarcinoma (8480, 8481) and adenocarcinoma NOS (8140) (Kim et al. 2008). Even though the endometrioid adenocarcinoma subtypes represent the majority of Type 1 tumors (60 %) (Robboy et al. 2009), the inclusion of the less common Type 1 histologic subtypes may have introduced sufficient heterogeneity to reduce power to detect genome-wide significant associations. However, when we restricted our analysis to Stage 1 and Stage 2 cases with known endometrioid histology, the overall association of rs4430796 with EC risk remained the same, while the significance weakened most likely due to a loss of power from the reduced sample size. This is consistent with results from the PAGE study, which found that HNF1B may be a general susceptibility locus for EC, as risk associated with rs4430796 [G] was similar for Type 1 and Type 2 tumors (Setiawan et al. 2012). Most of the suggestive SNP associations in our study (Table 2) were slightly weakened when the analysis was restricted to cases with known endometrioid histology (Table 3).
Endometrial cancer is part of Lynch syndrome, which is attributable to the inheritance of rare, highly penetrant mutations in DNA mismatch repair genes Peltomaki et al. 1993;Aaltonen et al. 1993). The lifetime risk of EC among women with HNPCC is 50-60 %, whereas that of the general population is 2-3 % (Seger et al. 2011). Women with this inherited predisposition to endometrial neoplasm tend to develop the disease 15 years earlier than the general population (Vasen et al. 1994). Studies on estimates of heritability for EC suggested a high genetic component for younger women (Schildkraut et al. 1989;Gruber and Thompson 1996;Parslov et al. 2000). In addition, a record linkage study in Utah (Seger et al. 2011) indicated that there was considerable clustering of EC in families, even accounting for obesity. On the other hand, a twin study of sporadic cancers (i.e., not attributable to family cancer syndromes), which account for 98 % of EC cases, suggests a low genetic contribution (Lichtenstein et al. 2000).
Based on the results of this study and the previous GWAS in European ancestry women (Spurdle et al. 2011), it is unlikely that there exist any common variants with large effects on the risk of EC, although there may be many markers with smaller effects. For example, the probability that at least one of these GWAS would identify a genomewide significant association with a marker that had a perallele odds ratio of 1.2 and a risk allele frequency of 0.30 is over 80 %. Conversely, the power of this study to identify a marker like rs4430796 with a per-allele odds ratio of 1.08 and risk allele frequency of 0.52 is 5 %; the power of the Spurdle et al. GWAS was under 1 %. This suggests that circa 18 additional markers with HNF1B-like effects on EC exist, but have not yet been identified due to low power (Park et al. 2010). Consequently, a GWAS with 12,000 cases and 24,000 controls-triple the sample size of the two European ancestry GWAS conducted to date-should identify three or more markers with HNF1B-like effect sizes with 85 % probability, as well as other markers with smaller effects. We caution that these projections are based on only one known GWAS-identified risk marker; we cannot rule out a larger number of HNF1B-like risk markers and can say little about markers with subtler effects.
In conclusion, we did not identify any novel loci associated with EC susceptibility. Taken together, a low inherited genetic component, tumor heterogeneity and the small expected effects of genetic variants could explain the apparent lack of association. Therefore, larger studies with specific tumor classification (Kandoth et al. 2013) are necessary to identify novel genetic polymorphisms associated with EC susceptibility.

Study participants
Participating studies are described in Table 1 and comprise a total of 7,077 EC cases and 16,343 controls from 15 studies (ten case-control and five cohort, which were analyzed as nested case-control). Cases in Stage 1 were diagnosed with Type I EC. In cohort studies, controls were cancer free at the time of case diagnosis. In case-control studies, controls had not had hysterectomies. The cohort studies were analyzed as nested case-control studies. Cases of European descent from CTS, CONN, FHRC, MEC, NHS and PLCO were scanned using Illumina Omniexpress. PLCO controls were scanned using Illumina Omni 2.5 and the PECS cases and controls were scanned using Illumina Human 660 W. With the exception of PLCO, all controls were matched to cases on age within each study site. Each participating study obtained informed consent from study participants and approval from its institutional review board (IRB) for this study and obtained IRB certification permitting data sharing in accordance with the NIH Policy for Sharing of Data Obtained in NIH Supported or Conducted Genome-Wide Association studies (GWAS).
Participating studies in Stage 2 are described in Table 1. We did not restrict to European ancestry in this stage; a multiethnic population was included (Supplementary  Table 5), although we also conducted sensitivity analyses restricted to women of European ancestry. We conducted two replications, a fast track, in which nine SNPs were genotyped in all studies except ANECS, SEARCH and SECGS using the Taqman assay. Stage 2 was conducted using the Illumina's Human Exome 12v1 chip with custom content in the following studies: AHS, FHCRC, MEC and EDGE.

GWAS Genotyping
DNA was isolated from peripheral blood following the manufacturer's recommended protocol. Genotyping was performed at two centers. At least 625 ng of each DNA sample from NHS, CONN, MEC, CTS and FHCRC was sent to USC for genotyping using the HumanOmniExpress BeadChips (Illumina Inc, San Diego, CA). The BeadChips were run on an Illumina iScan system using the Infinium HD Assay Super Automated Protocol. The GenomeStudio Genotyping (GT) Module (Illumina Inc, San Diego, CA) was used for data normalization and genotype calling. The following studies were genotyped at the Core Genotyping Facility (CGF), at the National Cancer Institute; PLCO cases were genotyped using the Illumina Omni Express chip, PECS controls were previously genotyped on the Illumina Human 660 W chip and PLCO controls were genotyped on the Omni 2.5 M chip.

Replication genotyping
Fast-track replication was performed at the Dana Farber/ Harvard Cancer Center High-Throughput Genotyping Core on the ABI PRISM 7900HT Sequence Detection System (Applied Biosystems, Foster City, CA) according to the manufacturer's instructions. TaqMan ® assays were ordered using either Assays-on-Demand or using the ABI Assays-By-Design service. All Stage 2 replication samples were genotyped using Illumina Exome 12v with custom content (N = 1818 SNPs) (Table 1).

Genome-wide association analysis
In total, 5,806 women with genotypes were available for Stage 1 analysis. To minimize bias due to population stratification, we used ~7,600 ancestry informative markers to identify and exclude women with <80 % European ancestry (N = 146). An additional four participants were excluded based on a self-report as being of non-European descent. We also identified four unexpected inter-study duplicates (all EC cases) and removed one subject from each unexpected duplicate pair. Because the scan was based on women of European descent with Type I EC, 180 cases of Type II EC were excluded for a final sample size of 5,472 (2,695 cases, 2,777 controls) women eligible for Stage 1. After filtering SNPs with completion rates <90 %, minor allele frequencies <1 %, and out of Hardy-Weinberg equilibrium (P < 0.0001) we had >524K genotyped SNPs in each Stage 1 study for a combined total of >873K unique SNPs across all studies. Concordance between known duplicates was >99.9 %.
We applied similar filters to the newly genotyped Stage 2 samples. Four pairs of unexpected duplicates (eight total samples) and 30 samples with <90 % SNP completion rate were removed. One genetically male sample and seven samples that did not cluster with other samples from their self-reported ancestry group were also excluded, leaving 2,975 samples for analysis. SNPs with <90 % completion rate were removed from analysis, as were SNPs that showed deviation from HWE at P < 10 −5 in any ethnic group.
Genotyping procedures, quality control and analysis procedures for the ANECS/SEARCH and SECGS GWAS have been reported previously (Spurdle et al. 2011;Long et al. 2012).
In all analyses, genotypes were coded log additively (0, 1, 2 copies of the minor allele) and logistic regression was used to model associations. Stage 1 analyses were adjusted for study and the first two principal components. Analyses of the newly genotyped Stage 2 data (i.e., all Stage 2 studies except ANECS/SEARCH or SECGS) were adjusted for study and the first four principal components. Principal components for Stage 1 were calculated using ~7,600 independent markers ; principal components for Stage 2 were calculated using 47,097 common SNPs on the exome chip. Of the 1,818 SNPs selected for replication in Stage 2, 1,371 loci included additional in silico data from two previously reported GWAS (Spurdle et al. 2011;Long et al. 2012) in a total of 2,121 cases and 10,209 controls from SEARCH/ANECS and SECGS studies. Study populations were analyzed separately and results combined using fixed effects meta-analysis. Association analyses of SNPs selected for fast-track replication were conducted in SAS Version 9.2 (SAS Institute, Cary, NC, USA). All other analyses were performed using PLINK software package (v 1.07, October 2009).

Supplementary acknowledgments
The Nurses' Health Study (NHS) is supported by the NCI, NIH Grants Number 1R01 CA134958, 2R01 CA082838, P01 CA087969, and R01 CA49449. The authors would like to thank the participants and staff of the Nurses' Health Study for their valuable contributions as well as the fol-  The Alberta Health Services Study (AHS) was supported by operating grants obtained from the National Cancer Institute of Canada with funds from the Canadian Cancer Society and the Canadian Institute for Health Research. The AHS is also supported by NCI, NIH Grant Number 2R01 CA082838. Dr Christine Friedenreich is supported by career awards from Alberta Innovates-Health Solutions and the Alberta Cancer Foundation through the Weekend to End Women's Cancers Breast Cancer Chair. Dr Linda Cook was supported through the Canada Research Chairs program.
The Turin endometrial cancer case control study was supported by the Italian Association for Research on Cancer (AIRC) and Ricerca Finalizzata Regione Piemonte.
Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.