Introduction

The bladder exstrophy-epispadias complex (BEEC) is a spectrum of congenital abnormalities which involves the abdominal wall, bony pelvis, the urinary tract, the external genitalia, and in the worse cases also the gastrointestinal tract. The BEEC represents the severe end of all human congenital anomalies of the kidney and urinary tract. The most common defect form, classic bladder exstrophy (CBE), is characterized by pubic diastasis, the evaginated bladder plate template, and an epispadic urethra. At birth, the visible bladder mucosa appears reddish and mucosal polyps may be seen on the surface. CBE is associated with kidney and other upper urinary tract anomalies with a higher occurrence in males compared to females1. Associated long-term complications include malignancies of the bladder comprising mainly urothelial cell carcinoma and adenocarcinoma2,3. Recently, the CBE live prevalence for Germany has been estimated to be ~1:30,7004. Given the overall European population of ~450,000,000 (https://ec.europa.eu/) citizens, presumptively ~15,000 CBE patients live in Europe. State-of-the-art health care for this population should take the genetic and bladder cancer disposition into account.

To determine the genetic contribution to CBE, we previously performed two genome-wide association studies (GWAS) with subsequent meta-analysis and identified a susceptibility locus on chromosome 5q11.15,6. The present study aimed to identify further risk loci. Furthermore, we investigated if the identified genetic risk loci might be involved in the associated bladder cancer susceptibility. For this purpose, we performed the largest GWAS for CBE to date comprising 628 patients and 7352 ethnically matched controls. In detail, the present meta-analysis included seven independent discovery samples (Supplementary Information: Supplementary Table 1) comprising: 98 patients of Central European origin and 526 ethnically matched controls5, 110 patients of Central European origin and 1,177 ethnically matched controls6, 172 patients of Central European origin and 2588 ethnically matched controls, 57 patients of Italian origin and 1,325 ethnically matched controls, 62 patients of Spanish origin and 279 ethnically matched controls, 80 patients of Swedish origin and 238 ethnically matched controls, and 49 patients of UK origin and 1,219 ethnically matched controls, identifying eight genome-wide significant risk loci, seven of which are novel. Within these loci reside 10 coding genes (LPHN2, EFNA1, SLC50A1, DPM3, KRTCAP2, ISL1, TRIM29, SYT1, PAWR, GOSR2) and four non-coding genes (one pseudogene and three long non-coding RNA, respectively, HMGB1P47, ISL1-DT, LINC01974, and LINC01716). Among these coding genes, EFNA1 has been previously shown to be strongly expressed in mouse embryonic genital tubercle, urethra, and primitive bladder prompting us to re-sequence this gene in our cohort. To assess their embryonic and fetal expression, we generated mouse embryonic bladder total RNA-seq at CBE-relevant developmental stages E10.5, E12.5, and E15.5, and human embryonic and fetal urinary bladder and genital tissues total RNA-seq at gestational week 7, 7 to 7.5, 7.5, 8, and 9. Finally, to evaluate their possible link in the overall CBE bladder cancer susceptibility, we analyzed the expression of these genes in urothelial carcinoma tissues and in different bladder cancer cell lines obtained from the Cancer Cell Line Encyclopedia (EMBL-EBI) compared to healthy bladder tissue transcriptomic (GEO).

Results

GWAS meta-analysis

The meta-analysis of 628 patients with CBE and 7,352 ethnically matched controls comprised seven independent GWAS. These seven GWAS included the first two GWAS cohorts5,6, and five new CBE cohorts described above from Central European, Italy, Spain, Sweden, and the UK along with ethnically matched control samples. We used a total of 8,289,003 SNPs with info score >0.4 and mean dosage for the minor allele >1% in cases and controls in at least one sample, obtaining a genomic inflation factor λ of 1.068. The respective Q-Q plot is shown in Supplementary information (Supplementary Fig. 16). Single marker analysis identified eight genome-wide significant loci shown in the Manhattan plot in Supplementary information (Supplementary Fig. 17) and the strongest signal at rs6874700 p = 5.58 × 10−24 corresponds to the 5q11.1 previously reported locus (Fig. 1)6. Table 1 shows the relative risks in each sample and in the meta-analysis for the most strongly associated SNP (top SNP) from each locus. Notably, with the exception of the UK sample where the top associated SNP on chromosome 12 was not significant, the direction of effect was consistent between all studies for these top SNPs. A complete list of all genome-wide significant SNPs is given in Supplementary Data 2. Regional association results for all eight genome-wide significant loci are shown in Fig. 1 and in Supplementary Information (Supplementary Figs. 18). For conditional logistic regression analyses the regional association plots are presented in Supplementary Information (Supplementary Figs. 2027). The results provide no evidence that secondary signals in any of the eight loci are present.

Fig. 1: Chromosome regional association loci of CBE.
figure 1

The eight CBE regional association loci (red boxes in chromosomes) reside in chromosome 1, 3, 5, 11, 12, 17, and 20. In the panels, details of the genome-wide association loci: every dot represents an SNP (x axes) plotted to the relative –log10(P value) (y axes). SNP are colored according to the relative r2 value.

Table 1 Top SNPs from genome-wide significant loci.

Re−sequencing of EFNA1

Among the most significant markers, marker rs4745 resides directly in EFNA1. Mouse Efna1 has been shown to be expressed in CBE-relevant embryonic anatomical structures (https://www.gudmap.org/). This prompted us to re-sequence EFNA1 in 580 CBE patients. We identified 14 rare variants in 14 independent patients (Supplementary information, Supplementary Table 4). Four of these variants residing in the coding region of EFNA1 were found to be novel: two heterozygous missense variants c.116 T > C (p.Ile39Thr) and c.503 C > T (p.Ala168Val); one homozygous missense variant c.167 A > G (p.Asp56Gly), and a heterozygous loss of function (LoF) frameshift variant at c.341delT (p.Phe114Serfs*28). Parental samples were only available for the patient carrying variant c.116 T > C demonstrating paternal transmission. In silico prediction tools, Mutation Taster, Poly-Phen-2, and SIFT defined the missense variant c.116 T > C (p.Ile39Thr) as disease-causing, deleterious, and possibly damaging. The CADD score of 25.3 supports a functional implication of this variant on EFNA1 regulation. None, of the other missense variants, were scored deleterious. The LoF variant c.341delT (p.Phe114Serfs*28) has a CADD score of 25.6. For the estimation of the enrichment of rare protein-altering variants in EFNA1 in our cohort compared to the general population resembled by gnomAD, we use a very conservative comparison. Hence, we only included the three novel coding variants with CADD score >20, identified here in our re-sequencing approach of EFNA1. We compared these to missense or LoF variants in gnomAD less or equal to 5 (≤5 in 250,000; MAF ≤ 0,00002) consistent with rare penetrant dominant phenotypes. These criteria identified 162 missense and LoF variants in gnomAD (baseline 250,000 alleles; https://gnomad.broadinstitute.org/gene/ENSG00000169242?dataset=gnomad_r2_1). Per se, it is possible that some of these variants are cis/trans in the same individuals but if we would be able to define this it would only make the association stronger. For comparison, we used Fisher’s exact test. Taking this assumption, the chi-square statistic using Fisher´s exact test, yielded 18.0159, and the p value is 0.000022. We added this statistic to our results.

Analysis of mouse and human embryonic total RNA-sequencing data of the identified genes in mouse and human embryonic and fetal urinary bladder and genital tissues

In the linkage disequilibrium block of all eight top SNPs reside 10 coding genes, four non-coding genes comprising one pseudogene, and three Long Intergenic Non-Protein Coding RNAs (Table 2). All 10 coding genes showed expression in mouse embryonic bladder at E10.5, E12.5, and E15.5. Isl1, Trim29, Syt1, and Pawr showed differential expression through different mouse embryonic stages (Table 3, Fig. 2a). As in mouse transcriptome, all 10 coding genes showed expression in the human embryonic bladder during different developmental stages. DPM3, TRIM29, SYT1, and PAWR showed differential expression at different human embryonic respectively fetal stages (Table 3, Fig. 2b). While the two LINC01974 and LINC01716 are not expressed in any of the bladder developmental stages, the pseudogene HMGB1P45 and the long noncoding RNA ISL1-DT are strongly downregulated; the first from weeks 7 to 7.5 followed by gene silencing at week 8 and 9 and the latter shows a downregulation trend from weeks 7 to 9.

Table 2 Coding and non-coding genes in the LD blocks of the most significant markers for CBE.
Table 3 RNA expression patterns of coding and non-coding genes in the LD blocks of the most significant GWAS markers in mouse and human embryonic urogenital tissue.
Fig. 2: Expression heatmaps of genes that reside in the LD blocks of the eight significant genetic markers.
figure 2

a Genes expression pattern in mouse embryonic bladder from embryonic day E10.5 to E12.5, E12.5 to E15.5, and E10.5 to E15.5. b Genes expression pattern in human embryonic and fetal bladder from week 7 to 7–7.5, 7–7.5 to 7.5, 7 to 7.5, 7.5 to 8, 8 to 9, and 7 to 9. c Genes expression pattern of 3-year-old control bladder tissue compared to Bladder carcinoma (Bladder Ca), Bladder squamous cell carcinoma (Squamous cell Ca), Bladder transitional cell carcinoma (Transit. cell Ca), Ureter urothelial carcinoma (Ur. Urothelial Ca), Muscle invasive urothelial cancer (Mus. Inv. Urothelial Ca). Legend: Suppressed = gene is silenced and no expression is detected. Activated = gene shows expression after a silenced state.

Comparison of RNA-sequencing data of the identified genes in healthy bladder tissue and different types of bladder cancer cell lines and muscular invasive urothelial carcinoma

Expression of non-coding genes in healthy bladder tissue and muscular invasive urothelial carcinoma could not be detected due to polyA-RNA-sequencing (Table 4). Despite this, we found all coding genes to be differentially expressed when compared to healthy bladder tissue (Table 5, Fig. 2c). In detail, SLC50A1 and SYT1 were significantly upregulated in all cancers compared to healthy bladder tissue. On contrary, DPM3 and KRTCAP2 were significantly downregulated. All other coding genes LPHN2, EFNA1, ISL1, TRIM29, PAWR, and GOSR2 were differentially up- or downregulated in different cancers (Table 5, Fig. 2c).

Table 4 Average TPM of coding and non-coding genes in the LD blocks of the most significant GWAS markers calculated for each bladder cancer cell line.
Table 5 log2-fold change of genes in the LD blocks of the most significant GWAS markers of bladder cancer cell types and muscle-invasive urothelial cancer over bladder control tissue.

Discussion

Recently, we described SLC20A1, encoding a sodium-phosphate symporter, as the putative monogenic dominant disease gene for isolated BEEC7. We were able to support our genetic data through functional studies in non-BEEC human embryos, mouse embryos, and zebrafish Morpholino knockdown experiments. To our knowledge, the present genetic study with a focus on the multifactorial genetic background of the BEEC is the largest study on CBE to date. We identified eight genome-wide significant risk loci. Within these loci, we determined possible CBE candidate genes using transcriptome datasets of CBE-relevant mouse embryonic, human embryonic, and fetal urogenital tissues at different developmental stages. Additionally, we provide a possible link between the identified putative candidate genes and CBE-associated bladder cancer susceptibility.

In detail, in direct proximity to the most significant markers of all eight risk loci reside nine coding genes that are expressed in CBE-relevant mouse and human urogenital tissues during different embryonic stages. Four of these candidate genes (Isl1, Trim29, Syt1, Pawr) showed differential expression in mouse embryonic urogenital tissues, five of these candidate genes (DPM3, ISL1, TRIM29, SYT1, and PAWR), and two of the non-coding genes (HMGB1P47, ISL1-DT) showed differential expression in human embryonic urogenital tissues. Previous reports of transgenic mouse lines of Isl1 and Syt1 revealed phenotypic overlap to the human CBE phenotypic spectrum. The Hoxb6Cre;Isl1 cKO hindlimb skeletons exhibited proximal defects in particular the os pubis and ischium, two posterior segments of the pelvic girdle, were missing, resembling pubic diastasis, a human BEEC-specific feature1,8. Transgenic Syt1tm1a(EUCOMM)Wtsi/Syt1tm1a(EUCOMM)Wtsi mice among other features develop thoracoschisis9, a rare congenital anomaly characterized by the evisceration of intra-abdominal organs through a thoracic wall defect10 mirroring the BEEC associated infraumbilical abdominal wall defect1.

One of the most significant markers identified in the present GWAS resides within EFNA1. In general, a probability of being LoF intolerant (pLi) score of 0.46 for EFNA1 is suggestive of possessing LoF intolerance for this gene in the context of the CBE condition. Although the pLi of 0.46 is only suggestive of LoF we have to consider that bladder exstrophy is not a mortal condition at birth. Hence, we believe that a value of 0.46 is supportive for EFNA1 to be implied in CBE11. Previously, in EFNA1 only two LoF variants were observed in the entire gnomad (frequency of 0,000016). Here, we observed one in 580 (frequency of 0,0017). Furthermore, in EFNA1 in the entire gnomad database, 96 missense variants were observed in 125,099 individuals (frequency of 0,00077). Here, we identified three in 580 (0,0052). Based on this observation, we performed a conservative estimation of whether LoF or missense variants might be enriched in EFNA1 in our CBE cohort compared to the general population showing a significant difference between both cohorts (p 0.000022). This finding suggests a possible implication of these variants in CBE formation in a multifactorial inheritance model among the affected.

Comparative analysis of control and bladder cancer tissues showed that all of the ten candidate genes were differentially expressed in bladder cancers. SLC50A1 and SYT1 were significantly upregulated in all cancers compared to healthy bladder tissue. On contrary, DPM3 and KRTCAP2 were significantly downregulated. LPHN2 has been suggested to have a regulatory role in urothelial bladder cancer12. EFNA1 plays a pivotal role in the pathogenesis of several tumors, including renal cell carcinoma, bladder, and prostate cancer13,14. Mapping all putative candidate genes prioritized in the present study to the search tool for retrieval of interacting genes (STRING), we found probable interaction of three proteins comprising: (i) gene fusions between EFNA1 and SLC50A1, and (ii) co-expression between EFNA1 and DPM3, (iii) and EFNA1 and SLC50A1 (Supplementary information: Supplementary Fig. 28). The PPI enrichment p-value was determined with 0.000205. Gene clustering analysis suggested clustering for EFNA1, DPM3, and SLC50A1 (Supplementary information: Supplementary Fig. 28). All three genes EFNA1, DPM3, and SLC50A1, respectively the genomic region 1q21-q22, been previously associated with the 2D:4D ratio, a sexually dimorphic trait, that has been extensively used in adults as a biomarker for prenatal androgen exposure15. Markers in the region of EFNA1, DPM3, SLC50A1 have previously been associated with prostate cancer risk16. Prostate cancer risk on the other side correlates with serum testosterone levels17. All of these observations suggest a possible gene-environmental interaction for this region. Adding to this hypothesis in the context of embryonic CBE formation, CBE presents with a higher occurrence rate in males compared to females1, a skewed sex ratio that is so far not explained, but could be influenced by differences in intrauterine androgen exposure between males and females.

The tumor suppressor gene TRIM29 is up regulated during early and late embryonic bladder development but is down-regulated in three different bladder cancers18. More specific, TRIM29 protein has been shown to be a driver of invasive and non-invasive bladder cancer. Interestingly, TRIM29-driven bladder cancers in transgenic mice were indistinguishable from gene expression signatures of human bladder cancers19. PAWR, has been previously shown to be a key altered gene in human bladder cancer stem cells20. SYT1 has been reported as a possible oncogene in colon cancer21. The knockdown of SYT1 markedly inhibits colon cancer cell proliferation, migration, and invasion, and induces cell apoptosis, indicating that SYT1 may function as an oncogene in colon cancer21. ISL1 has been associated with high-risk non-muscle-invasive bladder cancer in several studies22,23. Here we found downregulation of ISL1 in embryonic stages of mouse and human CBE urogenital tissues. Vice versa we found dysregulation of ISL1 expression in three bladder cancers. Hence, dysregulation of ISL1 expression in human embryonic and adult bladder tissues might contribute to the CBE and bladder malignancies vice versa.

To date, this is the largest genetic study on CBE. We have identified eight genome-wide significant risk loci. Our transcriptomic analysis of CBE-relevant mouse embryonic, human embryonic, and fetal urogenital tissues suggests candidate genes within these loci. Bladder cancer transcriptomic suggests these candidate genes play a possible role in the CBE-associated bladder cancer susceptibility. Identification of the different expressions to turn these developmental genes on later in life might ultimately lead to preventive strategies for bladder cancer per se.

Methods

Patients and recruitment

This study was approved by the institutional ethics committee of each participating center. All experimental protocols were approved by the institutional committee of the University of Bonn (Lfd.Nr.031/19). The study was conducted according to the Declaration of Helsinki principles. Written informed consent was obtained from all patients, guardians, and healthy controls. We included 420 newly recruited isolated CBE patients and 5,649 healthy controls of European origin. Details can be found in the Supplementary information (Supplementary Table 1). Details about the 208 CBE patients and 1,703 ethnically matched controls of our previous studies, included in the present meta-analysis, are described in [5,6], in summary, CBE patients were recruited under written informed consent by BEEC expert physicians.

Sample description

In addition to the two previously described samples GWAS1 and GWAS26, five new samples of patients with bladder exstrophy and representative controls were obtained from Central Europe, the United Kingdom, Italy, Spain, and Sweden. The number of cases and controls used in this study are shown in Supplementary Information (Supplementary Table 1).

Genotyping

All samples, cases, and controls, were genotyped on Illumina human genotyping arrays. In GWAS 1 (Reutter et al. 2014), cases and controls were genotyped in two batches. Due to the discontinuation of the genotyping arrays utilized for earlier batches, different arrays were used comprising Illumina’s Human610-Quad (H610Q) and Human660W-Quad Bead Chips and the Illumina HumanOmni1-Quad-v1 Bead Chip. In GWAS 26, all cases and controls were genotyped using the Illumina BeadChip HumanOmniExpress. The five novel GWAS case samples were newly genotyped simultaneously using the Illumina “Infinium Global Screening Array-24 v2.0”. The five novels ethnically matched control samples were also genotyped using the Illumina “Infinium Global Screening Array-24 v2.0”. However, the five novel control samples were not genotyped together with the five novel case samples but independently of each other.

Quality control of individuals

An individual was excluded if (i) the call rate was <97%; (ii) the rate of autosomal heterozygosity deviates more than six standard deviations from the mean; (iii) the rate of X-chromosomal heterozygous genotypes was >2% for a supposed male individual or <10% for a supposed female individual. PLINK version 1.9 and KING were used to detect pairs of closely related individuals within and between samples24,25. From each pair of individuals with an estimated identity by descending probability >0.2 or kinship coefficient >0.0884, the individual with a higher rate of missing genotypes was discarded. Individuals being outliers in a multidimensional scaling analysis (MDS) were removed. The post-quality control sample sizes are presented in Supplementary Table 1.

Principal component analysis

Individuals with:

$${\left(\frac{{c}_{1}-{m}_{1}}{{s}_{1}}\right)}^{2}+{\left(\frac{{c}_{2}-{m}_{2}}{{s}_{2}}\right)}^{2}\ge 81$$
(1)

were considered to be outliers. c1 and c2 denote the first two MDS coordinates of the individual and m1, m2 and s1 and s2 denote the mean and standard deviation, respectively, of the first two MDS coordinates in European HapMap individuals. For the five new samples (Central Europe, Italy, Spain, Sweden, and the UK), Supplementary Information (Supplementary Figs. 913) show the first two MDS coordinates for all genotyped individuals together with Asian and African HapMap individual. For the samples of GWAS1 and GWAS2, the MDS coordinates used in our previous studies are shown in Supplementary Information (Supplementary Figs. 14, 15).

Quality control of variants and imputation

Separately in each of the different ethnicity samples, SNPs were removed if (i) the minor allele frequency was <1% in either cases or controls; (ii) the successful genotyping rate was >95% in either cases or controls; (ii) the p value for Hardy–Weinberg equilibrium was <10−4 in controls or <10−6 in cases. SNPs satisfying the quality filters were uploaded for imputation on the University of Michigan Imputation Server using the Haplotype Reference Consortium panel26.

Statistical analysis for genome-wide association analysis

Association testing was performed by logistic regression using SNPTEST version 2.5.2 for the allele dosage and adjusted for the sample-specific top five MDS coordinates27. For each SNP, a meta-analysis with the fixed-effects inverse variance-weighting approach was conducted by including only those samples in which the info score was >0.4 and the mean dosage for the minor allele was >1% in cases and controls for the respective SNP. SNPs reaching a p value <5*10−8 in the meta-analysis are considered to be genome-wide significant. Q-Q and Manhattan plots for the meta-analysis were created by SAS28. Regional association plots for genome-wide significant loci were generated with LocusZoom29. To look for secondary signals of association in loci of genome-wide significance, logistic regression using SNPTEST conditioned on the most associated SNP in the locus was carried out.

Protein–protein interaction networks analysis

Putative candidate genes within identified risk loci were mapped to the STRING to acquire protein–protein interaction (PPI) networks (https://string-db.org/). The search tool integrates both known and predicted PPIs. Here it was used to predict functional interactions of proteins30,31. Active interaction sources, including text mining, experiments, databases, and co-expression as well as species limited to “Homo sapiens” and an interaction score >0.4 were applied to construct the PPI networks. In the networks, the nodes correspond to the proteins and the edges represent the interactions. STRING was employed to seek potential interactions among putative candidate genes. Active interaction sources, including experimental repositories, computational prediction methods, and public text collections as well as species limited to “Homo sapiens” and a combined score >0.4, were applied.

Re-sequencing of EFNA1

Re-sequencing of all coding exons of EFNA1 of transcript ENST00000368407.3 was performed in 580 CBE patients, all of which were included in the current GWAS. PCR conditions can be obtained upon request, primer sequences are shown in Supplementary information (Supplementary Table 2). Sequencing files for patient, parent, and control DNA were added to databases created using PreGap4 software, with control DNA processed as the reference sequence.

Genes prioritization

Lower p-value SNP of each associated region was imputed in LDproxy Tool (https://ldlink.nci.nih.gov/?tab=ldproxy) for European populations of CEU (Utah residents from north and west Europe); TSI Toscani in Italia; FIN Finnish in Finland; GBR British in England and Scotland; IBS Iberian population in Spain. Out of this, genes that reside in the linkage disequilibrium blocks defined from LD variants of r^2 above 0.8 to the top SNPs were taken into consideration for this study (Supplementary information, Supplementary Figs. 18). LD blocks coordinate regions imputed in hg19 are described in Supplementary Information (Supplementary Table 3).

No variants were significant LD associated with rs1924557 in chromosome 1 to determine an LD block region.

RNA isolation and mRNA library preparation of mouse embryonic urinary bladder and genital tissues

Animals were anesthetized with Isoflurane and sacrificed by cervical dislocation. Ethical consent is documented and approved by the local authorities of the Regierungspräsidium Darmstadt. Embryos from timed-pregnant females of the SWISS strain were harvested at embryonic days (E) E10.5, E12.5, and 15.5 (Supplementary information: Supplementary Fig. 18). The respective developmental Theiler stages were determined as 18 (TS18), TS21, and TS23. From E10.5 embryos, the urogenital ridge was dissected under an M205C stereo microscope (Leica Microsystems, Germany) surgically isolated, and transferred into QIAzol®. Embryos were pooled for each time point. For E10.5 stage biopsies from three embryos were pooled biopsies to prepare RNA, for E12.5 and E15.5 stages two embryos were pooled for RNA preparation. From E12.5 (primitive bladder) and E15.5 (bladder) embryos, the distinct structures of the developing and distinct visible bladder were surgically isolated (Supplementary information: Supplementary Fig. 18), combined, and transferred into QIAzol®.

Processing of total mouse embryonic RNA-sequencing data

About 50 million unique mapped reads per sample were obtained from each RNA-seq experiment. The reads were aligned using STAR aligner32. Read count was calculated with GenomicFeatures Bioconductor package. Calculation and normalization of “transcripts per kilobase million (TPM)” accounting for reads per kilobase (RPK) was performed as described elsewhere33. The fold change was calculated by dividing the subsequent stage by the preceding one and the log2 function was applied to the division as following: log2 (FoldChange) = log2 (subsequent embryonic stage/preceding embryonic stage). Differentially expressed genes were identified with values less than or equal to −1.5 or ≥1.5, respectively. The same algorithm was applied for the calculation of TPM of already deposited human embryonic and fetal RNA-seq data at EMBL-EBI expression atlas (accession code: E-MTAB-6592).

The raw RNA-sequencing data of mouse embryonic urinary bladder are deposited at GEO with the accession id: GSE190641.

Processing of bladder cancer RNA-sequencing data

Total RNA was purified using the QuantSeq library (Lexogen) with 500 ng RNA input. QuantSeq polyA RNA-tail libraries were sequenced (Single end 1 × 75 bp) on an Illumina Hiseq platform and generated data were further processed according to the GRCh38, TPM transformed, and further normalized. Sequencing, aligning and TPM calculation was performed by ImmunityBioTM. Visualization of results in heatmaps was performed using graphpad PRISM 9.0.0.

RNA isolation and mRNA library preparation of human embryonic and fetal urinary bladder and genital tissues

Embryonic and fetal bladders and genital tissues were obtained by surgeons from terminated pregnancies after informed consent was obtained and with ethics approval. Pregnancies were terminated for social indications and the respective fetuses and embryos were healthy. The embryonic tissues comprised 7-week embryos, 7–7.5-week embryos, 7.5-week embryos, late 8-week embryos, and late 9-week embryos (Supplementary information: Supplementary Fig. 19). Samples comprised week 7 (n = 2), 7.5 (n = 1), 8 (n = 3), 9 (n = 4) for the bladder tissues and for the genital tissues from week 7 (n = 3), 8 (n = 3), 9 (n = 3), and 10 (n = 4). Gene expression data were extracted and analyzed after high throughput sequencing of paired-end mRNA libraries (Illumina). Data were deposited at EMBL-EBI expression atlas (accession code: E-MTAB-6592). Calculation of fold change of already deposited human embryonic RNA-seq data was carried out accordingly to our calculation of mouse embryonic data (see in Methods: Processing of total mouse embryonic RNA-sequencing data).

Processing of bladder cancer RNA-sequencing data

Cancer RNA-seq data were obtained from already deposited data at EMBL-EBI expression atlas (Cancer Cell Line Encyclopedia, experiment E-MTAB-2770). The deposited data does not include samples derived from CBE patients. Out of 1019 different cancer cell lines, the following cell line sample has been analyzed: 20 cell lines of bladder carcinoma, one cell line of bladder squamous cell carcinoma, four cell lines of bladder transitional cell carcinoma, one cell line of ureter urothelial carcinoma. TPM average was then calculated for each carcinoma cell type and data were compared with fold change to TPM of deposited mature urinary bladder polyA RNA-seq data (GEO accession: GSM1067793). In addition, RNA-polyA-seq data available from 38 cases of the CCC-EMN bladder cancer cohort [12] were generated from FFPE tissue all classified with Muscular invasive urothelial carcinoma. Demographic data is found in Supplementary Data 1.

Statistics and reproducibility

Quality control of individuals, principal component analysis, Quality control of variants and imputation, Statistical analysis for GWAS, and Genes prioritization is meticulously described in the methods above.

Average of TPM was calculated in R from biological replicates. Log2FC of human and mouse bladder was calculated as following: log2(AVERAGE_TPMnext_stage/AVERAGE_TPMprevious_stage). Log2FC of cancer cells was calculated as following: log2(AVERAGE_TPMcancer_line/AVERAGE_TPMcontrol_bladder_tissue). Sample size of mouse consists in n = 3 for embryo bladder at stage E10.5, and n = 2 for E12.5 and E15.5. Human fetal bladder samples comprised week 7 (n = 2), 7.5 (n = 1), 8 (n = 3), 9 (n = 4). Human cancer cells liens comprised n = 20 of bladder carcinoma cell, n = 1 of bladder squamous cell carcinoma cell line, n = 4 of bladder transitional cell carcinoma cell lines, n = 1 of ureter urothelial carcinoma cell line, n = 38 of muscular invasive urothelial carcinoma. Replicates are defined as a minimum of three technical replicates per sample size.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.