Background

The term teratoma is derived from the Greek word teraton, meaning monster, and was first used by Rudolf Virchow in 1863 [1]. Teratomas are true neoplasms arising from totipotential germ cells [2] and classified as mature or immature, depending on the degree of differentiation of their components [3]. They are defined histologically as containing tissues derived from all 3 germ layers: ectoderm, mesoderm, and endoderm [3]. Extragonadal germ cell tumors represent only 1–5% of all germ cell tumors and frequently present as sacrococcygeal teratomas [4]. One of the most common locations is the ovary, although they also occur in the testes [5]. Ovarian teratomas include mature cystic teratomas (dermoid cysts), immature teratomas, and monodermal teratomas (e.g., struma ovarii, carcinoid tumors, neural tumors) [3]. In the ovaries, most benign teratomas are cystic and referred to in clinical parlance as dermoid cysts [5]. Mature cystic teratomas of the ovary are the most common germ cell tumor, comprising 33% of ovarian tumors [6].

The presence of three somatic germ layers within teratomas is considered the best indicator of the pluripotency of human embryonic stem (hES) cell lines [7, 8]. Studying teratomas may aid in the development of safer hES cell therapies [9]. As developmental processes cannot be investigated in intact mammalian embryos [10], teratomas represent an alternative development model. The arrangement of different tissue types in teratomas in many ways recapitulates organogenesis within the embryo [11]. It is also important to elucidate the stepwise developmental processes and molecular bases of teratomas, as these may provide useful information for the development of tissue-engineering technologies [12]. The genetic and environmental conditions that confer teratoma susceptibility remain poorly understood [13], although mutations in several genes might underlie increased tumor incidence. Limited studies, most of which have been case reports, have demonstrated mutations in mature cystic teratomas [14, 15]. In a recent study, an attempt was made to identify the genomic abnormalities in squamous cell carcinomas (SCCs) arising from ovarian mature cystic teratomas using next-generation sequencing [16]. The most frequently altered genes in SCC are TP53 (20/25 cases, 80%), PIK3CA (13/25 cases, 52%), and CDKN2A (11/25 cases, 44%) [16]. The aim of this study was to elucidate the possible etiological roles of genetic alterations identified on whole-exome sequencing (WES) in teratoma formation.

Materials and methods

Clinical samples

The Institutional Review Board of Chung Shan Medical University Hospital approved all procedures, and informed consent was obtained from all subjects prior to collecting their genetic material for the study (reference CS19118). Eight 18–46-year-old patients with ovarian teratoma(s) were enrolled, including one woman with bilateral mature cystic teratomas (Tera-10R and Tera-10L) of the ovary [17, 18], totaling 9 samples (Table 1).

Table 1 Baseline characteristics of 9 mature cystic teratomas of the ovary

Histological examination

Upon cutting, cystic masses were found to contain fat, hair, and bony tissue. Histological sections were prepared from formalin-fixed paraffin-embedded (FFPE) blocks and stained with hematoxylin and eosin for histopathological review. Microscopically, sebaceous gland, skin appendages, and thyroid follicles were also evident (Table 1). These tumors were considered to be mature without immature components after examination of multiple sections (Table 1). There was no evidence of malignancy (Table 1).

Isolation of DNA from blood

Genomic DNA was extracted from paraffin-embedded sections of the teratomas with the DNA FFPE Tissue Kit (Qiagen, Hilden, Germany) according to the manufacturer’s instructions. DNA from the teratomas was obtained from solid nodule within the inner site and finally dissolved in 100 μl of TE buffer (10 mM Tris–HCl, pH 8.0, and 1 mM EDTA). DNA concentration of each sample was measured using NanoDrop UV–VIS Spectrophotometer.

Library preparation and whole-exome sequencing (WES)

WES was carried out at a biotechnology company (Genomics BioSci & Tech, Taipei, Taiwan). A total of 200 ng DNA per sample served as the input material. Sequencing libraries were generated using Agilent SureSelect Human All Exon V6 kit (Agilent Technologies, California, USA) following the manufacturer’s recommendations with index codes added to each sample.

Briefly, fragmentation was carried out by hydrodynamic shearing system (Covaris, Massachusetts, USA) to generate 180–280 bp fragments. Remaining overhangs were converted into blunt ends via exonuclease/polymerase activities. After adenylation of 3’ ends of DNA fragments, adapter oligonucleotides were ligated. DNA fragments with ligated adapter molecules on both ends were selectively enriched via PCR reaction. After PCR reaction, libraries were hybridized with liquid phase using biotin labeled probe. Then, magnetic beads with streptavidin were used to capture the exons of genes. Captured libraries were enriched in a PCR reaction to add index tags in preparation for sequencing. Products were purified using AMPure XP system (Beckman Coulter Inc, California, USA) and quantified via Agilent high sensitivity DNA assay conducted on Agilent Bioanalyzer 2100 system. Libraries were sequenced on Illumina NovaSeq 6000 platform and 150 bp paired-end reads were generated by Genomics BioSci & Tech Co.

Bioinformatics analysis

Bioinformatics analysis pipeline followed from the sequencing step. Low-quality bases and sequencing adapters in raw data generated from Illumina sequencer were removed using the program Trimmomatic. Subsequently, the reads were aligned to reference genome using Burrows-Wheeler Aligner (BWA) [19]. The results of alignment step were recorded in.bam format. Then, .bam file was processed using Picard-tools with sorting and duplicate marking. After that, variant calling was performed with Genome Analysis Toolkit (GATK) and HaplotypeCaller task and variants were annotated by VEP [20]. Gene sequences were aligned to reference sequences based on human genome build GRCh37/UCSC hg19.

Results

Tumor-only whole-exome sequencing

Next-generation WES was performed on 9 teratoma-derived FFPE specimens without matched normal controls that were denoted tumor-only. On average, 6.97 Gb of high-quality clean bases were generated per sample and 99.9% of sequence reads were uniquely aligned with the human reference genome. The depth of on-target coverage of each exome ranged from 15 to 132 with an average of 78.91. In this study, we utilized somatic variant calling with MuTect2 from GATK to detect somatic variants, and then we filter them to obtain a more confident set of somatic variant calls with FilterMutectCalls. Variants identified on WES were compared against the Catalogue of Somatic Mutations in Cancer (COSMIC) and dbSNP databases.

Spectrum of putative somatic mutations

We extracted the variants, which were annotated as PASS in VCF file and found 38,633 putative somatic mutations, including non-synonymous and splicing mutations, according to dbSNP filtering and COSMIC criteria. There were 26,132 single nucleotide variants (SNVs). Of these, 15,099 SNVs were within exons. Moreover, these variants were functionally annotated and their impact was predicted using WEP software [21]. It was found that 159, 5,561, 7,460, and 1,919 were of high, low, moderate, and modifier impacts, respectively.

Somatic mutation profiles of the ovarian teratomas are shown in Fig. 1. In terms of variant classification (Fig. 1A), missense mutation was the most common, followed by frame shift insertion, frame shift deletion, in frame insertion, and in frame deletion. Mutations were validated in 15 genes with alterations in 9 (100%) samples and changes in protein coding (Fig. 2). The top 10 mutated genes were FLG, MUC17, MUC5B, RP1L1, NBPF1, GOLGA6L2, SLC29A3, SGK223, PTGFRN, and FAM186A (Fig. 1B). Genetic variants detected in exons with a change in protein coding in the top 10 mutated genes are shown in Additional file 1. DUSP5, KRTAP4-2, MPP2, PHLDA1, and PRR21 were added to complete the list of the top 15 most frequently mutated genes. Oncoplot of the 15 most frequently mutated genes with changes in protein coding in 9 ovarian teratomas is shown in Fig. 2.

Fig. 1
figure 1

Somatic mutation profiles of mature cystic teratomas of the ovary. A Variant classification, B top 10 mutated genes, C variants per sample, and D % of mutation type per sample for 9 mature cystic teratomas of the ovary

Fig. 2
figure 2

Oncoplot of the 15 most frequently mutated genes with changes in protein coding in teratomas. Each column represents a different sample and each row a different gene. Colored squares represent mutated genes. Mutations are shown according to variant type as indicated in the legend. Genes annotated as “Multi_Hit” have more than one mutation in the same sample. The barplot at the top shows the number of mutated genes for each patient according to mutation type. The barplot on the right presents the numbers of mutated teratomas for each gene according to mutation type

The number of variants of each target teratoma of each exome ranged from 347 to 3,009 with an average of 1,845 (Fig. 1C). Among the missense mutations, the most common was C>T, followed by T>C and C>G (Fig. 1D). The patterns of substitutions for each mutational signature are shown in Fig. 3.

Fig. 3
figure 3

Mutational signature found in mature cystic teratomas of the ovary. Pattern of substitutions for signatures according to the 96 substitutions defined by substitution class and sequence context immediately 3′ and 5′ of the mutated base. Probability bars are shown for the 6 types of substitutions. Mutational signature is based on the trinucleotide frequency of the human genome

Mutational landscape of teratomas

Among the prevalent mutated genes, 22 common somatic variants were detected in the 9 paraffin-embedded tumor specimens (Table 2; Fig. 4). Mutations were validated in 15 genes with alterations in all 9 (100%) samples (Fig. 2): 7 genes with the same variant in exon and changes in protein coding (Fig. 4A) and 8 leftover genes with different variants (Fig. 4B). There were 12 variants in exons (Fig. 4A, C) and 10 variants in introns (Fig. 4D). Seven of the 12 variants in exons were associated with changes in protein coding (Fig. 4A): PTGFRN, DUSP5, MPP2, PHLDA1, PRR21, GOLGA6L2, and KRTAP4-2. Three variants were substitutions (shown in red in Fig. 4A) with moderate impact: rs71483896 (c.828_829delinsGA, p.Ser277Thr) in exon 3 of the PTGFRN gene on chromosome 1 (missense variant, depth 631, average depth 70.11); rs35834951 (c.658_659delinsAT, p.Ala220Met) in exon 3 of the DUSP5 gene on chromosome 10 (missense variant, depth 760, average depth 84.44); and rs70964679 (c.240_241delinsGC, p.His80_Val81delinsGlnLeu) in exon 4 of the MPP2 gene on chromosome 17 (missense variant, depth 727, average depth 80.78). One variant was a three-nucleotide deletion with moderate impact, rs71716769 (c.582_584del, p.Gln204del) in exon 1 of the PHLDA1 gene on chromosome 12 (inframe deletion, depth 615, average depth 68.33) (shown in magenta in Fig. 4A). There were three SNVs with moderate impact and changes in protein coding (shown in pink in Fig. 4A): rs6732185 (c.1025A>T, p.Lys342Met) in exon 1 of the PRR21 gene on chromosome 2 (missense variant, depth 1044, average depth 116); rs59122400 (c.949C>T, p.Arg317Trp) in exon 8 of the GOLGA6L2 gene on chromosome 15 (missense variant, depth 745, average depth 82.77); and rs389784 (c.284A>G, p.Tyr95Cys) in exon 1 of the KRTAP4-2 gene on chromosome 17 (missense variant, depth 985, average depth 109.44). They were all of moderate impact, such that a non-disruptive variant might change protein effectiveness [21]. Among the 15 prevalent mutated genes, 8 with different variants in exons with changes in protein coding are important for the development of mature cystic teratomas of the ovary: FLG, MUC17, MUC5B, RP1L1, NBPF1, SLC29A3, SGK223, and FAM186A (Fig. 4B). The variants are shown in Additional file 1. Five of the 12 variants were in exons without changes in protein coding (Table 2; Fig. 3C): a non-coding transcript exon variant of modifier impact in ZNF806 (depth 224); a variant of modifier impact within the 3′-untranslated region in ATP5G1 (depth 586); and three synonymous variants with low impact in RP11-166B2.1 (depth 1122), CACNA1A (depth 209), and NEFH (depth 1076).

Table 2 Common genetic variants detected in all 9 paraffin-embedded tumor specimens on whole-exome sequencing
Fig. 4
figure 4

Word cloud artwork illustrates the important genes in ovarian mature cystic teratomas identified on WES. Mutations were validated in 15 genes with alterations in all 9 (100%) samples: 7 variants in exon with changes in protein coding (A) and 8 leftovers (B). There were 12 variants in exons: 7 variants with changes in protein coding (A) and 5 variants without changes in protein coding (C). A, 7 variants in exon with changes in protein coding; 3 substitutions are shown in red, 1 deletion is shown in magenta, and 3 SNVs are shown in pink. C 5 variants in exon without changes in protein coding; 1 deletion is shown in light blue and 4 SNVs are shown in dark blue. D 10 variants in intron without changes in protein coding; 4 substitutions are shown in black, 1 deletion and 1 insertion are shown in dark gray, and 4 SNVs are shown in light gray

Four of the 10 variants in introns were substitutions, resulting in modifier impact without changes in protein coding (Table 2, shown in black in Fig. 4D): rs370476236 (c.504+63_504+64delinsGC) in intron 1 of the XXYLT1 gene on chromosome 3 (depth 131); rs386695380 (c.148+27_148+28delinsTT) in intron 3 of the DBN1 gene on chromosome 5 (depth 502); rs71526806 (c.1399-34_1399-33delinsCT) in intron 12 of the POR gene on chromosome 7 (depth 394); and rs71212741 (c.178-14_178-13delinsTC) in intron 2 of the ADAM33 gene on chromosome 20 (depth 611). Four of the 10 variants in introns were SNVs of modifier or low impact: rs10804167 (c.42-84G>A) in intron 2 of the C2orf80 gene on chromosome 2 (depth 173); rs2731436 (c.301+86C>T) in intron 3 of the DIP2B gene on chromosome 12 (depth 186); rs4964884 (c.422+8T>C) in intron 3 of the MMP17 gene on chromosome 12 (splice region variant, depth 211); and rs6115307 (c.4-84C>G) in intron 1 of the NOP56 gene on chromosome 20 (depth 181). In addition, rs59886367 (c.985-40del) in intron 4 of the ACTG1 gene on chromosome 17 (depth 1076) and rs6147585 (c.921+23_921+24insGGGGAGCACCAAGGGCTGGGGCAG) in intron 8 of the GATSL3 gene on chromosome 22 (depth 609) were deletion and insertion with modifier impact, respectively.

Discussion

The most common variant was C>T, followed by T>C and C>G (Figs. 1D, 3). Most variants were C>T/G>A, similar to Signature 6 which is characterized predominantly by C>T at NpCpG mutations in the mutational signature analysis by Alexandrov et al. [22]. Davies et al. reported that MLH1-inactivated breast cancers are combinations of predominant mutation types C>T/G>A and T>C/A>G transitions (classified as Signature 6) with overwhelming indel mutagenesis, particularly deletions at polynucleotide repeat tracts [23]. However, only one MLH1-inactivated missense variant was observed in Tera-11 in this study: rs63750447 (c.1151T>A, p.Val384Asp) in exon 12.

Except for some case reports, few studies have been published on the mutations in mature cystic teratomas [14, 15]. Point mutations in the p53 gene and p16 gene are associated with SCCs that arise in mature cystic teratomas [24]. In a recent study, genomic abnormalities in such SCCs were identified using next-generation sequencing [16]. The most frequently altered genes in SCC are TP53 (20/25 cases, 80%), PIK3CA (13/25 cases, 52%), and CDKN2A (11/25 cases, 44%) [16]. However, only one TP53 missense variant in exon with changes in protein coding was observed in Tera-10R: rs1042522 (c.215C>G, p.Pro72Arg) in exon 4. No PIK3CA or CDKN2A variants were detected in the 9 paraffin-embedded tumor specimens in this study.

Among the prevalent mutated genes, 7 with the same variant in exons with changes in protein coding are important for the development of mature cystic teratomas of the ovary: PTGFRN, DUSP5, MPP2, PHLDA1, PRR21, GOLGA6L2, and KRTAP4-2 (Fig. 4A). PTGFRN encodes a 135-kDa protein PTGFRN (Prostaglandin F2 receptor negative regulator) that inhibits binding of PGF2-α to its specific receptor [25]. Also known as CD315, EWI-F, CD9P-1, and SMAP-6, PTGFRN has been shown to interact with CD9 and CD81 [26,27,28] and is potentially an important regulated protein in the development of the antral follicle. Down-regulation of PTGFRN in GCs may lead to follicular atresia [29]. Differentially expressed genes, especially those in five modules, including OAS1, IFI27, LPAR1, PTGFR, ITGB4, and ITGA6, might participate in the epithelial-mesenchymal transition process in breast cancer cell line DKTA [30]. DUSP5 gene encodes dual specificity phosphatases (DUSPs) that inactivate ERK 1/2 through dephosphorylation and inhibit inflammatory gene expression [31]. DUSP5 protein plays an important role in the maintenance of pluripotency in mouse embryonic stem cells and may be required for embryoid body development [32]. DUSP5 promotes osteogenic differentiation through SCP1/2-dependent phosphorylation of SMAD1 [33]. MPP2 gene encodes palmitoylated membrane protein 2, which is a member of the membrane-associated scaffold protein family known as MAGUKs (membrane-associated guanylate kinase homologs) [34]. MPP2 is expressed in multiple cell types and plays important roles in cellular proliferation, differentiation and tumorigenesis [35]. MPP2 protein interacts with c-Src in epithelial cells to control c-Src activity and morphological function [36]. The c-Src proto-oncogene has been strongly implicated in the development, growth, progression, and metastasis of a number of human cancers including those of the colon, breast, pancreas, and brain [37]. PHLDA1 gene encodes Pleckstrin homology-like domain family A member 1 (PHLDA1 protein) that is involved in the regulation of apoptosis [38] and serves as a follicular stem cell marker [39]. PHLDA1 protein inhibits Akt and has tumor‐suppressive ability in breast and ovarian cancers [40]. PRR21 (putative proline-rich protein 21, PRR21) is a single exon gene, previously annotated as “uncertain” by UniProtKB but since removed from the UniProtKB proteome [41]. GeneCards database summarizes diseases associated with PRR21 including Bardet–Biedl Syndrome 4 and Peroxisome Biogenesis Disorder 2A. GOLGA6L2 gene encodes golgin A6 family-like 2 [42]. Mutations in this gene have been reported in a breast cancer sample from The Cancer Genome Atlas project and in three patients with fibrolamellar hepatocellular carcinoma [43]. KRTAP4-2 gene encodes keratin-associated protein 4–2. In the GeneCards database summary among its related pathways are keratinization and developmental biology. Keratin-associated proteins are the structural proteins of hair fibers and thought to play an important role in determining the physical properties of hair fibers [44]. These genetic alterations may play an important etiological role in teratoma formation.

The novel mutations in DUSP5 and PHLDA1 (bold in Fig. 4A) genes found on WES of mature cystic teratomas of the ovary may help to explain the presence of hair within these tumors. For example, the mutation p.Gln204del of the PHLDA1 gene was observed in all teratomas in this study. The PHLDA1 protein is localized in the follicular bulge and acts as a stem cell marker of hair follicles [45]. Marchiori et al. also noted that estrogen can up-regulate PHLDA1 transcription [46]. Further clarification of the mutant function on hair follicle may shed light on the treatment of diseases involving hair loss, e.g., alopecia. Another interesting finding in this study is the mutation p.Ala220Met of the DUSP5 gene in all teratomas. DUSP5 protein plays an important role in the maintenance of stem cell pluripotency [32] and osteogenic differentiation [33]. Osteogenic differentiation is another characteristic of teratomas, i.e., bone or tooth formation. Further research on this mutant may point to a treatment method for diseases involving bone loss, e.g., osteoporosis.

Conclusions

In summary, some important genes were identified in mature cystic teratomas of the ovary via WES in this study (Fig. 3). Mutations were validated in 15 genes with alterations in 9 samples (100%) and changes in protein coding (Fig. 3A, B). The top 10 mutated genes were FLG, MUC17, MUC5B, RP1L1, NBPF1, GOLGA6L2, SLC29A3, SGK223, PTGFRN, and FAM186A. Among the prevalent mutated genes, 7 variants in exons with changes in protein coding are important for the development of mature cystic teratomas of the ovary, including PTGFRN, DUSP5, MPP2, PHLDA1, PRR21, GOLGA6L2, and KRTAP4-2. These genetic alterations may play an important etiological role in teratoma formation. Moreover, novel mutations in DUSP5 and PHLDA1 genes found on WES may help to explain the characteristics of teratoma.