Cancer genomics: why rare is valuable

Rare conditions are sometimes ignored in biomedical research because of difficulties in obtaining specimens and limited interest from fund raisers. However, the study of rare diseases such as unusual cancers has again and again led to breakthroughs in our understanding of more common diseases. It is therefore unsurprising that with the development and accessibility of next-generation sequencing, much has been learnt from studying cancers that are rare and in particular those with uniform biological and clinical behavior. Herein, we describe how shotgun sequencing of cancers such as granulosa cell tumor, endometrial stromal sarcoma, epithelioid hemangioendothelioma, ameloblastoma, small-cell carcinoma of the ovary, clear-cell carcinoma of the ovary, nonepithelial ovarian tumors, chondroblastoma, and giant cell tumor of the bone has led to rapidly translatable discoveries in diagnostics and tumor taxonomies, as well as providing insights into cancer biology.


Impact of study of rare tumors
The history of biomedical research has repeatedly shown that with the advent of new methodologies, the study of rare, but clinically well-defined disease entities has led to the generation of a disproportionate amount of knowledge. For instance in the eighteenth century, the peculiar scrotal skin cancers of adolescent males with a history of occupation as chimney sweeps led Percivall Pott to describe one of the earliest associations of workplace hazards and cancer [1]. His study shed light on the potential role of environmental factors in cancer formation and helped lay the foundations for the science of epidemiology. Another example happened in 1961 when Sir Anthony Epstein came across D.P. Burkitt's description of an unusual new children's cancer in Africa with a geographic distribution related to certain rainfall and temperature patterns [2]. Epstein thought of a possible climatedependent vector and initiated a study, during a time of great advancements in molecular biology and virology, that led to the discovery of the Epstein Barr virus, the first proven human cancer virus [3]. Retinoblastoma was another relatively rare condition that led to the conceptualization of tumor suppressor genes. The comparison of kinetics of unilateral sporadic versus bilateral familial cases led Alfred Knudson Jr. to describe the two-hit hypothesis, which revolutionized cancer biology [4].
It has long been clear that mutations play a critical role in the development of cancer. However, many common cancers are both biologically and clinically complicated and their mutational landscape reflects such complexity [5,6]. On the other hand, some perhaps more obscure tumors, especially ones affecting younger patients and specific sites, reveal simpler genomic landscapes with characteristic mutations that allow a more focused look at the oncogenic processes. That is not to say that such scenarios are absent in more common cancers, especially when examined as specific subtypes. However, rare cancers should not simply be ignored because of their rarity or the logistical difficulty of working with them. These anomalies in nature, because of the very fact of their unusual patterns, can hold the key to understanding more common tumors. We term such neoplasms forme fruste tumors. Forme fruste refers to an attenuated manifestation of disease. We recognize that these tumors are true neoplasms and in many cases, cancers. However, they do not have the genomic baggage and heterogeneity due to genomic instability as seen in common cancers and it is this attenuated genomic phenotype that makes them such tractable and useful targets for genomic research. In this review, we will go over recent sequencing studies of some forme fruste tumors which led to discoveries of profound importance.

Era of next-generation sequencing
Ever since the discovery of DNA and its association with human cancer, scientists and clinicians have dreamt of the possibility to scrutinize it base by base. The ability to sequence DNA, which quickly developed into a robust method by Sanger sequencing [7], was a solid step toward this goal. With the Human Genome Project establishing a map of the human genetic code and rapid advances in computer technology, everything seemed to be in place other than cost and efficiency. Billions of dollars and years of multi-institutional efforts would not make nucleotide sequencing an accessible tool for scientists to ask questions on a regular basis, and the limited resources were not earmarked for the study of rare specimens. The limitations of Sanger sequencing were in the termination of polymerase reactions as well as in the need to separate the products of these reactions by gel or other electrophoretic systems [8]. Additionally, preparation of sequencing libraries was necessary via transformation in E. coli or by an incredibly large number of separate PCR reactions. However, with massively parallel sequencing platforms, the first shortcoming was overcome by reversible fluorescent nucleotide addition and imaging (used in Illumina platforms) or through monitoring nucleotide addition via ion detection (used in Ion Torrent platforms (Life Technologies)) both achieved by cyclic manipulation of polymerase or ligase enzymes [9]. Moreover, the second shortcoming was resolved by in vitro library preparation via techniques such as emulsion PCR [10] (Ion Torrent) or bridge PCR on solid surfaces [11,12] (Illumina). With these improvements, the sequencing cost and time requirements have been vastly reduced. Advancements in bioinformatics and the ability to more readily distinguish signal from noise have also increased the feasibility of large-and small-scale genomic studies. Thus, today, sequencing whole genomes and transcriptomes is more accessible and has become a reality for individual laboratories. We argue that we are in an incredibly exciting era of molecular medicine where a new Bmolecular microscope^in the form of massively parallel sequencing, also commonly referred to as next-generation sequencing (NGS) or second-generation sequencing, is giving rise to a whole new paradigm for the understanding of human diseases.
We will discuss recent attempts to study forme fruste tumors using NGS. The general approach and representative bioinformatic tools employed in such studies are summarized in Fig. 1. Specific forme fruste tumor types will be discussed that exemplify the impact of such studies in three categories: a better definition of an already known disease, establishment of new disease subtypes, and development of novel insights into oncogenic mechanisms. Table 1 includes a more comprehensive summary of discoveries in forme fruste tumors with a focus on recent NGS findings and their study designs. In addition, Fig. 2, accompanied by Table 2, demonstrates the broad range of pathways affected in such tumors. Lastly, we will go over some of the challenges and scenarios where the study of certain forme fruste-like tumors shows that their genomic behavior is not always straightforward. In addition to the broad array of discoveries made, the small number of cases used in each successful discovery process is perhaps noteworthy (Table 1) with the single case of endometrial stromal sarcoma where, as described below, sequencing led to a diagnostic and formal disease reclassification.

Better definition of an existing pathology
Pathognomic mutations allow a more specific definition of a previously established histological diagnosis. Below, we will discuss the cases of the FOXL2 mutation in adult-type granulosa cell tumors (GCTs) as well as the WTR1-CAMTA1 fusion in epithelioid hemangioendotheliomas (EHEs).

Granulosa cell tumors and FOXL2 mutations
GCTs are rare, constituting only about 5 % of ovarian tumors [13]. Until 2009, the molecular biology of this tumor remained a mystery and hence, there was limited success in development of therapeutics for aggressive cases [14]. There are two subtypes of this tumor: adult-type and juvenile. These have similar biomarker profiles but occur in different age groups and have different histopathological features [13]. Adult-type GCTs fit the concept of a forme fruste cancer: cytogentics had shown a more stable genome compared to other ovarian tumors [15], and the tumor subtypes have a consistent pathological presentation with cells that have maintained some levels of differentiation expressing follicule-stimulating hormone receptors and inhibin [16]. There were no associations between expression of common oncogenes and tumor suppressors, such as MYC, TP53, ERBB2, or RAS family and outcomes in GCTs [17]. Based on these facts, Shah et al. reasoned that with sequencing of very few adult-type GCT cases, rather than the massive sample sets needed for the more common genetically complex tumors, considerable insight into the biology of GCTs could be attained. Thus, only four samples of adulttype GCTs were used for whole transciptome sequencing as a discovery cohort and 11 other ovarian tumors were sequenced as a comparative cohort [18]. After alignment and removal of previously reported germline insertions and deletions, there were between 289 and 495 somatic nonsynonymous variants in the GCTs. Genes with mutations in at least three of the four cases that were not mutated in the comparative cohort were considered for further follow-up [18]. The only potential mutation found in all four cases was a C134W mutation in FOXL2 [18]. The resulting mutant protein was still expressed in GCTs as observed by immunohistochemistry in cases with apparent homozygosity (likely through loss of the normal allele), meaning most likely there was a gain/switch of function. In a validation cohort of an additional 95 sex-cord stromal ovarian tumors, specificity and sensitivity of the C134W mutation in adult-type GCTs was established [18]. This study was significant in three aspects: it was the first time that a consistent genetic event was associated with GCTs, the first time FOXL2 had been indicated to have an oncogenic role in any tumor, and the first example of a novel disease-defining pathognomonic driver mutation being discovered using massively parallel sequencing.
Although FOXL2 is known to be critical for the development of ovaries and is one of the early differentiation markers [19], somatic mutations in this transcription factor had not been linked to pathology before this study. The diagnostic implications of the C134W mutant FOXL2 have already become apparent [20,21], and some studies have since looked at potential mechanistic pathways. It has been suggested that the hotspot FOXL2 mutation might have very particular effect in a specific context: the mutant FOXL2 reduces the expression of gonadotropin-releasing hormone (GnRH) receptor and limits the GnRH-induced apoptosis seen in normal human granulosa cells [22]. This finding shows that tissue-specific pathways may be the bottlenecks that limit driver mutations that can arise in a specific cell of origin. Furthermore, the mutant FOXL2 has also been suggested to be less stable because of increased phosphorylation via GSK3β and MDM2mediated ubiquitination and proteasome degradation [23].  . WWTR1 (also known as TAZ) encodes a transcriptional coactivator containing the WW domain. This domain, which is named as such because of the two conserved tryptophans, mediates specific protein-protein interactions and has been implicated in Hippo signaling, a critical pathway in regulating organ size and keeping proliferation in check. WWTR1 is phosphorylated by lats tumor suppressor kinases (LATS2) which are key components of Hippo signaling and this in turn leads to binding by 14-3-3 proteins which lead to cytoplasmic localization and hence inactivation of WWTR1 [29]. Interestingly, in the EHE fusion, the 14-3-3 binding domain of WWTR1 is maintained; however, one of the critical LATS2 phosphorylation sites, namely Ser311, is lost. This perhaps could render the fusion protein partly resistant to inhibition by Hippo signaling. CAMTA1 is a transcriptional regulatory protein with the capacity to bind DNA. Because WWTR1 has no known DNA-binding motifs and since the DNA-binding domain of CAMTA1 is maintained, the new fusion protein might giveWWTR1 a new ability to bind DNA [25]. Fig. 2 Recent discoveries of mutations in forme fruste tumors revealing a broad range of pathways involved. Driver mutations from membrane receptors, to signal transducers, chromatin modifying, and remodeling complexes, as well as transcription factors and microRNA processing factors have been described in a variety of specific tumor pathologies, which may occur at low frequencies in the population. However, such insight, when followed by an understanding of the tumorigenic mechanisms involved, can vastly improve understanding of more common cancers and tumor biology in general. This figure is accompanied by Table 2 which includes a list of such tumors as well as the indicated mutations (marked by asterisks)

Establishment of new classification/subtype
In addition to identifying pathognomonic mutations by deep sequencing a few samples, sequencing studies of forme fruste tumors have led to new classification and subtype establishments. Three examples are discussed as follows.

YWHAE fusions in high-grade endometrial stromal sarcoma
Endometrial stromal sarcoma (ESS) is a malignancy of the uterus that had been previously linked with recurrent fusions: the fusion of JAZF1, a transcriptional repressor, with members of the polycomb complex including SUZ12, PHF1, and EPC1 [30, 31]. Yet, there remained a subset of ESS tumors, often with a higher histologic grade, that could not be demonstrated to carry fusions involving these genes. Lee and colleagues thus decided to look in depth at the genomics of such cases and ended up discovering recurrent fusions involving YWHAE and the FAM22 family [32]. As in earlier work, Lee and colleagues first drew on results from cytogenetic studies and noted a recurrent t(10;17)(q22;p13). Similar to the case with the WWTR1-CAMTA1 fusion in EHEs, this karyotype information greatly aided the analysis of the next-generation sequencing data such that, by the use of just one sample, they were able to focus on the YWHAE-FAM22A translocation event, later showing that in cases missing this particular fusion, YWHAE was fused to homologs of FAM22A such as

BCOR-CCNB3 bone sarcoma
An exemplary case of NGS defining a new pathology came through the study of peculiar small round cell bone sarcomas that lacked the EWSR1-ETS fusions of the top candidate in the . Although ameloblastomas are benign and rare tumors, this study emphasizes the mutational heterogeneity of histologically indistinguishable tumors depending on their location and highlights the significance of molecular classification. As mutant BRAF, commonly seen in melanomas, can be targeted with new therapies, this finding also has immediate therapeutic implications. Associations of tumor location and defining mutations have also been identified, for instance, in mengiomas: those that arise in the lateral and posterior regions bear NF2 mutation whereas those in the anterior and medial regions do not [40]. Even in rare tumors with seemingly distinctive histology, there exist subsets defined by specific molecular aberrations. The new disease subclassifications thereby identified may be of great significance for development and application of targeted therapeutics.

Insights into cancer mechanisms
The study of rare tumors has also expanded our knowledge about cancer pathways. We will focus on recent findings of recurrent mutations in chromatin remodelers, microRNA processors, and histones.

SWI/SNF mutations in ovarian epithelial tumors and meningiomas
Clear-cell ovarian carcinomas are the second most common type of ovarian cancer [41] and until 2010 were not very well studied despite evidence of relative genomic stability [42,43]. With whole transcriptome sequencing/exome sequencing, recurrent mutations in ARID1A, a member of the already established SWI/SNF chromatin remodeling complex were found [44,45]. The mutations were spread across the ARID1A gene and led to its inactivation, thus suggesting that this gene may function as a tumor suppressor. Although other core members of the SWI/SNF complex had been linked to cancer previously (SMARCB1 and SMARCA4 are known to have lost expression in a variety of tumors), this study showed that noncanonical members of the SWI/SNF complex could also play important roles in tumorigenesis. Furthermore, lack of evidence for mutations in other members of the complex hinted at a context-specific tumor suppressor role for the individual members of the SWI/SNF complex. Additionally, Wiegand and colleagues also showed that the mutation was present in precursor atypical endometriotic lesions of the tumor, and thus was likely an early driver of ovarian clear-cell carcinoma. ARID1A mutations were later found in a variety of other more common types of cancer including gastric adenocarcinomas [46] and colorectal cancers [47]. Small-cell carcinoma of the ovary of the hypercalcemic type (SCCOHT) is another rare but genetically stable tumor [48] that was discovered to have abnormalities in the SWI/ SNF complex. In this case, the core enzymatic unit of the protein complex, SMARCA4, was mutated in an inactivating fashion in the majority of cases, and almost all tumors of this specific diagnosis stained negatively for SMARCA4's protein product BRG1 [49][50][51][52]. Although mutations in SMARCA4 have been described in more common cancers such as lung adenocarcinomas [53], they occur in a fraction of cases and are not the obvious drivers of oncogenesis. The studies in SCCOHT with loss of SMARCA4 in almost all cases emphasized the driver role of SMARCA4 loss. Another example of a critical driver role of SWI/SNF mutations came through the NGS study of familial multiple spinal meningiomas [54]. In familial cases, which tested negative for previously described NF2 or SMARCB1 mutations, germline SMARCE1 mutations were identified through exome sequencing. Again, the protein was lost in the tumor samples but not in normal tissue, thus suggesting a classic Kundson biallelic inactivation and a tumor suppressor role of SMARCE1.
It should be noted that the reason for disease specificity of SWI/SNF member mutations and indeed the steps in tumorigenesis associated with their loss are not clear. SWI/SNF is thought to regulate the expression of many genes and interacts with many critical cancer pathways from cell cycle regulation to hedgehog and Wnt signaling. Indeed, it has been suggested that perhaps the remaining complex, which still assembles without the mutated members, might act as an oncoprotein and drive tumorigenesis [55]. Thus, much is still to be clarified in this area; however, since the establishment of the association of the SWI/SNF complex with cancer in rather rare entities, we know today that about 20 % of all cancers have mutations in this complex [56]. However, the impact of these mutations is by and large yet to be established.

MicroRNA processing mutations in nonepithelial ovarian tumors
Given abnormalities in microRNA levels in certain cancer, it was thought that the genes encoding proteins involved in microRNA processing might also be of significance in oncogenesis. Germline mutations in DICER1 were identified in the rare familial pleuropulmonary blastoma-family tumor and dysplasia syndrome [57]. However, the first evidence for somatic oncogenic mutations of DICER1 came from the study of nonepithelial ovarian tumors [58]. Recurrent somatic hotspot mutations in DICER1 were identified across nonepithelial ovarian tumor types and were most predominantly seen in Sertoli-Leydig cell tumors [59]. Although low expression of DICER1 has been previously associated with worse prognosis in breast cancer [60] and ovarian tumors [61], the study of these nonepithelial ovarian tumors changed the paradigm as for the first time it was found that a hotspot genetic aberration in DICER1 can drive cancer through the combination of loss of one allele and a functionally deficient protein, this is an aberration of the classic two-hit hypothesis [59]. In actuality, DICER1 in nonepithelial ovarian tumors does not fit traditional tumor suppressor or oncogene models. Rather, there seems to be a mix of the two models involved in tumorigenesis. There is an inherited inactivation of one copy of the genes, and the remaining allele is not totally inactivated somatically, which would be lethal in most cells rather is hypomorphic via hotspot mutations (Fig. 2). The hotspot mutations are found in the RNaseIIIb metal-binding site, reducing RNaseIIIb activity and leading to a global loss in the processing of mature 5p microRNAs but maintenance of 3p processing [62]. Later studies showed that oncogenic mutations in DROSHA, another microRNA processing gene, and associated global microRNA changes also occur in Wilm's tumor [63]. Therefore, processors of microRNA represent another family of cancer-associated proteins and forme fruste tumors were significant in this realization.
Histone mutation in bone and central nervous system tumors Another prime example of insights into cancer biology comes from the identification of mutations in histones in forme fruste tumors. Mutations in H3F3A, which encodes histone 3.3, were identified in pediatric diffuse intrinsic pontinegliomas (DIPGs) [64] and pediatric glioblastomas [65]. Histone 3.3 is a member of the histone 3 family which is associated with active chromatin and is incorporated into chromatin throughout the cell cycle [66][67][68]. Interestingly, an independently regulated gene named H3F3B also seemingly encodes the same histone 3.3 protein; however, mutations in this gene were not identified in DIPGs or glioblastomas. In a seminal study, Behjati et al. described H3F3A driver mutations in another tumor type: chondroblastomas [69]. Additionally, they also discovered novel H3F3B mutations in giant cell tumors of bone [69]. Chondroblastomas and giant cell tumors of bone have similarities such as clinical presentation in the bone epiphysis and the presence of large numbers of osteoclastic giant cells; however, they tend to affect different age groups and have different clinical outcomes. As mentioned, the two genes encode the same protein, yet Behjati et al. showed a clear predilection toward H3F3A or H3F3B depending on tumor type. Since there is no expression difference between these genes in giant cell bone tumor versus chondroblastomas, temporal expression, for instance at the time of tumor formation, is a possibility suggested by the authors [69]. The above studies were of great value shifting the focus from histone modifying complexes to histones themselves and showing that mutations in histones can be driver mutations.

Concluding remarks
It should be noted that rare tumors with homogenous clinical behavior are not always easy to study, and the examples used above are success stories that have benefitted from the relative ease of interpreting NGS results when the tumors are truly simple genomically. Embryonic rhabdomyosarcomas have clinical and morphologic features of forme fruste tumors but ended up revealing a complex genome with various tumorigenic mechanisms identified in different cases, unlike the more consistent drivers seen in the tumor types described above [70,71]. Similarly, our own group's study of epithelioid sarcoma has revealed that despite its unique and consistent pathology and biology, this tumor has a relatively complex genome.
Yet, as a whole, forme fruste tumors have been particularly informative in deep sequencing studies, expanding on our knowledge of cancer biology in a resource-efficient manner. Here, we have cited several successful examples of recent findings that have lead to the discovery of pathognomonic mutations, the establishing new subtypes and classifications, such as the case of high-grade ESS, and providing insight into mechanisms of cancer formation such as findings of SWI/SNF and microRNA processing gene abnormalities. This is not to say that such discoveries are not possible in more common, genetically complex cancers, but in forme fruste tumors, the reduced complexities in the genome allows for identification of driver oncogenic events with the use of very few samples. Part of the reason for success in studying these tumors can also be attributed to the fact that they have tended to be understudied and not so much is known about them. However, their rarity comes with the challenge of a lack of banked samples appropriate for the nucleic acid extractions needed for deep sequencing. Recent advancements in sequencing technologies mean that formalin-fixed paraffin-embedded tissues can now be also used for deep sequencing, and hence, some of the challenges in studying forme fruste tumors are already being overcome [39,72].

Beyond next-generation sequencing
The focus of this review has been on next-generation sequencing and its role as a molecular microscope helping define tumors in a new way. However, sequencing technologies and associated analytic capacities are advancing at a rapid rate. The ability to study clonal evolution and diversity, which has been successfully utilized in the breast cancer field for instance [73], can be of great value if applied to rare conditions to see the degree of genomic heterogeneity in these seemingly simpler tumors (which should have a more interpretable signal-to-noise ratio for subclone tracking). Additionally, tools such as single-cell RT-PCR and single-cell next-generation sequencing that are in development and reviewed elsewhere [74] can reveal a whole new window on intratumoral heterogeneity, and forme fruste tumors, in particular biphasic cancers like synovial sarcoma, can again provide models that may well prove easier to study. We believe that with all these developing methodologies, rare tumors can be a source of breakthroughs that give clearer answers at lower cost, with fewer samples needed to make discoveries.