Introduction

Colorectal cancer (CRC) is a molecularly heterogeneous disease that arises from a gradual accumulation of genetic and epigenetic changes [1,2,3,4]. Most CRC are sporadic, while a small proportion show a hereditary background leading to their occurrence in the context of a syndrome. Based on extensive transcriptome and genome analysis, different molecular subtypes of CRC have been described in recent years [5, 6]. Furthermore, there are molecular biomarkers that are relevant for prognostic assessment and predicting response to specific forms of therapy. In the following, we describe the current body of knowledge on the molecular mechanisms of colorectal carcinogenesis, the distinct molecular subtypes, and the role of molecular tests in the treatment of CRC.

The adenoma–carcinoma sequence

Most CRC arise from precancerous polyps, which are broadly categorized as conventional adenomas or serrated lesions [1, 6,7,8]. These arise from alterations in DNA repair and cell proliferation, with sequential changes in genes crucial for growth regulation. These genetic changes are accompanied by visible and increasing histological aberrations. This gradual progression may therefore be considered as a prime example of sequential tumorigenesis. Mutations in the APC (adenomatosis polyposis coli) gene are characteristic of conventional adenomas, BRAF mutations are found in serrated lesions. Subsequent genetic alterations vary by pathway, as does the time it takes for invasive carcinoma to develop. The different pathways are also reflected in a different histological and clinical presentation (Fig. 1, Table 1). In less than 10% of patients, a hereditary syndrome underlies the disease. The most common is Lynch syndrome (also known as hereditary non-polyposis colorectal cancer syndrome, HNPCC), followed by familial adenomatous polyposis (FAP) and other rare polyposis. The lifetime risk of developing CRC in such patients is around 80–100%, depending on the syndrome. The tumors that are associated with these syndromes appear in young to middle aged persons and include not only CRC, but also other intestinal cancers or, depending on the syndrome, cancers of the genital and urogenital tract.

Fig. 1
figure 1

Examples of colorectal cancer. ab Microsatellite-stable colon cancer with a typical picture of an adenocarcinoma with gland formation (a HE stain, b MLH‑1 immunohistochemistry with preserved nuclear expression). cf Microsatellite-instable cecum carcinoma, with solid growth pattern and pronounced accompanying inflammatory infiltrate (c HE-staining overview; d MLH‑1 immunohistochemistry with loss of expression with positive internal control in the accompanying inflammatory infiltrate; e note the transition from a serrated lesion on the right with dysplasia on the left as precursor of the carcinoma with f loss of MLH‑1 expression in the dysplastic areas but not the non-dysplastic serrated epithelium)

Table 1 Comparison of molecular classifications of colorectal cancer: The Cancer Genome Network (TCGA) classification and consensus molecular subtype (CMS) classification

Chromosomal instability

Around 80% of CRCs arise via the chromosomal instability (CIN) pathway [1,2,3, 5]. In most cases, these tumors are characterized by numerous numerical (= somatic copy number alterations, SCNA) and structural chromosomal aberrations (losses and amplifications, aneuploidy, translocations, loss of heterozygosity [LOH]), which are associated with alterations, e.g., in the genes APC, KRAS, SMAD4, or TP53, but they are not hypermutated. CRCs that arise from conventional adenomas usually follow the CIN pathway. Clinically relevant signaling pathways involved in CIN include the Wnt and MAPK pathways involving other molecules such as beta-catenin, which accumulates in the nucleus and activates transcription. The latency to cancer development via CIN is often more than 10 years.

Microsatellite instability

Microsatellite instability (MSI), which can be found in about 15% of CRCs, is characterized by a generalized instability of short, tandemly repeated DNA sequences called microsatellites. MSI tumors have up to 100 times more somatic mutations than non-hypermutated cancers such as CIN-CRC [1, 6, 8]. MSI may be due to mutations in the mismatch repair (MMR) genes MLH1, MSH2, MSH6, or PMS2, or to silencing of the MLH1 promotor through hypermethylation. Lynch syndrome is associated with MSI and is based on a germline mutation in one of the MMR genes [3]. In the sporadic setting, MSI is indicative of the serrated pathway [9]. In contrast to CIN-CRC, MSI-CRC show distinct morphological features, including mucinous histology and high numbers of tumor-infiltrating lymphocytes [9]. MSI is determined diagnostically, either by molecular pathology by analyzing a defined marker panel by polymerase chain reaction (PCR) or using complex methods such as next-generation sequencing (NGS), or—approximately—by means of immunohistochemistry for the products of the MMR genes [10, 11].

CpG island methylation phenotype

Sporadic MSI tumors can also result from methylation of the CpG-rich promotor sequence (CpG island methylation phenotype, CIMP), e.g., in the MLH1 gene, which then leads to MMR deficiency [1, 2, 5]. MSI tumors with hypermethylation account for three quarters of hypermutated CRC, those with somatic mutations in the MMR genes for a quarter. The CIMP way shows overlaps with the MSI way. However, CIMP-positive tumors are a specific subgroup with a high proportion of BRAF mutations, and thus show a strong association with serrated lesions as precursors. A high percentage of so-called interval carcinomas (i.e., carcinomas that develop within the recommended screening interval of 10 years) arise in this way, with the latency for tumor development being 3–5 years [3].

Molecular subtyping

Since the molecular basis of tumors not only determines their biology but also the response to various forms of therapy, a large number of studies have attempted to comprehensively characterize CRC molecularly and to work out particular subtypes that may have prognostic and therapeutic relevance. In addition to genetic and epigenetic classifications such as those by Jass or Ogino, which are based on combinations of CIN, CIMP, and MSS/MSI and the BRAF status [12], CRC was analyzed by The Cancer Genome Atlas Project (TCGA) through a genome-wide analysis including mRNA and miRNA expression analyses and classified into different subtypes [13]:

  1. a)

    hypermutated cancers (approximately 15%): in this group approximately 75% had MSI due to hypermethylation and MLH1 silencing, while 25% of the tumors showed somatic MMR gene or polymerase E (POLE) mutations;

  2. b)

    non-hypermutated cancers (85%): these tumors generally show CIN. Colon and rectal cancers were found in this group, which, contrary to expectations, showed considerable similarities in their genomic alterations.

The Colorectal Cancer Subtyping Consortium compiled genomic databases of over 4000 tumors, including the TCGA source, and combined them with their own transcriptome analyses [8]. In this way, four so-called molecular consensus subtypes (CMS) with different molecular properties and different histomorphologies could be identified. A fifth subtype showed a mixed phenotype without clear assignment:

  1. a)

    CMS1 (“immune type,” approximately 15%): hypermutated, CIMP-positive phenotype, with BRAF mutation with a high level of MSI. This leads to upregulation of immune genes. CMS1-CRC are associated with the serrated pathway, show a rather solid, “medullary” growth with a pronounced accompanying inflammatory reaction and are more likely to be found in the right colon. They are considered candidates for immunotherapy.

  2. b)

    CMS2 (“canonical type,” approximately 40%): not hypermutated and have the highest chromosomal instability of all four groups. They are MSS, have high SCNA, and often show activated WNT and Myc pathways. Most arise within the conventional adenoma–carcinoma-sequence and are “classical,” non-mucinous, gland-forming carcinomas without conspicuous infiltration of immune cells.

  3. c)

    CMS3 (“metabolic type,” approximately 15%): characterized by the deregulation of metabolic pathways and mutations in the KRAS gene. In addition, CMS 3-tumors show mixed genomic and epigenomic patterns, some hypermutated, some with moderate or low MSI and intermediate CIMP status, and also a mixed phenotype.

  4. d)

    CMS4 (“mesenchymal type,” approximately 30%): MSS, shows CpG hypermethylation, and a striking high number of SCNA. CMS 4‑tumors have an infiltrative growth with pronounced angiogenesis via the activation of an epithelial–mesenchymal transformation. They respond very poorly to standard therapy and have the worst prognosis of the four CMS types.

The identification of CMS subtypes, and particularly the CMS2–4 subtypes was based on comprehensive molecular workup. Recently, RNA- [14, 15] and miRNA-based [16] classifiers that are supposed to also work on formalin-fixed paraffin-embedded routine pathological material have been developed and had a very good accuracy for predicting CMS. Moreover, a panel of immunohistochemical markers has been proposed to be able to classify CRC according to CMS subtypes, but the discrimination between CMS2 and CMS3 in particular seems to be challenging. Also due to the lack of clinical consequences, CMS subtyping beyond MSI (i.e., CMS1) is currently not performed for routine pathologic diagnostics [17,18,19].

Predictive molecular biomarkers

Diagnostically used molecular alterations associated with a response to certain treatment concepts and substances are mutations in the RAS genes, the BRAF gene, the PIK3CA gene, and MSI [2, 7]:

The RAS-encoded proteins are involved in cellular signal transduction. RAS genes (especially KRAS, but also NRAS and HRAS) are mutated in up to 50% of sporadic CRCs, with KRAS mutations being observed early in carcinogenesis. The mutations occurring in KRAS codons 12 and 13 of exon 2 and codon 61 of exon 3 lead to activation of the RAS/MAPK signaling pathway and the PI3K-AKT signaling pathway. CRC with wildtype KRAS and NRAS respond better to anti-EGFR therapy with cetuximab or panitumumab [20].

BRAF, a RAF threonine kinase oncogene, encodes a serine/threonine kinase which is activated by its interaction with RAS-GTP. Mutations in the BRAF gene, which are almost exclusively the V600E missense mutation, occur in a total of about 10% of CRC. This induces activation of the MEK pathway independently of KRAS. While a BRAF mutation is detected in less than 10% of sporadic tumors, the rate of sporadic MSI cancers is significantly higher, at up to 80%. BRAF mutation determination is used to rule out Lynch syndrome in MSI-CRCs. RAS and BRAF mutations are usually mutually exclusive. “Quadruple negative” CRCs with wildtype KRAS, NRAS, BRAF, and PIK3CA are more responsive to targeted anti-EGFR therapy [20].

PIK3CA mutations, predominantly in exons 9 and 20, are found in approximately 10–20% of CRC. In RAS-wildtype CRC, the presence of a PIK3CA mutation is associated with higher tumor aggressiveness and poorer response to anti-EGFR therapy. However, these tumors respond better to adjuvant therapy with acetylsalicylic acid [21].

Testing for MSI is essential for diagnosing Lynch syndrome. Apart from that, sporadic microsatellite-unstable CRC show a better overall prognosis than MSS. However, MSI-CRC respond less well to 5‑fluorouracil-based chemotherapy. Molecular properties of tumors also influence the tumor microenvironment and the immune response to the tumor. Hypermutated CRC (such as MSI tumors) usually show a more pronounced immune response and an upregulation of immune checkpoint molecules (e.g., Programmed cell death protein 1 [PD‑1], Programmed death‑ligand 1 [PD‑L1], Cytotoxic T‑lymphocyte‑associated protein 4 [CTLA‑4]). MSI is thus an important predictive biomarker for the efficacy of immune checkpoint inhibitors, with the anti-PD‑1 inhibitor pembrolizumab currently being used as a first-line therapy for MSI‑H metastatic CRC (as approved by the EMA and G‑BA after the KEYNOTE-177 study) [11, 12]. In contrast to colon cancer, the frequency of MSI in rectal cancer is rather low. Data on the prognostic value or an association with response to conventional neoadjuvant treatment by radiochemotherapy or radiotherapy are conflicting [9]. However, the extremely promising result of a recent study investigating immunotherapy as a neoadjuvant therapy concept point to the need of MSI testing in rectal cancer patients: patients with mismatch repair-deficient, locally advanced rectal cancer were treated with neoadjuvant PD‑1 blockade with the anti PD‑1 antibody dostarlimab alone, which resulted in a complete clinical response in all examined patients as measured by the combination of rectal MRI, visual endoscopic inspection, and digital rectal examination for at least 6 months of follow-up [22]. Despite the very low number of patients included in this study and the current lack of long-term results, these results suggest that both radiochemotherapy and surgery could be omitted in this molecularly defined subgroup of rectal cancer, with substantial implications for the quality of life of patients.

In contrast to MSI status, the impact of immunohistochemical analysis of PD-L1 as a biomarker for immunotherapy has yet to be clinically established [23].

In addition to immunotherapy, there are other promising and targeted therapies, e.g., directed against c‑Met or HER2, in which molecular, “druggable” alterations in these genes or their gene products could serve as potential predictive biomarkers [3]. Regarding HER2, recent results of clinical trials and retrospective analyses show that up to 7% of CRCs harbor HER2 amplification or HER2 somatic mutations [24, 25]. Immunohistochemistry (IHC) and fluorescence in situ hybridization (FISH) are typically used to identify HER2 amplification in CRC, using modified and customized criteria for assessing HER2 positivity in CRC [26]. Next-generation sequencing (NGS) can be applied for detection of HER2 gene alterations with a strong correlation to FISH as well [24, 27]. The role of HER2 mutations for response prediction, however, has still to be determined [26].

For standardized and routine pathological reporting, national (e.g., the Royal College of Pathologists, UK [28]) and international (e.g., the International Collaboration for Cancer Reporting, ICCR [29]) guidelines recommend mismatch repair (MMR) protein immunohistochemistry and/or microsatellite instability (MSI) testing, MLH1 promoter hypermethylation testing, and testing for the BRAF V600E mutation as well as KRAS and NRAS mutations. The role of HER2 is considered as emerging [30]. The College of American Pathologists (CAP) expands this list to PIK3CA and PTEN immunohistochemical and mutational analysis despite limited therapeutic consequences outside of study protocols [31].

Crosstalk between histopathology, molecular subtyping, and treatment

Some histopathological tumor characteristics show a very strong association with a specific molecular genetic profile. For example, the so-called medullary growth pattern and increased intraepithelial lymphocytes are associated with MSI/CMS1 [12].

Digital image analysis of scanned histology slides in combination with machine learning and other techniques of artificial intelligence is a highly emerging field in pathology, both in the research and diagnostic setting. Initial studies have shown the potential of highly accurate prediction of molecular features, such as MSI and also BRAF mutations in CRC using deep learning methods [32,33,34]. Such digital tests could provide the opportunity to identify therapeutic targets based on morphomolecular features without elaborate and cost-intense laboratory tests. Besides MSI/CMS1, an example could be the identification of CMS4 tumors that show a highly infiltrative growth pattern with high grade of tumor budding, which has been shown to represent a prognostic factor itself and is associated with a poorer prognosis [35].

The mesenchymal colon cancer subtype CMS4 appears to be amenable to imatinib therapy. As could be shown, this therapy leads to a shift of CMS4 tumors to more epithelial phenotypes, which could eventually sensitize CMS4 tumors to standard chemotherapy regimens. In this context, however, further research is required. Apart from that, it appears imatinib induces a gene expression program in CMS4 colon cancers that is associated with improved prognosis [36].

Conclusion

CRC is not a homogeneous tumor entity. Rather, there are different molecular development mechanisms that usually form molecular genetically, morphologically, and phenotypically different tumors. The molecular genetic differences are also reflected in different tumor biology and tumor aggressiveness, but potentially also represent the basis for application of different therapeutic options tailored to the molecular subtype [2, 20, 37].

Take home message

CRC is a molecularly heterogeneous disease. The Consensus Molecular Subtyping recognizes four subtypes, including microsatellite instability with a highly relevant predictive value.