Introduction

Cancer is a class of complex diseases characterized by abnormal cellular growth and the potential to invade healthy tissues and organs. The incidence of cancer and cancer-related death rates has been on the rise globally [1]. With more than 18 million new cases, and more than 9 million deaths per year, cancer is the first or the second cause of death before the age of 70 in 91 out of 172 countries [1].

Cancer survival rates vary substantially between different types of cancer, where diagnosis at late stage worsens prognosis even for treatable cancers. Metastatic spread to distal sites, which is the definition of late-stage cancer, accounts for 90% of cancer-related deaths [2]. There is a case to be made that population-wide screening and early cancer detection might have a substantial positive impact on cancer morbidity and mortality [3]. Despite this need for earlier detection of cancer, screening tests with proven clinical utility are uncommon. The advent of high-throughput technologies and computational tools is likely to facilitate early diagnosis in years to come. However, adoption of new biomarkers for early cancer diagnosis requires careful consideration of available evidence for associated benefits, costs, and potential harms [4].

For cancers in which early diagnosis is possible and treatment options exist, favorable outcomes are often impeded by our limited understanding of patient stratification to guide treatment decisions. The current clinical practice for diagnosis and treatment decisions is commonly based on methods like tissue biopsy, imaging techniques (CT, MRI, or PET), and cytology. The information gained from these approaches is coarse-grained, because they provide little detail at the molecular level about the underlying cancer. This can complicate treatment decisions, because the interpatient tumor heterogeneity often dictates the response to the available therapies [5, 6].

Rapid advances in omics technologies, such as genomics, transcriptomics, epigenomics, proteomics, and metabolomics, can be used to profile biopsied tumor samples at great detail, enabling precision oncology [7, 8]. However, even such detailed analyses render only static snapshots of the tumor tissue. Rather than being homogeneous, tumors exhibit spatial heterogeneity and undergo Darwinian evolution [9,10,11,12], which can confound prognosis or render treatment decisions ineffective. Unfortunately, solid tumors can only be repeatedly biopsied with invasive procedures, necessitating alternative non-invasive diagnostic strategies.

Precision oncology is turning towards liquid biopsy as an approach for non-invasive and risk-free detection and monitoring of cancer. Liquid biopsy relies on deriving diagnostic information about cancer by detecting and measuring tumor-related biomarkers in non-solid biological tissues (most commonly blood, urine, and stool) [13,14,15,16]. Liquid biopsy can target diverse classes of biomarkers, such as circulating tumor cells (CTC), circulating free and tumor DNA (cfDNA and ctDNA, respectively), RNA, exosomes with their corresponding biological cargo, circulating proteins, or metabolites [16,17,18]. Integration of these omics technologies enables interrogation of clinical samples for early diagnosis, prediction of therapy response for patient stratification [18], and longitudinal monitoring in cancer patients. For cancers which can be treated, precise localization, burden quantification, and knowledge of molecular signatures can be used to longitudinally tailor treatments. Furthermore, accurate cancer profiling can readily identify non-aggressive cancers and help patients avoid overtreatment.

Biomarkers are defined as a “characteristic that is measured as an indicator of normal biological processes, pathogenic processes, or a response to an exposure or intervention” [19]. Therefore, effective implementation of personalized medicine depends on identifying substances, patterns, or activities which can be reliably assayed as indicators for differential diagnosis (diagnostic), classifying tumors based on the probable outcome in the absence of treatment (prognostic) or assessing the probability that a patient will respond positively to a particular treatment (predictive). The goal of the cancer biomarker research is to develop robust, sensitive, specific, and cost-effective strategies for these clinical uses.

A survey of cancer biomarker literature shows a steady increase of interest in the field (Fig. 1). However, only a fraction of putative cancer biomarkers has made it to clinical trials, and only precious few validated in clinical trials (Fig. 1). This discrepancy has been a topic of several reviews [20, 21] and will be discussed in later sections. Briefly, in order to have clinical utility, performance of cancer biomarkers should conform to a set of analytical and clinical requirements [22]. Analytical requirements, such as precision, trueness, limit of detection and quantitation, linearity range, and specificity, while necessary, are not sufficient evidence of clinical validity or utility. Clinical validity of a biomarker is the ability of the biomarker to accurately identify patients with the targeted pathological state [23], while the utility measures the benefit (such as reduced mortality) of using a biomarker in clinical settings [24]. Evaluating the clinical performance [24] (intended use, clinical specificity and sensitivity, ROC analysis, positive and negative predictive values, cost-effectives, fast turn-around) of a biomarker requires carefully designed studies with large cohorts. Adopting biomarkers into standard clinical use without evidence of their clinical utility can be highly problematic. For example, population-wide screening based on biomarkers with poor specificity or low positive predictive values can translate into large numbers of patients undergoing unnecessary, expensive, and potentially harmful procedures. However, validation and utility studies are lengthy and expensive, and often out of reach for most laboratories conducting basic research. On the other hand, investment of time and resources into rigorous validation of biomarkers and molecular-level profiling of tumors might become more valuable than histopathological information for therapeutic decisions and drug approval in the future [25].

Fig. 1
figure 1

Cancer biomarker studies reported in PubMed. Grey bars show the percentage of biomarker articles among all journal articles and letters indexed in PubMed in English (articles with substance name “Biomarkers, Tumor” MeSH term “Humans,” excluding MeSH terms “Tumor Cells, Cultured, and Cell Line”). The orange bars show the percentage of PubMed articles on circulating biomarkers (biomarker articles and letters with MeSH terms “blood”, “plasma”, “serum”, “urine”, or “saliva”). Cyan bars show the subset of all biomarker articles and letters from the trial, multicenter, validation, clinical, or evaluation studies

This review discusses current standard and emerging approaches in the space of liquid biopsy. We categorize cancer biomarkers by omics layers (Fig. 2) and for each omics, we first evaluate the scientific rationale for its relevance in cancer. Second, we review progress and challenges associated with established or commercially exploited emerging technologies. And lastly, we examine and highlight some of the recent success stories and challenges in the clinical translation of cancer biomarkers. Some of the recently developed technologies still far from producing evidence of clinical validity are outside of the scope of this paper and reviewed elsewhere [17].

Fig. 2
figure 2

Liquid biopsy and biomarkers for precision medicine. Liquid biopsy is used to collect body fluid samples from individual patients. Samples typically contain multiple sources of biomarkers, which can in turn be analyzed using modern high-throughput omics technologies. High dimensional and heterogeneous data can be integrated into biological networks with interacting biochemical circuits and pathways. This complex multi-dimensional data can be analyzed and reduced using statistical and machine learning algorithms, with the end goal of producing robust and accurate classifiers for diagnosis, prediction of response to treatment and prognosis. Finally, the same scheme can be used for longitudinal monitoring of patients, obviating the need for repeated invasive biopsies

Sources of Biomarkers

Genome

Scientific Rationale

The onset and progression of cancer is dependent of alterations to DNA sequence of the genomes of cancer cells. The illumination of these alterations and their functional consequences can help improve clinical management of cancer patients. The investigation of genetic underpinnings of cancer led to the discovery of genetic and genomic cancer risk factors and biomarkers, some of which are well established in cancer research and clinical use. A classic example of genetic risk assessments is based on identifying inherited alleles of BRCA1 and BRCA2 genes associated with increased risk for breast cancer [26]. Genotyping cancer biopsy samples by PCR, Sanger, and next-generation sequencing is becoming standard clinical practice [27], a trend that is likely to continue as the whole genome sequencing (WGS) costs approach $1,000 per genome [28] (however, see refs [29, 30] for a critical evaluation of costs for NGS in clinical practice). Unfortunately, sample collection for tissue genotyping requires invasive procedures and returns only static information about a disease which is ultimately spatiotemporally dynamic. Thus, circulating DNA [31,32,33] is being investigated extensively as an emerging biomarker for personalized medicine.

Circulating or cell free DNA (cfDNA) is released from healthy as well as cancer tissues. cfDNA probably originates from apoptotic and necrotic cells, but the exact origin and mechanism of release are still a topic of investigation [34, 35]. Primary tumors, circulating tumor cells, occult and overt metastases can release DNA, increasing concentrations of cell-free DNA (cfDNA) in bodily fluids of cancer patients, compared to healthy people [31, 36]. Circulating tumor DNA (ctDNA) has been found in blood, urine, stool, and saliva. In people with cancer, about 1% of circulating DNA is ctDNA [35, 37], but this percentage can vary substantially between patients and is affected by stage and tumor volume [34]. Because ctDNA displays mutations characteristic of the progenitor tumor, it can serve as a biomarker for diagnosis, prognosis, and prediction.

The half-life of ctDNA is short (2–5 h in blood and urine [33, 34, 38]) and concentrations can decrease rapidly in response to treatment [33]. ctDNA levels are therefore useful as a monitoring biomarker [33] for estimating cancer burden in patients and assessing treatment responses.

Current and Emerging Technologies

Current and upcoming genomics technologies to detect and quantify ctDNA range from targeted to unbiased genome-wide methods and vary dramatically in their analytical sensitivity, specificity, costs, and throughput. Despite the technological progress of the field, reliable detection of low ctDNA amounts and variants with low mutant allele frequencies (MAF) can be challenging. Because ctDNA typically makes up only a small fraction of cfDNA, detection of rare tumor alleles might require large amounts of input materials and prohibitive sample volumes. Genomics technologies and their limitations have been a topic of a recent comprehensive review [34] and we will only cover them briefly in the following paragraphs.

Detection and quantification of predetermined alleles from ctDNA by Sanger sequencing, single and multi-locus PCRs [39], and qPCR [40, 41] is established in clinical oncology (e.g., cobas assay [40, 42, 43]). However, these methods are typically limited to detecting MAFs well above 1% [34, 44, 45], which would leave many cancers smaller than 10 cm3 undetected because their expected plasma MAFs can be 0.1% or lower [17, 46]. Advances such as digital droplet PCR (ddPCR) [47] and BEAMing [48, 49] allow absolute quantification of allele frequencies as low as 0.01%.

Next-generation sequencing (NGS) methods can be used for both targeted and unbiased (the whole genome (WGS) approach) identification of patient-specific structural aberrations [50], copy number variations, and SNPs. However, NGS approaches are limited by the low abundance of ctDNA fraction in total cfDNA, short ctDNA fragment length, as well as by high error rates that preclude reliable detection of rare variants. The sensitivity of targeted NGS methods can be boosted by identifying cancer-specific mutations (from patient-specific tumor tissue samples or known cancer-related alleles), and then applying this information as a selector in liquid biopsy samples. This approach is exemplified by tagged amplicon deep sequencing (TAm-Seq) [51], and enrichment by hybrid capture and molecular barcoding methods such as Safe-Seq [52] and CAPP-Seq [53] to detect MAF as low as 0.00025%. The record analytical sensitivity of detecting 4 in 105 cfDNA molecules was achieved by combining CAPP-Seq with molecular barcoding and in silico digital error suppression [54]. Sensitivity of unbiased methods can be improved by using methods such as whole genome amplification (WGA) [55].

Whole exome sequencing (WES) from liquid biopsy samples offers the ability to longitudinally follow cancer patients under treatment and monitor the appearance of resistance-conferring mutations. For example, in a proof-of-principle study, WES from plasma has been used to monitor six patients with metastatic breast, ovarian, and lung cancers for up to 2 years, revealing resistance-conferring mutations evolving in response to treatment [56]. However, such unbiased approaches have low sensitivity and can only be used on patients with advanced cancer.

Short-read massively parallel sequencing is now being increasingly used in the biomarker discovery and routine clinical applications. However, the aforementioned short read technologies suffer from GC bias, difficulty resolving complex and repeating sequences and phasing alleles. Genotyping liquid biopsy samples using long read single molecule sequencing offers a new approach that mitigates some of these limitations. There are currently two commercially available single molecule sequencing platforms with long reads commercialized by Pacific Biosciences and Oxford Nanopore Technologies. These two sequencing platforms enable phasing of nucleotide variants [57], sequencing of complex repetitive regions, and structural rearrangements [58], and epigenetic modifications [59, 60]. Furthermore, they allow rapid sequencing runs, which is desirable in diagnostic settings [61].

Single molecule real-time sequencing (SMRT) [62] platform by Pacific Biosciences is the more mature of the two platforms. SMRT technology derives sequence data from real time recording of incorporation of fluorescent nucleotides by polymerases immobilized on solid phase [62]. The polymerase reads the circularized template, resulting in long reads (> 10 kb on average, > 80 kb maximal read length [63]), albeit with a relatively high error rate (~ 13%) [63]. Where shorter (< 10 kb) reads suffice, templates can be read multiple times, allowing computation of highly accurate circular consensus sequences (CCS) from individual single passes (subreads). CCS-based sequencing has already been used to detect mutations at frequencies below 0.5% in the stool of CRC patients [64].

The long-read platform commercialized by Oxford Nanopore Technologies (ONT) [65] is a relative newcomer to the market. Nanopore sequencing platform derives sequence information from nucleotide-specific ionic current changes observed as ssDNA passes through a membrane-bound nanopore [66]. Standard ONT protocols result in DNA fragments of ~ 8 kb; however, there is no technical limitation to read length, apart from the length of DNA templates in the sample [63]. The biggest drawback of the ONT platform is its high error rate (~ 15%) [67], with no possibility of sequencing the same strand multiple times. The accuracy can be improved through 2D nanopore sequencing, where the two strands of a dsDNA template are linked by a hairpin and sequenced consecutively as ssDNA, allowing a consensus sequence to be computed.

The potential of SMRT and ONT platforms for advancing liquid biopsy is big, but long read technologies suffer from several drawbacks that currently limit more widespread use and implementation into routine clinical settings. Notably, per-base costs are considerably higher than for short read sequencing and currently available bioinformatics algorithms are less mature than for short read sequence analysis [68]. However, recent developments in base-calling software [69, 70] and future improvements in sequencing chemistries are expected to facilitate adoption of long read sequencing platforms into biomarker development and clinical use.

Translational Status

Molecular profiling of tumor tissues by genomics techniques has a critical role in advancing cancer research and is gradually becoming standard clinical practice. Given the wealth of knowledge these approaches accumulated about cancer molecular biology and technical advances in sequencing, it is not surprising that ctDNA from liquid biopsies are already yielding biomarkers validated in clinical trials. Furthermore, some of the assays have already been FDA-approved and are moving into clinical practice.

The first FDA-approved liquid biopsy test on the market is cobas EGFR Mutation Test v2 (Roche Diagnostics Inc) [42, 43]. The assay relies on detecting cancer-specific mutations in cfDNA and is used as a companion diagnostic test for detection of mutations in the epidermal growth factor receptor (EGFR) gene to identify patients with metastatic non-small cell lung cancer (NSCLC) eligible for treatment with erlotinib.

One of the standing problems in clinical practice is identifying patients with high risk of recurrence after surgery. The fact that plasma ctDNA can correlate with minimal residual disease, cancer-specific MAFs can serve as proxies for recurrence risk. Patient-specific cancer alleles can be specified by analyzing the cancer tissue at loci known to be recurrently mutated in cancer. The locus with the highest MAF can be chosen, and its abundance in plasma ctDNA can be used as a prognostic biomarker [71]. This approach was used to identify high-risk colon cancer patients. Specifically, quantification of patient-specific mutated alleles in ctDNA by dPCR in post-operative samples showed that ctDNA-positive patients were at a higher risk for recurrence in a prospective multicenter cohort of patients with resected stage II colon cancer (HR = 18; N = 230) [71].

In addition to revealing cancer-specific SNPs, ctDNA reflects other tumor-specific changes, such as chromosomal rearrangements. The potential of combining patient-specific somatic mutations and structural variants (first identified in tumor samples by SNP [72] arrays and WGS [72, 73], and then assayed in ctDNA by ddPCR) to estimate the risk of recurrence has been tested in two studies on patients with colorectal cancer [72, 73]. The first study investigated the ability of ctDNA to predict recurrence in retrospectively selected six relapsing and five non-relapsing patients following colorectal surgery [72]. Somatic structural variants in ctDNA were shown to be valuable prognostic biomarkers in terms of detecting post-surgery relapse with 100% sensitivity and 100% specificity. Importantly, relapse was detected on average 10 months earlier than when using conventional follow-up. The second study [73] was a part of a single-site prospective observational cohort with colorectal cancer patients undergoing surgical treatment, split into two cohorts. The investigation off the first cohort (N = 27 retrospectively selected with 1:1 ratio of relapsing and non-relapsing patients) showed that patients treated with curative intent for localized disease who were ctDNA-positive within 3 months of surgery had a very high risk (100%) of relapsing (HR 37.7). The ctDNA analysis of an independent validation cohort (retrospectively selected 18 CRC patients with liver metastases) showed that ctDNA within the first 3 months was associated with a high relapse risk (HR 4.9). These estimates should be interpreted with caution given the relatively low sample size. Indeed, patients are being recruited for a multi-site prospective observational cohort (N = 1800, NCT03637686) to validate the ctDNA as a biomarker for detection of subclinical residual disease and precisely estimate the risk of recurrence from colorectal cancer.

In a single-center retrospective study of patients with non-metastatic breast cancer, the abundance of rearrangements in ctDNA was used to identify patients who will eventually develop metastases [74]. Authors first carried out whole genome sequencing of primary tumors to identify patient-specific chromosomal rearrangements. Next, they analyzed plasma ctDNA from follow-up blood samples and quantified tumor-specific rearrangements using ddPCR. The assay could discriminate patients with eventual metastasis from those with long-term disease-free survival with 93% sensitivity and 100% specificity (AUC = 0.98, N = 20) [74]. This exploratory study shows the potential of ctDNA to detect occult metastases. SAGA Diagnostics is deploying two ultrasensitive ddPCR assays (IBSAFE and KROMA) based on these results.

A similar approach has been successfully implemented for personalized cancer profiling by deep sequencing (CAPP-Seq) in NSCLC patients (5 healthy controls and 13 patients undergoing treatment for newly diagnosed or recurrent NSCLC) [53]. The development of a personalized profile is a multistep process. The first step is the identification of recurrently mutated cancer-specific regions in The Cancer Genome Atlas (TCGA) [75]. The next step is to design a library of biotinylated DNA oligonucleotides (CAPP-Seq “selector” library) that can selectively enrich identified targets in the sample. CAPP-Seq library is then used to identify patient-specific aberrations in tumor tissue samples. In the last step, CAPP-Seq is applied to plasma cfDNA to enrich and quantify ctDNA [75]. The assay had a maximal sensitivity 85% and maximal specificity of 96% to discriminate between NSCLC patients and healthy controls. CAPP-Seq has been adopted by Roche in AVENIO ctDNA NGS liquid biopsy kit.

Signatera RUO (Natera) is a multiplex-PCR NGS ctDNA test for monitoring and minimal residual disease assessment in NSCLC patients. The Signatera RUO technology is being validated in an observational prospective cohort trial called TRACERx (Tracking Non-Small-Cell Lung Cancer Evolution Through Therapy (Rx)) study (NCT01888601) with the goal of defining the relationship between intratumor heterogeneity over 5 years and clinical outcome following surgery and adjuvant therapy. The analysis of ctDNA samples of the first 96 participants with NSCLC [46] shows evidence of adjuvant chemotherapy resistance. Furthermore, the assay was able to identify patients with recurrence with sensitivity of 93% and specificity of 90% in the sub-group of 24 patients from the TRACERx cohort (retrospectively selected 10 control cases and 14 confirmed relapses).

Similar efforts have been undertaken by other companies. For example, Guardant Health has just completed a trial comparing a liquid cfDNA assay Guardant360 to tissue biopsy for detecting predictive and prognostic genetic markers in NSCLC patients (NCT03615443). Similarly, Foundation Medicine has established a companion diagnostics test called FoundationOne Liquid based on profiling more than 70 genes and genomic biomarkers for microsatellite instability.

The aforementioned approaches are applied once the cancer has been diagnosed and treatment initiated. However, most cancers present symptomatically and are only diagnosed at difficult to treat stages. Arguably, pre-symptomatic population wide screening for early cancer detection is one of the most attractive applications of liquid biopsies and could in principle improve clinical outcomes [3]. However, this application comes with a unique set of challenges that will need to be addressed before clinical adoption. First and foremost, in a setting with asymptomatic individuals, liquid biopsy biomarkers are likely to be the only source of diagnostic information. Second, already present tumors will tend to be small in size and at early stages, which means that the concentration of cancer biomarkers might be close to the methods’ limit of detection. Third, mutational analysis will need to account for non-tumor somatic mutations. Finally, establishing true clinical sensitivity and specificity of the assay will be of utmost importance, which will require large cohorts of healthy individuals with longitudinal follow-up.

The genomics company GRAIL is setting out to tackle these challenges, by leveraging cfDNA sequencing for early detection in one of the largest genomics medicine studies to date. Currently, GRAIL is conducting two clinical studies. The first, called STRIVE (NCT03085888), is a prospective observational cohort study with enrollment of approximately 100,000 women undergoing mammography. Participants will be followed for five years to record clinical outcome data, with the ultimate goal of validating GRAIL’s blood-based assay for early breast cancer detection. The second study (NCT02889978), called the Circulating Cell-Free Genome Atlas (CCGA), is a prospective observational case-control study at the recruitment stage. CCGA is enrolling more than 10,000 subjects to characterize the landscape of cfDNA found in the blood of cancer patients and healthy individuals. During 2019, GRAIL also plans to deploy a clinical study (SUMMIT) to evaluate the blood assay and sequence cfDNA in 50,000 participants with no cancer diagnosis at the time of enrolment. This study will follow up patients for 3 years, and then track them through medical registries for additional five to evaluate clinical outcomes. Planned sample sizes will be sufficient for showing clinical validity of the screening approach. However, adoption of population-wide screening will require evidence for clinical utility, which might require additional large cohorts with long follow-ups.

Circulating Tumor Cells

Scientific Rationale

Circulating tumor cells (CTCs) were first discovered in 1869 [76]. CTCs have reemerged as a topic of interest in the last couple of decades because of their role in tumor biology and potential to serve as biomarkers for liquid biopsy. CTCs are cells that enter the circulatory system from the primary tumor site during the growth and metastasis [77, 78]. The biology of CTC release into the bloodstream involves at least two independent mechanisms. The first mechanism requires epithelial-mesenchymal transition (EMT), where tumor cells lose their epithelial phenotype and acquire mesenchymal traits [79, 80]. The second mechanism does not require EMT and could depend on external forces, such as surgery [79, 81]. How these two mechanisms of CTCs release contribute to metastatic spread of cancer is still a topic of intensive investigation.

The potential of CTCs for liquid biopsy is manifold. CTC enumeration has to this point been most intensively investigated as a biomarker. Typically, CTCs are low in abundance (fewer than 10 cells per milliliter of blood [79, 82, 83]), and the abundance of CTCs correlates with reduced progression-free and overall survival [84], and can therefore be used as liquid biopsy biomarker for determining the cancer burden. CTCs can retain some of the markers of the tissue of origin, allowing localization of the tumor of origin [85]. Furthermore, the advent of microfluidic and single-cell omics methodologies [86] allows the analysis of CTCs by technologies that were earlier limited to bulk tissue samples [87]. However, the effective use CTCs in clinical practice is mainly hampered by methodological and standardization challenges related to enrichment and selection of CTCs, as outlined below.

Current and Emerging Technologies

The main technological challenge for analysis of CTCs for basic research and clinical use is their low abundance in blood. Therefore, the first step in any detection and analysis procedure is enrichment and selection. Enrichment methods (reviewed in [79, 88]) take advantage of CTCs biological properties (expression of surface cell proteins [89, 90]) or physical properties (size/shape [91, 92], electric charge [93, 94]). Once enriched, CTCs can be separated from normal cells on the basis of cell-surface antigen expression.

To complicate things further, CTCs exist as either apoptotic or viable populations, each in turn harboring subpopulations with different phenotypical characteristics [79, 95]. Thus, different enrichment procedures bias the enriched CTC pool towards specific subpopulations. Depending on the intended use of CTCs for liquid biopsy, this methodological challenge will need to be addressed differently [79]. For example, the early detection of cancer requires extreme sensitivity to the lowly abundant CTCs. Prognostic use of CTCs requires comprehensive enrichment of all CTC subpopulations, such that all the relevant tumor heterogeneity can be captured. Using CTCs as predictive and monitoring biomarkers requires enrichment of all living CTCs as well as their characterization to determine the tumor’s sensitivity or resistance to therapeutic interventions.

Despite this challenge, there are many attempts to use CTCs for liquid biopsy. To date, most progress has been made using CTC enumeration as biomarker prognostic biomarker in patients treated for metastatic breast [96], colorectal (CRC) [97], or prostate cancer (PCa) [98].

In addition to CTC enumeration, single-cell omics methodologies are being leveraged to characterize tumors and extract clinically meaningful information. Whole genome amplification (WGA) can be used to facilitate the genomic analysis of CTCs [99,100,101]. Multi-gene panel sequencing of CTCs from patients with stage IV CRC uncovered genotypes that were subsequently identified at subclonal levels in primary tumors and metastases of the same patients [102]. Whole genome sequencing of CTCs has been used in patients with metastatic breast cancer [103]. The long-read technology enabled identification of driver mutations as well as the primary tissue of origin. A similar study based on the whole exome sequencing for metastatic prostate cancer [104] uncovered concordance between mutations found in CTCs genomes and tumor tissues (primary and metastatic), demonstrating that CTC genomics profiling could be a valuable diagnostic and prognostic tool [103, 104].

Genome-wide epigenomic analysis of DNA from liquid biopsy samples might not be possible at the moment, but there are already attempts to probe the methylation status of targeted genes [105] and promoters in CTCs [106]. Single cell RNA sequencing has already been used to investigate CTC transcriptomes in hepatocellular [87] cancer, melanoma [107], pancreatic [108], and breast cancer [109]. Established CTCs protein analysis includes identification of CTCs via antibody staining [96]. However, higher throughput and coverage will advance through recent improvements in microfluidics [110]. Microfluidics devices are already enabling the analysis of proteomics [111] and secretomics [112], as well as metabolic [113] biomarkers from CTCs.

Translational Status

Despite the promise of CTC omics assays for personalized oncology, more clinical trials are needed to establish clinical validity and utility. As we mentioned earlier, the field is still hampered by technical and reproducibility issues related to CTC enrichment.

To date, the only FDA-approved CTC assay on the market is CellSearch (Menarini Silicon Biosystems, Inc, acquired from Janssen Diagnostics LLC/Veridex LLC). The assay exploits the epithelial cell adhesion molecule (EpCAM), expressed by many carcinoma cells to isolate CTC cells [82, 114]. Because EpCAM expression is linked to poor survival, CellSearch enumeration of EpCAM positive CTCs can serve as a prognostic biomarker in combination with other clinical information. In a pivotal prospective, double-blind, multi-center clinical trial a total of 177 patients with metastatic breast cancer were recruited to investigate CTC counts as predictors of progression-free survival and overall survival [96]. The results show that patients with higher CTC counts (≥ 5 CTC/mL) had significantly shorter OS and PFS. Later three prospective multi-center cohort studies showed similar results for metastatic breast, colorectal, and prostate cancer [115].

The aforementioned trials demonstrate the clinical validity of the CellSearch-based CTC enumeration as predictive and prognostic biomarker. However, the utility of CTC liquid biopsy in specific clinical contexts is still unclear and a subject of ongoing clinical trials (reviewed in ref [116]). For example, HER2 status of CTC cells enriched by CellSearch in patients with metastatic breast cancer showed potential as a prognostic and predictive biomarker for clinical response to therapies targeting HER2 [117, 118]. One therapy for HER2-positive metastatic breast cancer is trastuzumab emtansine (trastuzumab-DM1 or T-DM1), a conjugate of a tumor-activated prodrug and humanized anti-HER2 monoclonal antibody [119]. An interventional multi-site trial (NCT01975142) was carried out to test if patients with metastatic HER2-negative breast cancer but with HER2-positive CTC cells respond to treatment with T-DM1. The study found that HER2-positive CTCs can be detected in a subpopulation of HER2-negative metastatic breast cancer, but there was an overall low response to anti-HER2 therapy [120].

CellMaxLife is currently conducting clinical trials on using an automated liquid biopsy platform CellMax CTC (CMx) [85, 121] for early detection of colorectal (NCT03476122), prostate (NCT03488706), and breast cancer (NCT03511859).

In addition to CTC enumeration, omics profiling is also being investigated in clinical trials. Oncotype DX AR-V7 is an assay that detects AT-V7 proteins in nuclei of CTCs. The assay has shown efficacy in identifying metastatic prostate cancer patients who will not respond to androgen receptor (AR)-targeted therapies. Clinical studies show that Oncotype DX AR-V7 can be used to guiding the choice of treatment between taxanes and androgen receptor signaling inhibitors [122,123,124]. In a cross-sectional single-site cohort study of histologically confirmed mCRPC undergoing a change in therapy, it was investigated if pretherapy nuclear AR-V7 in CTCs is a treatment-specific marker. The results showed that taxanes result in improved OS compared to ARS inhibitors in patients with AR-V7-positive CTCs (HR 0.24) [122]. These findings were later validated in an independent, multi-site, blinded, cross-sectional cohort (N = 225) [124].

Epigenome

Scientific Rationale

Epigenetic modifications, such as DNA and histone methylation, play a critical role in regulating core cellular processes, such as transcription, DNA repair, and replication. Because dysregulation of DNA-templated processes is a crucial step in neoplastic progression, proteins and complexes involved in epigenetic modifications are often found mutated across different cancer types [125]. Changes in epigenomic regulators during tumorigenesis in turn lead to changes in methylation patterns compared to normal cells [125, 126].

Exactly how epigenomics changes modulate cellular processes in normal and cancer cells is a field of intense research. This research has established a causal relationship between epigenetic changes and some of the hallmarks of cancer [125, 126]. DNA methylation is the most widely researched epigenetic modification. Repression of tumor suppressor genes can be readily achieved through epigenetic changes [127, 128]. Specifically, methylation patterns at CpG sites in promoter regions can regulate expression of downstream genes. Tumor suppressor specific hypermethylation is a process where cancer cells methylate CpG islands in promoters of tumor suppressor genes, thus downregulating their expression. Furthermore, genome-wide hypomethylation has also been observed in cancer [129, 130]. Hypomethylation of promoters that are methylated in normal cells can dysregulate gene expression, which can promote tumorigenesis when targets of dysregulation are proto-oncogenes. In combination with hypomethylation of repetitive sequences [131], this can elevate mutation rates, cause genomic instability, and promote tumor formation [126, 132]. Because these processes are crucial for cancer progress, key players are being investigated as therapeutic targets [125, 133,134,135] (reviewed recently in ref [136])

The epigenomic aberrations characteristic of cancer cells can readily distinguish healthy from cancer tissues with high accuracy [137]. Therefore, epigenomics changes such as DNA methylation patterns have been recognized as potential diagnostic, prognostic, and predictive biomarkers. Furthermore, there is evidence of concordance between epigenetic changes found in ctDNA and its tissue of origin [138,139,140], making epigenomic biomarkers promising candidates for liquid biopsy.

DNA methylation patters are highly tissue-specific [141], enabling detection of tissue of origin for ctDNA [142]. A recently published atlas of human cfDNA methylation patterns [143] could bolster efforts for early cancer detection and help pinpoint the tissue of origin.

Current and Emerging Technology and Biomarkers

Epigenomic biomarkers can be assessed in cfDNA, exosomes and CTCs. It is possible to approach epigenomic biomarkers by looking at specific changes at predetermined loci (for example at recurring cancer-specific epigenetic changes or at patient-specific methylation patterns), or to probe genome-wide epigenomic patterns characteristic of cancer.

5-Methylcytosine (5mC) is the most extensively studied DNA methylation in epigenomics [125]. Epigenomics techniques to assess the methylation patterns of DNA rely on bisulfite treatment, which converts unmethylated cytosine to uracil, but leaves 5mC unaffected. Bisulfite-treated DNA is then subject to different methods that attempt to distinguish uracil from cytosine in the template [126, 144,145,146].

Bisulfite PCR methods to assess methylation in liquid biopsy biomarkers include single [147] or multiplex [144, 148] quantitative real-time PCR, bisulfite dPCR (methyl-BEAMing) [146], and methylation-specific PCR [145]. While it is possible to make diagnostic decisions by assaying the methylation status of one marker or small gene panels [144], the accurate identification of individual epigenomic biomarkers is typically problematic because of low sensitivity due to background plasma DNA noise [149]. Furthermore, DNA methylation profiles are strongly influenced by gender, ethnicity, and individual differences and spatial heterogeneity within tumors [149, 150], necessitating the use of large biomarker panels to increase accuracy, as well as large and diverse cohorts to validate clinical utility of epigenomics biomarkers for liquid biopsy.

Aforementioned approaches rely of quantifying changes in methylation at one or more predetermined loci. An alternative is to assay methylation patterns in cfDNA in a highly parallel manner, for example, by using methylation microarrays [151]. Similarly, massively parallel bisulfite sequencing offers an unbiased way to assay genome-wide methylation patterns [129] in circulating DNA. A recent study shows that an unbiased DNA methylome can predict the outcome in patients with juvenile myelomonocytic leukemia in small discovery and validation cohorts [152]. A modification of the general method, involving methylated CpG tandems amplification and sequencing (MCTA-Seq) [153] can allow detection of methylated alleles at frequencies as low as 0.25%.

The major limitation of bisulfite-based approaches is that the initial step of bisulfite conversion degrades most of the input DNA [154]. To address this issue, a new bisulfite-free method for epigenomic profiling was developed [155]. This cell-free methylated DNA immunoprecipitation and high-throughput sequencing (cfMeDIP-seq) is an unbiased method that captures and enriches methylated cfDNA fragments to assess genome-wide methylation patterns from lowly abundant DNA samples, albeit at a lower resolution (100–300 bp) [156].

Another type of DNA methylation is 5-hydroxymethylcytosine (5-hmC). The regulation of 5-hmC is known to be affected in some human cancer types, making 5-hmC a potential biomarker for cancer [157,158,159]. A recently developed bisulfite-free method targeting 5-hmC takes advantage of the fact that the hydroxymethyl group in 5-hmC can be selectively chemically labeled [160]. This reduces the background noise and allows higher analytical sensitivity in samples with low input DNA [160].

Given that dramatic changes in epigenomes are universal to all cancers, it might be possible to go beyond cancer type-specific biomarkers and develop pan-cancer methylation biomarkers in the future. An innovative approach along these lines investigates how the methylation landscape (“Methylscape” [161]) affects physicochemical properties of DNA. Specifically, authors exploited changes in solvation and gold affinity of plasma cfDNA caused by cancer-specific methylation, and developed an innovative and rapid electrochemical and colorimetric assays for the presence of breast and colorectal cancer (case-control study with 45 healthy individuals and combined 100 breast and CRC cancer patients, AUC = 0.887 for discrimination of healthy vs cancer patients) [161].

In general, epigenomics biomarkers have a lot of potential for liquid biopsy. However, major technological limitations are low sensitivity, high costs, and the required expert knowledge. Furthermore, different protocol for isolating cfDNA show significant variation in the amount of recovered DNA, necessitating analytical validation and standardization of new methods before clinical implementation.

Translational Status

Despite being a field in its infancy, epigenomics is already yielding biomarkers and assays for liquid biopsy in clinical practice. Assays based on even a single epigenetic marker can serve as powerful diagnostic tools. The first FDA-approved blood test for CRC diagnosis, Epi proColon (Epigenomics, Inc), uses qualitative real-time PCR to detect methylated Septin 9 DNA in plasma. Epi proColon has shown similarly sensitivity to the fecal immunochemical test (FIT), albeit at a lower specificity in a multicenter observational case control study (NCT01580540) [147]. The study with the primary goal of establishing non-inferiority of Epi proColon compared to FIT was conducted on two cohorts: the first had subjects with diagnosed CRC (N  =  102) and the second had subjects scheduled to undergo colonoscopy (N  =  199). Sensitivity to discriminate between CRC and non-CRC was 73%, and specificity was 82%. Interestingly, a randomized controlled two-site trial with average-risk adults overdue for screening (NCT02251782) showed that Epi proColon boasts better patient adherence to screening than FIT (N = 413, 99.5% vs 88.1%) [162].

Another FDA-approved assay for liquid biopsy is Cologuard assay (Exact Sciences). Cologuard is multitargeted stool DNA test that can detect colorectal cancer. Aberrant methylation patterns in NDRG4 and BMP3 are detected through allele-specific real-time PCR and employed as diagnostic biomarkers in combination with hemoglobin and the allelic status of KRAS gene [148]. The sensitivity and specificity of the assay were determined in a large observational case-only prospective study involving asymptomatic persons at average risk for colorectal cancer scheduled to undergo screening colonoscopy (N = 9989). The sensitivity was 92.3% (versus 73.8% for the FIT assay) for discriminating between CRC and non-CRC patients, including those with negative findings on colonoscopy, precancerous lesions, and non-advanced adenomas. Cologuard had a lower specificity than FIT (89.8% for Cologuard vs 96.4% for FIT) for patients with negative results on colonoscopy [148].

Nucleix has recently published a study detailing the clinical performance of their Bladder EpiCheck. The test assays methylation patterns in a urine panel of 15 DNA methylation biomarkers [163] for follow-up of patients with non-muscle-invasive bladder cancer (blinded, single-arm, prospective multicenter study, N = 353, AUC = 0.82 for cancers including and AUC = 0.94 for excluding LG Ta tumors). With a high NPV (99.3%), EpiCheck could be used in follow-up to reduce the burden of repeated cystoscopies, which is the standard of care to diagnose bladder cancer progression.

Transcriptome

Scientific Rationale

The pool of all expressed RNA species is collectively called the transcriptome. The transcriptome plays a pivotal role in cellular processes. The human genome encodes approximately 20,000 genes that are transcribed into mRNA, rRNA, and tRNA. The importance of these RNA species for regulation and protein synthesis has been recognized since the inception of molecular biology. The non-coding RNAs (ncRNA) have been discovered in the last couple of decades, expanding the concept of transcriptome to include microRNA (miRNA) [164], piRNA [165], tiRNA [166], snoRNA [167], Y RNA [168], PASR [169], TSS-RNA [170], snRNA [171], and lncRNA [169, 172, 173].

While DNA content is mostly identical in different cells of an organism, transcriptional profiles can vary dramatically across cell types, space, and time. Therefore, changes in the transcriptome offer an opportunity to associate cellular phenotypes to underlying molecular mechanisms and potential genotypic changes. The experimental methods, algorithms, and the underlying domain knowledge have been developing more than four decades [174]. With the advent of RNA-Seq, transcriptomics has arguably become the most mature omics approach in the functional genomics toolset.

The role of transcriptional changes in healthy and diseased states, carcinogenesis in particular, have been the focus of intense investigation. Notably, aberrant mRNA expression levels are associated with dysregulation in cancer. Comprehensive profiling of gene expression patterns across many tissues and cancers have yielded molecular classifications of cancer (sub)types [75, 175]. Furthermore, unbiased sequencing of transcripts has enabled detection of cancer- and patient-specific somatic mutations [176, 177] and fusions/rearrangements, spearheading the discovery of novel mRNA biomarkers [178].

Methodological advances in transcriptomics have also helped uncover the role of non-coding RNAs in health and disease. Small ncRNAs have been implicated in the regulation of transcription, post-transcriptional processing pathways, gene silencing, epigenetic processes, translation, and protein activity [179, 180]. The most widely studied class of ncRNAs in cancer are miRNAs, known to regulate tumor suppressors and oncogenes [181]. Aberrant miRNA expression has been implicated in the occurrence and progression of tumors [182]. More importantly, miRNA expression patterns are cancer-specific and can accurately identify tissue of origin in metastatic cancer [183]. Because miRNAs are dysregulated in all stages of cancer, they can be used as biomarkers for early detection, prognosis, or treatment selection. Similarly, lncRNA have a wide range of regulatory functions in cancer and normal cells [174, 184]. The functions of other ncRNAs in carcinogenesis are still poorly understood.

The existence of the circulating transcriptome as a liquid biopsy biomarker was first recognized when cell free RNA (cfRNA) from Epstein-Barr virus was discovered in the blood of nasopharyngeal carcinoma patients [185] and circulating miRNAs were found in the serum of B-cell lymphoma patients [186]. cfRNAs can be released into body fluids passively through processes such as cell death, or actively secreted in exosomes or in complexes with proteins [187].

miRNAs are remarkably stable and abundant in fluids like blood, making them most widely investigated transcriptomics biomarkers for liquid biopsy [187, 188]. The major impediment for discovery and clinical use of other cfRNA biomarkers, mRNA specifically, is their poor stability in body fluids, mostly due to degradation by RNase [187]. Recent studies uncovered that longer circulating RNA species can be found in exosomes or complexes with (lipo)proteins, both of which increase the stability and the resistance to RNase [184, 188].

Current and Emerging Technology and Biomarkers

Quantification methods include qRT-PCR, dPCR, microarrays, and RNA-Seq (for both miRNA and lncRNA). PCR-based methods tend to be quick, sensitive, and easy to interpret, but they lack throughput and can only analyze a small panel of predetermined RNAs [189]. Microarrays have the advantage of analyzing many biomarkers in parallel [190, 191]. However, they are characterized by lower analytical sensitivity and specificity [192]. RNA-Seq allows detection of high-throughput analysis, the capacity to identify novel fusions, but at a cost of higher complexity of analysis and the larger amount of sample input.

There are a number of studies exploring the potential of circulating RNAs as diagnostic, predictive, and prognostic biomarkers. Plasma [193, 194] and serum [195] miRNAs have shown potential as biomarkers for early detection of NSCLC. Furthermore, patterns of miRNA and their precursors (pri-miRNA) can be used to detect NSCLC and enable discrimination between squamous cell carcinoma and adenocarcinoma NSCLC subtypes [196]. Similarly, circulating miRNA have been proposed as markers for guiding therapy by identifying triple negative breast cancer [197]. Finally, miRNA species in blood show potential as metastatic biomarkers in osteosarcoma [198], bladder cancer [199], and ovarian cancer [200]. However, one of the major drawbacks of miRNA as cancer biomarkers is that most species exist in both healthy and cancer patients, and their expression differences can be rather small [187].

The number of studies investigating potential miRNAs biomarkers is high, but there are recent examples that show the potential of using other RNA species. For example, different mRNAs from liquid biopsies can be potential biomarkers for predicting sensitivity to chemotherapy in gastric cancer [201,202,203]. Currently, one of the most promising mRNA biomarkers is telomerase reverse transcriptase (TERT), showing potential as a diagnostic, prognostic, and predictive biomarker in gastric [204], prostate [205], hepatocellular cancer [206], CRC [207], and breast cancer [208].

A recent study [188] looked at combined patterns of altered expression of different RNA classes, using a combination of NGS and validation by qPCR and ddPCR. The study uncovered a number of novel microRNA, mRNA, and YRNA in plasma of melanoma patients compared to healthy control. However, these novel biomarkers will need to be validated in larger studies [188].

The choice of RNA quantification technology needs to be accompanied by other analytical considerations. The first step in any transcriptomics protocol is the isolation and purification or RNA. Extraction from body fluids often results in samples with low RNA quality and quantity. Furthermore, the use of different isolation protocols can affect purity and quality of isolated cfRNA [184]. To complicate matters further, cfRNA concentrations and fractions can vary drastically between different body fluids [184], tissues, organs, and individuals [184]. In addition to these technical challenges, data analysis procedures protocols are not standardized [209, 210].

Translational Status

The published results on transcriptomics-based liquid biopsy highlight an abundance of potential (mostly miRNA) biomarkers. However, many proposed circulating RNA biomarkers were validated only in underpowered retrospective single-center studies with small cohorts [187], often leading to contradictory results [211]. Nevertheless, RNA-based assays are being validated in clinical trials, and there are already tests ready for commercial and clinical use.

A large validation study was conducted to validate miR-Test, a screening blood assay based on signatures of 13 serum miRNAs quantified by microarrays [212]. Specifically, high-risk individuals enrolled in a single-center non-randomized lung cancer screening trial (COSMOS study) [213] were selected, split into four cohorts, and screened with the miR-Test [212]. The first, calibration cohort (N = 24) was screened using an extended panel of 34 miRNAs. The initial miRNA set was then reduced to the most informative 13 miRNAs, whose discriminatory power was validated in a second cohort (N = 1008 from the COSMOS trial, where 36 patients had low-dose computed tomography (LDCT)-detected lung cancer). miR-Test showed the sensitivity of 77.8% to detect all tumors and the specificity of 74.8%. In the third cohort, the miR-Test was assayed for the ability to discriminate non-malignant from malignant diseases (N = 83 patients who never developed cancer during a 5-year follow-up), showing specificity of 86.7%. The fourth cohort was a clinical validation set (N = 74 patients diagnosed with stage I-III lung cancer outside of the COSMOS study) where sensitivity to detect cancer was 70.3%, with similar performance across different stages [212]. A similar result was obtained in a randomized multi-center prospective trial (N = 939) carried out to test the clinical utility of qRT-PCR-based miRNA signatures in plasma [214]. The test showed good diagnostic performance with NPV = 99%, as well as the ability to identify malignancy and the aggressiveness by predicting death as a result of the disease with sensitivity of 95% and an NPV of 100% [214]. Cumulatively, these two examples show that miRNA-based liquid biopsy assays could be used as biomarkers for early detection of lung cancers with similar sensitivity and specificity to low-dose computed tomography (LDCT), and could potentially be used to reduce false positive rates associated with LDCT screening alone [212, 214].

Another example is Progensa PCA3 [215] (Gen-Probe), a prostate-specific lncRNA overexpressed in primary and metastatic prostate adenocarcinoma [216]. A urine-based test specific for PCA3 [217] has shown good analytical [218] and clinical performance in a number of clinical studies, a greater specificity but lower sensitivity than the classical PSA biomarker [216]. In a multicenter prospective cohort of men scheduled for repeat prostate biopsy, the assay showed 77.5% sensitivity and 57.1% specificity for repeat biopsy outcome (N = 441) [215]. The PCA3 test has been approved by FDA for clinical decisions about repeat biopsy of prostate cancer [217].

SelectMDx (MDxHealth, Inc) is a liquid biopsy test based on detecting HOXC6 and DLX1 mRNA levels in urine for prediction of clinically significant prostate cancer prior to prostate biopsy. The performance of the test has been investigated in a trial with two multicenter cohorts, a discovery cohort (N = 519), and validation cohort (N = 386). The results show that the test performs with AUC = 0.90 for a classifier that combines mRNA levels with PSA levels and clinical risk factors, excluding DRE results [219]. The clinical use of SelectMDx has shown potential to increase health outcomes and reduce overall treatment costs through elimination of unnecessary biopsies in two simulations of cost-effectiveness [220, 221].

Cxbladder is a urine-based multiplex RNA test for detecting, monitoring, and stratification in bladder cancer [222, 223]. The test derives its score from increased expression levels of five genes, MDK, HOXA13, CDC2, IGFBP5, and CXCR2. In a multi-center study with prospectively recruited patients with hematuria, Cxbladder detected 82% cases (at specificity of 85%) of urothelial carcinoma from urine samples taken prior to cystoscopy, outperforming other tests and cytology (N = 485) [224]. Furthermore, it distinguished between low-grade Ta tumors and other detected urothelial carcinoma with a sensitivity of 91% and a specificity of 90% in the same cohort [224].

Proteome

Scientific Rationale

The proteome is the entire set of proteins expressed in a cell, tissue, or an organism. Unlike the genome, the composition of the proteome can change with time and in response to varying intracellular and extracellular conditions. Furthermore, because expression of a single gene can involve alternative splicing, recoding events, and a wide range of post-translational modifications, the number of expressed proteins and proteoforms vastly outnumbers the number of genes. The identification and quantification of expressed proteoforms can illuminate molecular pathways, interactions, and events underlying cellular phenotypes in healthy and disease contexts.

Oncoproteomics investigates proteins involved in the carcinogenic processes with the aim of understanding the underlying molecular mechanisms, deriving novel targets for therapy, and identifying proteins that can serve as diagnostic, prognostic, and predictive biomarkers. Historically, the proteome has been the major source of circulating biomarkers for clinical use. Due to analytical and clinical limitations of these early markers, the field of oncoproteomics has been focused on discovering new protein molecules and signatures that can inform clinical decisions.

The advancement in proteomics methods and instrumentation, paired with other omics methodologies, raised hopes for the development of protocols to aid the rapid discovery of novel biomarkers and to implement personalized approaches in clinical practice. The promise of proteomics to advance basic biology and cancer research has culminated in projects to elucidate the maps of the human (tissue) proteome [225,226,227,228] and the cancer proteome [229]. However, despite the advances in methods and the accumulation of domain knowledge, much of the current oncoproteomics research for biomarkers remains in the discovery phase. The introduction of novel protein biomarkers and proteomic technologies in the clinic is still hampered by technical challenges, poor analytical performance, reproducibility issues, a lack of standards, and the lack of validation in large and rigorously designed clinical studies. The careful analysis of past success and failures has been critical for shaping the guidelines and regulations for advancing the field of biomarker discovery and for bringing new biomarkers into standard clinical practice [20, 21, 230,231,232,233].

Current and Emerging Technology and Biomarkers

Of all the sources of biomolecular information that can be used to diagnose and characterize cancers through liquid biopsy, protein biomarkers have the longest standing tradition in clinical practice. These circulating protein biomarkers include CEA [234], PSA [235], β-hGC [236], AFP [237, 238], FDP [239, 240], HE4 [241], ALT/AST [242], LDH [243, 244], CA 125 [245, 246], CA 15-3 [247], CA 19-9 [248], CA 27.29 [247] (reviewed in [230, 249,250,251,252]). Standard clinical protein biomarker assays typically target a single or a small number of prespecified tumor-associated antigens using immune-based methods [253]. These methods have a high analytical sensitivity and assays can easily be automated using liquid handling robots [254]. The limitation is that antibodies can detect multiple proteoforms [255] and non-specifically interact with interfering compounds generating false positives [256, 257]. Limited dynamic range is another technical issue affecting immunoassays [254].

High-throughput proteomics techniques have emerged as a viable alternative to address some of these issues. Importantly, newer proteomics methods can go beyond detecting predetermined panels of biomarkers, complementing genomic and transcriptomic approaches in probing the molecular signature of the underlying cancer. Furthermore, proteomics can provide additional information about concentrations of expressed proteins and their post-translational modifications, offering unique approaches to analyze and stratify cancer types.

The most widely used proteomics approach is mass-spectrometry (MS) [254]. Mass-spectrometry is an umbrella term that describes a wide array of methods that ionize analytes and then detect and analyze ions in the gas phase. Different ionization methods (electrospray ionization (ESI) [258] or matrix-assisted laser desorption/ionization (MALDI) [259], surface-enhanced laser desorption/ionization (SELDI) [259, 260], etc.) can be coupled with different analyzers (quadrupoles, time-of-flight, orbitrap analyzers, etc.) to achieve different analytical characteristics [254]. Furthermore, MS instruments can be combined in tandem (MS/MS) to afford structural information on analyzed ions. This setup can allow implementation of selected and multiple reaction monitoring (SRM [261] and MRM [262], respectively) in order to detect and quantify pre-specified ions. This mode of analysis is highly specific and can be used to multiplex quantification of approximately 200 proteins [263, 264]. In general, MS-based proteomic methods can yield large targeted or unbiased datasets with high precision and resolution [265]. In clinical laboratory, they offer lower routine costs, higher specificity and throughput, as well as the possibility to multiplex assays. Limitations of MS-based proteomics include reproducibility issues, analytical sensitivity to proteoforms with low abundance, method validation and standardization, instrument cost, and the need for expert interpretation.

An emerging platform for proteomics biomarker discovery is the protein array technology [266]. Protein (micro)arrays or chips are solid surfaces that immobilize up to thousands of purified or synthesized proteins at high densities [266,267,268,269]. Protein arrays are used to quantify large predetermined panels of proteins by capturing a wide range of protein binding activities [270]. The technology is characterized by high-throughput, high sensitivity, and robustness. The discovery of protein biomarkers for liquid biopsy is an important application of protein arrays. Arrays with immobilized antibodies are a low-cost method to profile the expression of many proteins in parallel [269], but they rely on using pre-existing antibodies for targets of interest. Functional protein arrays [271] contain complements of purified protein and can be used for unbiased assays of the entire circulating proteome. While it is difficult and costly to fabricate functional protein arrays, there are commercial variants available on the market [266]. An interesting variation of the technology are reverse-phase lysate microarrays, which allow lysates (e.g., from liquid biopsy) to be printed on a micro-array and quantified by immunochemical methods in a massively parallel fashion [272, 273]. These arrays are typically cheaper than functional ones but are limited by the availability of antibodies.

Another high-throughput proteomics technology with potential use in biomarker discovery is an aptamer-based platform called SOMAscan (Somalogic Inc.) [274]. SOMAscan is a multiplex quantitative affinity-based assay where immobilized aptamers bind to proteins with high specificity and affinity [274].

Proteomics technologies for liquid biopsy are being improved at a rapid rate, and some of the technical challenges are being addressed. Further improvements in this field will benefit from organized and institutionalized efforts to characterize cancers by integrating proteogenomic technologies and to establish standards for translation of the research in cancer biology. An example of such effort is the National Cancer Institute’s Clinical Proteomic Tumor Analysis Consortium (CPTAC), which is advancing integrated proteogenomic analysis for illumination of molecular bases of colorectal, breast and ovarian cancer [275,276,277]. The CPTAC initiative led to the creation of two additional programs, The Applied Proteogenomics OrganizationaL Learning and Outcomes (APOLLO) [278] network and the International Cancer Proteogenome Consortium (ICPC) [279]. These two networks are advancing the translation of proteogenomic research into routine clinical practice and enhancement of precision oncology through international data sharing.

Translational status

Assays for classic circulating protein biomarkers have been in clinical use for a couple of decades. Systematic reviews call in question the indiscriminate use of some of these biomarkers for population-wide screening [280,281,282,283], with the major limitation being a lack of specificity to a single cancer type, and high false positive rates associated with non-cancerous conditions [284]. These issues can translate into unnecessary treatments that harm patients and increase healthcare costs.

Despite the general issues with classical protein biomarkers, they show utility in appropriately defined clinical settings, especially when used in combination with other clinical variables. CEA has limited specificity and sensitivity for routine follow-up of early-stage CRC patients [285], but in combination with clinical, radiological, and histological findings, it can predict recurrence of CRC after resection [286, 287]. Likewise, PSA-based screening is associated with false positive rates and overdiagnosis [288, 289], but it was shown to reduce mortality from PCa when screening high-risk patients between 55 and 69 years of age in conjunction with digital rectal examination [288], it can detect PCa recurrence after resection, as well as predict response to treatment [290]. Similarly, while LDH is unsuitable as a biomarker for diagnosis of early-stage melanoma, it is a validated prognostic biomarker in metastatic melanoma (and currently the only serologic marker for melanoma) [244, 291].

New proteomics technologies have been used extensively in biomarker discovery in the last two decades. However, these methods are not easily translated into routine clinical use due to technical complexity, low throughput, and low reproducibility [292]. There are efforts to address these issues in a systematic fashion. A multi-site study has been implemented to standardize protocols and limit variability and irreproducibility in multiple reaction monitoring for plasma proteomics [293]. Attempts to translate proteomics-based biomarkers into clinical trials can be improved by following fit-for-purpose recommendations for designing analytical validation experiments and reporting findings [294].

Poorly designed and underpowered trials have hampered the translation of proteomic biomarkers into the clinic [20]. Some of the known examples of biomarker failures include the use of exoprotease-generated peptidome patterns in serum for detection of various cancer types [295]. In the original study, MALDI-TOF MS analysis of peptidomic patterns yielded a panel of biomarkers for accurate discrimination of three cancer types form healthy controls [295]. However, subsequent reanalysis of the study design revealed a number of flaws in the study design, namely improper case-control matching (age and sex brackets different in case and control patients) and inappropriately selected patient population (cohort including mostly patients with late-stage cancer, where intended purpose was early cancer detection) [20]. Later validation study found no evidence that serum peptidomic patterns can be used as cancer biomarkers [296]. Another example of a flawed design is a study that suggested a four biomarker panel yielding 95% sensitivity and 95% specificity, thus outperforming the standard CA125 biomarker for detection of ovarian cancer [297]. A subsequent study found that expanding the biomarker panel to six analytes could further improve specificity (99%) to OvCa [298]. However, an independent, blinded attempt to validate these findings showed that none of the biomarkers outperformed CA125 in sensitivity and specificity to OvCa [20, 299]. Reanalysis of the original study design revealed that authors used partially overlapping training and validation cohorts, causing overfitting [300]. Furthermore, cases and control patients came from different sources, where sample collection procedure was not standardized, introducing potential biases into the analysis [300]. Similar study design flaws prevented OvaCheck, an MS-based serum test to identify ovarian cancer patients [301], to be marketed for public use. Namely, differential handling of samples from different sources created batch effects that confounded the signal [302,303,304].

To address aforementioned study design issues, recommendations have been put in place for the phased design in biomarker discovery [305] aimed at better evaluation of classification accuracy in the intended clinical context. A novel candidate protein biomarker for pancreatic ductal adenocarcinoma (PDAC) has been put forth using the recommended study design [306, 307]. Authors first carried out an experiment where they reprogrammed human PDAC cells into an induced pluripotent stem cell-like line (10-22 cell line) [308]. A proteomic analysis of media from 10-22 cell-derived precursor pancreatic intraepithelial neoplasia cultured as organoids revealed 107 candidate proteins. Authors cross-referenced this candidate set against human plasma proteins [308] and focused on proteins with low abundance in healthy human plasma [306, 309]. They further reduced the set to three candidates, matrix metalloproteinase 2 (MMP2), MMP10, and thrombospondin-2 (THBS2), and assayed these in human plasma samples from a biospecimen repository (N = 10 cancer cases and N = 10 controls). MMP2 and MMP10 were uninformative, but THBS2 could discriminate pancreatic cancer from healthy controls (AUC = 0.76), and resectable and locally advanced PDAC against healthy controls (AUC = 0.886) in this small discovery cohort. A combined panel with THBS2 and CA19-9 performed well across all stages on PDAC in a larger validation cohort (AUC = 0.970, N = 161, 81 cancer and 80 healthy controls). The authors then validated the two-marker panel in an independent cohort (N = 337) and found that it can discriminate between cancer and healthy subjects with AUC = 0.97 [306].

In 2009, FDA has approved OVA1, the first in vitro Diagnostic Multivariate Index Assay (IVDMIA) for Ovarian Cancer [231]. OVA1 combines results of assays for CA-125 II, prealbumin, apolipoprotein A-1, β2-microglobulin, and transferrin into a single score that can distinguish malignant ovarian tumors from non-malignant forms that do not warrant referral to surgery. The assay was initially developed as a SELDI-TOF assay, but due to reproducibility issues, it was finally implemented as an immunoassay [310]. The OVA1 test was first validated in combination with physician assessment to predict malignancy in a prospective multicenter cohort of women scheduled for surgery of an ovarian tumor (N = 516, sensitivity 96%, specificity 35%) [311]. In a second study, the assay was validated in the context of its intended use, for risk stratification of ovarian malignancy after enrollment by non-gynecologic oncology providers. This prospective multicenter trial assayed pre-operative serum of 494 women and correlated the results with surgical pathology results. In combination with clinical variables, the assay could distinguish malignant from benign adnexal masses with the sensitivity of 95.7% and specificity 50.7% [312]. A recent improvement of the assay, OVA2, showed increased specificity (69%) to OvCa in a multi-site prospective cohort (N = 493) [313].

Another successful example of a commercialized liquid biopsy test is Veristrat (Biodesix). Veristrat classifies NSCLC patients as “Good” or “Poor” based on a multivariate MALDI-TOF proteomics blood test via detection of inflammatory states associated with aggressive lung cancer [314, 315]. Veristrat was used for patient stratification in a randomized phase III multi-center clinical trial with the goal of measuring survival and response to EGFR-TK inhibitor (erlotinib) or chemotherapy [316]. In this cohort of 285 confirmed, second-line, stage IIIB or IV NSCLC patients [316], those with a Veristrat classification of “Poor” had shorter overall survival on erlotinib than on chemotherapy (3 vs 6.4 months, HR = 1.72). Biodesix has now partnered with MRM Proteomics Inc. in an effort to implement the iMALDI platform for proteomic biomarkers to further enhance diagnosis and prognosis for lung cancer.

In 2018, Integrated Diagnostics announced the results of its multi-site prospective PANOPTIC (Pulmonary Nodule Plasma Proteomic Classifier) Trial [317] (NCT01752114). PANOPTIC was designed to validate Xpresys Lung 2, a liquid biopsy assay that integrates clinical data and MRM quantification of two plasma proteins, LG3BP and C163A, to distinguish benign from malignant lung nodules. The assay showed high sensitivity (97%), specificity of 44%, NPV of 98% and could in principle reduce procedures carried out on benign nodules by 40% (N = 392 patients with pulmonary nodules) [317].

Array-based proteomics technologies are also showing their potential for translation. IMMray™ PanCan-d (Immunovia) is a blood test based on machine learning to derive a diagnostic classifier from antibody microarray data. The test could classify samples from pancreatic cancer patients or healthy controls accurately with AUC > 0.95 in six retrospective cohorts to date [318,319,320,321,322]. IMMray™ PanCan-d is currently undergoing a clinical trial to investigate its diagnostic accuracy for detection of pancreatic cancer in high-risk groups (NCT03693378). The antibody microarray technology has potential to be used in diagnostics of other types of cancer and has already been tested to longitudinally monitor sera of patients with breast cancer [323] and to classify patients based on risk for developing prostate cancer [324].

The potential of SOMAscan assay to bridge biomarker discovery and validation phases was tested in three multi-site prospectively designed case/control studies. These studies used archived samples to discover and validate protein biomarkers for NSCLC detection in high-risk patient populations [325]. The first cohort (N = 363) was used to discover a robust panel of protein NSCLC biomarkers from 1033 SOMAscan analytes. The analysis resulted in a 7-marker panel with an AUC of 0.85 for all cases of NSCLC vs benign nodule controls. The histopathological sensitivity of the 7-protein panel was validated in a second cohort and showed similar discrimination between cancer and healthy subjects (AUC = 0.81, N = 138), and AUC = 0.89 for squamous cell carcinoma. Authors performed an additional validation study on an EDRN multicenter reference set for validating biomarkers for detection of lung cancer (N = 135). In this cohort, they found that the biomarker panel could detect NSCLC vs healthy samples with AUC = 0.77 and squamous cell carcinoma with AUC = 0.87[325].

Metabolome

Scientific Rationale

Cancer metabolism dramatically differs from that of a normal tissue. This phenomenon, dubbed metabolic reprogramming, is recognized as one of the hallmarks of cancer [326]. The first known example of metabolic reprogramming in cancer cells, the Warburg effect, was discovered 90 years ago [327]. The Warburg effect describes a metabolic phenotype where cancer cells display higher glycolytic flux and produce lactate at a higher rate than normal cells despite oxygen availability.

The acquisition and maintenance of neoplastic processes such as abnormal cellular proliferation and metastasis generally increases the demand for energy and biosynthetic building blocks, and changes the redox balance [328, 329]. Consequently, the cellular metabolism changes to accommodate those requirements and increase the fitness of cancer cells. The exact mechanisms of metabolic reprogramming and how they contribute to malignant phenotypes are active topics of investigation.

Oncometabolites are small molecules whose abundance is drastically increased as a consequence of cancer-associated metabolic reprogramming or because of somatic mutations in specific enzymes [328]. The most commonly known oncometabolite is D-2-hydroxyglutarate (D2HG), a reduced form α-ketoglutarate. The abundance of D2HG is low in normal tissues but increases in tumors harboring mutations in isocitrate dehydrogenase 1 or 2 (IDH1 or IDH2) [328, 330, 331].

The discovery of oncometabolites prompted the search for metabolomic biomarkers. However, looking for a single oncometabolite to serve as an accurate diagnostic, prognostic, or predictive biomarker for complex diseases like cancer might prove to be futile. A more promising perspective might be to rely on metabolite panels or signatures. This better reflects the reality of dysregulated pathways, as well as improves statistical robustness of the biomarker-informed decision-making.

Discovery in cancer metabolism can proceed with two different pathways, either by quantifying metabolites (metabolomics) or by measuring activities of metabolic pathways (e.g., by metabolic flux analysis) [332, 333]. These two approaches are not interchangeable and provide complementary information. A combination of both can yield important insights into metabolic phenotypes in cancer and uncover oncometabolite biomarkers for precision medicine.

Current and Emerging Technology and Biomarkers

Metabolomic experiments typically use analytical platforms such as nuclear magnetic resonance (NMR) [334], liquid-chromatography mass spectrometry (LC-MS) [335, 336], gas chromatography mass spectrometry (GC-MS) [335], and capillary electrophoresis mass spectrometry (CE-MS) [335, 337] to analyze and quantify metabolites in a sample [338]. These experiments could be targeted to a specific metabolite, a class of metabolites, or attempt to comprehensively assess all metabolites in an unbiased manner. Targeted approaches focus on up to a hundred metabolites, while untargeted analyses can cover hundreds or even thousands in a single experiment.

Different platforms are suitable for different applications or the analysis of different metabolites. GC-MS is typically used to assay metabolites smaller than 1,000 Da that are volatile or can be made volatiles via chemical derivatization [338]. In a single acquisition, GC-MS can resolve a couple of hundred metabolites with different properties (such as sugars and their derivates, amino and organic acids, amines, sterols and fatty acids) [339]. The advantage of GC-MS is reproducibility across different platforms, the existence of comprehensive spectral libraries, robustness, and relatively low costs. LC-MS is mostly used for larger metabolites that are non-volatile [339]. LC-MS can generally resolve a larger number of molecular species than GC-MS. CE-MS is ideal for the analysis of polar and ionic compounds, especially from low volumes. However, the sensitivity is generally lower, and variability is higher than that of GC-MS and LC-MS [338]. NMR has an advantage over MS methods by being a non-destructive analytical technique. Furthermore, in addition to quantification, NMR can be used to unambiguously resolve structures of unknown metabolites. Results of a 1H-NMR analyses are highly reproducible, but they typically have lower sensitivity than MS-based approaches [338].

Targeted approaches are empowered by known changes in cancer metabolism. In some cases, it might be possible to associate cancer-specific genetic alterations to changes in metabolite concentration and devise a scheme to discriminate between different types of cancer. For example, a mutation in isocitrate dehydrogenase (IDH) is a driver event in malignant gliomas [330]. The accumulation of 2-hydroxyglutarate (2HG) is observed in glioma cells with IDH1 mutation [340]. Plasma and urine levels of 2-hydroxyglutarate can be used to predict response to treatment [341].

Targeted metabolomics can also focus on oncometabolite concentration changes caused by dysregulation in metabolic pathways. Known examples are changes to concentrations of prostaglandins resulting from reprogramming of the eicosanoid pathway, which was shown to promote tumor growth [342]. These changes can in turn create distinct metabolic signatures in urine, which can be detected by targeted methods such as SRM. This approach showed the potential of using urine prostaglandins as biomarkers for detecting patients with high risk of developing pancreatic cancer [343]. Similar studies highlighted prostaglandins as biomarkers for identifying patients at high risk of breast cancer [344] and for prognosis of lung metastases in breast cancer patients [345]. Another example looks at polyamines as circulating biomarkers. Dysregulation of intracellular polyamine metabolism is a hallmark of most cancers [346] and a potential target for therapy. Furthermore, urine and plasma polyamine content in cancer patients can mirror intracellular levels of these metabolites, presenting an opportunity to use them as liquid biopsy biomarkers. The biomarker potential of polyamines was recently investigated using targeted analysis with LC MS/MS in colorectal [347, 348], ovarian [349], and prostate cancer [350].

The second approach uses untargeted/unbiased analysis of circulating metabolites to find differences between cancer patients and healthy individuals. Unsurprisingly, the findings of these untargeted approaches can sometimes recapitulate known cancer-related metabolic changes. For example, dysregulation of the citric acid cycle can alter plasma and urine concentration of pathway intermediates, such as succinate and fumarate [328]. The potential of these two biomarkers for diagnosing and staging renal cell carcinoma was demonstrated using untargeted 1H-NMR and GC-MS [351]. Similarly, extracellular protein breakdown is a hallmark of pancreatic cancer [352] and leads to release of branched-chain amino acids (BCAA). A recent study relied on LC-MS metabolic profiling of individuals from four prospective cohorts uncovered that three BCAAs (isoleucine, leucine, and valine) whose elevated levels in plasma can indicate twofold increased risk of PDAC future diagnosis [353].

An innovative unbiased metabolomic approach bypassing urine and plasma sampling relies on analyzing volatile organic compounds (VOC) from breath and breath condensate. The analytical platform can be MS or a proprietary Field Asymmetric Ion Mobility Spectrometry (FAIMS) technology (Owlstone Inc). This approach is being investigated to uncover novel metabolic biomarkers of lung cancer [354, 355]. A systematic review of VOC studies for early detection of lung cancer found that 2-butanone and 1-propanol are commonly reported as best discriminators between healthy and lung cancer subjects [356].

Different metabolites can differ drastically in their size and physicochemical properties. Because no single method can be used to separate, detect, and quantify a wide range of molecular species, multiple sample preparation and analytical methods can be used to capture the metabolome. Using multiplatform metabolomic approaches is a powerful strategy to identify biomarkers signatures. For example, combined metabolomics and lipidomics approaches using 1H-NMR, GC-MS, and LC-MS have uncovered potential biomarkers originating from alterations in lipid metabolism that could help identifying breast cancer [357]. Similarly, a panel of Fourier transform ion cyclotron resonance mass spectrometry (FTICR-MS), liquid chromatography (LC) MS/MS, and NMR was used for discovery of a set of biomarkers from altered lipid metabolism with the potential for early detection of colorectal cancer [358].

One of the examples of discovering cancer-specific metabolomic signatures by first computationally identifying dysregulated pathways, and then confirming these predictions using metabolomics comes from our group. We explored the integration of multiomics data with genome-scale metabolic models and showed that genetic alterations in clear cell renal cell carcinoma (ccRCC) were associated to ccRCC-specific metabolic reprogramming [359, 360]. The detailed computational analysis of ccRCC metabolism showed altered regulation of glycosaminoglycan (GAG) biosynthesis [360]. CE-based plasma and urine analysis of samples from ccRCC patients showed that a GAG panel (19 metabolic species) can be used to derive a GAG signature with diagnostic and prognostic potential [360,361,362].

Studies cited above demonstrate that advancements in instrumentation and computational methods are adequate to allow metabolomics biomarker discovery. However, the field is still hindered by different analytical challenges. Most importantly, experimental procedures and materials, as well as data handling and statistical analyses will need to be standardized to ensure analytical validation of biomarkers.

Translational Status

Arguably, the translational efforts in oncometabolomics are still in its infancy. There are currently no FDA-approved metabolomic liquid biopsy tests on the market. There are many challenges that will need to be addressed before metabolic biomarkers in the discovery phase can be validated and translated into clinical practice. Typically, only a small number of samples is collected during clinical trials, making the subsequent identification and validation of discovered metabolites challenging. Another challenge is found in the difficulty of measuring small differences in metabolites between healthy and cancer patients, masked by high inter-individual variation due to genetic and environmental factors [363]. Some of these challenges can partially be addressed by the advent of the Human Metabolome Database (HMDB) [364]. The database contains information on ranges of specific metabolites in human populations and their link to cancer pathways and disease phenotypes. However, like with other classes of biomarkers, the conclusive evidence of biomarker’s validity can only be established through clinical trials with large and diverse cohorts.

Some of the mentioned issued with metabolomic biomarkers are exemplified by sarcosine. Plasma sarcosine has shown promise as a biomarker for prostate cancer [365]. However, its validity and utility as a viable biomarker were questioned when it failed in later validation trials [366,367,368], and then resurfaced as a biomarker in a recent study [369]. The definite decision about the clinical utility will require large and rigorously designed validation trials.

The sensitivity of circulating metabolites to external factors means that biomarker validation studies need to account for ethnographic and dietary habits. Aminoindex Cancer Screening (AICS, Ajinomoto Co., Inc.) system is a blood test based on multivariate analysis of plasma-free amino-acids (PFAA) [370, 371]. The AICS assay has been validated in a series of clinical trials covering approximately 2,500 patients with 7 different types of cancer and 15,000 healthy controls [372,373,374]. The AICS test is currently being used to screen for early detection of lung cancer in Japan [372, 375]. Ajinomoto is sponsoring a single-site observational prospective case-control study to investigate performance characteristics of the AICS test for gynecological cancers in the US population (NCT02178462).

Our group has completed a number of prospective and retrospective studies to establish clinical validity of a GAGs as biomarkers for liquid biopsy in ccRCC. We found that plasma and urine GAG scores readily distinguish cancer patients from healthy controls (100% specificity and 100% sensitivity in a discovery cohort with 34 mccRCC patients and 16 healthy individuals, and a validation cohort with 18 mccRCC and 9 healthy subjects) [360]. Furthermore, the diagnostic and prognostic value of GAG was investigated in two additional studies [361, 362]. The first study explored the association between urine and plasma GAG scores with progression free and overall survival in a prospective cohort of 31 patients diagnosed with ccRCC [361]. The results show that urine GAG score was a predictor of PFS and OS (hazard ratio (HR) 4.62 and HR 10.13, respectively). The second study [362] investigated plasma GAG score as a biomarker for pre-operative detection of early-stage RCC and prediction of recurrence and death after RCC surgery. This retrospective case-control study consisted of a consecutive series of surgically treated 175 RCC patients and 19 healthy controls [362]. We found that the GAG score could correctly classify RCC from healthy subjects with AUC = 0.999 in the discovery part of the cohort (N = 67). In the validation set (N = 108), the GAG score achieved an AUC of 0.991, and achieved 93.5% sensitivity at the predetermined cutoff [362]. This test will be validated in the multicenter prospective clinical study AURORAX-087A for detection of post-surgical recurrence in ccRCC (NCT04006405).

A systematic review of VOC studies for early detection of lung cancer found that 2-butanone and 1-propanol are commonly reported as best discriminators between healthy and lung cancer subjects [356]. However, there is some discordance between relevant VOCs, as one study points to n-dodecane as having the highest discriminatory power between patients with histologically proven lung cancer and healthy controls (sensitivity 76%, specificity of 100%, when using a decision tree based on n-dodecane and 9 other peaks; N = 50 cancer patients and N = 39 healthy subjects) [354]. Instead of relying on identification of individual VOC, entire VOC signatures have potential for lung cancer detection. For example, one pilot study compared VOC patterns of 32 patients with cytological or histological diagnosis of lung cancer and 54 healthy controls. Combinations of VOC peaks could discriminate cancer from healthy subjects with leave-on-out cross-validation accuracy of 100% [355]. Owlstone Ltd. is currently sponsoring a multi-center case-control study on patients suspected to have lung cancer aimed at evaluating VOC analysis using their breath biopsy technology for early detection of lung cancer (NCT02612532).

‘Breathomics’ are not limited to lung cancer detection. An observational cohort trial on patients referred to CRC surgery or for diagnostic colonoscopy was completed (NCT02332213), where breath biopsy samples were collected prior to surgery or colonoscopy. The GC-MS analysis of VOC revealed that acetone and ethyl acetate were elevated in CRC patients, compared to healthy controls. A discriminant function analysis of breath VOC patterns could discriminate CRC vs healthy controls with 85% sensitivity and a 94% specificity (N = 209) [376]. Two recently published studies shown that VOCs have the potential to discriminate esophagogastric cancer patients from heathy controls [377, 378]. Another study shows that VOCs can be used for early detection of pancreatic cancer [379]. Owlstone Ltd. is currently recruiting patients for a pan-cancer prospective cross-sectional observational case-control study to evaluate if breath biopsy can differentiate between healthy subjects and patients with gastric, esophageal, pancreatic, renal, prostate, and bladder cancer from matched controls (NCT03756597). Breath biopsies are a promising source of biomarkers for precision medicine. The ongoing trials will help address some of the challenges in the field, such as low reproducibility and low VOC concordance between different studies.

Exosomes

Scientific Rationale

Exosomes are bioactive nanovesicles (30–150 nm in size) enclosed in lipid bilayer membranes [380,381,382]. Exosomes are released from endosomes of almost all cell types. The molecular cargo of exosomes includes diverse classes of biomolecules (proteins, DNA, various RNA species, lipids, and other metabolites). The exact composition of exosomes can be very heterogeneous, and likely reflects the composition and the phenotypic state of the cell of origin. The exact mechanisms of exosome biogenesis and function are still under investigation, but there is evidence showing that exosomes have an important role in inter-cellular communication, both in healthy and diseased states [383].

Exosomes are being investigated as important factors in carcinogenesis, with potential to both promote tumor growth and restrain it [384]. Studies show that tumor-derived exosomes can modulate tumor progression, angiogenesis, and metastasis [385]. Given their role in cancer biology, exosomes are being studied as targets for anticancer-therapy [386], potential drug delivery vectors [387], and as biomarkers [383].

Exosomes are rich in biomolecules that reflect the state and the composition of progenitor cells; they are a rich source of biomarkers [383]. In fact, genomics [388], transcriptomics [388], proteomics [389], and metabolomics [390] analyses can be applied to exosome-derived biomarkers. Importantly, because exosomes can be readily isolated from blood, plasma, saliva, urine, breast milk, semen, ascites fluid, amniotic fluid, and cerebrospinal fluid, they are an ideal target for liquid biopsy. Finally, exosomes can obviate the need for repeated tumor biopsies. Because exosome heterogeneity can be associated with intra-tumor heterogeneity, the entire phenotypic and genotypic tumor landscape can be captured with a single liquid biopsy.

Current and Emerging Technologies and Biomarkers

To fully exploit exosomes as a source of robust and accurate biomarkers, it is important to carefully consider methods for sample handling, isolation, and enrichment. The main methods for isolation of exosomes are differential ultracentrifugation [391, 392], density-gradient ultracentrifugation [391, 392], polymer-facilitated precipitation [393, 394], immunoaffinity capture [391, 395], and size-exclusion chromatography [396]. The most commonly used method is differential centrifugation [392]. However, the method can cause vesicle aggregation and co-isolation of protein contaminants [397]. Density-gradient ultracentrifugation can produce samples of higher purity, but it is laborious and lengthy, making it unsuitable for clinical applications. Size exclusion chromatography can produce similarly pure samples, and minimally affect exosome characteristics [398]. Exosome purification using commercial polymer-facilitated precipitation is rapid and easy to implement, but results can be poor [399]. Isolation using immunoaffinity can be used to enrich specific subpopulations of exosomes based on surface antigens [391].

One of the major challenges in the field of exosome biology is the lack of standardization of protocols used for exosome enrichment and characterization. Biased isolation of exosome components can introduce variability in results [400]. Differences in experimental procedures can affect results across different studies. As a way of tackling this issue and establishing experimental guidelines, a crowdsourcing knowledgebase resource called EV-TRACK has been established [401]. Once exosomes are isolated, they can be used as starting materials for further characterization. Exosomes can be characterized by electron and atomic force microscopy methods [402,403,404,405], but here we focus on molecular characterization drawing on omics technologies described earlier.

Exosomal nucleic acid content represents the genetic variants of the originating cancer, as could therefore be used to tailor therapeutic decisions and monitor response to therapy. Given the high abundance and the stability of exosomes, compared to cfDNA and CTCs in patients who have undergone therapy, exosome-derived DNA (exoDNA) is a robust source of genetic biomarkers to guide therapy. Indeed, driver mutations such as BRAF V600E mutation in melanoma and EGFR L858R and T790M mutations in lung cancer have been readily detected in exoDNA [406]. On the other hand, results of exosomal liquid biopsies in population-wide screening should be carefully interpreted, because driver mutations, such as KRAS and TP53, can be found in exosomes of healthy individuals [407, 408].

In addition to DNA, exosomes are a rich source of variety of RNA species [184, 409]. Of all exosomal RNA (exoRNA), miRNAs have been the main focus of exosomal biomarker discovery [410]. On the other hand, long RNAs, such as lncRNA [411] and mRNA [412], are also informative and can be used to identify somatic mutations and changes in gene expression characteristic of cancers. Exosomal hTERT mRNA can be detected in exosomes of patients with different malignancies [412]. The utility of this potential pan-cancer marker will need to be validated in larger cohorts.

Exosomes are significantly more abundant than CTCs in body fluids. This means that exosomally derived proteins are more amenable to proteomics analyses. Exosome proteomes can be probed for biomarkers in an unbiased way, using untargeted MS-proteomics approaches [413, 414], or using affinity-based approaches, such as binding to aptamer-based SOMAscan arrays [415]. Indeed, studies have shown that single protein biomarkers and protein panels from exosomes have the potential to be used as diagnostic and prognostic biomarkers in pancreatic cancer [416, 417], melanoma [418], lung cancer [389], and colorectal cancer [419].

Combining multiple omics approaches can yield more robust biomarkers than relying on a single method. For example, exoDNA and exoRNA can be investigated in tandem to detect oncogenic fusion transcripts [420]. This approach can be very useful for combined genomics and transcriptomics profiling of cancers that are not amenable to solid tissue biopsies [420]. Similarly, improvements in sensitivity and specificity can be achieved by simultaneous investigation of protein and miRNA panels derived from exosomes [421].

Recent developments in the field use various strategies to bypass purification steps. In one such strategy, called ExoScreen, authors used two types of antibodies and photosensitizer-beads to directly capture and detect cancer-derived circulating EVs [422]. ExoScreen uses antibodies against CD9 and CD147 antigens to capture CD9/CD147 double-positive EVs which were enriched in the serum of stage I colorectal cancer patients [422]. Similarly, protein microarrays with cocktails of antibodies against exosome-specific tetraspanins can be used to ensure specific capture of all exosomes. This approach called Extracellular Vesicle Array (EV Array) [423] enabled the detection of exosomes from a crude biofluid sample in a high-throughput manner.

Microfluidics platforms [424,425,426] are a new avenue towards integrated isolation, detection, and multi-omics characterization of exosomes for liquid biopsies. Microfluidic devices obviate the need for lengthy and laborious protocols, enable working with smaller sample volumes, and offer higher throughput. Clinical validity of microfluidic devices is an active area of research, but early efforts have showed promise for liquid biopsies. Microfluidic devices like ExoChip [424] can offer integrated quantification of exosome levels in biofluids. ExoChip relies on immuno-isolation of exosomes by CD63 [424, 427], followed by fluorescence staining and detection/quantification using a standard plate readers. A similar platform, ExoSearch [426], uses continuous flow to isolate CD9-positive exosomes, which are then stained with fluorescently labeled antibodies against exosomal tumor markers (CA125, EpCAM, and CD24), followed by multiplex fluorescence imaging. Another innovative method for exosome capture and characterization, called nano-plasmonic exosome (nPLEX) sensor, uses an array of periodic nanoholes embedded in a gold film [419]. nPLEX arrays are functionalized with exosome-specific affinity ligands, where exosome capture causes changes in the local refractive index proportional to the target protein levels. Importantly, captured exosomes can be released, facilitating analysis of mRNA cargo by qRT-PCR [419]. Microfluidic exosome isolation can also be carried out via size selection. An example of this approach is Exodisc lab-on-a-chip [428]. Exodisc uses tandem nano-filters to enrich exosome subpopulations in the range of 20–600 nm. An integrated immunoassay can then be used to quantify and characterize isolated exosomes.

These emerging technologies have a higher throughput and require much lower volume than standard methods for exosome isolation. Whether this will promote wider adoption of exosome-derived biomarkers for clinical decision-making will remain to be seen after some of the methods are validated in independent prospective clinical trials.

Translational Status

The clinical use of exosome-based biomarkers is fraught with challenges. Exosomes can vary in size and concentrations in different biological samples [429]. Moreover, external factors such as physical activities undertaken prior to sampling or the time of sampling can influence the composition of exosomes in liquid biopsy samples [397]. Once samples are taken, it is critical to consider rapid sample processing and storage because circulating cells inside the sample can continue producing exosomes [397]. Additionally, any downstream biomarker analysis is reliant on costly specialized instrumentation and kits for exosome isolation [430]. Moreover, isolation techniques are often matrix-specific and lack standardization necessary for clinical use. Future improvements in standardization, scalability, and turnover time for exosome isolation will pave way for routine use of this important source of biomarkers in the clinic.

Despite these challenges, exosome-based liquid biopsy tests might make it to the clinic in the near future. Exosome Diagnostics has two assays on the market. ExoDx Prostate (IntelliScore) is a urine-based liquid biopsy test that uses exosomal RNA to quantify expression of three genes and predict the aggressiveness of prostate cancer [431]. A multi-site prospective cohort study of patients undergoing prostate biopsy was used to validate the ExoDx assay for discrimination between aggressive (Gleason grade 7 and higher) versus Gleason grade 6 or benign prostate cancer. In a training cohort (N = 255), the exosome assay score in combination with PSA and clinical variables could detect GS7 or higher prostate cancer with AUC = 0.77 [431]. In the validation cohort (N = 519), the performance was similar, with AUC = 0.73 to detect GS7 cancer or higher, outperforming the PSA-based standard of care test (AUC = 0.63) [431]. The test is available for clinical use and Exosome Diagnostics is currently conducting a trial to investigate its utility and evaluate its potential to reduce the number of initial prostate biopsies (NCT03235687). Their second test, ExoDx Lung (ALK), currently available for research only, is a qPCR test that detects EML4-ALK fusion transcripts in plasma exosomes to inform therapy selection for lung cancer. ExoDx Lung (ALK) was tested for longitudinal monitoring in response to treatment in a prospective cohort of ALK-positive patients (N = 52, total 144 longitudinal samples) [432]. The assay detected exoRNA ALK-fusions in 50% of patients at baseline. Furthermore, 98% of samples from patients who showed objective response or stable disease were tested negative, showing that ExoDx Lung (ALK) has potential as a monitoring biomarker for NSCLC [432].

Systems Biomarkers

Scientific Rationale

The biology of cancer is characterized by complex phenotypes such as genomic instability, metabolic reprogramming, changes in proliferative signaling, evasion of apoptosis and immune response, induction of angiogenesis, invasion and metastasis, collectively known as cancer hallmarks [433]. To understand, identify, and target these neoplastic processes, we need systems-level integration and understanding of information from multiple layers of biological activity (Fig. 2).

The first application of systems biology for precision oncology is the discovery of new (multiomics) biomarkers. Systems biology provides a framework to investigate complex cancer phenotypes in terms of pathways and networks. The state-of-the-art statistical and computational algorithms can be applied to the accumulated multi-layered biological data and integrated with known cancer-related biochemical pathways to guide the discovery of new biomarker panels [434]. The biomarker candidates can then be analytically and clinically validated in clinical trials.

The second application addresses the problem of intra- and inter-patient cancer heterogeneity. Despite general commonalities [433], cancers are defined by distinct background genotypes and molecular signatures. To implement successful personalized treatment protocols, we need to account for genetic and environmental differences between individuals, as well as temporal and spatial heterogeneity of cancer cells within patients. Multiomics patient data collection and assays for known biomarkers can be combined with machine learning approaches [435] for precise patient stratification. The complexity and costs associated with these approaches are still a barrier to implementing systems biology into clinical oncology. However, as the costs of omics analyses keeps declining, and as analysis tools become more powerful and easier to use, multiomics strategies might approach routine use in the next decade.

Current and Emerging Technology

The main drivers of systems biology are big data sets generated through omics technologies described in previous sections. Comprehensive multiomics profiling over thousands of cancer patients has resulted in large databases of cancer-related biological data. Arguably, the most important publicly available resource for cancer systems biology is The Cancer Genome Atlas (TCGA) [75]. TCGA contains petabytes of data from (epi)genomics, transcriptomics, and proteomics experiments and clinicopathologic annotation data describing 33 cancer types from 11,160 patients and has been instrumental for translational cancer research [436]. Another similar resource is The International Cancer Genome Consortium (ICGC) cancer data portal that contains multiomics data from 84 cancer projects and more than 20,000 patients worldwide [437].

The availability of big multiomics dataset is not enough to produce biological or clinical insight on its own. In fact, our ability to generate big data sets is vastly greater than the ability to analyze and integrate them [438]. Combining the results of biological assays with imaging, biopsy, and clinical data is used routinely in clinical practice. Extracting useful information from high-dimensional and heterogeneous biological data sets requires a different approach. The data can be used in combination with models of cellular processes and pathways [18] to reduce the dimensionality and generate candidate list of features. Another approach is to directly combine biological information from different omics platforms (multianalyte approach) to derive classifiers and diagnostic scores uses sophisticated computational methods [435]. These methods include various network topology analyses, dimensionality reduction methods, anomaly detection, supervised and unsupervised machine learning algorithms, as well as summarization and visualization techniques for complex high-dimensional data [439]. These methods have been used for feature selection on big data sets directly [440, 441].

The sensitivity, specificity, and confidence of clinical decision-making can be boosted by leveraging orthogonal multianalyte panels. However, using multi-layered information and multiple markers runs the risk of over-fitting predictive models [439]. The importance of identified multiomics biomarkers (genes, transcripts, proteins, and metabolites from the candidate list) needs to be investigated by targeted assays and omics experiments and be validated in retrospective clinical studies. Robust statistical methods can then be used to remove biomarkers with the minimal impact on accuracy, to identify meaningful correlations, and devise predictive models [439]. The final set of biomarkers, in combination with appropriate statistical methods, can then be validated in larger cohorts [439].

Multi-omics technologies and their integration with diverse clinical data will become even more important for robust patient stratification and cancer diagnostics with the advent of artificial intelligence and deep learning (DL) algorithms [442]. Deep learning is a subclass of machine learning algorithms that use neural networks, multi-layered data processing networks capable of feature extraction and pattern recognition in large and diverse data sets [443]. While the successful applications of DL algorithms in medicine are mostly focused on automated classification of medical imaging data [444, 445], there are promising examples of applications to (multi)-omics in precision medicine. In one study, authors used an artificial neural network (ANN) to distinguish multiple myeloma patients from healthy subjects with 95% sensitivity at 95% specificity (N = 84, case-control study) based on MALDI TOF MS low mass spectral fingerprint/metabolomic analysis of peripheral plasma samples [446]. In another study, miRNA sequence data from serum samples of epithelial ovarian cancer (EOC) patients was used to train a neural network, resulting in a diagnostic algorithm that outperformed CA125 in distinguishing cancer patients from healthy controls and benign tumors (AUC = 0.9, N = 179, case-control studies) [447]. The authors then reduced the set of diagnostic miRNAs to only seven that could be detected via qPCR, adapted the neural network to the reduced set, and validated the classifier on 51 pre-operative clinical samples to achieve an AUC = 0.85 [447].

The power of deep learning is that it can readily integrate disparate data, such as multi-omics data, medical images, and clinical information to enhance prediction accuracy [443]. However, further progress is limited by small data sets. The full realization of DL potential will have to wait on the availability of sufficiently large, matched, and carefully annotated datasets. However, even with perfect datasets of sufficient size, validation and assurance of proper use might require interpretability of predictions before DL is adopted into routine clinical use [448, 449].

Translational Status

Translational use of systems biology and multi-analyte biomarkers is a relatively new addition to the field. However, there have been some notable studies yielding new biomarker panels for liquid biopsy. Some of these biomarkers show promise in early clinical trials and await validation in larger cohorts, while others are on their way to being commercialized and used in clinical practice.

CancerSEEK is a promising multianalyte blood test, with potential for pan-cancer diagnosis [450]. The initial case-control study (N = 1005) on patients with clinically detected stage I-III cancers shows that the test could detect a median of 70% over eight common cancer types by quantifying levels of protein markers in plasma and cancer-specific mutations in cfDNA. Importantly, CancerSEEK protein markers were useful in detecting the candidate tissue of origin, which is a critical feature in a population-wide screens for early pan-cancer detection. However, the specificity needs to be assessed and validated on large prospective cohorts [450].

Large biomarker panels can in some cases be replaced by only a few biomarkers. For example, combining digital droplet PCR to determine KRAS mutant allele fraction (MAF) in cfDNA and exoDNA can help devise a classifier to predict liver cancer. This classifier was tested for clinical utility in a longitudinal prospective cohort of 194 patients undergoing treatment for clinically and histologically confirmed localized or metastatic pancreatic adenocarcinoma [32]. The baseline multianalyte analysis of the cohort showed that the ctDNA and exoDNA MAFs ≥ 5% to be a significant predictor of OS (HR, 7.73). Furthermore, longitudinal multianalyte monitoring of exoDNA showed that MAF peak above 1% is associated with radiologic progression (sensitivity 79% and specificity 100%) [32]. This study shows that longitudinal monitoring of circulating nucleic acids can provide useful predictive and prognostic information.

A larger multianalyte test has been investigated for prediction of pancreatic cancer in a population with risk for familial pancreatic cancer [451]. The panel used multiple analytes across different sample matrices: tissue (miRNA: miR-196b), serum (snRNA: RNU2-1f; protein: LCN2, TIMP1, Glypican-1, and CA 19-9), duodenal juice exosomes (protein: Glypican-1), and duodenal cfDNA (KRAS mutations). The validation in a small cohort showed that the entire panel could be reduced to the three plasma analytes (miR-196b, TIMP1, and LCN2), and distinguish stage I PDAC (N = 5) from healthy individuals (N = 20) with an AUC = 1, and sensitivity, and specificity at 100% [451]. Validating the specificity of this multi-analyte panel towards early stage PDAC is necessary. However, it might be challenging to do so, because clinically validated stage I PDAC samples are extremely rare [451].

A successful case of a multianalyte test that also incorporates clinical data is the Stockholm 3 model (STHLM3). This test combines plasma protein biomarkers (PSA, free PSA, intact PSA, hK2, MSMB, MIC1), 232 genetic polymorphisms associated with prostate cancer in earlier studies, and clinical variables to identify high risk prostate cancer at biopsy [324]. STHLM3 was validated in an independent multi-center community cohort of 533 patients scheduled for prostate biopsy [452]. Blood samples drawn prior to biopsy and analyzed to compare STHL3 to PSA-based diagnosis of clinically significant prostate cancer (ISUP Grade Group (GG) 2 or higher). STHLM3 showed better diagnostic performance than PSA alone (AUC = 0.859 vs 0.642 for PSA and 0.748 for PSA density) for detection of Gleason grade group ≥ 2 vs benign and Gleason grade group = 1 PCa, with the potential to reduce the total number of biopsies by 38% [452]. The Stockholm3 test is now entering clinical use in Sweden, Norway, and Finland.

An upcoming AI Genomics start-up Freenome is recruiting patients for an observational study focused on colorectal cancer screening. Their approach will be to analyze all cfDNA (most of which originates from immune cells [453]), cfRNA and circulating protein as potential circulating biomarkers. Their first report, on sequencing cfDNA for early CRC detection, was available as a preprint at the time of writing this review [454]. The study was performed on retrospectively collected 871 plasma samples from international institutions and commercial biobanks (from 546 predominantly early stage CRC cases, and 271 non-cancer controls). The authors estimated ctDNA fraction in plasma cfDNA from copy number variation and used machine learning (logistic regression and support vector machine) to discriminate between healthy and CRC samples. Using k-fold cross validation, the procedure showed sensitivity of 85% at 85% specificity for CRC versus healthy subjects.

Conclusions and Future Directions

The molecular characterization of tumor tissue biopsy samples is currently the gold standard of precision and personalized medicine. However, the invasiveness of (repeated) biopsies is one of the main drivers of research on liquid biopsy biomarkers for clinical decision making. Moreover, liquid biopsies can offer insights into biological phenomena and clinically important information about spatiotemporal heterogeneity of tumors that is not readily accessible through tissue biopsy.

Towards Best Practices for Discovery and Validation of Biomarkers

The accumulation of understanding of cancer biology and advances in omics technologies have already yielded many potential circulating biomarkers. The declining costs of high-throughput assays and propagation of efficient computational methods have enabled both targeted and unbiased genome-wide studies on biomarkers for clinical applications. The emerging liquid biopsy tests are increasingly focused on multiparametric assays, involving multiple analytes from a single layer of biological information or multi-omics analytes. These kinds of studies are propelled by advances in statistical and machine learning methods for analyzing big data. However, while current research highlights the promise of liquid biopsy biomarkers for precision oncology, the majority of studies are still in an early proof-of-concept phase.

There are many challenges that will need to be overcome before many of the new omics biomarkers can enter into standard clinical practice. One of the reasons for the discrepancy between the number of biomarkers in the primary literature and clinical practice is the gulf between the experimental evidence needed to establish a finding in basic science and the requirements for a robust diagnostic assay [230]. Moreover, even when analytical validity is established, assays need to be clinically validated in well-designed trials. Currently, there are many biomarkers that show promise in retrospective and case-control pilot studies, but there is a general lack of large prospective studies demonstrating—at the very least—clinical validity. For example, a recent systematic assessment of clinical proteomics literature [455] revealed that only 10–20% studies mention potential clinical application, while the rest focus mainly on development of technical aspects of an assay or sample preparation. Furthermore, even where reviewed studies included clinical validation, it was found to be underpowered for the specific context of use (low sample size), the potential biomarker was not tested against current methods, the study was not performed on the population of interest (e.g., screening biomarker tested in a case-control study with different prevalence than in reality), or tested in the intended context of use [455]. Moreover, the evidence of clinical utility is lacking for the majority of liquid biopsy biomarkers.

Demonstrating clinical validity and utility is perhaps even more challenging for circulating biomarkers that would enable population-wide screening and early detection of tumors. While the early data may show promising diagnostic performance, caution is necessary because of large numbers of false positives even when clinical specificity is high (> 99%). Because the incidence and the mortality from any specific cancer is low, clinical utility studies need to be long and cover a sufficiently large population.

The weight of evidence and the scope of clinical trials needed to successfully complete a phased biomarker research for screening applications is well outside of reach for most research groups. In fact, multidisciplinary collaborations are often needed to successfully complete the process of biomarker discovery and validation through multiple phases that require a diverse and orthogonal sets of expertise: biological, analytical, statistical, clinical, ethical, and regulatory [456]. Inadequate attention to one or more of these facets of biomarker development has led to failures in validation studies. Specifically, past biomarker failures have been linked to poor selection of patients, low-quality sample acquisition, processing and storage, insufficient statistical power, lacking standards for biomarker profiling, and problems with reporting and analysis [20, 21, 456, 457].

To address common problems in biomarker development, various authors and organizations have devised sets of standards and guidelines for study design and reporting. Some of the important resources that provide guidelines for phased biomarker development, study design, and reporting standards for protocols and results include: guidelines for phased development of biomarkers for early detection of cancer [307], guidelines for case-control study design [305], validation steps for omics biomarkers [304], the Biospecimen Reporting for Improved Study Quality (BRISQ) [458], PRospective-specimen-collection, retrospective-Blinded-Evaluation (PRoBE) [305], design strategies for identification of predictive biomarkers [459], REporting recommendations for tumor MARKer prognostic studies (REMARK) [460], Consolidated Standards of Reporting Trials (CONSORT) [461], and standards for reporting of diagnostic accuracy (STARD) [462]. The National Biomarker Development Alliance (NBDA) set forth to establish standards and point out phases of the systems-based, end-to-end biomarker development that still need to be standardized [463]. Some critical points along the biomarker development trajectory are addressed below.

Many research and clinical ambiguities can be avoided by establishing the clinical question early at the inception of the biomarker research program [464]. This should be done by consulting with clinicians and taking unmet clinical needs into account (see the following subsection). Prespecifying the context of use enables early and productive engagement with regulatory bodies. Moreover, predetermined intended use of the biomarker dictates the details of all downstream phases. Specifically, it assures that patient selection, sample sources, quality, and adequate size/number are relevant to the intended clinical utility [299].

The importance of a large sample size is pronounced where multi-omics data are analyzed for biomarker discovery—small N leads to overfitting and false positives [463]. Equally important are independent confirmatory studies, where independent training and blinded test sets can increase confidence in the validity of a biomarker. In cases where biomarkers are being developed for a cancer (sub)type with low prevalence, it might be necessary to use biobanks and samples from multiple research centers to reach requisite numbers for validation. Furthermore, access to all clinical data is necessary to rule out patient characteristics other than the disease state as the source of biomarker level variation.

Prior to clinical validation, it is necessary to carry out the analytical validation of the biomarker assay. The assurance of the analytical validity of the test assay requires the use reference standards and can be assessed by measuring different parameters: accuracy, trueness, precision, reproducibility, robustness, linearity, analytical sensitivity and specificity, the limit of detection, and interfering substances. Notably, using multiomics procedures necessitates adherence to strict quality control standards [304, 463]. As the last step for ensuring the validity of the assay, it is important to establish multi-site assay precision and reproducibility. Adoption of FDA Good Laboratory Practice [465] and CLIA laboratory proficiency testing [466] can enhance analytical and statistical rigor.

Protocols for all pre-analytical (processing, handling, transport, and storage), analytical (assay methods and instrumentation) and post-analytical (statistical/computational pipeline for data analysis and interpretation) procedures need to be standardized and strictly defined (“locked down”) at this point in order to mitigate reproducibility issues downstream [463].

Data collection, storage, and analysis algorithms are another critical area where special care is necessary [463]. Increasingly complex multi-omics approaches require validation and application of new analytical and statistical approaches, the development of analytical standards, and robust open-source classification algorithms. Collection, storage, and sharing of high-quality data and metadata is important throughout the development process. Importantly, data sharing and publication of negative and contradictory results can simplify investigation of reasons for biomarker failure. Researchers need to adhere to data quality standards, use established ontologies, vocabularies, minimum reporting standards, and utilize accepted exchange formats. FDA and NIH jointly developed BEST (Biomarkers, EndpointS, and other Tools) Resource [19] to facilitate this process.

Clinical validation of the biomarker requires careful study design according to the intended use of the biomarker. Prospective randomized trials are considered to be a gold standard for biomarker validation; however, some cancer biomarkers currently in use were validated using retrospective analyses of clinical trials [459] (e.g., KRAS [467, 468] and BRCA mutations [469]). Critically, statistical procedures and threshold values for evaluating biomarker utility need to be prespecified. Depending on the intended use, biomarker characteristics will be estimated by receiver operating characteristic (ROC) curve analysis, estimated as clinical sensitivity and specificity, positive and negative predictive value (PPV and NPV, respectively).

As the final goal of biomarker development, it is necessary to establish clinical utility. In this phase, it is necessary to demonstrate the improvement of patient outcomes through the use of the biomarker (e.g., through overall or progression-free survival, mortality), show economical utility, and compare it to the established biomarkers and the standard of care.

Even if the number of aforementioned aspects to take in consideration for a successful translation of newly discovered biomarkers in the clinic may seem overwhelming and beyond the capacity of academic research, in our experience, biomarker validation is an iterative process (Fig. 3). Each iteration redefines the level of readiness of the liquid biopsy technology. Through each “cycle” of validation, evidence is accumulated on the clinical validity/utility of the biomarker, before the ultimate decision can be made about the commercialization of the assay. Noteworthy, reaching the market does not mark the end of the process, because biomarker performance needs to be monitored even after it has been approved for commercialization [463].

Fig. 3
figure 3

The process of biomarker validation in practice. Biomarker validation can be seen as an iterative process where the liquid biopsy assay at an increasing level of analytical validation is tested for clinical validity for an intended purpose and on a fit-for-purpose patient population. The study design and an endpoint need to be predetermined, and sample and clinical data collection need to be carried out in accordance with clinical and biomarker development guidelines. Upon completion of the study, assay performance characteristics need to be reported to requisite regulatory bodies. Upon enhancement of the analytical performance all protocols need to be locked down and validated again

Towards Unmet Clinical Needs

Despite all the challenges, translational efforts in liquid biopsy have resulted in validated and commercialized biomarkers, some of which are already in clinical use (Table 1). In particular, diagnostic, prognostic, and predictive circulating biomarker options exist for lung, breast, colorectal, bladder, ovarian, cervical, and prostate cancer. Additionally, the first chronic myeloid leukemia liquid biopsy assay for monitoring treatment response, QXDx AutoDG ddPCR System (Bio-Rad Laboratories), received FDA clearance at the time of writing this review. However, management of patients with solid tumors like gastric, esophageal, liver, pancreatic, endometrial, brain, thyroid, head and neck, melanoma, and renal cancers still has no validated and established liquid biopsy biomarkers.

Table 1 Examples of FDA-approved, commercialized, and upcoming liquid biopsy tests

Gastric (GC) and esophageal cancer are estimated to cause death in more than a 1.2 million people worldwide in 2019 [1]. There is an unmet need for diagnostic and prognostic liquid biopsy biomarkers for these two types of cancer. However, all available studies have been carried out in basic studies or retrospective and prospective studies with small cohorts.

Hepatocellular carcinoma is another example of a common and deadly cancer with an urgent need for liquid biopsy biomarkers [470]. Many liver cancer patients are diagnosed at late stage, where curative treatment is no longer an option. Importantly, sorafenib, a drug used to treat advanced hepatocellular carcinoma, currently has no clinically validated biomarkers for response prediction.

Similarly, pancreatic cancer, one of the most aggressive tumors, is asymptomatic at an early stage and most diagnosis are made at late stage where there are limited options for treatment [471]. The established biomarker for pancreatic cancer, CA19-9, is not suitable for screening and diagnosis. Moreover, pancreatic cancer has four different subtypes with complex molecular signatures, which are impossible to resolve using current diagnostic procedures [471].

Another example is endometrial cancer, typically diagnosed at an early stage when it is treatable. However, about 20% of the cases are diagnosed at late stage where 5-year survival is drastically lower [472]. Late stage endometrial cancer is treated with surgery, radiotherapy, and chemotherapy. However, chemotherapy is less effective than with other cancers, and different cancer subtypes require different therapeutic decisions, warranting development and validation of biomarkers for monitoring recurrence and response to therapy [472].

Perhaps the most compelling reason for development of liquid biopsy biomarkers are tumors of the central nervous system. Clinical decision-making, including monitoring response to treatment, is heavily reliant of neuroimaging. However, chemoradiation and antiangiogenic therapy can alter contrast enhancement and confound imaging results by affecting the permeability of the blood-brain barrier and the tumor vasculature [473]. Furthermore, brain tumor tissue biopsy carries significant risk for the patient, creating difficulties for diagnosis, prediction, and prognosis. Thus, there is an unmet clinical need for non-invasive liquid biopsy biomarkers for brain tumors.

Precision and personalized medicine will benefit greatly when analytical and clinical challenges affecting circulating biomarker development are addressed. While it might take a long time before clinical validity and utility are demonstrated for early detection, liquid biopsy biomarkers are becoming crucial for patient stratification and therapy response prediction. We believe that these trends will continue, and that liquid biopsy will play an increasingly important role in personalized cancer patient management in the future.