Background

Artificial intelligence (AI) encompasses multiple technologies with the common aim to computationally simulate human intelligence. Machine learning (ML) is a subgroup of AI that focuses on making predictions by identifying patterns in data using mathematical algorithms. Deep learning (DL) is a subgroup of ML that focuses on making predictions using multi-layered neural network algorithms inspired by the neurological architecture of the brain. Compared to other ML methods such as logistic regression, the neural network architecture of DL enables the models to scale exponentially with the growing quantity and dimensionality of data [1]. This makes DL particularly useful for solving complex computational problems such as large-scale image classification, natural language processing and speech recognition and translation [1].

Cancer care is undergoing a shift towards precision healthcare enabled by the increasing availability and integration of multiple data types including genomic, transcriptomic and histopathologic data (Fig. 1). The use and interpretation of diverse and high-dimensionality data types for translational research or clinical tasks require significant time and expertise. Moreover, the integration of multiple data types is more resource-intensive than the interpretation of individual data types and needs modelling algorithms that can learn from tremendous numbers of intricate features. The use of ML algorithms to automate these tasks and aid cancer detection (identifying the presence of cancer) and diagnosis (characterising the cancer) has become increasingly prevalent [2, 3]. Excitingly, DL models have the potential to harness this complexity to provide meaningful insights and identify relevant granular features from multiple data types [4, 5]. In this review, we describe the latest applications of deep learning in cancer diagnosis, prognosis and treatment selection. We focus on DL applications for omics and histopathological data, as well as the integration of multiple data types. We provide a brief introduction to emerging DL methods relevant to applications covered in this review. Next, we discuss specific applications of DL in oncology, including cancer origin detection, molecular subtypes identification, prognosis and survivability prediction, histological inference of genomic traits, tumour microenvironment profiling and future applications in spatial transcriptomics, metagenomics and pharmacogenomics. We conclude with an examination of current challenges and potential strategies that would enable DL to be routinely applied in clinical settings.

Fig. 1
figure 1

Deep learning may impact clinical oncology during diagnosis, prognosis and treatment. Specific areas of clinical oncology where deep learning is showing promise include cancer of unknown primary, molecular subtyping of cancers, prognosis and survivability and precision oncology. Examples of deep learning applications within each of these areas are listed. The data modalities utilised by deep learning models are numerous and include genomic, transcriptomic and histopathology data categories covered in this review

Emerging deep learning methods

Covering all DL methods in detail is outside the scope of this review; rather, we provide a high-level summary of emerging DL methods in oncology. DL utilises artificial neural networks to extract non-linear, entangled and representative features from massive and high-dimensional data [1]. A deep neural network is typically constructed of millions of densely interconnected computing neurons organised into consecutive layers. Within each layer, a neuron is connected to other neurons in the layer before it, from which it receives data, and other neurons in the layer after it, to which it sends data. When presented with data, a neural network feeds each training sample, with known ground truth, to its input layer before passing the information down to all succeeding layers (usually called hidden layers). This information is then multiplied, divided, added and subtracted millions of times before it reaches the output layer, which becomes the prediction. For supervised deep learning, each pair of training sample and label is fed through a neural network while its weights and thresholds are being adjusted to get the prediction closer to the provided label. When faced with unseen (test) data, these trained weights and thresholds are frozen and used to make predictions.

Fundamental neural network methods

There are multiple neural network-based methods, all with different advantages and applications. Multilayer perceptron (MLP), recurrent neural network (RNN) and convolutional neural network (CNN) are the most fundamental and are frequently used as building blocks for more advanced techniques. MLPs are the simplest type of neural networks, where neurons are organised in consecutive layers so that signals travel through the network in one direction (from input to output) [1]. Although MLPs can perform well for generic predictions, they are also prone to overfitting [6]. RNNs process an input sequence one element at a time, while maintaining history of all past elements in hidden ‘state vector(s)’. Output predictions are made at every element using information from the current element and also previous elements [1, 7]. RNNs are typically used for analysing sequential data such as text, speech or DNA sequences. By contrast, CNNs are designed to draw spatial relationships from image data. CNNs traverse an image and apply small feature-filter matrices, i.e. convolution filters, to extract granular features [1]. Features extracted by the last convolution layer are then used for making predictions. CNNs have also been adapted for analysis of non-image data, e.g. genomic data represented in a vector, matrix or tensor format [8]. A review by Dias and Torkamani [7] described in detail how MLPs, RNNs and CNNs operate on biomedical and genomics data. Moreover, the use of MLPs, RNNs and CNNs to assist clinicians and researchers has been proposed across multiple oncology areas, including radiotherapy [9], digital histopathology [10, 11] and clinical and genomic diagnostics [7]. While routine clinical use is still limited, some of the models have already been FDA-approved and adopted into a clinical setting, for example CNNs for the prediction of malignancy in pulmonary nodules detected by CT [12], and prostate and breast cancer diagnosis prediction using digital histopathology [13, 14].

Advanced neural-network methods

Graph convolutional neural networks (GCNNs) generalise CNNs beyond regular structures (Euclidean domains) to non-Euclidean domains such as graphs which have arbitrary structure. GCNNs are specifically designed to analyse graph data, e.g. using prior biological knowledge of an interconnected network of proteins with nodes representing proteins and pairwise connections representing protein–protein interactions (PPI) [15], using resources such as the STRING PPI database [16] (Fig. 2a). This enables GCNNs to incorporate known biological associations between genetic features and perceive their cooperative patterns, which have been shown to be useful in cancer diagnostics [17].

Fig. 2
figure 2

An overview of Deep Learning techniques and concepts in oncology. a Graph convolutional neural networks (GCNN) are designed to operate on graph-structured data. In this particular example inspired by [17,18,19], gene expression values (upper left panel) are represented as graph signals structured by a protein–protein interactions graph (lower left panel) that serve as inputs to GCNN. For a single sample (highlighted with red outline), each node represents one gene with its expression value assigned to the corresponding protein node, and inter-node connections represent known protein–protein interactions. GCNN methods covered in this review require a graph to be undirected. Graph convolution filters are applied on each gene to extract meaningful gene expression patterns from the gene’s neighbourhood (nodes connected by orange edges). Pooling, i.e. combining clusters of nodes, can be applied following graph convolution to obtain a coarser representation of the graph. Output of the final graph convolution/pooling layer would then be passed through fully connected layers producing GCNN’s decision. b Semantic segmentation is applied to image data where it assigns a class label to each pixel within an image. A semantic segmentation model usually consists of an encoder, a decoder and a softmax function. The encoder consists of feature extraction layers to ‘learn’ meaningful and granular features from the input, while the decoder learns features to generate a coloured map of major object classes in the input (through the use of the softmax function). The example shows a H&E tumour section with infiltrating lymphocyte map generated by Saltz et al. [20] DL model c multimodal learning allows multiple datasets representing the same underlying phenotype to be combined to increase predictive power. Multimodal learning usually starts with encoding each input modality into a representation vector of lower dimension, followed by a feature combination step to aggregate these vectors together. d Explainability methods take a trained neural network and mathematically quantify how each input feature influences the model’s prediction. The outputs are usually feature contribution scores, capable of explaining the most salient features that dictate the model’s predictions. In this example, each input gene is assigned a contribution score by the explainability model (colour scale indicates the influence on the model prediction). An example of gene interaction network is shown coloured by contribution scores (links between red dots represent biological connections between genes)

Semantic segmentation is an important CNN-based visual learning method specifically for image data (Fig. 2b). The purpose of semantic segmentation is to produce a class label for every single pixel in an image and cluster parts of an image together into each class, where the class represents an object or component of the image. Semantic segmentation models are generally supervised, i.e. they are given class labels for each pixel and are trained to detect the major ‘semantics’ for each class.

To enhance the predictive power of DL models, different data types (modalities) can be combined using multimodal learning (Fig. 2c). In clinical oncology, data modalities can include image, numeric and descriptive data. Cancer is a complex and multi-faceted disease with layers of microscopic, macroscopic and molecular features that can separately or together influence treatment responses and patient prognosis. Therefore, combining clinical data (e.g. diagnostic test results and pathology reports), medical images (e.g. histopathology and computed tomography) and different types of omics data, such as genomic, transcriptomic and proteomic profiles, may be useful. The two most important requirements for a multimodal network are the ability to create representations that contain dense meaningful features of the original input, and a mathematical method to combine representations from all modalities. There are several methods capable of performing the representative learning task, e.g. CNNs, RNNs, deep belief networks and autoencoders (AE) [21]; score-level fusion [22]; or multimodal data fusion [23]. The multimodal learning applications discussed in this review are based on AE models. In simplistic terms, AE architecture comprises of an encoder and a decoder working in tandem. The encoder is responsible for creating a representation vector of lower dimension than the input, while the decoder is responsible for reconstructing the original input using this low-dimensional vector [24]. This forces the encoder to ‘learn’ to encapsulate meaningful features from the input and has been shown to have good generalisability [24]. Moreover, it provides DL models the unique ability to readily integrate different data modalities, e.g. medical images, genomic data and clinical information, into a single ‘end-to-end optimised’ model [8].

A major challenge with implementing DL into clinical practice is the ‘black box’ nature of the models [25]. High-stake medical decisions, such as diagnosis, prognosis and treatment selection, require trustworthy and explainable decision processes. Most DL models have limited interpretability, i.e. it is very difficult to dissect a neural network and understand how millions of parameters work simultaneously. Some even argue that more interpretable models such as Decision Trees should be ultimately preferred for making medical decisions [26]. An alternative approach is explainability—mathematical quantification of how influential, or ‘salient’, the features are towards a certain prediction (Fig. 2d). This information can be used to ‘explain’ the decision-making process of a neural network model and identify features that contribute to a prediction. This knowledge can enable resolution of potential disagreements between DL models and clinicians and thus increase trust in DL systems [27]. Moreover, DL models do not always have perfect performance due to either imperfect training data (e.g. assay noise or errors in recording) or systematic errors caused by bias within DL models themselves, which can result from the training data not being representative of the population where DL is later applied [27]. In these circumstances, explainability can assist clinicians in evaluating predictions [27]. While some explainability methods were developed specifically for neural networks [28, 29], others offer a more model- and data-agnostic solution [30,31,32,33]. Excitingly, explainability methods can be used in conjunction with multi-modal learning for data integration and discovery of cross-modality insights, e.g. how cancer traits across different omics types correlate and influence each other.

Another challenge in applying DL in oncology is the requirement for large amounts of robust, well-phenotyped training data to achieve good model generalisability. Large curated ‘ground-truth’ datasets of matched genomic, histopathological and clinical outcome data are scarce beyond the publicly available datasets, such as The Cancer Genome Atlas (TCGA) [34], International Cancer Genome Consortium (ICGC) [35], Gene Expression Omnibus (GEO) [36], European Genome-Phenome Archive (EGA) [37] and Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) [38]. Pre-training on abundant datasets from other domains may help overcome the challenges of limited data (a process known as transfer learning). The pre-trained neural network would then be reconfigured and trained again on data from the domain of interest. This approach usually results in a considerable reduction in computational and time resources for models training, and a significant increase in predictive performance, compared to training on small domain-specific datasets [39].

Deep learning in oncology

A variety of DL approaches that utilise a combination of genomic, transcriptomic or histopathology data have been applied in clinical and translational oncology with the aim of enhancing patient diagnosis, prognosis and treatment selection (Fig. 1, Table 1). However, even with the emerging DL approaches, human intervention remains essential in oncology. Therefore, the goal of DL is not to outperform or replace humans, but to provide decision support tools that assist cancer researchers to study the disease and health professionals in the clinical management of people with cancer [79].

Table 1 Summary of deep learning methods, their relevant applications and brief technical descriptions of each DL model

Deep learning for microscopy-based assessment of cancer

Cancers are traditionally diagnosed by histopathology or cytopathology to confirm the presence of tumour cells within a patient sample, assess markers relevant to cancer and to characterise features such as tumour type, stage and grade. This microscopy-based assessment is crucial; however, the process is relatively labour-intensive and somewhat subjective [80, 81]. A histology image viewed at high magnification (typically 20x or 40x) can reveal millions of subtle cellular features, and deep CNN models are exceptionally good at extracting features from high-resolution image data [82]. Automating cancer grading with histology-based deep CNNs has proven successful, with studies showing that performance of deep CNNs can be comparable with pathologists in grading prostate [40,41,42], breast [43], colon cancer [44] and lymphoma [45]. Explainability methods can enable and improve histology-based classification models by allowing pathologists to validate DL-generated predictions. For example, Hägele et al. applied the Layer-wise Relevance Propagation (LRP) [29] method on DL models classifying healthy versus cancerous tissues using whole-slide images of lung cancer [46]. The LRP algorithm assigned a relevance score for each pixel, and pixel-wise relevance scores were aggregated into cell-level scores and compared against pathologists’ annotations. These scores were then used to evaluate DL model performance and identify how multiple data biases affected the performance at cellular levels [46]. These insights allow clinician and software developers to gain insights into DL models during development and deployment phases.

In addition to classification and explainability, semantic segmentation approaches can also be applied on histopathology images to localise specific regions. One notable approach to perform semantic segmentation is to use generative adversarial networks (GANs) [47]. GAN is a versatile generative DL method comprising a pair of two neural networks: a generator and a discriminator [83]. In the context of semantic segmentation, the generator learns to label each pixel of an image to a class object (Fig. 2b), while the discriminator learns to distinguish the predicted class labels from the ground truth [84]. This ‘adversarial’ mechanism forces the generator to be as accurate as possible in localising objects so that the discriminator cannot recognise the difference between predicted and ground-truth class labels [84]. Using this approach, Poojitha and Lal Sharma trained a CNN-based generator to segment cancer tissue to ‘help’ a CNN-based classifier predict prostate cancer grading [47]. The GAN-annotated tissue maps helped the CNN classifier achieve comparable accuracy to the grading produced by anatomical pathologists, indicating DL models can detect relevant cell regions in pathology images for decision making.

Molecular subtyping of cancers

Transcriptomic profiling can be used to assign cancers into clinically meaningful molecular subtypes that have diagnostic, prognostic or treatment selection relevance. Molecular subtypes were first described for breast cancer [85, 86], then later for other cancers including colorectal [87], ovarian cancer [88] and sarcomas [89]. Standard computational methods, such as support vector machines (SVMs) or k-nearest neighbours, used to subtype cancers can be prone to errors due to batch effects [90] and may rely only on a handful of signature genes, omitting important biological information [91,92,93]. Deep learning algorithms can overcome these limitations by learning patterns from the whole transcriptome. A neural network model DeepCC trained on TCGA RNA-seq colon and breast cancer data, then tested on independent gene expression microarray data showed superior accuracy, sensitivity and specificity when compared to traditional ML approaches including random forest, logistic regression, SVM and gradient boosting machine [48]. Neural networks have also been successfully applied to transcriptomic data for molecular subtyping of lung [94], gastric and ovarian cancers [95]. DL methods have the potential to be highly generalisable in profiling cancer molecular subtypes due to their ability to train on a large number of features that are generated by transcriptomic profiling. Furthermore, due to their flexibility, DL methods can incorporate prior biological knowledge to achieve improved performance. For example, Rhee et al. trained a hybrid GCNN model on expression profiles of a cancer hallmark gene set, connected in a graph using the STRING PPI network [16] to predict breast cancer molecular subtypes, PAM50 [18]. This approach outperformed other ML methods in subtype classification. Furthermore, the granular features extracted by the GCNN model naturally clustered tumours into PAM50 subtypes without relying on a classification model demonstrating that the method successfully learned the latent properties in the gene expression profiles [18].

The use of multimodal learning to integrate transcriptomic with other omics data may enable enhanced subtype predictions. A novel multimodal method using two CNN models trained separately on copy number alterations (CNAs) and gene expression before concatenating their representations for predictions was able to predict PAM50 breast cancer subtypes better than CNNs trained on individual data types [54]. As multi-omics analysis becomes increasingly popular, multimodal learning methods are expected to become more prevalent in cancer diagnostics. However, the challenges of generating multi-omic data from patient samples in the clinical setting, as opposed to samples bio-banked for research, may hinder the clinical implementation of these approaches.

Digital histopathology images are an integral part of the oncology workflow [11] and can be an alternative to transcriptomic-based methods for molecular subtyping. CNN models have been applied on haematoxylin and eosin (H&E) sections to predict molecular subtypes of lung [49], colorectal [50], breast [51, 52] and bladder cancer [53], with greater accuracy when compared to traditional ML methods.

Diagnosing cancers of unknown primary

Determining the primary cancer site can be important during the diagnostic process, as it can be a significant indicator of how the cancer will behave clinically, and the treatment strategies are sometimes decided by the tumour origin [96, 97]. However, 3–5% of cancer cases are metastatic cancers of unknown origin, termed cancers of unknown primary (CUPs) [98, 99]. Genomic, methylation and transcriptomic profiles of metastatic tumours have unique patterns that can reveal their tissues of origin [100,101,102].

Traditional ML methods, such as regression and SVMs, applied to these omics data can predict tumour origin; however, they usually rely on a small subset of genes, which can be limiting in predicting a broad range of cancer types and subtypes. In contrast, DL algorithms can utilise large number of genomic and transcriptomic features. The Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium [103] used a DL model to predict the origins of 24 cancer types individually and collectively using thousands of somatic mutation features across 2 different classes (mutational distribution and driver gene and pathway features) [55]. Remarkably, the study found that driver genes and pathways are not among the most salient features, highlighting why previous efforts in panel and exome sequencing for CUP produced mixed results [104,105,106,107]. Deep learning approaches utilising transcriptome data have also shown utility in predicting tumour site of origin [56, 57]. A neural network called SCOPE, trained on whole transcriptome TCGA data, was able to predict the origins of treatment-resistant metastatic cancers, even for rare cancers such as metastatic adenoid cystic carcinoma [56]. The CUP-AI-Dx algorithm, built upon a widely used CNN model called Inception [108], achieved similar results on 32 cancer types from TCGA and ICGC [57]. As whole genome sequencing becomes increasingly available, these models show great potential for future DL methods to incorporate multiple omics features to accurately categorise tumours into clinically meaningful subtypes by their molecular features.

In addition to genomic and transcriptomic data, a new model call TOAD trained on whole slide images (WSIs) was able to simultaneously predict metastasis status and origin of 18 tumour types [58]. Moreover, the model employed an explainability method called attention [109, 110] to assign diagnostic relevance scores to image regions and revealed that regions with cancer cells contributed most to both metastasis and origin decision making [58]. These results suggested TOAD can ‘focus’ on biologically relevant image patterns and is a good candidate for clinical deployment.

Cancer prognosis and survival

Prognosis prediction is an essential part of clinical oncology, as the expected disease path and likelihood of survival can inform treatment decisions [111]. DL applied to genomic, transcriptomic and other data types has the potential to predict prognosis and patient survival [59,60,61,62, 112]. The most common survival prediction method is the Cox proportional hazard regression model (Cox-PH) [113,114,115], which is a multivariate linear regression model finding correlations between survival time and predictor variables. A challenge of applying Cox-PH on genomic and transcriptomic data is its linear nature, which can potentially neglect complex and possibly nonlinear relationships between features [116]. By contrast, deep neural networks are naturally nonlinear, and in theory could excel at this task. Interestingly, many studies have incorporated Cox regression used for survival analysis into DL and trained these models on transcriptomic data for enhanced prognosis predictions [59,60,61,62, 112]. Among them, Cox-nnet was a pioneering approach that made Cox regression the output layer of neural networks, effectively using millions of deep features extracted by hidden layers as input for the Cox regression model [59]. Cox-nnet was trained on RNA-seq data from 10 TCGA cancer types and benchmarked against two variations of Cox-PH (Cox-PH and CoxBoost). Cox-nnet showed superior accuracy and was the only model able to uniquely identify important pathways including p53 signalling, endocytosis and adherens junctions [59], demonstrating that the combination of Cox-PH and neural networks has the potential to capture biological information relating to prognosis. The potential of DL was confirmed by Huang et al. [62] who found that 3 different DL versions of Cox Regression (Cox-nnet, DeepSurv [60] and AECOX [62]) outperformed Cox-PH and traditional ML models. These results suggest that DL models can provide better accuracy than traditional models in predicting prognosis by learning from complex molecular interactions using their flexible architecture.

The incorporation of biological pathways in DL has enabled the elucidation of key survival drivers among thousands of features. PASNET [63] and its Cox-regression version Cox-PASNet [64] are among the most advanced DL models in this area. Both models incorporate a pathway layer between the input and the hidden layers of the neural network, where each node of the pathway layer represents a pathway (based on pathway databases such as Reactome [117] and KEGG [118]), and the connections between the two layers represent the gene-pathway relationships. These trained pathway nodes have different weights. By analysing the weight differences across different survival groups and identifying genes connected to each node, PASNet and Cox-PASNet were able to identify clinically actionable genetic traits of glioblastoma multiforme (GBM) and ovarian cancer [63, 64]. In GBM, Cox-PASNet correctly identified PI3K cascade, a pathway highly involved in tumour proliferation, invasion and migration in GBM [119]. Cox-PASNet also correctly detected MAPK9, a gene strongly associated with GBM carcinogenesis and a novel potential therapeutic, as one the most influential genes [120]. The GCNN-explainability model from Chereda et al. is the latest example of incorporating molecular networks in cancer prognosis [19]. The study used gene expression profiles, structured by a PPI from Human Protein Reference Database (HPRD) [121], to predict metastasis of breast cancer samples. The explainability method, LRP [29], was then used to identify and analyse the biological relevance of the most relevant genes for predictions [19]. Pathway analysis of these genes showed that they include oncogenes, molecular-subtype-specific and therapeutically targetable genes, such as EGFR and ESR1 [19].

In addition to prognosis predictions from transcriptomic data, CNN models trained on histopathology images have been used to infer survival in several cancers including brain [122], colorectal [123], renal cell [124], liver cancers [125] and mesothelioma [65]. Among them, MesoNet [65] stands out for incorporating a feature contribution explainability algorithm called CHOWDER [126] on H&E tissue sections of mesothelioma to identify that the features contributing the most to survival predictions were primarily stromal cells associated with inflammation, cellular diversity and vacuolisation [65]. The CHOWDER algorithm enabled MesoNet to utilise large H&E images as well as segment and detect important regions for survival predictions without any local annotations by pathologists [65]. These findings suggest that ‘white-box’ DL models like MesoNet could be useful companion diagnostic tools in clinical setting by assisting clinicians in identifying known and novel histological features associated with a survival outcome.

Multi-modal DL analysis integrating histopathology images and, if available, omics data has the potential to better stratify patients into prognostic groups, as well as suggest more personalised and targeted treatments. Most multi-modal prognostic studies have focussed on three aspects: individual feature extraction from a single modality, multi-modal data integration and cross-modal analysis of prognostic features. The model PAGE-Net performed these tasks by using a CNN to create representations of WSIs and Cox-PASNet [64] to extract genetic pathway information from gene expression [66]. This architecture allowed PAGE-NET to not only integrate histopathological and transcriptomic data, but also identify patterns across both modalities that cause different survival rates [66]. More interestingly, the combination of multi-modal and explainability methods is particularly promising. PathME [67] is a pioneer of this approach by bringing together representation-extraction AEs and an explainability algorithm called SHAP [31,32,33, 127]. The AEs captured important features from gene expression, miRNA expression, DNA methylation and CNAs for survival prediction, while SHAP scores each feature from each omic based on how relevant it is to the prediction [67]. Together, the two algorithms detected clinically relevant cross-omics features that affect survival across GBM, colorectal, breast and lung cancer [67]. The PathME methodology is cancer-agnostic, which makes it a great candidate for clinical implementations to explore actionable biomarkers in large-scale multi-omics data. Additionally, other studies [128,129,130] have employed Principal Component Analysis (PCA) [131] to compress gene expression, mutational signatures and methylation status into eigengene vectors [132], which were then combined with CNN-extracted histopathology features for survival predictions. While these methods could integrate histopathology data with multi-omics, they are not as explainable as PAGE-Net [66] or PathME [67] and thus less clinically suitable, as the conversion of genes into eigengenes makes exploration of cross-modality interactions challenging.

Precision oncology

The promise of precision medicine is to use high-resolution omics data to enable optimised management and treatment of patients to improve survival. An important part of precision oncology involves understanding cancer genomics and the tumour microenvironment (TME). DL offers the potential to infer important genomic features from readily available histopathology data, as well as disentangle the complex heterogeneity of TME to enable precision oncology.

Genomic traits such as tumour mutation burden (TMB) and microsatellite instability (MSI) have been shown to be important biomarkers of immunotherapy response across cancer types [133,134,135,136]. Assessment of these traits requires sequencing (comprehensive panel, exome or whole genome), which is still expensive and is not readily available in the clinic.

Routinely used histopathological images are a potential window to genomic features and may in future prove useful for predictions of specific clinically meaningful molecular features without the need for tumour sequencing. Several CNN methods have been developed to infer TMB, MSI and other clinically relevant genomic features from H&E sections [68,69,70, 137]. A model called Image2TMB used ensemble learning to predict TMB in lung cancer using H&E images. Image2TMB was able to achieve the same average accuracy as large panel sequencing with significantly less variance. It also attempted to estimate TMB for each region of an image [69], which could enable studies of histological features associated with molecular heterogeneity.

Another DL model called HE2RNA used weakly supervised learning to infer gene expression from histopathology images, which were then used to infer MSI status in colorectal cancer [68]. When compared with another DL method to predict MSI directly from H&E slides [137], HE2RNA showed superior performance on both formalin-fixed paraffin-embedded (FFPE) and frozen sections, indicating a high level of robustness across tissue processing approaches.

Kather et al. [70] has also showed that CNN models trained and evaluated on TCGA H&E slides can accurately predict a range of actionable genetic alterations across multiple cancer types, including mutational status of key genes, molecular subtypes and gene expression of standard biomarkers such as hormone receptor status. While these molecular inference methods demonstrate an intriguing application of DL in histopathology, their current clinical utility is likely to be limited as features such as MSI and hormone receptor status are already part of the routine diagnostic workflows (immunohistochemistry staining for mismatch-repair proteins in colorectal and endometrial cancer or ER, PR in breast cancer). However, these studies serve as proof-of-concept, and the developed models could in future be adapted to predict clinically important molecular features that are not routinely assessed. Thus, future investigations into histopathology-based genomic inference are warranted, with the understanding that the accuracy of such DL models needs to be exceptional for them to replace current assays.

The tumour microenvironment

The TME plays a key role in cancer progression, metastasis and response to therapy [138]. However, there remain many unknowns in the complex molecular and cellular interactions within the TME. The rise of DL in cancer research, coupled with large publicly available catalogues of genomic, transcriptomic and histopathology data, have created a strong technical framework for the use of neural networks in profiling the heterogeneity of TME.

Infiltrating immune cell populations, such as CD4+ and CD8+ T cells, are potential important biomarkers of immunotherapy response [139, 140]. Traditional ML methods can accurately estimate TME cell compositions using transcriptomic [141, 142] or methylation data [143]. However, most of these methods rely on the generation of signature Gene Expression Profiles (GEPs) or the selection of a limited number of CpG sites, biassed to previously known biomarkers. This can lead to models susceptible to noise and bias and unable to discover novel genetic biomarkers. DL methods can be trained on the whole dataset (i.e. the whole transcriptome) to identify the optimal features without relying on GEPs. Recently developed DL TME methods include Scaden [71], a transcriptomic-based neural network model, and MethylNet, a methylation-based model [72]. MethylNet also incorporated the SHAP explainability method [31,32,33, 127] to quantify how relevant each CpG site is for deconvolution. While these methods currently focus on showing DL models are more robust against noise, bias and batch effects compared to traditional ML models, future follow-up studies are likely to reveal additional cellular heterogeneity traits of the TME and possibly inform treatment decisions. For example, a CNN trained on H&E slides of 13 cancer types [20] showed a strong correlation between spatial tumour infiltrating lymphocytes (TIL) patterns and cellular compositions derived by CIBERSORT (a Support Vector Regression model) [141]. These models have significant clinical implications, as rapid and automated identification of the composition, amount and spatial organisation of TIL can support the clinical decision making for prognosis predictions (for example, for breast cancer) and infer treatment options, specifically immunotherapy. We expect future DL methods will further explore the integrations of histopathology and omics in profiling tumour immune landscape [144]. We also expect future DL methods to incorporate single-cell transcriptomics (scRNA-Seq) data to improve TME predictions and even infer transcriptomic profiles of individual cell types. Several DL methods have already been developed to address batch correction, normalisation, imputation, dimensionality reduction and cell annotations for scRNA-Seq cancer data [145,146,147]. However, these studies are still experimental and require further effort and validation to be clinically applicable [148].

The new frontiers

An exciting new approach for studying the TME is spatial transcriptomics which allows quantification of gene expression in individual cells or regions while maintaining their positional representation, thus capturing spatial heterogeneity of gene expression at high resolution [149, 150]. Given the complexity of this data, DL approaches are well suited for its analysis and interpretation. For example, by integrating histopathology images and spatial transcriptomics, DL can predict localised gene expression from tissue slides, as demonstrated by ST-Net, a neural network capable of predicting expressions of clinically relevant genes in breast cancer using tissue spots from H&E slides [73]. As the cost of spatial transcriptomics decreases in the future, it is expected more translational applications of DL will arise, for example utilising spatial transcriptomics information for improved prognosis predictions, subtype classification and refining our understanding of tumour heterogeneity [151].

In addition, gut microbiome, i.e. metagenome, has been an emerging field and shown to play an important role in cancer treatment efficacy and outcomes [152, 153]. As more multi-omics datasets (genomics, transcriptomics, proteomics, microbiotics) are being generated, annotated and made available, we speculate that integrative analysis between these data types will help mapping omics profiles of each individual patient to the metagenome, which will unlock effective new exciting options.

Lastly, pharmacogenomics, to predict drug responses and the mechanisms of action using genomic characteristics, is an important and exciting area in precision oncology where DL methods have significant potential [154]. The increasing availability of public omics data has facilitated recent growth of DL applications in cancer pharmacogenomics [155,156,157]. Most common applications include therapy response and resistance (e.g. Dr.VAE [158] or CDRscan [74]), drug combination synergy (e.g. DeepSynergy [75] and Jiang et al. [76]), drug repositioning (e.g. deepDR [77]) and drug-target interactions (e.g. DeepDTI [78]). As pharmacogenomics is a highly translational field, we expect many such DL models will be applied in clinical setting in the future.

Challenges and limitations: the road to clinical implementation

This review provides an overview of exciting potential DL applications in oncology. However, there are several challenges to the widespread implementation of DL in clinical practice. Here, we discuss challenges and limitations of DL in clinical oncology and provide our perspective for future improvements.

Data variability

Data variability is a major challenge for applying DL to oncology. For example, in immunohistochemistry each lab may have different intensity of staining or have different qualities of staining. It is currently unclear how DL systems would deal with this inter- and intra-laboratory variability. For transcriptomic data, one of the principal difficulties is establishing the exact processing applied to generate a sequence library and processed dataset. Even properties as basic as ‘the list of human genes’ are not settled and multiple authorities publish and regularly update lists of genes, observed spliceforms, so any analysis should specify both the source and version of the gene model used. Additionally, there are a large range of data transformations (log, linear, etc.) and data normalisations (FPKM, TMM, TPM), with implementations in multiple programming languages resulting in a combinatorially large number of possible processing paths that should theoretically return the same results but without any formal process to ensure that that assumption is true.

Paucity of public phenotypically characterised datasets

One challenge of implementing DL into clinical practice is the need for large phenotypically characterised datasets that enable development and training of DL models with good generalisation performance. High-quality cancer datasets that have undergone omics profiling are difficult to acquire in the clinical setting due to cost, sample availability and quality. In addition, clinical tumour samples can be small and are typically stored as FFPE blocks, resulting in degraded RNA and crosslinked DNA not suitable for comprehensive molecular profiling. To overcome this, explanability methods, such as SHAP, could be applied on the current DL models, that are developed in research setting, to identify the most salient features and design targeted profiling workflows suitable for clinical samples. This way, the DL models could still capture the complexity and possible non-linear gene relationships, but be retrained to make clinical predictions using only the select salient features. Multi-modal based DL models coupled with explainability could also be explored due to their potential of using features in one modality to complement missing data in another. Transfer learning can also overcome challenges of requiring large datasets by pre-training DL models from other domains. In practice, however, large data sets with thousands of samples per class are still needed for accurate predictions in the clinic, as patient outcomes are complex and there is clinical heterogeneity between patients including responses, treatment courses, comorbidities and other lifestyle factors that may impact prognosis and survival. As more data is being routinely generated and clinical information centrally collected in digital health databases, we expect to see more DL models developed for treatment response predictions as well as the general prognosis predictions. More interestingly, DL’s ability to continue learning from and become more accurate with new training samples, i.e. active learning, can significantly help pathologists reduce time spent on training histopathology data annotation. For example, a histopathology-based DL model by Saltz et al. only required pathologists to annotate a few training images at a time, and stopping the manual annotation process when the model’s performance is satisfactory [20].

Lastly, clinical data about a sample or piece of data usually do not capture all the complexities of the samples and phenotype and can be prone to incompleteness, inconsistencies and errors. A potential strategy to address this issue is to design DL models less reliant on or independent from clinical annotations, for example the MesoNet model was able to detect prognostically meaningful regions from H&E images without any pathologist-derived annotations [65].

AI explainability and uncertainty

Finally, for DL to be implemented and accepted in the clinic, the models need to be designed to complement and enhance clinical workflows. For human experts to effectively utilise these models, they need to be not only explainable, but also capable of estimating the uncertainty in their predictions.

Over the last 5 years, research into explainable AI has accelerated. For DL to obtain regulatory approval and be used as a diagnostic tool, comprehensive studies of the biological relevance of explainability are imperative. In medical imaging, this entails validating DL-identified clinically relevant regions against pathology review, and in some cases, cross-validation with genomic features [46]. In genomics, this entails validating DL-identified relevant genetic features against those identified by conventional bioinformatics methods, for example confirming that the most discriminatory genes in predicting tissue types, as identified by SHAP, were also identified by pairwise differential expression analysis using edgeR [159] or showing that patient-specific molecular interaction networks produced in predicting metastasis status of breast cancer were not only linked to benign/malignant phenotype, but also indicative of tumour progression and therapeutic targets [19].

Furthermore, DL model’s ability to produce the ‘I don’t know’ output, when uncertain about predictions, is critical. Most DL applications covered in this review are point-estimate methods, i.e. the predictions are simply the best guess with the highest probability. In critical circumstances, overconfident predictions, e.g. predicting cancer primary site with only 40% certainty, can result in inaccurate diagnosis or cancer management decisions. Furthermore, when uncertainty estimates are too high, companion diagnostic tools should be able to abstain from making predictions and ask for medical experts’ opinion [160]. Probabilistic DL methods capable of quantifying prediction uncertainty, such as Bayesian DL [161], are great candidates to address these issues and have recently started to be applied in cancer diagnosis tasks [162,163,164]. We expect probabilistic models to become mainstream in oncology in the near future.

Conclusions

In summary, DL has the potential to dramatically transformed cancer care and bring it a step closer to the promise of precision oncology. In an era where genomics is being implemented into health delivery and health data is becoming increasingly digitised, it is anticipated that artificial intelligence and DL will be used in the development, validation and implementation of decision support tools to facilitate precision oncology. In this review, we showcased a number of promising applications of DL in various areas of oncology, including digital histopathology, molecular subtyping, cancer diagnosis, prognostication, histological inference of genomic characteristics, tumour microenvironment and emerging frontiers such as spatial transcriptomics and pharmacogenomics. As the research matures, the future of applied DL in oncology will likely focus on integration of medical images and omics data using multimodal learning that can identify biologically meaningful biomarkers. Excitingly, the combination of multimodal learning and explainability can reveal novel insights. Important prerequisites of widespread adoption of DL in clinical setting are phenotypically rich data for training models and clinical validation of the biological relevance of DL-generated insights. We expect as new technologies such as single-cell sequencing, spatial transcriptomics and multiplexed imaging become more accessible, more efforts will be dedicated to improving both the quantity and quality of labelling/annotation of medical data. Finally, for DL to be accepted in routine patient care, clinical validation of explainable DL methods will play a vital role.