Introduction

In the past decade, advances in genetic disease and precision oncology have resulted in an increased demand for predictive assays that enable the selection and stratification of patients for treatment [1]. The enormous divergence of signaling and transcriptional networks mediating the cross talk between healthy, diseased, stromal, and immune cells complicates the development of functionally relevant biomarkers based on a single gene or protein.

Unexpectedly, the conclusion of the human genome did not translate into a burst of new drugs. The pharmaceutical industry rather announced a declining output in terms of the number of new drugs approved despite increasing commercial efforts of drug research and development [2, 3]. In contrast, machine learning (ML) as well as network and systems biology are innovating with impactful discoveries and are now starting to be seamlessly integrated into the biomedical discovery pipeline [4].

A major ambition of medical artificial intelligence (AI) lies in translating patient data to successful therapies. Machine learning models face particular challenges in biomedicine such as the size of the library to train the model, data input conversion problems, transfer, overfitting, ignorance of confounders, and many more [5,6,7]. They may require new infrastructures, while making possibly just recently established workflows obsolete. On the other hand, deep neural network (DNN) approaches may offer distinct benefits. Such opportunities for deep learning (DL) in biomedicine include scalability, handling of extreme data heterogeneity, and the ability to transfer learning [8], or if wanted even the possibility not to depend on data supervision at all [9].

The goal of this work is to show progress in ML in digital health and exemplify needs, trends, and requirements for AI and ML for precision medicine. Digital image recognition, single-cell analysis, and virtual screens demonstrate breadths and power of ML in biomedicine (Fig. 1).

Fig. 1
figure 1

Machine learning applications using big data in precision health

Enabling Synergies Between Artificial Intelligence and Digital Pathology

Advances in pattern recognition and image processing have enabled synergies between AI technology and modern pathology [10, 11•]. In particular, DL architectures such as deep convolutional neural networks have achieved unprecedented performance in image classification and gaming tasks [13,14,15,16]. The expression “digital pathology” was coined when referring to advanced slide-scanning techniques in combination with AI-based approaches for the detection, segmentation, scoring, and diagnosis of digitized whole-slide images [17].

In pathology, quantifying and standardizing clinical outcome remains a challenge. Accurate grading, staging, classifying, and quantifying response to treatment by computer-assisted technologies are important recent initiatives [12, 18]. Neural network algorithms perform well in a setting where either large amounts of input data or high-quality training sets are provided. Using a digital archive of more than 100,000 clinical images of skin disease such prerequisites were fulfilled and a deep convolutional neural network was successfully trained to classify skin lesions comparable with current quality standards in pathology [19]. Given such an intuitive image-based analysis, a mechanistic understanding of the convoluted layers is not necessary and the approach could be transferred to patient-based mobile phone platforms to enhance early detection and cancer prevention [20,21,22]. In the future, specific DNN modules will replace selected steps of the traditional pathology workflow. By looking at different computational image-recognition tasks, already today, particularly strong performance of DL is already observed in segmentation tasks nuclei, epithelia or tubules, immune infiltration by lymphocyte classification, cell cycle characterization and mitosis quantification, and grading of tumors. Over time, the transition toward the digital pathology lab will lead to more accurate drug response prediction and prognosis of this underlying disease [23].

Digital Healthcare and Clinical Health Records

ML can learn from almost any data type, even unstructured medical text, such as patient records, medical notes, prescriptions, audio interview transcripts, or pathology and radiology reports. Future day-to-day applications will embrace ML methods to organize a growing volume of scientific literature, facilitating access and extraction of meaningful knowledge content from it [24]. In the clinic, ML can harness the potential of electronic health records to accurately predict medical events [25]. By implementing a ranking function in the content network, one can overcome heterogeneity of clinical or healthcare provider–specific electronic health records, inherent to the current medical practice around the world [26].

Multi-omics Integration

A defined goal of precision medicine is to predict the best treatment strategy for the patient. Drug responses in combination with genomic, epigenomic, transcriptomic, proteomic, metabolomic profiling data provide accurate network prediction to the perturbation. Using multi-omics data, including somatic copy number alterations, somatic exome mutations, methylomes, and transcriptomes of 1000 cell lines, ML can be utilized in a modeling exercise to predict genomic features for process and drug response prediction [27]. Top-performing methods exploit ML, integrate multiple profiling data sets, and enhance scoring by regression models to predict drug sensitivities [28,29,30]. Given convolution and non-linear relationship between transcriptomic, epigenomic, and metabolic functions, future ML applications can be challenged to resolve intricate multi-omics patterns [31]. Precision oncology has been showcased by implementing patient-derived cancer cell lines [32]. Such bench-to-bedside models can provide real-time drug response predictions and often create massive knowledge banks accessible to ML workup. In the future, the ability to screen patient-derived avatars will inform about resistance mechanisms and facilitate evidence-based medicine, even of complex traits [33].

Machine Detection of Resistance Signatures

Somatic alterations in cancer frequently escape the recognition by the endogenous immune system, creating resistance [34]. Even though excellent efficacy and some complete remissions have been seen in a limited number of melanoma patients, some of whom may be regarded as cured of cancer, many malignancies show resistance or lack of response of long duration with these agents. Predicting tumor responses to immune checkpoint blockade remains a major challenge and an active field of research fueled by systems biology and AI approaches [18].

Deciphering Epigenomic Networks

Epigenomics of oncogenic networks has an ability to accurately predict regulome function, epigenomic-transcriptomic cooperation, and disease progression [35]. Then again, epigenetic modifications on chromatin, DNA, and RNA are complex and often context-specific, making their mechanistic understanding challenging. Elastic net is a shrinkage method hybrid of ridge and lasso regularization (preventing over-fitting) able to handle ultra-high dimensional regression and suitable for epigenomic data [36]. Using such methods, metabolic and epigenomic data have been used to establish biomarkers and to predict clocks in aging [37, 38]. Enhanced by ML methods, epigenetic marks including promoter methylation are utilized as a continuous readout of transcriptional accessibility and molecular processes that guide development, tissue maintenance, disease states, and eventually aging. Given progress in multiplex barcoding, new data challenges in the field of epigenomics are quickly at hand. Frontiers include processing and machine integration of sequencing and chromatin accessibility information derived from the transcriptome and epigenome of the same cell [39•].

Visualizing and Exploring Cellular Heterogeneity at Single-Cell Resolution

In single-cell biology, ML and DL are frequently utilized to investigate the diversity and complexity of cell populations. In cancer, single-cell methods provide a view of heterogeneity that recognizes the impact of diverse cell states and types surrounding the tumor microenvironment. Further, cancer is a dynamic and highly heterogeneous disease composed of a mix of clones characterized by distinct genotypes pushing bulk sequencing methods to their limits. Profiling of copy numbers, transcripts, or chromatin accessibility together with cluster analysis can uncover differences, even in seemingly homogenous tissues and resolve subclonal complexity. Dimensionality reduction and clustering are typical ML techniques employed to visualize single-cell transcriptomics (scRNA-Seq) data. In particular, the clustering algorithm Louvain community detection is robust for high-dimensional data like scRNA-Seq matrices. The human cell atlas [40], whose primary goal is to establish, discover, and catalogue different cell populations ab initio, creates unsupervised maps, serving as a resource for subsequent disease-directed studies. In addition, it is possible to predict cycle, disease progression, and perturbation responses using deep network approaches [41•, 42•, 43,41,45].

Spatial transcriptomics (spRNA-Seq) combines the benefits of traditional histopathology with single cell gene expression profiling. The ability to connect the spatial organization of molecules in cells and tissues with their gene expression state enables mapping of specific disease pathology [46, 47]. ML has the ability to decode molecular proximities from sequencing information and construct images of gene transcripts at sub-cellular resolution [48].

Artificial Intelligence in Chemical Informatics and Drug Discovery

Chemical informatics has an ability to predict novel drug targets, quantify ADME and toxicology, match drugs with targets and biological activities, model physicochemical properties, accelerate data mining, predict biological targets for compounds on a large scale, design new chemicals and syntheses [49], and analyze large virtual chemical spaces [50]. Such a new paradigm enables medicinal chemists to process billions of molecules in virtual screens [51, 52]. By tightly integrating database knowledge, AI, and lab automation, it is possible to accelerate the drug discovery pipeline and select structures that can be prepared on automated systems and made available for biological testing, allowing for timely hypothesis testing and validation.

Computational analyses of drug-perturbation assays have the ability to predict the activities of the compounds on seemingly unrelated biological processes [53]. ML can provide insight into drug mechanism, create correlative bridges between disjoint nodes, establish biomarkers, repurpose existing drugs, optimize drug candidates, design clinical trials, and even recruit for clinical trials. Image-based drug fingerprints were demonstrated to enable biological activity prediction for drug discovery, even when a chemical library in combination with high-content image screening was repurposed. Potential applications of predictions delivered by implemented computational models were far beyond the intended target of the original compound screen [54•].

Conclusion

Biomedical science of genomic signatures, image processing, and drug discovery rapidly adopted big data opportunities and new learning-based technologies. From traditional approaches relying on leads from nature to brute-force screening using robotics, following the introduction of several other disruptive technologies, artificial intelligence is yet another pivotal moment toward a rationalized, data-driven process in healthcare and pharmaceutical industry. Machine intelligence and deep networks are changing our approach to medical bioinformatics at an unprecedented speed. As a result, the decision-making processes in precision medicine will shift from an algorithm-centric to a data-centric insight.