1 Introduction

As of January 27, the aggregate number of patients infected with SARS-CoV-2 in the world has exceeded 100 million, and the total number of deaths has reached 2 million.Footnote 1 There are still 30 million confirmed patients but without receiving specific drugs for treatment. There are two main strategies to find effective treatment drugs for SARS-CoV-2 quickly and effectively. One is the de novo drug design, and the other is the drug repurposing. In which, de novo drug design is the process of signaling new drugs or drug combinations with starting from studying the structure of the receptor protein. Drug repurposing is the process of signaling new effects of drugs or drug combinations based on existing drugs. The details of employing machine learning to assisting the drug discovery and drug repurposing are shown in the following sections, as well as the details describing the progress of discovering therapeutic drug combinations for fighting against COVID-19.

2 De Novo Drug Design

The current development of de novo drug discovery faces grand challenges, such as long research and development cycles, high cost, and limited experimental success rate. In recent years, the timeliness of drug research and development in the pharmaceutical industry is delaying yearly. However, employing machine learning methods to mining the properties and activities of compounds can be saving the time and costs in an efficient way. The characteristics of drug compounds can be represented by the molecular fingerprints of compounds. The molecular fingerprints include static fingerprints and dynamically generated fingerprints, which can be automatically inferred in the training process by employing machine learning approaches. Specifically, representation learning employs neural network-based approaches to training the embedding of compound features directly.

Lusci et al. [1] employed UG-RNN (update gate recurrent neural network) model to train the vector of the same growth degree with molecular structure. UG-RNN model conveyed the vector of molecular structure to the fully connected neural network. Duvenaud et al. [2] trained molecular structure into molecular fingerprints via the GCN (graph convolution network) model. An algorithm that is mostly used for generating new compounds is variational autoencoders (VAE) [3], through which the compounds’ feature space can be mapped to a latent space using the encoder, and decoded into a representation of the characteristics of the original compounds through a decoder.

The strategy of combining VAE and GAN (generative adversarial network) gained rising attention in studying generating the new compounds. Comparing to VAE, RNN (Recurrent neural network) is another achievable way to design compounds. The model of RNN benefits in better to learn the probability distribution of feature space. Jin et al. proposed the model of GCPN [4] to construct the molecular graph structure by employing graph convolutional neural network along with the reinforcement learning framework. In addition, GCPN integrated GAN to minimize the bias between the generated distribution and the original distribution. GraphAF [5] is a compound generation model that combines the auto-regressive model with flow-based generation. The experimental results showed that GraphAF can generate 68% of chemically valid molecules without any priori chemical knowledge. Moflow model [6] is a flow-based model, which can generate molecule graphs in a certain validity guarantee by linking two probability distributions of the adjacency matrix. Jin et al. [7] proposed a generative model and borrowed the molecular pair approach to generate set of molecular rationales (molecular substructures). And a neural network-based approach was employed to combine molecular rationales to design the molecule that simultaneously conform to multiple objectives. Bung et al. [8] proposed a generative model as the pre-training model to learn the distribution of physical and chemical characteristics of the compound. And the model can be used to signal and identify SARS-CoV-2 3CLpro chemical frameworks. Zhavoronkov et al. [9] employed deep learning-based methods consisting of autoencoders (AE), GAN, and reinforcement learning to identifying small molecules that can inhibit SARS-CoV-2 3CL protein.

Computational method-assisted drug design is mainly by employing approaches of molecular docking and network pharmacology. To date, there have a considerable volume of works focused on developing methods of network pharmacology and molecular docking based on traditional Chinese medicine (TCM) to fight against COVID-19 [10, 11]. The results indicate that TCM compounds can play an indirect therapeutic effect by directly acting on the new coronavirus or by anti-inflammatory and immune regulation. Ren et al. [12] employed data-driven approaches to obtain TCM prescriptions as the potential treatment for pestilence from analyzing classical prescriptions. In this study, they targeted Mpro (3CL hydrolase) and ACE2 (angiotensin-converting enzyme 2) as the vital docking ingredients. And they also analyzed that Gancao (Licorice), HuangQin (Scutellaria), Dahuang (rhubarb), and Chaihu (Bupleurum) contain more potential target treating compounds. Yan et al. [13] employed the network pharmacology and molecular docking technology to explored the potential targets, signal pathways, and biological functions of Lianhua treating for COVID-19. And this research is based on combined six mostly used medicine databases.

3 Drug Repurposing

The effect of employing the strategy of de novo on drug discovering is significant enrichment as mentioned above, though the cost and timeliness of drug design is usually unaffordable. By contrast, drugs with known mechanism of action and pharmacokinetics can be considered as the priori knowledge of specific domain. When discovered the potential effects of the known drug, which is more effective and safe to be used and without having to start from scratch. In such case, the time and economic cost are much smaller of developing “old drugs for a new use”. Drug repurposing is a plausible strategy and highly promising technique that has attracted growing attention from governments and pharmaceutical companies for its outstanding performance in saving time and cost. AI technology assisted drug relocation further on reducing time and economic costs. The workflow of the drug repurposing is described in Fig. 1.

Fig. 1
figure 1

The workflow of computational drug repurposing studies

Drug repurposing can benefit from new computational methods in detecting relationships among various types of biological entities such as genes, portions, diseases, and drugs. This study has the advantage in identifying alternative therapeutic indications for existing drugs.

Since the outbreak of the COVID-19 epidemic, drug repurposing has become the most used methods to signaling therapeutic drugs or potential drug combinations due to the long cycle of de novo drug design. Several drugs, such as chloroquine, phosphoate, and radecivir, have been used to evaluate the therapeutic effect for COVID-19. Drug repurposing can win in high efficiency and low cost comparing with the traditional drug discovery strategy. Besides, it has more advantages in treating pandemic, like COVID-19.

Baritinib is a drug to treat rheumatoid arthritis. After outbreak of COVID-19, Benevolent AI Company discovered that AAK1 is a regulatory factor in the process of SARS-CoC-2 infection by generating a drug-knowledge-graph. However, Baritinib can inhibit the activity of AAK1 without obvious side effects. Guney et al. [14] quantified the therapeutic effects of drugs and predicted new drug-disease associations by utilizing biological information to measuring the quantified interplay between drug targets and diseases. Zeng et al. [15] proposed an integrative deep network to generate a large network containing multiple relationships, collecting a large volume of expectations from PubMed and DrugBank database to embedding the entity as the vector representation. The results showed that 41 repurposable drugs were predicted (including dexamethasone, indomethacin, niclosamide, and toremifene) to be considered as potential therapeutic drugs for treating SARS-CoV-2. The big picture of utilizing machine learning methods to developing drug repurposing is shown in Fig. 2.

Fig. 2
figure 2

The schema of employing machine learning approaches in studying drug repurposing

By constructing a large computational biology network consisting of drugs, genes, and diseases to measure the interaction between target and biomolecule, Chen et al. [16] treated drug combination as another form of drug repurposing. Beck [17] developed the model MT–DTI (molecule transformer–drug target interaction) based on the pre-trained drug–target interactions. MT–DTI can be used to signal the affinity of the compounds and target protein to identify commercially available drugs that could act on viral proteins of SARS-CoV-2. The results showed that atazanavir can be used to treat and prevent the human immunodeficiency virus (HIV). Kim et al. [18] identified potential associations between drugs and diseases by employing several machine learning methods (logistic regression, random forest, and SVM) by taking into account self-defined similarity metrics (drug–drug similarities and disease–disease similarities). Hooshmand et al. [19] mined possible therapeutic drugs to treat SARS-CoV-2 by analyzing the chemical structures of small-molecule drugs.

DeepPurpose [20] is a deep learning toolkit for drug–target interaction (DTI) integrating encoding-based approaches of drug molecules and protein amino acid sequences. Belyaeva et al. [21] proposed a causal framework using multiple data patterns to generate a causal network consisting of nodes represented by COVID-19 and aging. The method integrated transcriptomics, proteomics, and other omics data to identify a target located in the pathway of COVID-19 infection.

3.1 Drug Repurposing with Real-World Data

Researches based on the real world can take advantage of massive data to reflect the actual process of diagnosis, treatment, and the health status of patients in the real situation. Since traditional statistical methods limited in handling the large volume of data, deep learning can be employed to signaling and mining treasures from massive real-world data owing to its outstanding power. Liu et al. [22] generated a framework for drug repurposing to estimating the effect of one single drug by taking into account the feature space of existing known drugs. After given a cohort of patients, potential drugs were extracted, and each drug can be categorized to the intervention group and a control group. Confounders and disease progression in the two cohorts were estimated, and LSTM combined with the attention mechanism was used to correct the bias, and finally, a drug with therapeutic effect was obtained.

4 Drug Repurposing of Traditional Chinese Medicine

Comparing the aforementioned studying in mining small-molecule repurposing drugs, there are a few of studies on using machine learning to find TCM treatment for COVID-19 [23]. Wang et al. [24] used an ontology-based side-effect prediction framework (OSPF) integrating the neural network-based methods to evaluate the TCM prescriptions officially recommended by China Health Ministry as the treatment of COVID-19. The results showed that QFPD-T, HSBD-F, PMSP, GCT-CJ, SF-ZSY, and HSYF-F can be regarded as the potential therapeutic treatments. Liao et al. [25] used deep learning approaches to mine the relationship between patients’ facial and prescriptions and propose to construct convolutional neural networks that generate TCM prescriptions according to the patient’s face image. Guo et al. [26] conducted hierarchical clustering of TCM using unsupervised methods to classify the compounds into several modules with similar therapeutic functions. And the method is to investigate the polypharmacology effect of TCM, benefit to clarifying the mechanism of action of TCM, and providing new possibilities for disease treatment. Weng et al. [27] proposed a framework for automated medical knowledge graph construction based on semantic analysis, which automatically extracted semantic reasoning through the graph. The computed TCM prescription can be introduced to diagnosis based on clinical symptoms.

Table 1 describes an overview of computational drug repurposing studies, consisting of the adopted strategies, computational approaches, and main techniques.

Table 1 Summary of computational methods in studying drug repurposing

5 Summary of Machine Learning Applications in Drug Repurposing

Machine learning methods play a vital role in studying drug repurposing; in which traditional machine learning mainly include, such as Logistic Regression, Random Forest, Support Vector machine, KNN and RotatE, etc. [15, 18, 29], which are mainly used in the early stage. During the past decades, deep learning methods own the more significant power in signaling and discovering repurposable drugs, such as RNN [1], GCN [2], CNN [25], GNN [7, 28], LSTM [22, 30], VAE [4], and Transformer methods [17]. In addition, deep learning approaches can extract more informative features with respect to molecules and mapping these molecule structures to potential spaces. Since, flow-based models can switch the distribution of features, which have been got more attention [5, 6].

6 Conclusion

Machine learning plays an important role in studying drug repurposing, especially since the occurrence of COVID-19, scientists around the world used machine learning-based approaches to signal effective drugs. At present, there are still some problems, such as the black box problem of deep learning to signaling repurposable drugs causing hard to explain the rationality of the results. It is necessary to develop interpretable deep learning and causal learning along with the traditional drug discovery experiments. Furthermore, it is the fusion problem of the general field of machine learning in drug development, how to better characterize molecules and their conformational changes, to better extract the characteristics of molecules. By developing machine learning methods, we can accelerate drug discovery and improve human health in a way that has never been possible before.