TCM2Vec: a detached feature extraction deep learning approach of traditional Chinese medicine for formula efficacy prediction

Gao, Wanqing; Cheng, Ning; Xin, Guojiang; Khantong, Sommai; Ding, Changsong

doi:10.1007/s11042-023-14701-w

TCM2Vec: a detached feature extraction deep learning approach of traditional Chinese medicine for formula efficacy prediction

Open access
Published: 09 March 2023

Volume 82, pages 26987–27004, (2023)
Cite this article

Download PDF

You have full access to this open access article

Multimedia Tools and Applications Aims and scope Submit manuscript

TCM2Vec: a detached feature extraction deep learning approach of traditional Chinese medicine for formula efficacy prediction

Download PDF

Wanqing Gao¹,
Ning Cheng¹,
Guojiang Xin¹,
Sommai Khantong² &
…
Changsong Ding ORCID: orcid.org/0000-0002-9904-3742^1,3

1435 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

In current era, the intelligent development of traditional Chinese medicine (TCM) has attracted more and more attention. As the main carrier of clinical medication, formulas use synergies of active substances to enhance efficacy and reduce side effects. Related studies show that there is a nonlinear relationship between the efficacy of formulas and herbs. Deep learning is an effective technique for fitting nonlinear relationships. However, it is not good for using deep learning model directly due to ignoring the characteristics of formulas. In this paper, we propose a detached feature extraction approach (TCM2Vec) based on deep learning for better feature extraction and efficacy prediction. We build two detached encoders, one of it uses cross-feature-based unsupervised pre-training model (FMh2v) to extract the relationship features of herbal medicines for initializing, while the other one simulates multi-dimensional characteristics of medicines by normal distribution. Then we integrate relationships and medicinal characteristics for deep feature extraction. We processed 31,114 unlabeled formulas for pre-training and two classification tasks in-domain for predicting and fine-tuning. One of tasks is multi-classed with 1036 formulas, other one is multi-labelled with 1,723 formulas. For labelled formulas, different feature extraction models based on detached encoder are trained to predict efficacy. Compared with the no pre-training, CBOW and BERT baseline models, FMh2v leads to performance gains. Moreover, the detached encoder offers large positive effects in different models which for efficacy prediction, where ACC increased by 5.80% on average and F1 increased by 12.06% on average. Overall, the proposed feature extraction is an effective method for obtaining characteristic representation of TCM formulas, and provides reference for the adaptability of artificial intelligence technology in the domain of TCM.

TCM Function Multi-classification Approach Using Deep Learning Models

A Deep Learning-Based Herb Pair Discovering Approach

Effectiveness Analysis of Traditional Chinese Medicine for Anti-Alzheimer’s Disease Based on Machine Learning

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

As the traditional culture of Chinese nation, TCM plays a significant role in health care [39]. Its unique and effective treatment method has received more and more attention. Formulas consisting of multi-herbs are the major form of TCM clinical treatment, implying the ideas of TCM syndrome differentiation and formula compatibility. It is also the key to the intelligent and modern development of TCM. The efficacy of formula refers to its effect on preventing and treating diseases in the condition of herbal interaction. There are many relationships between herbs (TCM-HRs, the representation of herbal relations), such as mutual reinforcement, mutual restraint, mutual suppression, mutual assistance. And the property of herb (TCM-HPs, the representation of herbal medicines) is the internal factor that affects the efficacy of formula, including toxicity, four characters (cold, hot, warm, cool) and neutral, five tastes (sour, bitter, sweet, pungent and salty), twelve channel tropism (lung, pericardium, heart, large intestine, triple energizers, small intestine, stomach, gallbladder, bladder, spleen, liver, kidney), totally 23 descriptions. However, the specific relationship between TCM-HPs and efficacy of TCM is complex and unclear in practice, which lead challenges to the exploration of inherent rules. Thus, it is necessary to model the matching patterns behind the herbs if TCM tends to intelligent development as we hope [47].

The strategy of the relationship between formula and efficacy researching has changed in the past few decades. It cannot limit to a single herb, but also needs to be considered as a whole. The multi-dimensional properties of TCM determines the efficacy of the prescription. Subsequently, some scholars proposed the medicinal combination model of herb [14, 35] revealing the relationship between the combination of herb and its efficacy from the overall perspective of TCM. Current research mainly uses traditional machine learning algorithms [24]. But the relationship of formulas and efficacies are extremely complex, traditional machine learning algorithms is not good for mining deep rules. Deep learning algorithms [18] which driven by data have developed significantly and are widely used in medical domain [1, 8]. As the carrier of TCM knowledge and information, determining how to apply deep learning to formulas for mining becomes a novel topic.

The first step of text modeling is the digital representation of features. As the core technologies in natural language processing (NLP) field [3], word embedding means to map structured text words into vector space, achieving the purpose of using structured vector to represent unstructured text. Pre-training language models (PLMs) are the most important method for generating word embedding. Among them, Word2Vec [26] based on neural network has achieved great success. It greatly alleviates the problem of feature sparsity and improves the semantic information expression of embedding vector, laying a good foundation for the application and development of deep learning text mining model [42]. In this research, we propose an improved pre-training method to learn the distributed representation of TCM-HRs and employ mathematical ideas to initialize representation of TCM-HPs, which may significant for TCM further development.

However, there are obvious differences between formulas and public texts that come from natural language. Such as formulas is low-sequence in writing form, that means the herb at the front of the formula can be related to the herb at the end rather than the surrounding herb. But for the text of natural language, the word in sentence may follow grammatical rules. As mentioned by Gururangan [9], there should be corresponding domain adaptation pre-training representation for different tasks. For this purpose, we designed a model based on CBOW and cross-feature for TCM pre-training.

Formula-efficacy predicting belongs to classification task of artificial intelligence. In recent years, deep learning has achieved good performance in classification tasks, such as face recognition [10, 41], audio classification [11, 22], email spam filtering [7], text generation [40] and in TCM field, having tongue image [15] and symptoms classification [37]. However, there are few studies on the mapping relationship between formula and efficacy. The two main reasons are as follow: (1) low-resource settings, the little training data is not enough to support the huge parameter training in deep learning model; (2) complex theory, using the model directly may produce bad results; (3) moreover, it is a pity to know the talents for the cross-study of TCM and computer are scarce.

In this research, the experimental data is crawled from internet. We totally obtain 31,114 unsupervised formula data, one multi-class with 1036 formulas and one multi-label 1723 formulas. In order to predict the efficacy of formula, we propose TCM2Vec, including a feature extraction pre-training model FMh2v integrated with TCM theory for better characteristic representation of the relationship between herbs, and a detached embedding network in the deep learning network for predicting. Meanwhile, the herbal expression obtained by pre-training can be fine-tuned while predicting the efficacy of formula. This research provides a new research idea for TCM feature extraction, and contributes to the study of formula-assisted decision-making and other related work. Our contributions mainly lie in the following aspects:

The ideas of NLP research are applied to study TCM. Integrating the theory of TCM, we propose a pre-training model based on CBOW and cross features. This model training on unsupervised data can extract TCM relationship features and obtain a reliable initial representation of TCM.
We employ a mathematical method to simulate the distribution of herbal medicines in the real world and fine-tune it by sparse activation during the training of prediction model.
We construct a deep learning model with a detached encoder to predict formula-efficacy relations, while fine-tuning the pre-trained herbal vector on supervised data.

The remainder of this research is structured as follows. Section 2 focuses on the related works. Section 3 presents the explanation of the overall proposed framework. In Section 4, we evaluate the experimental results of our research and provides the comparison with existing techniques, followed by a conclusion and suggestions for future research directions in Section 5.

2 Related work

2.1 Feature representation

Feature representation is the foundation and important content of text data mining. Traditional feature methods mainly based on manual extraction, generally having problems of high dimension and poor correlation, such as one-hot [34] and TF-IDF [32]. Bengio [2] first proposed to learn the distributed representation of words while predicting the next word in a sequence to overcome the dimensional disaster. Following this idea, Mikolov [27] extended the simple feed forward neural network to a recursive neural network for capturing longer distance dependence. However, both models still resemble the probabilistic language model. Subsequently, Mikolov proposed Word2Vec model [26], which included CBOW and Skip-Gram training modes. CBOW predicts the center word in the context window using a simple logistic regression classifier based on the words in the context window. Skip Gram uses the same architecture, but predicts context words based on the central word. Recently, in view of its superior representation learning ability, BERT becomes a commonly used pre-training method in various fields.

There are few studies on the distribution characterization of herbal medicines. But the research approach has changed in recently, for herb representation is moving towards model-based training. Li [23] firstly proposed the distributed representation model of TCM and used N-Gram and CBOW models for experimental comparison. Deng [5] proposed a herb vector training (QM-BP) model based on multi-layer feedforward neural network, which took herb- efficacy as the sample for training. The experiment showed that the quantified value of herb obtained by the correlation between TCM property and efficacy was more accurate and had better presentation performance. Inspired by the above ideas and combined with TCM theory, we develop a feature learning method of herb based on cross features. During training, the model not only observed the individual features in the window, but also paid attention to the value information brought by the fusion of two features.

2.2 Activation function

Sparsity refers to expressing most original signals with a linear combination. The nonlinear activation function is a simple and efficient sparse method. The saturation activation functions include Sigmoid [16] and Tanh [30] functions. Such activation functions are smooth and easy to take derivatives, but the mapped vectors are more denseness. The unsaturated activation function ReLU initially appeared to solve the saturation type non-zero center and gradient disappearance [38]. For segmented function mode, it possesses the power of sparse vectors. Pathirage [31] used ReLU to constraint the information content in the latent representation for making it low dimensional. Then, in order to avoid neuron necrosis in ReLU, researchers proposed PreLU [29], RreLU [43], ELU [4] and other segmented functions with parameters, having good performance in many tasks.

Activation functions also have been used in TCM field. For recommending effective TCM formula for advantage diseases, Zhou [46] proposed a deep learning network(FordNet) with ReLU as the activation function of convolution layer. In deep learning models with cloud computing proposed by Zhang [45], using Tanh and ReLU activation functions in the model for diagnosing spleen and stomach diseases. The medicinal characteristics of herb belong to discrete characteristics. We initialized the medicinal characteristics by mathematical method, and used unsaturated activation function to sparse it for fitting the true distribution in training process.

2.3 Predicting model

Traditional text classification models are based on machine learning algorithms, such as SVM [36] and naive bayes [25], while deep learning algorithms are widely used due to learn features automatically. In 2014, Kim [19] firstly applied Convolutional Neural Network (CNN) to English text classification task, extracting features of text with one-dimensional convolutional kernel and key information in maximum pooling. However, conventional CNN can only learn local relations between minimal semantic units. In order to capture long-term textual dependencies, Johnson [17] proposed a low-complexity, word-level deep new convolutional network DPCNN. In order to better obtain the global information of text, researchers turn to a Recurrent Neural Network (RNN) model, and recurrent memory units make the network memorable. Hu [13] experimentally analyzed the performance of TextRNN in Chinese text classification, and the results showed that the accuracy of Chinese text classification could be improved from 88.60% (THUCTC) to 94.62% (TextRNN) combined with Word2vec. CNN and RNN are two popular models in the field of deep learning at present, but they are not perfect. Some researchers mix the two models to realize complementary advantages and minimize the weaknesses. Liu [21] proposed the RCNN model, which adopts the bidirectional circular structure to obtain global text information. And it reduces the noise information brought by the traditional window-based neural network and retains word order information in maximum extent when learning text representation. Then uses the max-pooling layer to obtain key features.

With the development of artificial intelligence, many Chinese scholars have conducted research on Chinese medicine cases classification with deep learning models. In order to explore the correlation between tongue diagnosis and formulas, Hu [15] constructed a two-channel CNN model to train tongue images for correspondent formulas of different tongue diagnosis. And Song [37] classified TCM cases by Text-CNN and LSTM. It must be recognized that deep learning networks have great development prospects in TCM field though there are few studies and applications of deep learning at present.

3 Methodology

The workflow of TCM2Vec is illustrated in Fig. 1. As shown in the illustration, our framework consists of two components: one is TCM-HRs pre-training under the unlabeled dataset section, the other is formula-effect predicting and fine-tuning section under the small labeled datasets, of which details are introduced in Section 3.1 and Section 3.3, respectively. The initialization of TCM-HPs is introduced in 3.2.

3.1 Initialization of TCM-HRs

The main advantage of PLMs is automatic feature extraction under data-driven conditions, without manual feature engineering in advance. Although there are many excellent models in academic research, we should decide the appropriate choice according to the characteristics of the data type. Compared with public data, such as Wikipedia, Books Corpus data, our usable data are not enough. And the text composition is short, however, most popular models [6, 12, 28] are great for long text. In this research, we choose CBOW framework as benchmark model which based on context scene.

In order to get better expressions of herb relationship characteristics, we integrate TCM theoretical knowledge into the pre-training model. Herb pair theory [44] is the basis of formula compatibility, and its composition is affected by many factors. The relationship between herbs in pairs could be synergy or antagonism. Reasonable compatibility can enhance the efficacy of herbs and reduce adverse reactions from some herbs. For example, in the pair of pinellia ternata and ginger, ginger can not only enhance the stomach warming of pinellia ternata, reducing nausea and cough, and can restrict the toxicity of pinellia ternata as well. But combinations that violate the rules may inhibit efficacy or produce toxicity. In the pair of ginseng and resveratol, ginseng can strengthen the contractility of the heart, while resveratol has antihypertensive properties againsting to ginseng.

$$ {y}_{FM}(x)={w}_0+{\sum}_{i=1}^n{w}_i{x}_i+{\sum}_{i=1}^n{\sum}_{i=j+1}^n<{v}_i,{v}_j>{x}_i{x}_j $$

(1)

$$ <{v}_i,{v}_j>={\sum}_{f=1}^k{v}_{i,f}\cdotp {v}_{j,f} $$

(2)

Combining with herb pairs, the feature interaction network layer based on the cross-term of FM algorithm [33] is added to the basic CBOW model (Fig. 2, Left) to learn the cross relation of herbs (Eq. 3), and the last results are used to predict the score of central words. The TCM pre-training model is named FMh2v (Fig. 2, Right), which the loss calculation as shown in Eq. 4:

$$ {H}^R\left({X}_C\right)=E\left({X}_C\right)+{\sum}_{i=1}^{\mathrm{c}}{\sum}_{i=j+1}^{\mathrm{c}}<{v}_i,{v}_j>E\left({x}_i\right)E\left({x}_j\right) $$

(3)

$$ Loss=- log\sigma \left(E\left({X}_{tar}\right)\ast {H}^R\left({X}_C\right)\right)-{\sum}_{x_j\in {W}_{neg}}\left(E\left({X}_{tar}\right)\ast E\left({X}_{neg}\right)\right) $$

(4)

where X_C represents the context of the target word, c is window size, H^R(∙) stands for latent space, v_i is latent vector, E(∙) is mapping function of embedding layer, σ stands for activation function, and X_neg is the set of negative samples for one input.

3.2 Initialization of TCM-HPs

Each herb has its own specifically 23-dimensional (toxicity, four characters and neutral, five tastes and twelve channel tropism) herbal medicines in the real world, such as dendrobe is sweet in taste, a little cold in nature and attributive to the stomach and kidney meridians. However, it is very difficult to collect and process the medicinal information of all Chinese medicines manually. In this research, we used mathematical ideas to fit the distribution characteristics of TCM medicines. According to the central limit theorem [20], under the appropriate conditions, the mean values of large number of independent random variables converge to a normal distribution after proper standardization. Each dimension of medicinal intrinsic feature is an independent variable. Assuming the number of herbs appearing in the corpus is infinitely close to the number of all Chinese medicines present in nature, each dimension of intrinsic feature follows the normal distribution. The specific realization of each dimension of the 23-dimensional medicines is shown in Eqs. 5 and 6:

$$ {h}^p=\left\{{m}_1,{m}_2\dots, {m}_{23}\right\} $$

(5)

$$ {m}_i\sim N\left({\mu}_i,{\sigma_i}^2\right) $$

(6)

where h^p denotes herbal medicines represention, m_i stand the value of the i-th dimension, μ_i is the standard deviation of the medicinal normal distribution in i-th dimension, σ_i represents the variance of the i-th dimension.

3.3 Formula-efficacy relations predicting

In this section, we train a fine-tuning model with a detached encode layer using a formula efficiency classification task. Fine-tuning model has better classification performance, indicating that the embedding layer parameters have better representation performance.

3.3.1 Detached encoder

The predicting model will be built with the initialized vector representation which we have get in 3.1 and 3.2. For the two types of TCM feature vectors, there should be different training methods in the process of network learning. With the iterative learning and updating of the network, the v^R of each medicine should be a continuous dense vector implying the relationship between herbs, and the v^P vector should tend to be sparse representing discrete characteristics. For example, in Table 1, herb has four character, five tastes, channel tropism and toxicity in TCM theory. But not every dimension of natural characteristic exists in herb. Coding like one-hot, is not conducive to the calculation of the relationship between herbs. Meanwhile, some herbs have different degrees in natural characteristic, such as slightly cold. Thus, it is very difficult to code it manually.

Table 1 Examples of herb about natural characteristic

Full size table

Therefore, we design a detached encoder to give different mapping spaces. In Fig. 1, TCM-HRs-Encoder represents relational feature encoder initializing with FMh2v. TCM-HPs-Encoder represents medicine encoder, which is initialized with normal distribution of herbal medicines. Both encoders are composed of fully connected neurons, but there is a RreLU sparse activation layer inside TCM-HPs-Encoder. Then, the two vector are stacked to get the final herbal representation (Eq. 9, h^R denote initialized relationship representation).

$$ {v}^P(x)=\mathrm{RreLU}\left(\mathrm{TCM}-\mathrm{HPs}-\mathrm{Encoder}\left({h}^P\right)\right) $$

(7)

$$ {v}^R(x)=\mathrm{TCM}-\mathrm{HRs}-\mathrm{Encoder}\left({h}^R\right) $$

(8)

$$ {v}^{herb}=S\left({v}^R,{v}^P\right) $$

(9)

3.3.2 Feature extractor and classifier

Feature extraction plays a key role in deep learning research. In this paper, we utilize three different kinds of feature learning techniques to experiment our double-layer encoder performance, convolution networks (CnnNet), recurrent neural networks (RecNet), LSTM with max-pooling layer (RCnnNet) and Wide deep convolution networks (WideDeepNet) respectively, parameter details in Section 4.2. And after the feature extraction completing, we construct a layer of fully connected layer to output the result of prediction, and define the training objective on multi-class data as Eq. 10, multi-label data as Eq. 11, y_i denotes true label, $ {\hat{y}}_i $ denotes predicting label.

$$ \mathcal{L}=-{\sum}_i{y}_i\mathit{\log}{\hat{y}}_i $$

(10)

$$ \mathcal{L}=-{\sum}_i{y}_i\ast \log \Big({\left(1+{e}^{-{\hat{y}}_i}\right)}^{-1}+\left(1-{y}_i\right)\ast \log \left(\frac{e^{-{\hat{y}}_i}}{\left(1+{e}^{-{\hat{y}}_i}\right)}\right) $$

(11)

4 Data and experiment

In 4.1, we will introduce experiment data for pre-training and predicting. Then, we will present our research experiment results from four part, FMh2v pre-training models, detached encoder performance, analyzing the differences of feature extractors and the experiment results with different sparse activation functions. Since word embedding cannot be directly evaluated from the perspective of word embedding, this research takes Accuracy (ACC) and F1 value(F1) as the evaluating indexes, in which F1 values involved Precision (PRE) and Recall (REC). The network has better classification results, showing that the embedding has better feature representation ability.

4.1 Data

TCM medical records are widely cited by doctors during treatment, but it is difficult to extract formulas from the descriptive natural language of the records due to the low degree of digitalization. Clinic is another good way to acquire large scale examples, however, much of this data is not publicly available. Meanwhile, considering the huge manual consumption if collect resource from ancient books or textbooks, we turned to Internet resources, which included a massive digital resource.

We treat formulas of TCM as object while efficacy treat as label and collect formulas data from Chinese medicine websites (http://www.zhongyaofangji.com/index.html and https://db.yaozh.com/), but the formulas from websites cannot be used directly. The online presentation of these data not only includes the composition of herbs, but also the dosage, processing methods, units and other information. In this research, we base on the level of herbal component, so some unnecessary information will be clean. For getting greater quality data, we also prepare medicine alias library to replace the herb that represent the same substance with different names. Ultimately, we cleaned and formalized three datasets, including 31,114 unlabeled formulas (Formula data one, Fd-1), 1036 multi-class formulas (Formula data two, Fd-2) and 1723 multi-label formulas (Formula data three, Fd-3). In Fd-2, there are 11 classes totally, ‘Huo Xue’ (活血), ‘Jie Biao’ (解表), ‘Gu Se’ (固涩), ‘Qu Shi Li Shui(祛湿利水)’, ‘Wen Li’ (温里), ‘Xie Xia’ (泻下), ‘Xi Feng’ (熄风), ‘He Jie’ (和解), ‘Li Qi’ (理气), ‘Yong Yang’ (痈疡) and ‘Xiao Dao’ (消导), each sample only have one class. And in Fd-3, the number of each formula label should be greater than or equal to 1, and there are totally 23 different labels appear, ‘Bu Yang’ (补阳), ‘Jing Chan’ (经产), ‘Qu Feng’ (祛风), ‘He Jie’ (和解), ‘Xie Huo’ (泻火), etc. Word segmentation is unnecessary in formula texts procession, we have already obtained the sample consist of minimum semantic unit (herbs) after cleaning (Table 2).

Table 2 Partial experimental data

Full size table

4.2 FMh2v pre-training model

In FMh2v pre-training part, we employ ConvNet as feature extractors, we design three kinds of window size kernel (2, 3, 4) to learn information from different perspectives, and then the classification results of the network are used to evaluate the experiments without pre-training (None), with Word2Vec, with BERT and our FMh2v. The experimental parameters about pre-training model as shown in Table 3.

Table 3 Parameter settings of pre-training

Full size table

According to the experimental results shown in Table 4, it can be found that FMh2v pre-training model proposed in this paper, has achieved improvement in the classification performance of labelled formula data, comparing with none pre-training, CBOW and BERT.

Table 4 Comparative analysis for FMh2v and other pre-training method

Full size table

4.3 Detached encoder

Now we get herbal relationship representation initialized by FMh2v and medicine representation initialized by normal distribution. On this basis, we built three different types of feature extractor to show its applicability. Except ConvNet mention above, RecNet, RCnnNet and WideDeepNet are used as well, detailed parameter settings see in Table 5.

Table 5 Parameter settings

Full size table

We use ACC and F1 to evaluate Fd-2 the effect of feature extraction in Table 6, S stands for fine-tuning model only with relationship representation, and D delegates our detached encoder in the predicting model. As shown in the Table 6, the classification performance of those models is all improved for adding detached encoder. In predicting model with ConvNet, D group’s ACC promote 0.96% and F1 promote 1.16% in contrast with S group’s; in RecNet as feature extractor model, ACC increase from 51.92% to 64.42%, and 43.45% to 60.34% in F1; in RCnnNet, ACC also promote 0.96%, F1 promote 0.77%; and in wide deep fine-tuning model, there are 7.69% and 7.47% increase in ACC, F1 value respectively.

Table 6 Detached encoder(D) compared with single encoder(S) in Fd-2

Full size table

We adopt ConvNet, RCnnNet and WideDeepNet as base model when fine-tuning embedding and predicting multi-label. In Table 7, we can see huge improve in each experiment’s F1 values. And ACC have 2.96%, 4.74%, 10.8% obvious elevation respectively.

Table 7 Detached encoder(D) compared with single encoder(S) in Fd-3

Full size table

4.4 Effect of feature extractor

In this section, we will make a reasonable analysis of the influence which arise from different feature extractor types on the fine-tuning network. We can observe that, ConvNet both gets the best performance in Fd-2 and Fd-3, but WideDeepNet effect is far-worse than ConvNet. In terms of the technology of them, both belong to convolutional neural network, which the convolution process is similar to the extraction of N-gram information. In ConvNet only contains one convolutional layer, having 384 kernels, however, there are 500 kernels in WideDeepNet. In the view of our object, two small scale datasets, it is not enough to support the learning of huge parameters.

RecNet only have one layer of bidirectional circulating neural units which may bring gradient disappearance or explode problem. And the length of formula composition is short, the information may be lost during the training process. These are probably the reasons for RecNet has lowest performance. Although RCnnNet consist of a two-way LSTM layer, but it also has a maximum pooling layer which can effective filter the noise characteristics, improving the fitting ability of the model, leading a better performance than RecNet on Fd-2.

4.5 Effect of sparse activation function

In this section, we use predicting framework based on detached encoder as the benchmark model, comparing the effects of different activation functions in TCM-HPs-Encoder for getting better fine-tuning result. PreLU, ReLU and ELU are unsaturated nonlinear activation functions as well, which may helpful to sparse distribution of input features (Fig. 3). α is hyper-parameter in PreLU (α = 0.1), RreLU(αϵ(0.1,0.5)), ELU(α = 0.1). As can be seen from Tables 8 and 9, RreLU can achieve good performance on both ConvNet_D, RCnnNet_D and WideDeepNet_D on Fd-3. Although RreLU is not the best choice by evaluation index, parameter α with a controllable range can effectively avoid neuron necrosis and reduce the inconvenience from fixed parameter setting.

Table 8 Activation function ablation experiment in Fd-2

Full size table

Table 9 Activation function ablation experiment in Fd-3

Full size table

5 Conclusion and future work

The representation of TCM play an important role for learning internal knowledge in deep learning model. In this research, we propose a detached feature extraction deep learning approach TCM2Vec, which include an unsupervised pre-training model of herb, FMh2v, combining with theory of TCM and a detached encoder for improving the applicability of deep learning methods in TCM field. FMh2v will obtain the herbal relationship representation which may imply herbal interaction information. Then we employ normal distribution to matching medicinal feature on basis of a large unsupervised formula dataset. The last, deep learning models with detached encoder are built for predicting formula-effect relations and fine-tuning the initial representation. Our experiment reveals that pre-training model incorporating cross- feature towards TCM field can provide significant benefit. Meanwhile, our findings suggest that it may be valuable to tailor learning model for specific field by using domain or task relevant theories, such as our detached encoder. However, relative to public data, formula data is a low-dataset. And we removed some other information in formula, such as dosage and manufacture method. There is a lack of representation performance. In the future, we will focus on more detailed research. In order to enhance the depth of vector expression, the information, such as the processing methods, the status or the dose of herb, will be considered. And in deep learning model application, we will design and develop professional model for TCM.

Data availability

Some data and code during the current study are available in the github repository(https://github.com/Gao-WQ/TCM2Vec).

References

Acharjya DP, Ahmed PK (2022) A hybridized rough set and bat-inspired algorithm for knowledge inferencing in the diagnosis of chronic liver disease. Multimed Tools Appl 81(10):13489–13512
Article Google Scholar
Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155
MATH Google Scholar
Chowdhury GG (2003) Natural language processing. Annu Rev Inf Sci Technol 37(1):51–89
Article Google Scholar
Clevert DA, Unterthiner T, Hochreiter S (2015) Fast and accurate deep network learning by exponential linear units (elus). Comput Sci 2015:1–14
Google Scholar
Deng L, Chang C, Huang X, Liang L, Liang H (2020) Quantitative study on medicinal properties of traditional Chinese medicine based on BP neural network. Chin Tradit Herb Drugs 51(16):4277–4283
Google Scholar
Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Gangavarapu T, Jaidhar CD, Chanduka B (2020) Applicability of machine learning in spam and phishing email filtering: review and approaches. Artif Intell Rev 53(7):5019–5081
Article Google Scholar
Gao KY, Fokoue A, Luo H, Iyengar A, Dey S, Zhang P (2018) Interpretable drug target prediction using deep neural representation. In IJCAI pp 3371–3377
Gururangan S, Marasović A, Swayamdipta S, Lo K, Beltagy I, Downey D, Smith NA (2020) Don't stop pretraining: adapt language models to domains and tasks. arXiv preprint arXiv:2004.10964
Han X, Du Q (2018) Research on face recognition based on deep learning. In 2018 sixth international conference on digital information, networking, and wireless communications (DINWC) pp 53–58
Hershey S, Chaudhuri S, Ellis DP, Gemmeke JF, Jansen A, Moore RC, Plakal M, Platt D, Saurous RA, Seybold B, Slaney M, Weiss R, Review of Text Classification Methods on Deep Learning K (2017) CNN Architectures for Large-Scale Audio Classification, In 2017 IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP) pp 131–135
Hu Z, Dong Y, Wang K, Chang KW, Sun Y (2020) Gpt-gnn: generative pre-training of graph neural networks. In proceedings of the 26th ACM SIGKDD international conference on Knowledge Discovery & Data Mining pp 1857–1867
Hu W, Gu Z, Xie Y, Wang L, Tang K (2019) Chinese text classification based on neural networks and Word2vec. In 2019 IEEE fourth international conference on data science in cyberspace (DSC) pp 284–291
Hu Y, Sun J, Wang Y, Qiao Y (2016) Property combination patterns of traditional Chinese medicines. J Tradit Chin Med Sci 3(2):110–115
Google Scholar
Hu Y, Wen G, Liao H, Wang C, Dai D, Yu Z (2019) Automatic construction of chinese herbal prescriptions from tongue images using CNNs and auxiliary latent therapy topics. IEEE Trans Cybern 51(2):708–721
Article Google Scholar
Huang F, Zhang J, Zhang S (2018) A family of robust adaptive filtering algorithms based on sigmoid cost. Signal Process 149:179–192
Article Google Scholar
Johnson R, Zhang T (2017) Deep pyramid convolutional neural networks for text categorization. In proceedings of the 55th annual meeting of the Association for Computational Linguistics 1:562–570
Ker J, Wang L, Rao J, Lim T (2017) Deep learning applications in medical image analysis. Ieee Access 6:9375–9389
Article Google Scholar
Kim Y (2014) Convolutional neural networks for sentence classification. In Proceedings of the 2014 conference on empirical methods in natural language processing, Doha, Qatar pp 1746–1751
Kwak SG, Kim JH (2017) Central limit theorem: the cornerstone of modern statistics. Korean J Anesthesiol 70(2):144–156
Article Google Scholar
Lai S, Xu L, Liu K, Zhao J (2015) Recurrent convolutional neural networks for text classification. In Twenty-ninth AAAI conference on artificial intelligence
Lee J, Moon N (2021) Immersion analysis through eye-tracking and audio in virtual reality. Comput Mater Contin 69(1):647–660
Google Scholar
Li W, Yang Z (2017) Distributed representation for traditional Chinese medicine herb via deep learning models[J]. arXiv:1711.01701 [cs]
Li S, Yu Y, Bian X, Yao L, Li M, Lou YR, Yuan J, Lin HS, Liu L, Han B, Xiang X (2021) Prediction of oral hepatotoxic dose of natural products derived from traditional Chinese medicines based on SVM classifier and PBPK modeling. Arch Toxicol 95(5):1683–1701
Article Google Scholar
McCallum A, Nigam K (1998) A comparison of event models for naive bayes text classification. In AAAI-98 workshop on learning for text categorization 752(1):41–48
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. 3
Mikolov T, Karafiát M, Burget L, Cernocký J, Khudanpur S (2010) Recurrent neural network based language model. Interspeech 2(3):1045–1048
Article Google Scholar
Mustajar S, Ge H, Haider SA, Irshad M, Noman SM, Arshad J, Ahmad A, Younas T (2021) A quantum spatial graph convolutional network for text classification. Comput Syst Sci Eng 36(2):369–382
Article Google Scholar
Ozawa K, Isogai K, Tachibana T, Nakano H, Okazaki H (2019) A multiplication by a neural network (NN) with power activations and a polynomial enclosure for a NN with PReLUs. In 2019 IEEE 62nd international Midwest symposium on circuits and systems (MWSCAS) pp 323–326
Parkes EJ, Duffy BR (1996) An automated tanh-function method for finding solitary wave solutions to non-linear evolution equations. Comput Phys Commun 98(3):288–300
Article MATH Google Scholar
Pathirage CSN, Li J, Li L, Hao H, Liu W, Wang R (2019) Development and application of a deep learning–based sparse autoencoder framework for structural damage identification. Struct Health Monit 18(1):103–122
Article Google Scholar
Ramos J (2003) Using tf-idf to determine word relevance in document queries. Proc Instruct Conf Mach Learn 242(1):29–48
Google Scholar
Rendle S (2010) Factorization machines. In 2010 IEEE international conference on data mining pp 995–1000
Rodríguez P, Bautista MA, Gonzalez J, Escalera S (2018) Beyond one-hot encoding: lower dimensional target embedding. Image Vis Comput 75:21–31
Article Google Scholar
Shahrajabian MH, Sun W, Cheng Q (2019) Clinical aspects and health benefits of ginger (Zingiber officinale) in both traditional Chinese medicine and modern industry. Acta Agric Scand B Soil Plant Sci 69(6):546–556
Google Scholar
Shevade SK, Keerthi SS, Bhattacharyya C, Murthy KRK (2000) Improvements to the SMO algorithm for SVM regression. IEEE Trans Neural Netw 11(5):1188–1193
Article Google Scholar
Song Z, Xie Y, Huang W, Wang H (2019) Classification of traditional Chinese medicine cases based on character-level bert and deep learning. In 2019 IEEE 8th joint international information technology and artificial intelligence conference (ITAIC) pp 1383–1387
Tachibana K, Otsuka K (2018) Wind prediction performance of complex neural network with relu activation function. In 2018 57th Annual Conference of the Society of Instrument and Control Engineers of Japan (SICE) pp 1029–1034
Tong T, Wu YQ, Ni WJ, Shen AZ, Liu S (2020) The potential insights of traditional Chinese medicine on treatment of COVID-19. Chin Med 15(1):1–6
Article Google Scholar
Wang CL, Liu YL, Tong YJ, Wang JW (2021) GAN-GLS: generative lyric steganography based on generative adversarial networks. Comput Mater Contin 69(1):1375–1390
Google Scholar
Wang Y, Zhang C, Liao X, Wang X, Gu Z (2021) An adversarial attack system for face recognition. J Artif Intell 3(1):1–8
Article Google Scholar
Wang J, Zhao C, He S, Gu Y, Alfarraj O, Abugabah A (2022) Loguad: log unsupervised anomaly detection based on word2vec. Comput Syst Sci Eng 41(3):1207–1222
Article Google Scholar
Xu B, Wang N, Chen T, Li M (2015) Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853
Yue SJ, Xin LT, Fan YC, Li SJ, Tang YP, Duan JA, Guan HS, Wang CY (2017) Herb pair Danggui-Honghua: mechanisms underlying blood stasis syndrome by system pharmacology approach. Sci Rep 7(1):1–15
Article Google Scholar
Zhang Q, Bai C, Chen Z, Li P, Yu H, Wang S, Gao H (2021) Deep learning models for diagnosing spleen and stomach diseases in smart Chinese medicine with cloud computing. Concurr Comput Pract Exp 33(7):1–1
Article Google Scholar
Zhou W, Yang K, Zeng J, Lai X, Wang X, Ji C, Li Y, Zhang P, Li S (2021) FordNet: recommending traditional Chinese medicine formula via deep neural network integrating phenotype and molecule. Pharmacol Res 173:105752
Article Google Scholar
Zhu X, Liu Y, Li Q, Zhang Y, Wen C (2020) Mining patterns of Chinese medicinal prescription for diabetes mellitus based on therapeutic effect. Multimed Tools Appl 79(15):10519–10532
Article Google Scholar

Download references

Acknowledgements

This work is funded by The Key Project of TCM Scientific Research Program in Hunan Province (http://tcm.hunan.gov.cn/tcm/index.html) and Natural Science Foundation of Hunan Province (http://kjt.hunan.gov.cn/zxgz/zkjj/). The project Numbers are 2020002 and 2018JJ2301. The main researcher of this project has received a total of RMB 130,000 of funding.

Author information

Authors and Affiliations

School of Informatics, Hunan University of Chinese Medicine, Changsha, China
Wanqing Gao, Ning Cheng, Guojiang Xin & Changsong Ding
Digital Business and Information Systems Department, Mahasarakham Business School, Mahasarakham University, Maha Sarakham, Thailand
Sommai Khantong
Big Data Analysis Laboratory of Traditional Chinese Medicine, Hunan University of Chinese Medicine, Changsha, China
Changsong Ding

Authors

Wanqing Gao
View author publications
You can also search for this author in PubMed Google Scholar
Ning Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Guojiang Xin
View author publications
You can also search for this author in PubMed Google Scholar
Sommai Khantong
View author publications
You can also search for this author in PubMed Google Scholar
Changsong Ding
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Changsong Ding.

Ethics declarations

Competing interests

The authors declare no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Gao, W., Cheng, N., Xin, G. et al. TCM2Vec: a detached feature extraction deep learning approach of traditional Chinese medicine for formula efficacy prediction. Multimed Tools Appl 82, 26987–27004 (2023). https://doi.org/10.1007/s11042-023-14701-w

Download citation

Received: 08 January 2022
Revised: 03 June 2022
Accepted: 03 February 2023
Published: 09 March 2023
Issue Date: July 2023
DOI: https://doi.org/10.1007/s11042-023-14701-w

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

TCM2Vec: a detached feature extraction deep learning approach of traditional Chinese medicine for formula efficacy prediction

Abstract

Similar content being viewed by others

TCM Function Multi-classification Approach Using Deep Learning Models

A Deep Learning-Based Herb Pair Discovering Approach

Effectiveness Analysis of Traditional Chinese Medicine for Anti-Alzheimer’s Disease Based on Machine Learning

1 Introduction