In an open environment, on the one hand, knowledge graphs need to constantly integrate new knowledge from the open Internet, enhance the expansion and coverage of the existing knowledge graph, on the other hand, in order to enhance the application effect of knowledge graphs, we need to integrate multiple knowledge graphs or more semantic information in knowledge graphs.
As shown in Fig. 1, from the perspective of KG construction, multi-source knowledge fusion can be divided into two categories: one is to update the existing KGs, also known as open source knowledge fusion (Section 3.1); this kind of fusion is mainly aimed at large data of the Internet, and studied how to extract useful knowledge from massive fragmented data and integrate it into existing KG, the other is multi-knowledge graph fusion(Section 3.2). It mainly refers to merging multiple knowledge graphs into a large knowledge graph by identifying the equivalent instances, equivalence classes and equivalence attributes of multiple knowledge graphs. Therefore, it is generally considered that the main task of knowledge fusion is entity alignment. The target of these two kinds of research is to update or construct a new KG. From the perspective of KG application, multi-source knowledge fusion can also be divided into two categories, one is information fusion within knowledge graph (Section 3.3), which mainly refers to taking into account information outsides knowledge graph’s structure information in the application process to enhance the application effect. The other is the fusion of multi-modal knowledge (Section 3.4). KG has become very important in the application of intelligent search and recommendation, intelligent Q&A and dialogue system and visual decision support. These two kinds of research are mainly to improve the application quality by better mining the information of multiple knowledge graphs.
Table 1 Comparison of results on entity alignment Open source knowledge fusion
Massive text, audio and video data on the Internet are important knowledge sources for building KGs. Open source knowledge fusion mainly refers to the real-time fusion of newly added knowledges, which integrates all kinds of information related to KGs contained in the Internet texts.
Integrating various data sources and various forms of knowledge, extracting new entities and new relationships from the knowledge and adding them to the original knowledge graph. This kind of integration can complement and expand the original knowledge graph, so we can regard open source knowledge fusion as a segment in the process of knowledge graph construction, and can also be understood as knowledge graph updating.
Due to the multi-source heterogeneity of the Internet knowledge, knowledge evaluation and verification are indispensable links for open source knowledge fusion.
Due to the uneven quality of Internet knowledge, knowledge must be evaluated and validated in the process of open source knowledge fusion. Knowledge evaluation can judge the authenticity of knowledge, and integrate the validated knowledge with the existing knowledge in knowledge graphs to achieve the fusion of knowledges and improve the reliability and confidence of knowledge. So far, the research work on open source knowledge fusion mainly focuses on the following two aspects: one is knowledge evaluation and verification, the other is entity link.
There are three traditional methods for knowledge evaluation and verification: Bayesian model [4, 5], the D-S evidence theory [6,7,8], and the fuzzy set theory [9, 10]. With the development of machine learning, knowledge evaluation and verification methods based on graph models [11,12,13,14] have been developed in recent years.
The basic principle of the Bayesian model is: according to the prior probability of the knowledge to be evaluated in advance, and then use the conditional probability observed in the data source to obtain the posterior probability, and select the correct knowledge according to the maximum posterior probability criterion. In fact, the prior probability of knowledge is often very difficult to precognition, so the Bayesian model has boundedness. The D-S evidence theory is a generalization of Bayesian method. This method does not need to know the prior probability, and can well express “uncertainty”, and uses “interval estimation” instead of “point estimation” to describe uncertain information. It can be used to solve the conflict problem in multi-source knowledge fusion. Both the D-S evidence theory and the Bayesian model are based on the hypothesis that knowledge from different sources is independent of each other, and when there is a serious conflict among knowledge sources, it often results in contrary conclusions. In addition, the time complexity of the D-S evidence theory has potential exponential explosion, which is not suitable for large-scale knowledge evaluation and verification. The model based on the fuzzy set theory can deal with both inaccurate and uncertain information, but it needs to set up fuzzy rules and membership functions of knowledge based on experience. It is difficult to guarantee the stability and robustness of knowledge evaluation results, and it is not suitable for multi-source heterogeneous knowledge evaluation. Knowledge evaluation based on graph models uses knowledge from the existing knowledge base to fit the prior model, so as to assign a probability to knowledge, and can also be used as a link prediction problem. According to the prediction results, it can guide the quality evaluation of knowledge acquired from data sources. These methods can reduce the wrong knowledge to a certain extent and improve the reliability and confidence of knowledge. However, the scale of knowledge in the open domain is becoming larger and larger, and it has strong dynamic evolution characteristics. The following research work should consider the time dimension of knowledge and the large-scale knowledge evaluation.
From the point of view of entity links, the research results of open source knowledge fusion are discussed in the next three parts separately, which are not introduced in detail here.
Multi-knowledge graph fusion
People use different information sources to construct different knowledge graphs. How to fuse and express multi-knowledge graphs is of great significance to establishing a unified large-scale knowledge graph. Because the information sources of different knowledge graphs are different, they may be domain knowledge graphs or general knowledge graphs, and their knowledge description systems are different. The same entities in semantics will have different expressions in different knowledge graphs, and entities with the same name may also represent different things. Multi-knowledge graph fusion is not simply to merge knowledge graphs, but to discover equivalent instances, equivalent attributes or equivalent classes among knowledge graphs, and to determine which entities and relationships from different knowledge graphs will be aligned.
Entity alignment is an important component of multi-source knowledge fusion technology. The aligned entities can be used to transfer knowledge in multi-knowledge graphs, and facilitate the construction of cross-language knowledge graphs and knowledge reasoning. Considering the multi-type relationship in knowledge graphs, [15] proposed a knowledge graph embedding and entity alignment algorithm based on representation learning. They select the alignment-task driven representative relations based on the pre-aligned entity pairs. With the help of the selected relationships, they embed cross-network entities into public space by modeling the head/tail of entities and the corresponding context vectors. For entity alignment tasks, pre-aligned entities are used to facilitate context information transmission across knowledge graphs. In this way, the problem of entity embedding and alignment can be solved simultaneously in a unified framework. A large number of experiments on two multi-lingual knowledge graphs prove the validity of the model. [16] also proposed a multi-source and multi-knowledge base entity alignment algorithm based on network semantic labels. The core of the algorithm is to align the entities between different knowledge graphs by calculating the semantic similarity pairs between two entities. In the alignment process, the description information of entities including unstructured text keywords, semantic tags and category tags is integrated. Firstly, the similarity of three features is calculated separately, and then the similarity is calculated synthetically.
$$ SIM\left({E}_1,{E}_2\right)={\omega}_1\times SIM\left({TP}_1,{TP}_2\right)+{\omega}_2\times SIM\left({C}_1,{C}_2\right)+{\omega}_3\times SIM\left({S}_1,\kern0.5em {S}_2\right) $$
(1)
SIME1,E2 = ω1 × SIMTP1,TP2 + ω2 × SIMC1,C2 + ω3 × SIMS1,S2Among them,SIM(TP1, TP2), SIM(C1, C2), SIM(S1, S2) respectively represent the semantic similarity based on attribute tags, the semantic similarity based on class tag matching, and the semantic similarity of unstructured text keywords. When the calculated value is greater than a certain threshold, the entity pair with the greatest similarity is taken as the output of the alignment result, which is also considered to have the same semantic orientation. Sun et al. [17] proposed a new method of joint knowledge embedding to achieve entity alignment. The model consists of three parts: knowledge embedding, joint embedding and iterative alignment. Use TransE [18] and PtransE (Path-based TransE) [19] to learn the entities and relationships in different knowledge graphs separately to obtain knowledge embedding. Because TransE ignores the important multi-step path information in the knowledge graph, the modeling effect on the complex relationship is not ideal, so PTransE is proposed. The joint embedding mapping all individual knowledge embedding into a semantic space. There are three models embedded in the joint: a translation-based model, a linear transformation model, and a parameter sharing model. Iterative alignment is the discovery of more aligned entities by adding “new aligned entities” to the seed set, updating the joint embedding. The objective function consists of three parts:
Where K, J and I denote the score function of knowledge embeddings, joint embeddings, and iterative alignment. Similarly, JAPE [20] uses attribute and text description information to enhance the learning representation of instances, and uses joint representation learning technology to directly embed entities and relationships in different knowledge graphs into a unified vector space.
Zhong et al. [21] proposed CoLink, a general unsupervised framework for the UIL(User Identity Linkage) problem. CoLink employs a co-training algorithm, which manipulates two independent models, the attribute-based model and the relationship-based model, and makes them reinforce each other iteratively in an unsupervised way. The attribute-based model predicts the linked user pairs by only considering the user attributes. It can utilize any classification algorithm. The sequence-to-sequence learning is a very effective implementation of the attribute-based model, which can well handle the challenge of the attribute alignment by treating it as a machine translation problem. The network consists of two parts: the sequence encoder and the sequence decoder. Both the encoder and the decoder use a deep Long Short-Term Memory (LSTM) architecture. Traditional classification algorithms like Support Vector Machines (SVM) can also be employed in the attribute-based model.
Trsedya et al. [22] proposed an entity alignment method between knowledge graphs based on attribute embeddings. The framework consists of three components including predicate alignment, embedding learning, and entity alignment. The framework is shown in Fig. 2. In the predicate alignment module, two KGs are merged into one KG by renaming potentially aligned predicates. By calculating the similarity of the name of the predicate (the last part of the URI), the potential aligned pairs of predicates are found and renamed using a unified naming format. For example, its predicate pair, “dbp: bornIn” and “yago: wasBornIn” will be renamed to “: bornIn”. An embedding learning module includes structure embedding, attribute character embedding and joint embedding learning. The structural embedding model is built on top of TransE. Unlike TransE, the model wants to pay more attention to aligned triples, that is, triples containing aligned predicates. The model achieves the goal by adding weights. The objective function of structural embedding is:
$$ {\mathcal{L}}_{SE}={\sum}_{t_r\epsilon {T}_r}{\sum}_{{t_r}^{\hbox{'}}\epsilon {T}_r^{\hbox{'}}}\max \Big(0,\gamma +\alpha \left(f\left({t}_r\right)-f\left({t_r}^{\hbox{'}}\right)\right) $$
(3)
$$ {T}_r=\left\{<h,r,t>|<h,r,t>\in G\right\} $$
(4)
$$ {T}_r^{\hbox{'}}=\left\{<{h}^{\hbox{'}},r,t>|{h}^{\hbox{'}}\in E\right\}\cup \left\{<h,r,{t}^{\hbox{'}}>|{t}^{\hbox{'}}\in E\right\} $$
(5)
$$ f\left({t}_r\right)=\mid \left|h+r-t\right|\mid $$
(6)
$$ \upalpha =\frac{count(r)}{\mid T\mid } $$
(7)
where count (r) is the number of occurrences relationship r, and ∣T∣ is the total number of triples in the merge KG G1 − 2. Attribute character embedding also follows the idea of TransE. Unlike structure embedding, there are differences in the representation of attributes with the same meaning in different KGs. Hence, Trsedya et al. [22] used a compositional function to encode the attribute value, and the three compositional functions are as follows: the Sum compositional function, the LSTM-based compositional function and the N-gram-based compositional function. The objective function of attribute character embedding is:
$$ {\mathcal{L}}_{CE}={\sum}_{t_a\epsilon {T}_a}{\sum}_{{t_a}^{\hbox{'}}\epsilon {T}_a^{\hbox{'}}}\max \left(0,{\gamma}_e+\alpha \left(f\left({t}_a\right)-f\left({t_a}^{\hbox{'}}\right)\right)\right) $$
(8)
Joint learning uses attribute character embedding to help structure embedding in the same vector space to complete training. The objective function of joint learning is:
$$ {\mathcal{L}}_{SIM}={\sum}_{h\epsilon {G}_1\cup {G}_2}\left[1-{\left\Vert {\mathrm{h}}_{se}\right\Vert}_2.{\left\Vert {\mathrm{h}}_{ce}\right\Vert}_2\right]. $$
(9)
The overall objective function of the model is:
$$ \mathcal{L}={\mathcal{L}}_{SE}+{\mathcal{L}}_{CE}+{\mathcal{L}}_{SIM} $$
(10)
After the joint learning of structure embedding and attribute character embedding, similar entities from different KGs will have similar embeddings, so potential entity pairs <h1, hmap> can be obtained through computing the following equation:
$$ {\mathrm{h}}_{\mathrm{map}}={\mathrm{argmax}}_{{\mathrm{h}}_2\upepsilon {\mathrm{G}}_2}{\left\Vert {\mathrm{h}}_1\right\Vert}_2.{\left\Vert {\mathrm{h}}_2\right\Vert}_2 $$
(11)
EnAli [23] is an unsupervised method for matching entities in two or more heterogeneous data sources. The research on multi-source heterogeneous data is very important in many fields. For large data sources, aligning all triples of multiple data sources is costly. EnAli employs a generative probabilistic model to incorporate the heterogeneous entity attributes via employing exponential family, handle missing values, and also utilize the locality sensitive hashing schema to reduce the candidate tuples and speed up the aligning process. EnAli is highly accurate and efficient even without any ground-truth tuples. EnAli consists of four components as follows: Candidate tuple generation (employs LSH to block entities from N data sources), Similarity computation, Parameter learning, Decision making. EnAli considers both discrete and continuous similarities as a wider range of probability distributions from the exponential family to model the similarity values of matched and unmatched entity tuples. This is an important extension to handle the heterogenous attribute types, including string, numeric, set, distribution, etc., and these exist in the entity alignment task. Wang et al. [24] proposed a method of enriching entities in ontology by using external definition and context information, and the additional information is used for ontology alignment. Different domains usually have different sentiment expressions, and a general sentiment classifier is not suitable for all domains. Training a domain-specific sentiment classifier for each target domain also faces the problem that the labeled data in the target domain is usually insufficient, and it is costly and time-consuming to annotate enough samples. Multi-source sentiment knowledge fusion can effectively improve the performance of sentiment classification and reduce the dependence on tagged data. Wu et al. [25] constructed a unified fusion framework to train domain-specific sentiment classifier for target domain by fusing sentiment knowledge from multiple sources.
Other studies include: Wang et al. [26] proposed that text data be taken into account in representation learning. Word2vec [27, 28] was used to learn the word representation in Wikipedia text, and TransE [18] was used to learn the knowledge representation in the knowledge base. At the same time, using the link information in the Wikipedia text (the correspondence between anchor text and entity) to make the word representation of entity in text as close as possible to the entity representation in knowledge bases, so as to realize the representation learning of text and knowledge base fusion; Zhong et al. [29] also used similar ideas to fuse entity description information. Sun et al. [30] summarized the current status of entity alignment algorithms in the field of geographical knowledge base research from three aspects of similarity measurement, similarity combination and consistency judgment, summarized the evaluation process of alignment results, and proposed the basic definition and general framework of entity alignment in a geographical knowledge graph. Guo et.al [31] proposed recurrent skipping networks for entity alignment (RSN4EA), which leverages biased RW (Radom Walk) sampling for generating long paths across knowledge graphs and generates the paths with a novel RSN (recurrent skipping network). RSN combines the traditional RNN with residual learning, and only a few parameters can greatly improve the convergence speed and performance.
Information fusion within knowledge graph
Most of the existing knowledge graph application models only use the triple structure information of knowledge graph, and the information about the entity and the relationship, category information and other information related to the knowledge are not effectively utilized. There are two main types of research on the internal information fusion of knowledge graphs. One is to consider the entity type, the entity description information and the relationship between the entities in the related research of entity alignment, and the second is to learn the representation of the knowledge graph. Incorporate rich internal information in the knowledge graph to obtain better knowledge representation results.
Zhong et al. [29] performed entity alignment based on entity description information without relying on Wikipedia as anchor text. Inspired by the joint embedding framework in [26], learn the best embedding by minimizing the following loss function:
$$ \mathcal{L}\left(\left\{{e}_i\right\},\left\{{r}_j\right\},\left\{{w}_l\right\}\right)={\mathcal{L}}_K+{\mathcal{L}}_T+{\mathcal{L}}_A $$
(12)
where \( {\mathcal{L}}_K \), \( {\mathcal{L}}_T \) and\( {\mathcal{L}}_A \)are the loss functions of the knowledge model, the text model and the alignment model respectively. [29] focusing only on the loss function \( {\mathcal{L}}_A \) of the new alignment model, the loss function \( {\mathcal{L}}_K \) of the knowledge model and the loss function in text model \( {\mathcal{L}}_T \) are the same as the counterparts in [26].
Guan et al. [32] proposed a self-learning and embedded entity alignment method (SEEA), which was used to iteratively search for semantic matching entity pairs and make full use of the semantic information contained in entity attributes. See Fig. 3 for an illustration. The knowledge graph is formalized as G = (E, A, V, R, AT, RT), where E = E1 ∪ E2 is the entity set, and E1 and E2 are two sets of entities to be aligned. A, V and R represent the set of attributes, the set of attribute values and the set of relationships, respectively. AT ⊆ E × A × Vis a set of attribute triples, and RT ⊆ E1 × R × E2 is a set of relation triples between entity group E1 and E2 . The input to the SEEA model is a knowledge graph, which includes two sub-modules: knowledge graph embedding and entity alignment. Knowledge graph embedding includes relation triple learning and attribute triple learning. The self-learning mechanism performs feedback operations from entity alignment to KG embedding. SEEA uses the results of the previous learning iteratively to update the embedding of entities, attributes and attribute values in the next iteration. That is to say, in the self-learning mechanism, the learned relational triples are used to update all embedding in the next iteration.
Yang et al. [33] proposed a Text- Associated Deep Walk (TADW) that incorporates text information. In the framework of matrix decomposition, TADW introduces text features as a supplement to network structure information into network representation learning. Similarly, CANE [34](Context-Aware Network Embedding) is a context-aware embedding method. There are two kinds of embedding for a node V, one is structure-based embedding vs, the other is text-based embedding vt (may be context-aware embedding or context-aware embedding), and then they are concatenate to get v = vs ⊕ vt.CANE wants to maximize the objective function of the edge as follows:
$$ L={\sum}_{e\epsilon E}\left({L}_S(e)+{L}_t(e)\right) $$
(13)
Where LS(e) is a structure-based objective function and Lt(e) is a text-based objective function. Context-free Embeddings means that the embedding of a node is fixed and does not change according to its context. Context-aware Embeddings means that CANE learns different embedding based on different context of a node.
Zhang et al. [35] proposed a recommendation system based on Collaborative and knowledge Base Embedding (CKE), as shown in Fig. 4. They introduced structured knowledge, text knowledge, image knowledge and other knowledge graph information to improve the quality of the recommendation system. Among them, structured knowledge uses TransR [36] to get the vector representation of entities. Text knowledge and image knowledge use Stacked De-noising Auto-encoders (SDAE) [37] and Stacked Convolutional Auto-encoders (SCAE) respectively to get vector representation with strong generalization ability.
Kristiadi et al. [38] considered the semantic information carried by the literal meanings of entity names in knowledge graphs, and proposed a new representation learning mechanism LiteralE (See Fig. 5). The improvement strategy of this mechanism is to integrate the literal information Ij or Ii of entities through transformation function g(∙) before scoring the vector representation of entities.
Where g(∙)can be linear transformations
$$ \kern0.75em {g}_{lin}\left({e}_i,{I}_i\right)={W}^T\left[{e}_i,{I}_i\right] $$
(14)
non-linear transformations
$$ {g}_{nonlin}\left({e}_i,{I}_i\right)=h\left({W}^T\left[{e}_i,{I}_i\right]\right) $$
(15)
simple MLPs
$$ {g}_{MLP}\left({e}_i,{I}_i\right)=h\left({W}_2^Th\left({W}_1^T\left[{e}_i,{I}_i\right]\right)\right) $$
(16)
Xie et al. [39] considered that the entity description information provided in Freebase and other knowledge bases can help knowledge representation learning to achieve better results. The representation learning model DKRL(description-embodied knowledge representation learning) proposed in this paper first converts entity description text information into entity representation using CBOW [27, 28] or CNN [40, 41], and then uses the entity representation to learn the objective function of TransE. CBOW extracts keyword sets containing the main concepts of entities from descriptive texts, then selects the first n keywords as input, and then simply adds the coded word vectors as text representations.
$$ {e}_d={x}_1+{x}_2+\dots +{x}_k $$
(17)
Where xi denotes the embeddings of the first word in the keyword set belonging to entity e. The Convolutional Neural Network (CNN) Encoder consists of five layers. The input is the whole description of a specific entity, and the output is the description-based representation of that entity. CBOW is slightly different from CNN in this model. The former does not consider the word order information of the text, while the latter considers the word order of the text.
TransC [42] is a knowledge graph embedding model which distinguishes concepts from instances. It encodes each concept in knowledge graphs as a sphere and each instance as a vector in the same semantic space. It expresses relations by the spatial inclusion relations between points and spheres and the inclusion relations between spheres. This representation can naturally solve the problem of the transmission of the relations. Concepts and instances, as well as the relative positions between concepts and concepts are described by the relationship between InstanceOf and subClassOf, respectively. The InstanceOf relation is used to indicate whether an instance is in a sphere represented by a concept, and the subClassOf relation is used to indicate the relative position between two concepts. Four possible relative positions are proposed:
As shown in Figure 6, where m is the radius of the sphere, d is the distance between the centers of the two spheres, si and sj represent the spheres represented by concepts i and j, respectively. Figuer6(a), 6(b), 6(c), 6(d) respectively represent four kinds of position relations between si and sj. For InstanceOf and subClassOf, there is a clever design to retain the transitivity of the isA relation, that is, the transferability of instanceOf-subClassOf is embodied by
$$ \left(i,{r}_e,{c}_1\right)\in {S}_e\wedge \left({c}_1,{r}_c,{c}_2\right)\in {S}_c\to \left(i,{r}_e,{c}_2\right)\in {S}_e $$
(18)
while subClassOf-subClassOf is embodied by
$$ \left({c}_1,{r}_c,{c}_2\right)\in {S}_c\wedge \left({c}_2,{r}_c,{c}_3\right)\in {S}_c\to \left({c}_1,{r}_c,{c}_3\right)\in {S}_c $$
(19)
where (i, re, c) means InstanceOf triple, (ci, rc, cj) means SubClassOf triple. There are three main types of triples: InstanceOf Triple, SubClassOf Triple, Relational Triple.
the loss function of instanceOf triples is defined as:
$$ {f}_e\left(i,c\right)={\left\Vert i-p\right\Vert}_2-m $$
(20)
use ζ and ζ′to denote a positive triple and a negative triple,and the margin-based ranking loss for instanceOf triples is:
$$ {\mathcal{L}}_e={\sum}_{\zeta \epsilon {S}_e}{\sum}_{\zeta^{\hbox{'}}\epsilon {S}_e^{\hbox{'}}}{\left[{\gamma}_e+{f}_e\left(\zeta \right)-{f}_e\left({\zeta}^{\hbox{'}}\right)\right]}_{+} $$
(21)
where [x]+ ≜ max(0, x) and γeis the margin separating positive triples and negative triples.
Similarly, we will have the ranking loss for subClassOf triples \( {\mathcal{L}}_c \)and relational triples \( {\mathcal{L}}_l \). The overall loss function is the linear combinations of these three functions:
$$ \mathcal{L}={\mathcal{L}}_e+{\mathcal{L}}_c+{\mathcal{L}}_l $$
(22)
Other related studies include adding logical rules [31, 43,44,45], entity types and descriptive text information to knowledge representation learning [46,47,48,49,50,51], and considering the relationship path in knowledge graph [52,53,54] Table 2.
Table 2 Comparison of various research models Multi-modal knowledge fusion
Data in different industries come from a wide range of sources and in a variety of forms, each of which can be considered as a modal, such as text, images, video, and audio, different modal have different levels of knowledge representation. Multi-source knowledge focuses on expressing the diversity of data sources. Multi-modal knowledge fusion can make agents perceive and understand real application scenarios more deeply, and better support industrial applications. Studying the feature representation and learning methods of different modal information can realize the cooperative representation of multi-modal data. In order to overcome the influence of structural differences on multi-modal representation, it is necessary to study the embedded learning method of multi-modal information and its internal and external knowledge, and establish a deep feature learning and association representation model supported by cognitive data, so as to project different modal information, such as language and vision, into a common subspace and realize the multi-modal data co-representation at the knowledge level, and support knowledge acquisition based on multi-modal fusion [55].
Zhang et al. [56] proposed seamless integration of multiple data sources with Bi-GRU (Gated Recurrent Unit) architecture Fig. 7. The model treats four inputs as a sequence {s1, s2, s3, s4} while using a Bi-GRU layer to learn their interdependencies. Subsequently, all hidden units {h1, h2, h3, h4} are concatenated into a new vector representation to preserve their differences and then sent to the final fully connected layer..
The vector representation of a user is:
$$ {v}_u=W\left[{h}_1\oplus {h}_2\oplus {h}_3\oplus {h}_4\right]+{b}_c $$
(23)
$$ {h}_i={f}_{BiGRU}\left({s}_i\right) $$
(24)
Bi-RNN is used to get the document presentation. The forward hidden layer can get a hidden representation and the backward hidden layer get a representation too. The two hidden layer representations are fused together and then a self-attention mechanism is used to automatically assign weights to different inputs. User nickname, self-introduction, education information, work information and individualized labels are treated as user metadata. After concatenating all the elements of metadata, feeding them into a Bi-RNN layer and an Attention layer to the metadata representation. Network Representation employ LINE [57].
RBMs (Restricted Boltzmann Machines) [58] can be effectively used to model the distribution of binary-valued data. Boltzmann machine models and their extensions to exponential family distributions [59] have been successfully applied in many applications. The Multimodal Deep Boltzmann machine [60] can be used to learn the characteristics of text and pictures separately [61, 62], and then combine these two.
features into a new feature vector as the input feature of the SVM(Support Vector Machines) classifier. The model integrates cross-modal features to set up a fusion representation.
The DCPR (Deep Context-aware Point of view Recommendation) [63] model is a point of view (POI) recommendation model based on deep context-aware. The DCPR model uses LSTM to learn potential user representations and CNN to generate potential representations from comments. An end-to-end depth model is used to consider POI attributes, user preferences, sequential momentum check-ins and so on.When researching the impact of events and investor sentiment on stock price trend, Zhang et al. [64] extracted events from online news, extracted users’ emotions from social media, and fused multi-source heterogeneous data by constructing tensors.
Visual appearance score, appearance mixture type and deformation are three important information sources for human pose estimation. [65] proposed to build a multi-source deep model in order to extract non-linear representation from these different.
aspects of information sources. With the deep model, the global, high-order human body articulation patterns in these information sources are extracted for pose estimation. A direct method is to mix information sources with different statistical characteristics in the first hidden layer. As shown in Figure 8 (a), this method has its limitations. Another method, as shown in Figure 8 (b), is to construct the high-level feature representation of each data source with two layers, and then use the other two layers to fuse the high-level representation of different information sources for pose estimation. Auto-encoder and RBM [58] are two common components of unsupervised deep learning algorithms. Similar approaches have been used in the research of representation learning based on a depth model [66,67,68,69,70].