Multiplex network embedding for implicit sentiment analysis

As one of the hot research directions in natural language processing, sentiment analysis has received continuous and extensive attention. Different from explicit sentiment words indicating sentiment polarity, implicit sentiment analysis is a more challenging problem due to the lack of sentiment words, which makes it inadequate to use traditional sentiment analysis method to judge the polarity of implicit sentiment. This paper takes sentiment analysis as a special sign link prediction problem, which is different from traditional text-based method. In particular, by performing the word graph-based text level information embedding and heterogeneous social network information embedding (i.e. user social relationship network embedding, and user-entity sentiment network embedding), the proposed scheme learns the highly nonlinear representations of network nodes, explores early fusion method to combine the strength of these two types of embedding modeling, optimizes all parameters simultaneously and creates enhanced context representations, leading to better capture of implicit sentiment polarity. The proposed method has been examined on real-world dataset, for implicit sentiment link prediction task. The experimental results demonstrate that the proposed method outperforms state-of-the-art schemes, including LINE, node2vec, and SDNE, by 20.2%, 19.8%, and 14.0%, respectively, on accuracy, and achieves at least 14% gains on AUROC. For sentiment analysis accuracy, the proposed method achieves AUROC of 80.6% and accuracy of 78.3%, which is at least 31% better than other models. This work can provide useful guidance on the implicit sentiment analysis.


Introduction
The unstructured text generated by users, who share feelings and express attitudes in social media, contains users' behaviors in real life, and forms the sentiment links between these users and entities (e.g. commodities, movies or friends). These sentiment links between users and entities contains a lot of available information, which has been successfully applied in many tasks, such as intention mining [1], personalized recommendation system [2,3] and so on.
However, in online social network platforms, the contents published by users are usually short sentences, clauses or phrases, and because people express their feelings or opinions in various ways, they can use explicit emotional words or implicit ones. In this case, it is inadequate to use traditional sentiment classification algorithms to judge the polarity of sentiment relationship among uses and entities. Therefore, according to the expression of subjective and objective emotion in texts, Liu [4] divides emotion into sub-objective opinion and fact-implied opinion, the former is the subjective statement of emotional tendency or opinion, while the latter is the implicit expression of emotion through objective statement. Liao et al. [5] defined implicit sentiment sentence as "a language segment that expresses subjective emotions but does not contain explicit affective words".
Most current user sentiment analysis literature focuses on making a sentiment decision according to users' text descriptions. However, on online social media, most people express their attitudes on different entities with very limited words (e.g., Sina Weibo, Twitter), where explicit sentiment polarity words are missing. In such cases, the sentiment polarity is likely to be hidden and not obvious from the short text. Furthermore, different context semantic backgrounds can result in different sentiment polarity. In these situations, traditional sentiment analysis methods often fail to retrieve users' hidden real attitudes, making it critical to be able to capture implicit sentiment, especially sentiment from the limited content published by users. Unfortunately, there is very little prior work focusing on implicit sentiment analysis for online social media. Different from explicit sentiment words indicating sentiment polarity, implicit sentiment analysis task is more difficult mainly because of the following reasons: 1. From the perspective of linguistic and words, different context semantic backgrounds can result in different sentiment polarity; and the implied text does not contain sentiment words. The above two factors make it challenging for the bag-of-words model-based text representation methods to effectively express the semantics of sentences, and the acquisition of semantic features is more complicated. In this case, it is unrealistic to extract the user's sentiment towards the entity by using the traditional sentiment classifiers. 2. Existing methods [12][13][14] focus on extracting features from discourse information and use the structural relationship of text data to construct graph, such as word co-occurrence, syntactic relationship, and context relationship and so on, and train classifiers to predict sentiment polarity. However, users' social structure and the user-entity-aware information, which exert significant impacts on the formation of social networks' sentiment links, are often ignored. 3. Network embedding methods aim to learn the low dimensional potential representation of nodes in the network [20][21][22][23][24][25][26], and these methods have achieved significant performance in the task of link prediction. However, most of these methods only focus on predicting whether there are links between node pairs, while cannot retrieve whether there are negative links or positive links between node pairs. So these methods cannot be directly applied to the field of sentiment analysis.
Text semantic information and social network information are important sources for learning node representations in the task of implicit sentiment analysis. However, there is no relevant research to analyze the hidden sentiment polarity expressed by users to entities with very limited words in online social networks (such as Sina Weibo and twitter). To our best knowledge, this is the first work that models sentiment polarity relationship as sentiment links between users and entities, and propose a novel multiplex network embedding framework based on word graph-based text level information and the heterogeneous social network information (user social relationship network, user-entity sentiment network) to extract nodes' low dimensional embedding representation and further to predict the sign of implicit sentiment links. The proposed MEISP model adopts deep autoencoders to map each user structure, sentiment network and text structure information into a low-dimensional vector space, in this process, nodes' highly nonlinear representations from these three networks can be preserved. The learned node representations are fused together by concatenation aggregation and nonlinear fusion for further sentiment link prediction. First of all, because both the user social relationship network embedding and the sentiment network embedding are equally important to the final representation, this paper aggregate these two embeddings into a final social network level embedding through a concatenation function. Then an early fusion method is used to fuse the text-based information and social network-based information. Through the nonlinear fusion of social information and text contents information, a powerful network embedded representation learning framework is formed. Furthermore, the joint optimization of these components can enhance the learning representation performance of network nodes, and solves the problem of implicit sentiment analysis.
As far as we know, this is the first work that studies both social network-based and text-based multiplex network embedding for implicit sentiment mining and analysis. The main contributions of this work are summarized below: • This paper formulates and elaborates the problem of sentiment analysis in social networks, that is, to find the sentiment tendency hidden in user comments. • In this paper, the implicit sentiment analysis problem is formally defined as a sentiment sign link prediction problem, which considers the comprehensive context at both the text level and the social network relation level. Based on this, a holistic sentiment polarity analysis scheme is designed to reveal users' real attitude hidden in short texts.
The rest of this paper is organized as follows. Section "Related works" reviews the related works from two aspects of sentiment analysis and network embedding. Section "Problem formulation" introduces the proposed formal definition of the implicit sentiment analysis problem. Section "Multiplex network embedding" proposes MEISP model and shows how multiplex information network embedding is handled to perform implicit sentiment link predictions. Section "Experiment" carries out a variety of experiments and compared with some well-known network embedding methods and sentiment analysis algorithms.

Related works
Since, this work aims to propose an advanced network embedding framework for implicit sentiment analysis, this section summarizes related works from two aspects: sentiment analysis and network embedding.

Sentiment analysis
Sentiment analysis is one of the most important tasks in the field of natural language processing. The existing textbased studies of sentiment classification can be mainly grouped into two categories: lexicon-based and corpusbased approaches. Lexicon-based approaches mainly use the sentiment polarity associated with the sentiment dictionary to calculate the sentiment polarity of each sentence or document. Corpus-based approaches [6][7][8] take sentiment classification as a special case of text categorization problem, which utilize machine learning methods to extract reasonable features from texts and feed them into a classifier to predict sentiments. Nowadays, more people are used to express their attitudes on different entities in online social networks, forming user to entity sentiment links. These sentiment links imply positive or negative semantics. Most of current user sentiment analysis literature focuses on making a positive, neutral, or negative sentiment decision according to users' text descriptions. And more and more attention has been paid to the task of user-entity sentiment analysis. For example, Li et al. [9] integrate both named entity information and sentiment level information together to form label sequences and adopt an approach based on graphical models. Gan et al. [10] propose a self-attention based hierarchical dilated convolutional neural network for multi-entity sentiment analysis. Ding et al. [11] design an entity-level sentiment analysis tool consisting of sentiment classification and entity recognition, which can classify issue comments into < sentiment, entity > tuples. However, these above entity sentiment analysis are mainly rely on text content, and all contain explicit words. Compared with explicit sentiment analysis, the lack of sentiment words in sentences makes the expression of sentiment more euphemistic and these traditional sentiment analysis methods often fail to retrieve users' hidden real attitudes, making the implicit sentiment analysis more challenging.
Existing works of implicit sentiment analysis mainly focuses on the discrimination of implicit sentiment sentences. To capture implicit structures, recent studies show promising results by learning word embedding with neural language models, such as word2vec [41] model. The embedded word vector obtained by these methods has better representation and reasoning ability in the semantic space and can be used as the input of various deep learning models, such as LSTM, and CNN models. Kauter et al. [12] propose a new fine-grained method to identify explicit and implicit sentiment in financial news and reviews. Liao et al. [5] propose a multi-level semantic fusion method based on representation learning, which can learn and fuse three different levels (i.e. words, sentences and documents) of features to recognize factual implicit sentiment sentences, however, this model is limited by grammatical structure. Bi-directionally long short-term memory [13] model can effectively capture the semantics of long dependent sequences, and assign different weights to words in sentences automatically via introduction of attention mechanism [14,15]. Wei et al. [16] propose an implicit sentiment classification model based on multipolarization orthogonal attention mechanism, which can simulate the differences between vocabulary and specific sentiment polarity attention effectively, and improve the performance of implicit sentiment classification. Zuo et al. [17] propose a context-based heterogeneous graph convolution network model, which first regards the context of the whole document as a heterogeneous graph to maintain its dependency structure, and then employs graph convolution network to obtain the features of implicit sentiment sentences and context. However, existing implicit sentiment analysis often faces three major challenges: (1) there is a lack of sentiment words; (2) the words are relatively objective and neutral; and (3) the words' sequences, co-occurrence relationship are neglected. The traditional bag of words-based models, however, can neither accurately capture the meaning and text structure of words in the text, nor effectively represent the semantics of a sentence. In addition, since words are organized into text content by forming phrases, sentences and so on to express users' opinion and the different positions of words in a sentence will have a great impact for a detailed understanding of the text.
Graph-based text representation method is one of the effective methods to solve the above problems. In the graph, nodes represent features and edges represent the relationship between different nodes. Although there are various graphbased text representation models, such as word graphs and ngram-graphs [18], to better capture the inherent characteristics of text in social media, this paper uses word co-occurrence graph to represent the relationship between words in short text content.
Representing the text in a graph structure that considers all the information for sentiment classification has achieved excellent performance and has attracted considerable attention in recent years. For example, Bijari et al. [19] propose to represent sentences via graph scheme and Node2vec 1 3 embedding is used as feature learning algorithms to extract the text features. Zhao et al. [20] design a bidirectional attention mechanism with position encoding to capture the aspect-specific representations. Gui et al. [21] propose to build a heterogeneous network by constructing the relationship among words, users and products.
On social networks, users with similar hobbies and concepts often form personalized social communities. For example, Xiao et al. [22] and Zhao et al. [23] both consider user structure and model the social relationship of users. In addition, part of the sentiment prediction work also considers user-product-aware information, for example, user product neural network (UPNN) [24], user product attention (UPA) [25], user product deep memory network (UPDMN) [26] and hierarchical user attention and product attention neural network (HUAPA) [27] have achieved excellent performance in sentiment analysis by considering internal and external connections of users and products.
These above researches demonstrate that both the representation of a text in a graph structure, user social structure and the connections between users and entities are effective for sentiment analysis and opinion mining. These studies inspire this work to take into account both aspects in the proposed framework.

Network embedding
The purpose of network embedding is to represent highdimensional and sparse vector space with low dimensional and dense vector space. The learned features can be used in machine learning tasks such as classification, regression, clustering and so on. Network embedding methods mainly include matrix eigenvectors calculation-based methods and neural network-based methods. Locally linear embedding (LLE) [28] and Laplacian Eigenmaps (LE) [29] are conventional matrix eigenvectors calculation-based methods. Nowadays, neural network-based embedding methods, such as DeepWalk [30], which inspired by the neural network language model Word2vec [31] achieves excellent performance and has been widely used. LINE [32] proposes to establish the first-and second-order proximity between nodes, and can be used in large networks. Node2vec [33] designs a biased random walk procedure to learn a mapping of nodes that maximizes the likelihood of preserving network neighborhoods of nodes. SDNE [34] uses auto-encoder to capture network structures and learn user representations.
The above researches are network embedding models and methods. In addition, there are also some literatures focusing on embedding representation algorithms for different entities. For example, Zhang et al. [35] propose a new optimization model to capture the hidden relationship between item content features for cold-start and content-based recommendations. He et al. [36] propose the bipartite graph neural network (BGNN), which consists of two central operations as interdomain message passing and intra-domain alignment to learn the embedding representations of user and item nodes. Wang et al. [37] propose SHINE model, which utilizes deep neural network structure to learn the low-dimensional representation of nodes to predict sentiment links between users and celebrities. Yuan et al. [38] propose to fuse multi-networks information to learn the nodes' low-dimensional embedding representation for user behavior classification.
The above studies on network embedding mainly focus on the field of recommender systems and user behavior analysis. The proposed scheme is different from these studies in two aspects. First, this work utilizes network embedding to address a different problem. In particular, this paper models the implicit sentiment analysis problem from special link prediction angle and explore user-entity modeling to find the implicit sentiment in social network comments. Second, this different problem raises new challenges that are not addressed in existing studies. In particular, implicit sentiment analysis requires the proposed embedding approach to be able to capture rich information from different networks, such as user social network, user-entity sentiment network, and word graph-based text level comments. These networks are highly heterogeneous. To address these challenges, a novel multiplex network embedding method called MEISP is proposed in this work. Specifically, deep neural network architecture is introduced for embedding these heterogeneous networks, and a multilayer perceptron structure-based early fusion method is adopted to fuse the text information and social network information. Furthermore, the proposed scheme can also extract nodes' highly nonlinear representations while preserving the original networks structure.

Problem formulation
To reveal the implicit sentiment polarity of social media short text, this paper explores and provides a novel angle to integrate user profiles and latent topics from text and social network structure, which is different from existing text-based methods. Specifically, rather than focusing on word-word relationship, the text embedding module processes text into graphs and represent each text as a graph of vectors. By jointly optimizing the first-order and second-order proximity in the graph, the proposed text embedding module can effectively capture the non-consecutive and long-distance semantics. Furthermore, inspired by the theory in sociology, we consider that social network information also plays an important role in understanding the context and predicting latent sentiment links. Therefore, beyond text embedding, this paper also integrates embedded information from social network structures. These implicit structures can significantly enrich the input information for sentiment analysis, especially for the dominant short-texts published on online social networks. In the subsections below, the input and output of the sentiment link prediction task are defined.
Input When predicting user sentiments in social networks, it is very complex to build a joint graph model by integrating user relationships and users' sentiments on entities. This work addresses this challenge by introducing multiplex networks embedding to learn the low dimensional vectorization representation of nodes in each graph. Through the feature learning algorithms, it can pay more attention to the important nodes and relationships, while ignoring the unimportant nodes and relationships. To extract the node embeddings, this paper constructs three kinds of graph network [39] (a) word graph (W-W) (b) user social relationship network (U-U), and (c) user-entity sentiment relationship polarity network (U-E-P).

Word graph
The occurrence, ordering, and positioning of words and the relationship between different parts of a text play the most important role in determining the sentiment polarity of a sentence. Therefore, a first fundamental question is how to select an appropriate graph-based text representation models to reveal the innate and essential information of a text graph. As far as we know, word co-occurrence graph is an effective way to represent the relationship between one word and another in social media comments. This paper defines the text graph as G T = (W, E) , where each word in a sentence is considered as a node. In the graph G T , if the corresponding words appear simultaneously in the window of k words, it can be defined that there is a co-occurrence relationship between these words. Generally, 2 to 10 seem to be appropriate according to different experiments. Therefore, this paper uses a word graph with a window size of 5.

User social relation network
A user's social relation connections can measure the local relationship of this user and has an indirect impact on his/ her behaviors. Therefore, based on such connections, users can be categorized into different communities, which is of great significance for analyzing the dissemination of social network information. In this paper, the user structure graph is denoted as is the set of users; R = r ij |R| i,j∈U is user connections, r ij can take the value of 1 or 0, which denote the user i follow the user j or not.

Sentiment relationship network
In social networks, a large number of users express their attitudes towards entities, and this topological relationship forms sentiment network. The sentiment network is denoted as G P = (U, V, P) , in particular, U and V represent users set and entities set contained in social media contents, respectively; P = p ij i∈U,j∈V is sentiment links among users and entities node pairs. Each p ij can take the value of +1 , −1 or 0 , which denote sentiment polarity between the user i and entity j are positive, negative or unobserved polarity relationship, respectively.
Output Taking G R , G P and G T as inputs, and through a well-designed multi network fusion method to learn the node low dimensional representation, and then to predict sentiment relation Ĝ P between uses and entities hidden in user comments.

Multiplex network embedding
In this section, we first show the overall framework of MEISP model, and then introduce how to extract node representations from three networks and learning algorithm in details.

Framework
The proposed method consists of two main modules: word graph-based text representation module and heterogeneous social graph network embedding module.
To capture the implicit text feature hidden in the user published content, this paper uses network embedding methods to learn the low dimensional potential representation of nodes in the network. For text representation, first, this paper processes text into graphs, and represent each text as a graph of vectors. Then, feature learning technology (representing the learning stage) is adopted to capture non-consecutive and long-distance semantics, and the internal features of the word graph-based text representation are determined.
In addition to text content, the topological relationship of users to entities forms the explicit sentiment network. To capture such information, rather than directly adopt the adjacency matrix of user-entity sentiment network, which is too long and sparse for further processing, this paper divides the original social network into user relationship structure network and user entity sentiment network, and learns their node embedding representations separately through two distinct autoencoders. Concatenation operation is then used to aggregate these above embeddings into final heterogeneous social network representation, since it can preserve more information out of the two types of heterogeneous information networks.
Once the word graph-based text embedding and heterogeneous social graph network embedding are obtained, multiple non-linear layers are stacked to learn better representations of data. In this way, the word graph-based text modeling can complement the learning of heterogeneous social network modeling, realize the seamless integration of these two modules. In the following subsections, this paper will introduce the proposed MEISP framework in detail. Figure 1 shows the work-flow of the proposed framework.

Word graph-based text representation
Node embedding representation is the process of vectorizing the nodes in a graph. In the process of learning representation, it can focus more on the important nodes and edges and ignore the unimportant nodes and edges. In the past decades, shallow networks have been used to learn network embedding representations and some high performance models have been proposed. At the same time, node embedding can be used to reveal the essential information of graph. Inspired by the recent success of deep learning, SDNE [34] a semi-supervised deep model, which has multi-layer nonlinear functions, has been proposed. The composition of multilayer nonlinear functions can map data to highly nonlinear space, and by jointly optimizing first-order and second-order proximity in the semi-supervised depth model, this method can preserve the local and global network structure, and is robust to sparse networks. Figure 2 shows the deep neural network for text embedding. Next, this paper will discuss the workflow of feature learning for graph-based text contents.
Word co-occurrence is the most effective way to represent text as a word graph. For a given text T , this paper defines the text as a graph G T = (W, E) , W = w 1, w 2 ⋯ w n denote the node set of vertices, and w i is the i − th vertex in the text. If two words appear in a context within a window of size k , they are considered to have a word-word relation, and E = e ij | | | i ∈ W, j ∈ W denotes the set of edges between vertices pairs w i , w j , and in this study, k is set as 5.
In this model, the auto-encoder is an unsupervised model, which consists of encoder and decoder. The encoder maps the input data to the representation space through multiple nonlinear operations, while the decoder maps the embeddings from the low dimensional potential space to the reconstruction space through multiple nonlinear functions.
For the i − th vertex, the adjacency matrix s i is used as input, and each s i contains the neighbor structure information of vertex i , the representation of each hidden layer [34] is as follows: After obtaining y (l) i , the output x i can get by reversing the operation process of the encoder.
The goal of the autoencoder is to minimize the reconstruction error of the output and input, so the reconstruction process can make the vertexes with similar structure have similar embedding representation vectors. Therefore, it is necessary to use a weighted loss function with a high penalty coefficient for non-zero elements. In other words, by reconstructing the second-order nearest neighbors between the vertices to preserve the global network structure. And then in order to make the two connected nodes very close in the mapping space, we also define the first-order similarity. The first order and the second order similarity are denoted as: . If s i,j = 0 , then b i,j = 1 , otherwise b i,j = > 1 . The parameter controls the penalty term coefficient of non-zero elements.
The semi-supervised learning method ensures the firstorder similarity and the second-order similarity respectively, the minimized objective function of joint optimization is: Fig. 1 Framework of the MEISP model is the parameter controlling the first-order loss. The representations of word graph network can be denoted as h T i = y (L) i . It is note that not all text words contribute equally important to the text sentiment, hence, this paper tries to learn unified embedding by assigning different importance weights to different words. The weight coefficients can be calculated as follows: where the word representation h T i obtained is used as input of multi-layer perceptron model, and then get the node representation z T i ,z w is the context word matrix. The final representation of the vertex v i is:

Sentiment relationship network
For the given sentiment relationship network G P = (U, V, P) , U and V are user and entity nodes set respectively, sentiment polarity adjacency vector can be denoted as The sentiment polarity adjacency vector x P contains global sentiment information of user i to entity j . However, in sentiment prediction task, due to the sparsity of adjacency vector x P , it cannot be used as sentiment relation directly. Recently, network embedding methods, which can learn rich network structure and semantic information, has attracted a lot of attention, and motivated by excellent performance of deep autoencoder in capturing highly nonlinear network structures via deep models, this paper introduces deep autoencoder structure to learn sentiment relationship network embedding. In the sentiment relationship autoencoder, encoder compresses inputs into potential space representation, decoder reconstructs outputs from potential space representation. Both encoder and decoder are neural network models and include multiple nonlinear layers. Figure 3 shows the architecture of autoencoder for user social relation and user-entity sentiment network embedding. Given the adjacency matrix x Pi as input, the hidden representations of each layer l are: where W P and b P are the weight and bias in the sentiment deep architecture, respectively, (⋅) is the nonlinear denotes the sentiment polarity network embedding representation, and the output of L P layer x Lp Pi denotes sentiment reconstruction vector x Pi . In the process of sentiment polarity network embedded representation and data reconstruction, our basic goal is to minimize the reconstruction loss between input and output data. Therefore, the reconstruction loss term of sentiment polarity deep autoencoder is: where x P −x P is used to measure the distance between x P and x P , ⊙ is the Hadamard product and l P ij is used to define reconstruction weight vector of sentiment link. In other words, if there is a sentiment connection between user node u i and entity node v j , whether it's positive or negative, this paper defines the weight vector l P ij = > 1 , else l P ij = 1 . The meaning of the reconstruction weight vector lies in that we impose more penalty to the reconstruction error of the nonzero elements than that of zero elements in input x P i .

User social relation network
For the given user social relation network G R = (U, R) , this paper collects the social relation among user pairs u i , u j . The relationship of each user pair can take a value of 1 or 0, indicating whether user i follows user j . Similar to sentiment network embedding, this paper also employs deep neural network to embed user social relation. Given the adjacency matrix x Ri as input, the hidden representations of each layer l are: where W R and b R are the weight and bias in the social relation deep architecture, respectively. (⋅) is the nonlinear activation function. The output of L R∕ Ri denotes the user social relation network embedding representation, and the output of L R layer x L R Ri denotes social relation reconstruction vector x Ri .
The design of loss function of social relation embedding is the same reconstruction loss term of sentiment polarity network, and then the social relation loss is: where x R −x R is used to measure the distance between x R and x P , ⊙ is the Hadamard product and l R ij is used to define reconstruction weight vector of user social relation link. That is if there is a link between u i and u j , this paper defines the weight vector l R ij = > 1 , else l R ij = 1.The meaning of the reconstruction weight vector lies in that we impose more penalty to the reconstruction error of the non-zero elements than that of zero elements in input x R i .

Optimization and sentiment prediction
Our work is to study heterogeneous social network-based and text-based multiplex network embedding for implicit sentiment mining and analysis. To combine the strength of both word graph-based text representation and heterogeneous social graph network modeling, an intuitive way is to choose concatenation to fuse the above information. However, the embedding representation obtained by concatenation is not effective for link prediction. Instead, early fusion model can optimizing all parameters simultaneously, so that the interaction between the word graph-based text modeling and heterogeneous social network modeling is more closely. Therefore, this paper uses the following methods to fuse the three kinds of network information.
First, for a given use/item node i , both the social relation network embedding and the sentiment network embedding are equally important to the final representation. Therefore, the proposed method first aggregates these two embedding into a final node embedding S i through a concatenation function, and the finally social network-based embedding can be denoted as: Then, to fuse the text-based information and social network-based information, this paper integrates these two modules by early fusion model, that is S i and T i are used as the input of the multi-layer perceptron model, and then the representations of each hidden layer can be calculated by the following formulas: k is the activation function, L is the number of hidden layers. The final embedding representation can be denoted as L . Through the above nonlinear fusion, heterogeneous social network modeling and word graph-based text modeling are combined. In particular, for a given user node i and an entity node j , inner product operation is used to predict sentiment p ij , then, according to the reconstruction loss of three types deep autoencoders and supervised loss between predicted sentiment link polarity and ground truth, the objective function can be considered as: where 1 , 2 3 and 4 are balancing parameters. The first three terms in Eq. (16) are the reconstruction loss terms of sentiment autoencoder, user social relationship autoencoder, and text autoencoder, respectively. The fourth term is the supervised loss term for penalizing the divergence between predicted sentiment and ground truth. The fifth term is the regularization term that prevents over-fitting.

Experiment
In this section, this paper conducted several experiments to bring out different aspects of the proposed MEISP method in comparison with the baselines and the state-of-the-art sentiment analysis methods. All of our experiments are run on Intel Xeon 2.30 GHz processor, with 256 GB RAM cluster and Nvidia Quadro P620 GPUs. The operating system and software platforms are Ubuntu 16.04, TensorFlow 1.14 and Python 3.5.

Database
To verify the application performance of the proposed model in social media, this paper collected Weibo data for experiments. To remove noise, text with less than 10 words is firstly filtered out. Secondly, useless symbols such as spaces and emoticons contained in the text are removed, and only keep the words and numbers in the sentence. Then Jieba is used to segment the text. The stop words in the text are removed. Finally, the obtained text content is transformed into word graph network for feature learning. After the above processing the obtained data contains useful Weibo towards Movies dataset which contains users' sentiment towards entities with 8735 texts, 1983 users, 1357 entities, 32,732 social links. And then, through the entity-level sentiment extraction method, this paper establishes the sentiment dataset with label information, which consists of users' sentiment relation and social relation. The current amount of data is enough to extract valuable information from complex network data and complement text information for online opinion mine and sentiment analysis. Therefore, the experimental results are statistically significant.

Parameters setting and sensitivity
For word graph-based text encoder, the reconstruction weight of non-zero elements = 10 and the weight of firstorder term = 0.05 . Besides, this paper designs a 4-layer encode architecture for both social relationship autoencoder and sentiment autoencoder, trade-off parameter 4 = 0.01 . Based on these parameters settings, this paper evaluates embedding dimensions, reconstruction weight of sentiment and social autoencoder, the balance parameters 1 , 2 , 1. Embedding layer dimension, reconstruction weight γ of social network autoencoders Specifically, this paper first fixes the balancing parameters 1 = 2 = 1 , 3 = 20 , and test the embedding layer dimension and reconstruction weight ( l R = l P = ). Figure 4a shows the influence of the embedded layer dimension and the reconstruction weight of the social autoencoders on the accuracy of the model, and this paper gets the following results. First, with the embedding dimension increased from 10 to 100, the per- formance is improved because the larger embedding dimension indicates that more useful information can be encoded into the autoencoder. However, with the embedding dimension increased from 100 to 500, it may introduce noises and then the performance starts to decline. Second, this paper sets = 1 , = 10 , = 20 and = 30 , respectively, to test the influence of different reconstruction weights of non-zero elements on the performance. The result shows that when = 20 , the accuracy is the best, because when = 1 and = 10 there is not much discrimination between zero elements and non-zero elements in the MEISP model reconstruction stage; when = 30 the performance will decrease because large makes MEISP insensitive to the dissimilarity between nodes.

Balancing parameters
Second, by fixing the reconstruction weights of social autoencoder and sentiment autoencoder non-zero elements l R = l P = = 20 , the sentiment and social embedding dimensionality d = 128 , this paper sets 1 and 2 to 0 or 1, respectively, and vary the value of 3 to study performance. For different values of 1 and 2 , it indicates whether the word graph embedding module and the social relation embedding module are considered in the MEISP model. From Fig. 4b we make the following observations. (1) When 1 = 1 or 2 = 1 , the model shows relatively better performance because the social relation information or word co-occurrence graph information is added into the sentiment network, respectively. When 1 = 1 and 2 = 1 , the performance of MEISP is further improved. This paper will further use ablation experiments to verify the accuracy of sentiment link prediction of different network embedding models in section "Network embedding analysis". (2) The performance improves at the beginning when the 3 increases from 1 to 20. This is because MEISP focuses more on sentiment link prediction errors. However, when 3 continuously increases (e.g., 3 = 30 ) the performance starts to drop slowly. The reason is that too large 3 breaks the trade-off among loss terms in the objective function.

Different number of hidden layers of early fusion
After the text-based information and social networkbased information embedding representation, this paper tries to integrate these two kinds of information by early fusion method. The generalization ability of early fusion improves with the increase of the number of hidden layers. However, because of the optimization difficulties, it may also degrade the performance of the model. In this situation, this paper tests the performance of link prediction under different number of early fusion hidden layers.
As can be seen from Table 1, first, when there are only input and output layers, the AUROC and Micro-F1 values of sentiment prediction are relatively weak. With the increase of the number of hidden layers, the performance of the model is gradually improved. This indicates that the use of deeper architecture can improve the sentiment analysis performance effectively, and the generalization ability of the model is enhanced. However, as the number of hidden layers increases, the training time of CPU it also increases, and even the tendency of "over fitting". Specifically, if there is only one layer in the early fusion layer, it takes 22.1 s for one epoch, while there are three hidden layers, it will take three times longer.

Network embedding analysis
The purpose of multiplex network embedding is to learn the low-dimensional vector representation of different nodes, and then to reveal implicit sentiment polarity via similarity measurement function. In MEISP model, it contains word co-occurrence relations graph embedding (W-W), social relationship network embedding (U-U) and user-entity sentiment network embedding (U-E-P). In this section, this paper conducts ablation study to verify the effects of different models on sentiment analysis performance. Table 2 shows different model types.
MEISP N is set as a baseline model, which only contains user-entity sentiment polarity network for sentiment prediction. On the basic of MEISP N model, it adds word co-occurrence graph embedding and social relation graph embedding respectively, and record them as MEISP W and First, we randomly hide 50% edges from the user-entity sentiment network, and construct a balanced test set in which the number of positive links equal to the number of negative links, and then the remaining network used to train the node representations via different models. The Precision, recall and F1value of experiment results are shown in Table 3.
From Table 3, we have the following observations. First, with the addition of U-U link information network and W-W information network to the MEISP N model, the accuracy of sign link prediction has been gradually improved. Compared with the MEISP N model, the MEISP w model has increased the Precision, Recall, and F1 values by 7.63%, 2.81%, and 4.75%, respectively. Compared with the MEISP N model, the MEISP R model has increased the precision, Recall, and F1 values by 10.40%, 7.62%, and 8.95%, respectively. It is proved that the MEISP w and MEISP R models can improve the performance of sentiment link prediction, optimize the effect of sentiment analysis, and have universal applicability by introducing user relationships, word graph information. Second, for MEISP model, U-E-P sentiment polarity network, U-U link information network, and W-W graph information network are introduced to realize the embedding and fusion of multiple networks for sentiment link prediction. When compared with the MEISP N model, MEISP has improved the precision, recall, and F1 values by 15.40%, 10.56%, and 11.67%, respectively. Third, comparing the MEIS P model with the MEISP W and MEISP R models, the precision, recall, and F1 values have been improved significantly.
In summary, these comparative experiments fully verify that the MEISP model proposed in this paper has significantly improved the sentiment analysis of Weibo by adopting user relationships and word co-occurrence information.

Sentiment link prediction
In order to verify the embedding effect of MEISP model, we use link prediction as the downstream task, and compare it with baseline embedding methods LINE [32], Node2Vec [33] and SDNE [34].
For Weibo datasets, the training set and the testing set are given, and we randomly split the training set into actual training and validation data with a ratio of 9:1. The hyperparameters are tuned according to the performance on the validation set. Empirically, we set the learning rate as 0.01 with Adam [40] optimizer and the dropout rate as 0.5. We have done two groups of experiments. First, we randomly take 50% links of sentiment network out as the test set, and the remaining network is used to train network embedding. Then, the accuracy and AUROC are adopted as evaluation metrics. The experimental results are shown in Table 4.
It can be seen from Table 4 that our proposed MEISP model achieves Accuracy of 82.9% and AUROC of 81.6%. In term of accuracy, the MEISP method outperforms LINE, node2vec by 20.2%, 19.8%, respectively, and MEISP is about 14% higher than the network embedding model SDNE which also uses deep autoencoder. And achieves at least 14% gains on AUROC compared with the other three schemes.
Second, we randomly take 50-100% of the sentiment network links out as the test set to further verify the sentiment analysis results in the case of sparse network. The final results are shown in Fig. 5.
As can be seen from Fig. 5, with the increase of removed sentiment links from 50 to 100% in the test set, the AUROC values of all models decreased. For MEISP model, when the removed sentiment links are 50% and 100%, the AUROC values are 81.6% and 75.45%, respectively, which decreases by about 6%. However, for LINE, Ndode2vec and SDNE models, AUROC values decreased by 21%, 15% and 12%,

Fig. 5
Accuracy and AUROC on link prediction tasks respectively. It demonstrate the robustness of MEISP model is better than other models when the network is sparse. The above two experiments show that compared with LINE, Node2vec and SDNE embedding methods, the proposed MEISP model can get better embedding representation results. Second, deep autoencoder structure-based multiplex network embedding is effective and more suitable for modeling sentiment analysis problem from link prediction angle.

Cold start scenario
It is difficult to judge the hidden sentiment relationship between new users and entities, when new user or entity nodes are just added in the network, which is the cold start problem. The most intuitive way is to use the information from the target network to learn new nodes embedding. For example, in the sentiment link prediction problem, the model only uses the information from the user-entity sentiment link network to solve the cold start problem. However, due to the little interaction between the new arrival node and the existing target network, the performance of sentiment link prediction is not satisfactory. The proposed scheme does not only consider the interaction between the new added node and the original network, but also makes full use of the side information between heterogeneous nodes.
In this section, we study the performance of MEISP when new nodes join the existing target network. We construct a test set of newly text, and the user social relations and user entity sentiment relationships contained in the new text are not used in the training process. The results for all and new added sentiment link prediction are shown in Table 5.
From the results, for MEISP, the Micro-F1 of new sentiment link prediction decreases by 1.6% when compared with all sentiment link prediction results, but for LINE, Node2Vec and SDNE, new sentiment link prediction results decrease by 11.9%, 10.7% and 6.7%, respectively. It can be seen that MEISP is still perform better in the cold start scenario, because the proposed multiplex network embedding method can mine the information from social network information and text information, which makes the obtained information for sentiment relationship analysis more complete.

Sentiment analysis accuracy
At present, many sentiment analysis methods are proposed from the feature extraction angle and word embedding representation angle and most of them have achieved excellent performance. In this section, we compare MEISP model with a number of existing methods as listed below.

Traditional machine learning methods
TextFeature extracts text features including word and character n-grams, sentiment lexicon features, etc., and then train a SVM classifier.
Trigram trains a support vector machine (SVM) classifier with unigrams, bi-grams and trigrams as features.

Deep learning method
Text-based sentiment analysis models SVM + word2vec word embeddings trained by word-2vec and SVM is used as classifier.
UPNN user product neural network [24], which incorporates user and product information using CNN. Link prediction-based sentiment analysis models MEISP our proposed sentiment classification method uses graph-based text representation module and heterogeneous social network embedding modules to reveal implicit sentiment links.
In this part of the experiment, balanced data is selected in order to reduce the bias to a certain category, which ultimately affect the sentiment analysis accuracy. And the optimal parameter setting is adopted for all models. The results are shown in Table 6. As can be seen from Table 6, first, compared with the traditional sentiment analysis methods of Trigram and Textfeature, Word2vec + SVM and SSWE + SVM are more effective due to the deep learning-based word representation. Second, we notice that our method significantly outperforms UPNN model which is designed for sentiment classification specifically. The main difference between the MEISP and UPNN is the word representations derived. For UPNN model, it incorporate user and product level information into a convolutional neural networks, and only uses word embeddings trained with context information from text content only. In the MEISP model, it generates word representations from constructed multiplex networks embedding model, which inherently consider word emergence, ranking positioning and the relationship between different components of the text, as well as the heterogeneous social network information (user social relation network, user-entity sentiment network) during representation learning.

Error analysis
For the given user-entity sentiment relationship network G P = (U, V, P) , the sentiment polarity adjacency vector x P contains global sentiment information of user i to entity j . However, in sentiment prediction task, due to the sparsity of adjacency vector x P , it cannot be used as sentiment relation directly. The proposed MEISP model can capture word graph-based text information and heterogeneous social graph network information. We analyzed our predicted sentiment labels in section "Experiment" experiment part. In the confusion matrix, we observed that our model incorrectly linked the sentiment polarity relationship between users and entities. This is because (1) when the context is short, the text information is scarce, or (2) when the heterogeneous social networks are sparse, it is difficult to infer the sentiment of the user to entity. Therefore, the accuracy of each model is not high. As contextual information and heterogeneous social network information increase, the accuracy of all models is rising, indicating that text information and heterogeneous networks information are positively correlated with the accuracy of the model.

Conclusions
In this paper, sentiment analysis is formulated as a special sign link prediction problem, which is addressed by combining user social link information, user-entity sentiment polarity information and word graph-based text level information through heterogeneous graph network embedding. In particular, a novel multi-network embedding method called MEISP is proposed for more effective and efficient embedding. This proposed work is different from current sentiment classification researches, which mainly determine the sentiment polarity of text from grammatical and semantic rules. In addition, it is different from the current multi-network embedding methods by considering both text information and social network information. Furthermore, it also extracts nodes' highly nonlinear representations and text structure information while preserving the structure of original networks. Extensive experiments are conducted to evaluate the performance of the proposed scheme. Experimental results prove the competitiveness of MEISP against multiple baselines and demonstrate the effectiveness of adopting user link information, user-entity sentiment polarity information and word graph-based text level information for embedding representation nodes in social networks to predict sentiment sign link polarities. The future works can focus on employing other graph-based representation methods to extract hidden characteristics of a network, exploiting the attribute information of edges to enrich the initial features of the network, and employing other innate features and information in the social media to enhance sentiment analysis techniques.

Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Availability of data and material (data transparency) The data cannot be shared at this time as the data also forms part of an ongoing study.
Code availability (software application or custom code) As the current research is still in progress, we decided not to share the code for the time being.
Ethics approval This article does not contain any studies with human participants or animals performed by any of the authors.

Consent for publication Not applicable.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.