Modeling multiple latent information graph structures via graph convolutional network for aspect-based sentiment analysis

Wang, Jiajun; Li, Xiaoge; An, Xiaochun

doi:10.1007/s40747-022-00940-1

Modeling multiple latent information graph structures via graph convolutional network for aspect-based sentiment analysis

Original Article
Open access
Published: 16 December 2022

Volume 9, pages 4003–4014, (2023)
Cite this article

Download PDF

You have full access to this open access article

Complex & Intelligent Systems Aims and scope Submit manuscript

Modeling multiple latent information graph structures via graph convolutional network for aspect-based sentiment analysis

Download PDF

1425 Accesses
5 Citations
Explore all metrics

Abstract

Aspect-based sentiment analysis (ABSA) aims to determine the sentiment polarity of aspects in a sentence. Recently, graph convolution network (GCN) model combined with attention mechanism has been used for ABSA task over graph structures, achieving promising results. However, these methods of modeling over graph structure fail to consider multiple latent information in the text, i.e., syntax, semantics, context, and so on. In addition, the attention mechanism is vulnerable to noise in sentences. To tackle these problems, in this paper, we construct an efficient text graph and propose a matrix fusion-based graph convolution network (MFLGCN) for ABSA. First, the graph structure is constructed by combining statistics, semantics, and part of speech. Then, we use the sequence model combined with the multi-head self-attention mechanism to obtain the feature representation of the context. Subsequently, the text graph structure and the feature representation of context are fed into GCN to aggregate the information around aspect nodes. The attention matrix is obtained by combining sequence model, GCN and the attention mechanism. Besides, we design a filter layer to alleviate the noise problem in the sentence introduced by the attention mechanism. Finally, in order to make the context representation more effective, attention and filtering matrices are integrated into the model. Experimental results on four public datasets show that our model is more effective than the previous models, demonstrating that using our text graph and matrix fusion can significantly empower ABSA models.

Exploring rich structure information for aspect-based sentiment classification

Article 30 July 2022

Aspect Fusion Graph Convolutional Networks for Aspect-Based Sentiment Analysis

Cross and Self Attention Based Graph Convolutional Network for Aspect-Based Sentiment Analysis

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

The aspect-based sentiment analysis (ABSA) task aims to determine the sentiment tendency of a particular aspect in a sentence. For example, “The environment of this restaurant is dirty, but the food is delicious.” there are two aspects “environment” and “food”, and the user expresses negative and positive sentiments over them, respectively. Thus, ABSA can precisely judge the sentiment tendency of a specific aspect, rather than simply judging sentiment polarity for a sentence.

In recent studies, various neural network models such as Recurrent Neural Networks (RNN) [1] and Convolutional Neural Networks (CNN) [2, 3] are widely used in aspect-based sentiment analysis tasks. With the popularity of the attention mechanism, more researchers [4,5,6,7] combine attention mechanism and neural network to reflect the degree of influence of each word on the aspect. However, the attention mechanism is susceptible to the noise in the sentence and mistakenly focuses on irrelevant words, and sequential models modeling context semantic relationships easily lose long-distance information.

Compared with sequential models, graph convolutional neural networks (GCN) [8] can handle complex data structures and model information to the global level. Because of the advantages of GCN, more researchers [9,10,11,12] combined dependency trees with GCN to make better use of syntactic dependency in sentences. However, the following problems arise when using dependency trees in ABSA tasks: (1) due to the establishment of the dependency tree structure is automatically created according to other tools, the dependencies will be inaccurate and uncontrollable. (2) Insensitive to domain-specific datasets [13]. To solve the problem of the dependency tree, researchers [13,14,15,16,17,18] makes improvements and enhancements based on dependency trees to solve ABSA tasks. However, these methods only consider the dependencies between words, but fail to consider more important information.

Information on the context, semantics, part of speech, and sentiment knowledge among words are essential when constructing GCN for aspect-based sentiment analysis. However, only a limited number of researchers model multiple hidden information via GCN for this task. For example, Yao et al. [19] proposed a text graph-based neural network (TextGCN). In their model, a text graph is first constructed based on the sequential contextual relationship between words. In [20], they combine syntax and knowledge via a GCN. More researchers [21, 22] focus on building graph structures based on rich contextual information, such as semantic and syntactic contextual information. The effective graph learning is not only important in the ABSA field, but many scholars have also done a lot of meaningful work [23, 24] in the graph learning in terms of multiple kernel graph-based clustering (MKGC). It can be seen that researchers have been paid more and more attention to effective graph learning.

Inspired by these studies, we model multiple latent information graph structures via GCN for aspect-based sentiment analysis. In the text graph structure building module, we integrate three different perspectives of statistics, semantics, and part of speech to build graph structures. The learned graph structure can fully extract the latent information in the text. In the ABSA module, we design a matrix fusion-based GCN for better context coding. In the ABSA field, most researchers directly combined the output of one type of neural network with the attention mechanism. They did not adequately combine the sequential neural network, GCN and attention mechanisms. Therefore, we first use the sequence model and the multi-head self-attention mechanism to generate the feature representation of the context. Second, the text graph and context feature matrix of graph structure are input into the graph structure model GCN. The aspect features that aggregate the information of adjacent nodes of the graph structure are obtained through multi-layer GCN. Then we calculate the attention matrix from the aspect features and the context features obtained from the sequence model. The filter matrix is obtained from the output of GCN through the aspect irrelevant information filter layer. Finally, the aspect-related information is enhanced using a matrix fusion layer. Finally, the aspect-related information is enhanced using a matrix fusion layer.

Our contributions are as follows:

We propose a graph convolution network model with multiple latent information for ABSA tasks. Our model considers the statistics, semantics, and part of speech within a sentence. We combine the graph structure obtained by the statistical method and the graph structure obtained by semantic similarity. Then the final graph structure is enhanced by part of speech rules.
We use a matrix fusion-based GCN over the graph structure. First, the attention matrix is obtained by combining the aspect feature output of GCN, the text feature output of the sequence model LSTM and the attention mechanism. Then, we use the information filter layer after GCN to filter the irrelevant information of the aspect. Finally, we combine the attention matrix and the filter matrix to get the final matrix.
Extensive experiments are conducted on four benchmark datasets to illustrate the effectiveness of our model for the ABSA task.

Related work

Aspect-based sentiment analysis is a fine-grained sentiment analysis task. Recently, most research work on aspect-based sentiment analysis (ABSA) use neural network and attention to associate aspect and context to capture semantic information hidden in sentences. For instance, Wang et al. [1] and Ma et al. [6] propose a model to compute the attention of aspects and sentences directly. Tang et al. [4] uses multiple levels of attention to the model. Chen et al. [5] adds the weighted memory mechanism on the basis of multi-layer attention. Fan et al. [7] propose a model to learn the representation containing sentence and aspect-related information, integrate it into the multi-granularity sentence modeling process, and finally get a comprehensive sentence representation. Huang and Carley [2] propose a novel parameterized convolutional neural network for aspect-level sentiment classification. They use parameterized filters and parameterized gates on CNN to incorporate aspects of information. Tan et al. [25] introduced a dual attention network to recognize conflicting opinions. However, these studies ignore the syntactic dependence between words in sentences, which may lead to ambiguity in identifying the polarity of specific targets.

In order to solve this problem, dependency trees are introduced into the ABSA task. Dependency trees can capture dependencies between words and enhances the connection between aspect and related words. More researchers use graph convolution neural network (GCN) to model based on dependency trees. Sun et al. [10] present a convolution over dependency trees model which combines Bi-directional Long Short-Term Memory (Bi-LSTM) and GCN. Zhang et al. [9] propose a model to combine attention mechanism and GCN on dependency trees. Xiao et al. [12] improve the GCN and combine a GCN with a multi-head attention mechanism for aspect-based sentiment analysis. Although dependency trees perform well in ABSA tasks, it also has some defects. The methods based on dependency trees are inaccurate and insensitive to domain-specific datasets. In order to alleviate these problems, more researchers have enhanced the graph structure information on dependency trees. Chen et al. [15] propose a model to combine dependency trees and latent graphs which are generated by self-attention networks. Wang et al. [16] utilize reshaping and pruning methods to make the ordinary dependency trees focus on the target aspect. Zhou et al. [20] employ a new Syntax- and Knowledge-based Graph Convolutional Network (SK-GCN) model for aspect-level sentiment classification. Tang et al. [17] use dual graph convolutional networks to improve the insensitivity of dependency syntax trees to online reviews.

In this paper, the text graph structure is generated by combining different methods. The statistic-based graph can consider the co-occurrence frequency between words in the text. The semantic-based graph can consider the rich semantic relationship of context in text and well-build graph structure in different domain datasets. The part of speech rules-based graph can enrich the relationship of specific aspects and strengthen the importance of aspects in the graph. We combine the aspect word feature output of GCN, the text feature output of the sequence model and the attention mechanism based on the constructed graph structure to extract attention matrix. We also use GCN and aspect information filter layer to extract aspect-related information matrix. Finally, we integrate the attention matrix considering the importance between aspect and other words and the matrix filtering irrelevant aspect information to obtain the final representation.

Our model

Figure 1 gives an overview of our model. It contains the graph building part and the aspect-based sentiment analysis part. In the graph building part, we first use statistical methods to calculate the co-occurrence between words and use it as a basis for forming each edge in the statistic-based graph $G_1$. Second, we construct the semantic-based graph $G_2$ using the semantic similarity between words as a basis. Thirdly, we keep the common edges in $G_1$ and $G_2$ and delete the rest to get a graph. Finally, we process the nodes corresponding to the edges around the aspect and other edges corresponding to these nodes based on the part of speech rules. The edges that conform to the set part of speech rules are added and those that do not conform are deleted, thus obtaining the graph $G_3$. In the aspect-based sentiment analysis part, we first utilize Bi-LSTM to extract hidden contextual representations $H^t$. Then these hidden representations $H^t$ are fed into GCN as features of the graph nodes. The vector representations $H^l$ that aggregates relevant information after GCN is obtained. We use the multi-head self-attention mechanism to link words more effectively to get $H^m$. The attention matrix $M_1$ is obtained by calculating the aspect vector in aspect and $H^m$ for attention. The aspect-related information matrix $M_2$ is obtained by passing the $H^l$ and the aspect vector through the filtering information layer. Finally, the matrix fusion of $M_1$ and $M_2$ is performed to obtain the final text representations.

Construction of the text graph

The statistic-based graph integrates the co-occurrence information between words. We combine the sliding window strategy and point-wise mutual information (PMI) to express the degree of association between words. The basis for the existence of edges between nodes in the graph can be formulated as

$$\begin{aligned}{} & {} T_{\textrm{st }}\left( w_{i}, w_{j}\right) =\log \frac{p\left( w_{i}, w_{j}\right) }{p\left( w_{i}\right) p\left( w_{j}\right) } \end{aligned}$$

(1)

$$\begin{aligned}{} & {} p\left( w_{i}, w_{j}\right) =\frac{N_{\left( w_{i}, w_{j}\right) }}{N_{\textrm{total}}} \end{aligned}$$

(2)

$$\begin{aligned}{} & {} p\left( w_{i}\right) =\frac{N_{\left( w_{i}\right) }}{N_{\textrm{total}}} \end{aligned}$$

(3)

$$\begin{aligned}{} & {} p\left( w_{j}\right) =\frac{N_{\left( w_{j}\right) }}{N_{\textrm{total}}}. \end{aligned}$$

(4)

Statistic-based graph

$p(w_i,w_j)$ is co-occurrence of $w_i$ and $w_j$ is the total number of the sliding windows over the whole dataset. $N_{(w_i)}$ is the number of occurrences that the word $w_i$ occurs in the sliding windows over the whole dataset. The weight between the word nodes $w_i$ and $w_j$ is defined as

$$\begin{aligned} M_{\textrm{st}_{ij}} \left\{ \begin{array}{ll}1,&{}\quad T_{\textrm{st}}\left( w_{i}, w_{j}\right) >\text {threshold}_{\textrm{st}} \\ 1, &{}\quad i=j \\ 0, &{}\quad T_{\textrm{st}}\left( w_{i}, w_{j}\right) <\text {threshold}_{\textrm{st}} \end{array}\right. \end{aligned}$$

(5)

where $M_{\textrm{st}}\in {{\mathbb {R}}}^{n{\times }n}$ is the adjacency matrix representation of the statistic-based graph. $M_{\textrm{st}_{ij}}$ represents whether the $i\textrm{th}$ node is connected to the $j\textrm{th}$ node in the graph. The $\hbox {threshold}_{\textrm{st}}$ is the standard to judge whether nodes are connected.

Semantic-based graph

Bert can fine-tune and act as a feature extractor according to the needs of the task to obtain high-quality word vector representation. LSTM can capture context semantic relationships. We use the Bert-LSTM model to represent the word vector of the text (shown in Fig. 2) and then calculate the cosine similarity between words to construct the edges in the semantic graph. The weight calculation formula of semantic relationship between words is as follows:

$$\begin{aligned} T_{\textrm{se}}\left( w_{i}, w_{j}\right) =\cos \langle v_{i}, v_{j}\rangle =\frac{v_{i} \cdot v_{j}}{\left\| v_{i}\right\| \times \left\| v_{j}\right\| } \end{aligned}$$

(6)

where $v_i$ and $v_j$ are the vectors of words $w_i$ and $w_j$, respectively. The weight between the word nodes $w_i$ and $w_j$ is defined as

$$\begin{aligned} M_{\mathrm{se\,}_{ij}} \left\{ \begin{array}{ll}1,&{}\quad T_{\textrm{se}}\left( w_{i}, w_{j}\right) >\text {threshold}_{\textrm{se}} \\ 1,&{}\quad i=j \\ 0,&{}\quad T_{\textrm{se}}\left( w_{i}, w_{j}\right) <\text {threshold}_{\textrm{se}} \end{array}\right. \end{aligned}$$

(7)

where $M_{\textrm{se}}\in {{\mathbb {R}}}^{n{\times }n}$ is the adjacency matrix representation of the semantic-based graph. $M_{\textrm{se}_{ij}}$ represents whether the $i\textrm{th}$ node is connected to the $j\textrm{th}$ node in the graph. The $\hbox {threshold}_{\textrm{se}}$ is the standard to judge whether nodes are connected.

Graph enhancement

In order to integrate statistical and semantic information simultaneously, we build a graph that contains the common edges of the semantic-based graph and statistic-based graph. The weight between the word nodes $w_i$ and $w_j$ is defined as

$$\begin{aligned} M_{\textrm{fusion}_{i j}}\left\{ \begin{array}{ll} 1,&{}\quad M_{\textrm{st}_{i j}}=1\quad \text {and}\quad M_{\textrm{se}_{i j}}=1 \\ 1,&{}\quad i=j \\ 0,&{}\quad M_{\textrm{st}_{i j}}=0 \quad \text {or}\quad M_{\textrm{se}_{i j}}=0 \end{array}\right. \end{aligned}$$

(8)

where $M_{\textrm{fusion}}\in {{\mathbb {R}}}^{n{\times }n}$ is the adjacency fusion matrix representation of statistic-based graph and semantic-based graph. In order to enrich the information associated with aspect nodes in the graph, we use part of speech rules to enhance the graph structure, as shown in Fig. 3.

In the graph structure, if the part of speech of node $w_{(i+1)}$ directly connected with aspect $w_i$ is an adverb or verb, it will enter the next stage of part of speech judgment. if the part of speech of node $w_{(i+2)}$ directly connected with aspect $w_{(i+1)}$ is an adjective, add an edge between $w_i$ and $w_{(i+2)}$.

Overall, the graph structure we built is a graph with properties, such as the following: (1) captures the co-occurrence between words; (2) integrates rich contextual semantic relations; (3) considers the relationship between aspects and other words at the part of speech level.

Contextualized word representation

A sentence $s=\{w_1^t,w_2^t,w_3^t,\ldots ,w_n^t\}$ containing aspects $a=\{w_1^a,w_2^a,w_3^a,\ldots ,w_{m+1}^a\}$ is given, where n is the length of the sentence and m is the length of the aspect. We use a Bi-LSTM encoder to capture contextual semantic relationships and obtain the contextualized word representations $H^t=\{h_1^t,h_2^t,h_3^t,\ldots ,h_\beta ^a,h_{\beta +1}^a,h_{\beta +2}^a,\ldots ,h_{\beta +m}^a,\ldots ,h_n^t\}$ $\in {{\mathbb {R}}}^{n{\times }{d_n}}$, where $d_h$ is the dimension of the hidden state vectors.

Multi-head self-attention (MHSA)

Multi-head self-attention (MHSA) is an attention mechanism that can parallelly operate in space. Compared with self-attention, the MHSA can extract more abundant semantic features, as shown in Fig. 4. First, we initialize Q, K and V to make them equal to the input $H^t$, respectively. The different weight matrices W is then used to map Q, K and V to different subspaces. The subspace mapping formula is shown in the following formula:

$$\begin{aligned} {\left\{ \begin{array}{ll}{Q_i}=QW_i^Q \\ {K_i}=KW_i^K&{}(i=1,2,3,\ldots ,n)\\ {V_i}=VW_i^V\end{array}\right. } \end{aligned}$$

(9)

where $W_i^Q$, $W_i^K$ and $W_i^V$ are the learning parameter matrices of the i-th head, respectively. n is the number of the heads. In each subspace, the output is calculated according to the attention function with K, Q and V as inputs. The calculation formula of attention is shown in the following formula:

$$\begin{aligned} {\text {Attention}}(Q, K, V)={\text {softmax}}\left( {\hbox {tanh}\left( W_{m} \cdot [ K ; Q ]\right) }\right) V\nonumber \\ \end{aligned}$$

(10)

The formula (10) is essentially the weighted calculation of V. The weight is obtained by a softmax function. In the softmax, the tanh function is used to calculate the correlation between K and Q. $W_m$ is the learning weight matrix. After obtaining the output of each subspace, they are concatenated to obtain the final output:

$$\begin{aligned} \text {head} _{i}= & {} \text {Attention}\left( Q_i, K_i, V_i\right) \left( i=1,2,3,\ldots ,n\right) \end{aligned}$$

(11)

$$\begin{aligned} H^m= & {} MHSA(Q,K,V)=\text {Concat}\left( \hbox {head}_1,\ldots ,\hbox {head}_n\right) \nonumber \\ \end{aligned}$$

(12)

where $\hbox {head}_i$ is the output of i-th head attention calculation and $H^m=\{h_1^m,h_2^m,h_3^m,\ldots ,h_n^m \}\in {{\mathbb {R}}}^{n{\times }{d_e}}$ is the final output of the MAMS. $d_e$ is the dimension of MHSA.

Graph convolution network (GCN)

Graph convolutional network (GCN) [8] uses convolution operation to encode graph structure data. The output is the node representation that aggregates the information around the node. We use GCN to model our constructed text graph structure incorporating multiple latent information. The adjacency matrix representation of the graph and the feature matrix of the graph nodes are first passed into GCN. After multi-layer training, the obtained node feature representation then aggregates the surrounding information. Finally, the mask layer is used to extract the aspect-specific representation. A graph $G=\{V,A\}$ is given, where V is the set of nodes in the graph, and $A\in {{\mathbb {R}}}^{n{\times }n}$ is the adjacency matrix representation of a graph. n is the number of nodes. $h_i^l$ is the node hidden state representation of l-layer. The node hidden state representation is updated by

$$\begin{aligned} h_{i}^{l}=\sigma \left( \sum _{j=1}^{n} A_{i j} W^{l} h_{j}^{l-1}+b^{l}\right) \end{aligned}$$

(13)

where $W^l$ is a learning weight matrix, $b^l$ is a bias term, $\sigma $ and is a nonlinear function (e.g., ReLU). $A_{ij}$ represents whether the $i\textrm{th}$ node is connected to the $j\textrm{th}$ node in the graph. $A_{ij}= 1$ if node is connected to node j, otherwise $A_{ij}= 0$.

In order to make the GCN learn aspect-specific representations. We use aspect mask on the output of GCN $H_l=\{h_1^l,h_2^l,h_3^l,\ldots ,h_n^l\}$. The aspect $a=\{w_\beta ^\alpha ,w_{\beta +1}^\alpha , w_{\beta +2}^\alpha ,\ldots ,h_{\beta +m}^\alpha \}$ is given. The output weight matrix is calculated as follows:

$$\begin{aligned} o_{i}= {\left\{ \begin{array}{ll}1-\frac{\beta -i}{n}, &{}\quad 1 \le i<\beta \\ 0, &{} \quad \beta \le i<m \\ 1-\frac{i-m}{n}, &{}\quad m<i \le n\end{array}\right. } \end{aligned}$$

(14)

where $o_i$ is the weight of the $h_i^l$. The weight and GCN output are calculated as follows:

$$\begin{aligned} h_i^{lo}=o_ih_i^l \end{aligned}$$

(15)

where $h_i^{lo}\in {{\mathbb {R}}^d}$ is a hidden representation of the $i\textrm{th}$ node and $H^{lo}=\{h_1^{lo},h_2^{lo},h_3^{lo},\ldots ,h_n^{lo}\}$ denotes hidden representations.

Aspect-aware attention

GCN aggregates the information of nodes around aspect, which enriches the aspect representation. In order to fully associate aspect with other words, we combine the feature output of the aspect after GCN aggregation extracted by the masking layer, the text feature output of the sequence neural network model LSTM and the attention mechanism. This operation allows the fusion of the rich features hidden by aspect in the text graph structure with the text features learned from the sequence model for attention calculation, so as to fully integrate the vital information in aspect into the attention matrix obtained. We extract the aspect representation $a^{lo}=\{0,\ldots ,h_\beta ^{lo},h_{\beta +1}^{lo},h_{\beta +2}^{lo},\ldots ,h_{\beta +m}^{lo},\ldots ,0\}$ from the sentence representation output by GCN, and then calculate the attention mechanism with the sentence feature representation $H^m$ by

$$\begin{aligned} \alpha _{t}= & {} \sum _{i=\beta }^{\beta +m} h_{t}^{m^{T}} h_{i}^{l o} \end{aligned}$$

(16)

$$\begin{aligned} M_{1}= & {} \sum _{t=1}^{n} \frac{\exp \left( \alpha _{t}\right) h_{t}^{m}}{\sum _{i=1}^{n} \exp \left( \alpha _{i}\right) } \end{aligned}$$

(17)

where $M_1$ is the output of aspect-aware attention. We use the dot product to calculate the semantic similarity between aspect and other words.

Aspect irrelevant information filter layer (AIIFL)

The attention mechanism may introduce noise (irrelevant information) in the ABSA task. It may cause the model to capture irrelevant sentimental information, thus reducing the accuracy of the analysis. In order to alleviate this problem, we designed an aspect irrelevant information filter layer (AIIFL). The calculation formula is as follows:

$$\begin{aligned} h_i^f= & {} f_ih_i^{lo} \end{aligned}$$

(18)

$$\begin{aligned} f_{i}= & {} \tanh \left( W_{s} \cdot h_{i}^{l o}+W_{a} \cdot a^{l o}+b_{f}\right) \end{aligned}$$

(19)

where $W_s$ and $W_a$ are the weight matrix. $b_f$ is a bias term. $M_2=\{h_1^f,h_2^f,h_3^f,\ldots ,h_n^f\}$ is the output matrix of the filter layer.

Matrix fusion layer (MFL)

In order to effectively integrate the attention matrix $M_1$ and the aspect information correlation matrix $M_2$, we designed a matrix fusion layer to improve the feature representation of sentences. The fusion formula is as follows:

$$\begin{aligned} M_{1}^{\prime }= & {} {\text {softmax}}\left( M_{1} W_{1}\left( M_{2}\right) ^{T}\right) M_{2} \end{aligned}$$

(20)

$$\begin{aligned} M_{2}^{\prime }= & {} {\text {softmax}}\left( M_{2} W_{2}\left( M_{1}\right) ^{T}\right) M_{1} \end{aligned}$$

(21)

$$\begin{aligned} M= & {} \left[ M_1^{'},M_2^{'}\right] \end{aligned}$$

(22)

where $W_1$ and $W_2$ are learning weight matrix.

Finally, the obtained representation M is input to a linear layer and the sentiment probability distribution is assigned using the softmax function:

$$\begin{aligned} p(e) ={\text {softmax}}(W_pM+b_p) \end{aligned}$$

(23)

where $W_p$ and $b_p$ are the learning weight matrix and the bias.

Loss function

The loss function of MFLGCN uses cross entropy and L2-regularization: The input format for the above table is as follows:

$$\begin{aligned} \text {Loss} =-\sum _{(d, e) \in D} \log p(e)+\lambda \Vert \theta \Vert _{2} \end{aligned}$$

(24)

where D is the training dataset, e is the true label and p(e) is the label of model prediction. $\theta $ represents all trainable parameters, and $\lambda $ is the coefficient of the regularization term.

Experiments

Datasets

We conduct experiments on four public datasets in different domains: Twitter [26], Restaurant, Laptop [27] and MAMS [28]. These datasets have three sentiment polarities: positive, negative, and neutral. The details of the experimental datasets are shown in the Table 1.

Table 1 The details of the experimental datasets

Full size table

Evaluation metrics

We use accuracy and F1-score to evaluate the performance of each model. The evaluation metrics are defined as follows:

$$\begin{aligned}{} & {} \hbox {Accuracy} =\frac{\left( \hbox {TN}+\hbox {TP}\right) }{\left( \hbox {TN}+\hbox {TP}+\hbox {FN}+\hbox {FP}\right) } \end{aligned}$$

(25)

$$\begin{aligned}{} & {} \hbox {Recall} =\frac{\hbox {TP}}{\left( \hbox {FN}+\hbox {TP}\right) } \end{aligned}$$

(26)

$$\begin{aligned}{} & {} \hbox {Precision} =\frac{\hbox {TP}}{\left( \hbox {FP}+\hbox {TP}\right) } \end{aligned}$$

(27)

$$\begin{aligned}{} & {} F\hbox {-measure} =\frac{2\left( \hbox {Precision}*\hbox {Recall}\right) }{\left( \hbox {Precision}+\hbox {Recall}\right) } \end{aligned}$$

(28)

where TP, TN, FP and FN denote true positive, true negative, false positive and false negative, respectively.

Implementation details

For our text graph construction, we set the number of words in the sliding window to 5. In the semantic-based graph, the initial word embeddings are pre-trained with Bert, and the dimension is 768. Our experiment uses 300-dimensional Glove vectors to initialize the word embeddings. The Bi-LSTM hidden size is set to 300. We set the parameter of the regulation to 0.00001. The batch size is set to 32. To alleviate overfitting, we apply dropout at a rate of 0.5. We set the number of GCN layers for the GCN model to 2. The number of multi-head self-attention heads was 8. Adam optimizer is used.

Table 2 Comparison of different experimental results on public datasets

Full size table

Comparison with the state-of-the-art

Comparison models

We use ASGCN and AEGCN as our main baseline models, and we also compare our proposed model (MFLGCN) with the following methods:

ATAE-LSTM [1] combines aspect embedding and attention mechanism to ABSA.
MEMNET [4] employs multi-hops attention level to represent the features of context.
IAN [6] uses the combination of LSTM and interactive attention mechanism to express the context and aspect.
RAM [5] designs a model combining multiple attention and memory networks to learn the sentence representation.
GCAE [29] utilizes the gating units to combine the output of CNN with two convolutional layers.
MGAN [7] proposes a multi-grained attention mechanism to capture the relationship between context and aspect.
AOA [2] obtains the corresponding representation of the context and aspect through the interactive learning of attention.
TD-GAT [11] proposes a graph attention network over the dependency tree to solve ABSA tasks.
ASGCN [9] propose a model to combine attention mechanism and GCN on the dependency tree.
kumaGCN [14] designs a latent graph structure to capture aspect representations with syntactic information.
BiGCN [13] propose a novel architecture which convolutes over hierarchical syntactic and lexical graphs.
AEGCN [12] utilizes a variety of attention mechanisms and GCN to ABSA on the dependency tree structure.

Table 3 Experimental results of ablation study

Full size table

Experimental results

Table 2 shows the comparison results on four benchmark datasets, which demonstrate that the proposed MFLGCN model is better than all comparison models. Accuracy and F1-score are used to evaluate these models. Compared to the traditional sequential model combined with attention mechanism such as ATAE-LSTM, MEMNET, IAN, RAM, GCAE, MGAN, and AOA, graph convolution network (GCN) combined with dependency trees such as ASGCN and AEGCN are making use of the rich dependencies between words and can avoid noises introduced by the attention mechanism, so its performance has been improved. However, the methods based on dependency trees are inaccurate and insensitive to domain-specific datasets. Compared with the method of directly using dependency trees, the performance of GCN combined with the improved dependency tree structure such as kumaGCN and BiGCN are higher than ASGCN and AEGCN. In order to model multiple latent information, we combined statistics, semantics, and part of speech to construct the graph structure and built our sentiment classification model. Compared with the previous optimal model, our proposed MFLGCN model achieves the best results in terms of accuracy on all datasets and obtains the best macro-averaged F1-score on Lap14, Rest14 and MAMS datasets. In order to more vividly and concretely show the performance of our model, we average the accuracy of the sequence model (ATAE-LSTM, MEMNET, IAN), the dependency tree-based GCN model (ASGCN-DT, ASGCN-DG, AEGCN) and our model (MFLGCN) on the four datasets. The results are shown in Fig. 5. The green in the figure represents the sequence model, the orange represents the dependency tree-based model and the blue is our model. From the Fig. 5, we can find that the proposed model obtains higher accuracy than other models.

Ablation study

To further investigate the impact of each component of the model on the experimental results, we make some ablation experiments. The results are shown in Table 3.

In order to prove the effectiveness of part of speech graph structure enhancement, MFLGCN w/o GE experiment is set up. MFLGCN w/o GE means that we remove the graph enhancement (GE) in the composition stage. It can be seen from the results that the experimental results on the four datasets have decreased after removing GE. Compared with the complete model on the four datasets, when the GE was ablated, the accuracy of the model decreased by 0.9%, 0.19%, 0.07% and 0.66%, respectively. The attention matrix $M_1$ obtained by combining the sequence model, GCN and attention mechanism can make the feature representation of aspect more relevant to the context in the model. The aspect-related information matrix $M_2$ obtained through the combination of GCN output and aspect irrelevant information filter layer (AIIFL) not only aggregates the rich information around aspect nodes, but also alleviates the impact of noise in sentences on aspect feature generation. In order to reflect the effectiveness of our matrix fusion layer in fusing the attention matrix and the information filtering matrix, we set experiment w/o MFL_$M_1$, which means ablation of $M_1$ only uses $M_2$ for testing, and experiment w/o MFL_$M_2$, which means ablation of $M_2$ only uses $M_1$ for testing. Compared with the complete model on the four datasets, when the attention matrix $M_1$ was ablated, the accuracy of the model decreased by 2.3%, 1.31%, 1.9% and 1.05%, respectively. When the aspect-related information matrix $M_2$ was ablated, the accuracy of the model decreased by 1.03%, 1.12%, 0.93% and 0.81%, respectively. Based on the results, we can find that the interaction of GE, Attention, and AIIFL is significant in our model.

Table 4 Experimental results of different graph structures

Full size table

From Fig. 6, we can find that different components have different effects on the performance of the model. The influence degree of each component on the model ranges from high to low: Attention, AIIFL, and GE.

Impact of different graph structures

In order to verify the effectiveness of our graph structure, we use our graph to replace the dependency tree on ASGCN and AEGCN, and use the dependency tree to do experiments on our model. Table 4 shows the results of different graph structures.

As can be seen from Table 4, when the baseline models (ASGCN and AEGCN) use our graph, the performance is higher than that using the dependency tree. On Lap14, Rest14 and MAMS datasets, the performance of ASGCN model was improved by 0.06%, 0.39% and 0.23%, respectively. On the Twitter dataset, our graph is inferior to the dependency tree. After our research and analysis, the reasons are as follows: (1) the design of the ASGCN model is initially aimed at the dependency tree structure, so its data processing and model components are more suitable for the dependency tree structure and less adaptable to our graph structure. (2) Twitter is an online social comment data, its grammatical structure and verbal expression are relatively irregular. On Twitter, Lap14, Rest14 and MAMS datasets, the performance of AEGCN model was improved by 0.05%, 0.02%, 1.1% and 0.15%, respectively. We also use our model to experiment on the dependency tree. The experimental results show that our graph combined with our model has better performance. On Twitter, Lap14, Rest14 and MAMS datasets, the performance of our model was improved by 1.07%, 0.26%, 1.05% and 1.02%, respectively. From these comparative experiments, it can be seen that the accuracy of sentiment analysis is improved using our graph structure.

Impact of GCN layer number

To investigate the impact of number of GCN layer, we set the number of GCN layer from one to five to evaluate our model on four datasets, respectively.

From Figs. 7, 8, 9 and 10, we can see with the increase of GCN layers, the performance of the model increases first and then decreases. Thus, the performance of the model does not always get improved with the increasing number of layers. This is because a large layer number makes it hard to train the model. Moreover, a larger layer number introduces more parameters and results in a less generalizable model. To avoid problems caused by large layers, the best GCN with two layers is applied to train the model.

Conclusion

In this paper, we propose a matrix fusion-based graph convolution network (MFLGCN) over multiple latent information graph structures to solve aspect-based sentiment analysis tasks. The learned graph structure combines semantic, statistic, and part of speech information to incorporate more latent information into the graph structure. MFLGCN can generate efficient and informative word coding. Experiments show that our graph structure leads to a more efficient node representation. Comprehensive experiments illustrate the effectiveness of our model. The proposed model outperforms the baseline models on four public datasets: Twitter, Lap14, Rest14, and MAMS. In addition, we performed ablation experiments on our model to prove the indispensable and effectiveness of each component of our model.

References

Wang Y, Huang M, Zhu X, Zhao L (2016) Attention-based LSTM for aspect-level sentiment classification. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 606–615
Huang B, Ou Y, Carley KM (2018) Aspect level sentiment classification with attention-over-attention neural networks. In: International conference on social computing, behavioral-cultural modeling and prediction and behavior representation in modeling and simulation. Springer, London, pp 197–206
Li X, Bing L, Lam W, Shi B (2018) Transformation networks for target-oriented sentiment classification. Preprint arXiv:1805.01086
Tang D, Qin B, Liu T (2016) Aspect level sentiment classification with deep memory network. Preprint arXiv:1605.08900
Chen P, Sun Z, Bing L, Yang W (2017) Recurrent attention network on memory for aspect sentiment analysis. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 452–461
Ma D, Li S, Zhang X, Wang H (2017) Interactive attention networks for aspect-level sentiment classification. Preprint arXiv:1709.00893
Fan F, Feng Y, Zhao D (2018) Multi-grained attention network for aspect-level sentiment classification. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 3433–3442
Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. Preprint arXiv:1609.02907
Zhang C, Li Q, Song D (2019) Aspect-based sentiment classification with aspect-specific graph convolutional networks. Preprint arXiv:1909.03477
Sun K, Zhang R, Mensah S, Mao Y, Liu X (2019) Aspect-level sentiment analysis via convolution over dependency tree. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 5679–5688
Huang B, Carley KM (2019) Syntax-aware aspect level sentiment classification with graph attention networks. Preprint arXiv:1909.02606
Xiao L, Xiaohui H, Chen Y, Xue Y, Donghong G, Chen B, Zhang T (2020) Targeted sentiment classification based on attentional encoding and graph convolutional networks. Appl Sci 10(3):957
Article Google Scholar
Li R, Chen H, Feng F, Ma Z, Wang X, Hovy E (2021) Dual graph convolutional networks for aspect-based sentiment analysis. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (volume 1: long papers), pp 6319–6329
Zhang M, Qian T (2020) Convolution over hierarchical syntactic and lexical graphs for aspect level sentiment analysis. In: Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pp 3540–3549
Chen C, Teng Z, Zhang Y (2020) Inducing target-specific latent structures for aspect sentiment classification. In: Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pp 5596–5607
Wang K, Shen W, Yang Y, Quan X, Wang R (2020) Relational graph attention network for aspect-based sentiment analysis. Preprint arXiv:2004.12362
Tang H, Ji D, Li C, Zhou Q (2020) Dependency graph enhanced dual-transformer structure for aspect-based sentiment classification. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 6578–6588
Tian Y, Chen G, Song Y (2021) Enhancing aspect-level sentiment analysis with word dependencies. In: Proceedings of the 16th conference of the European chapter of the association for computational linguistics: main volume, pp 3726–3739
Yao L, Mao C, Luo Y (2019) Graph convolutional networks for text classification. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 7370–7377
Zhou J, Huang JX, Qinmin Vivian H, He L (2020) SK-GCN: modeling syntax and knowledge via graph convolutional network for aspect-level sentiment classification. Knowl Based Syst 205:106292
Article Google Scholar
Vashishth S, Bhandari M, Yadav P, Rai P, Bhattacharyya C, Talukdar P (2018) Incorporating syntactic and semantic information in word embeddings using graph convolutional networks. Preprint arXiv:1809.04283
Liu X, You X, Zhang X, Ji W, Lv P (2020) Tensor graph convolutional networks for text classification. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 8409–8416
Ren Z, Sun Q (2020) Simultaneous global and local graph structure preserving for multiple kernel clustering. IEEE Trans Neural Netw Learn Syst 32(5):1839–1851
Article MathSciNet Google Scholar
Ren Z, Yang SX, Sun Q, Wang T (2020) Consensus affinity graph learning for multiple kernel clustering. IEEE Trans Cybern 51(6):3273–3284
Article Google Scholar
Tan X, Cai Y, Zhu C (2019) Recognizing conflict opinions in aspect-level sentiment classification with dual attention networks. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 3426–3431
Dong L, Wei F, Tan C, Tang D, Zhou M, Xu K (2014) Adaptive recursive neural network for target-dependent twitter sentiment classification. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (volume 2: short papers), pp 49–54
Pontiki DGM, Pavlopoulos HPJ, Androutsopoulos SMI (2014) Semeval-2014 task 4: Semeval-2014 task 4: aspect based sentiment analysis. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014), Dublin, Ireland, pp 23–24
Jiang Q, Chen L, Xu R, Ao X, Yang M (2019) A challenge dataset and effective models for aspect-based sentiment analysis. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP)
Xue W, Li T (2018) Aspect based sentiment analysis with gated convolutional networks. Preprint arXiv:1805.07043

Download references

Acknowledgements

Funding was provided by Key Research and Development Projects of Shaanxi Province (Grant Nos. 2020ZDLGY09-05).

Author information

Authors and Affiliations

Xi’an University of Posts and Telecommunications, Xi’an, 710121, Shaanxi, China
Jiajun Wang, Xiaoge Li & Xiaochun An
Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi’an, 710121, Shaanxi, China
Xiaoge Li & Xiaochun An
Xi’an Key Laboratory of Big Data and Intelligent Computing, Xi’an, 710121, Shaanxi, China
Xiaoge Li

Authors

Jiajun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoge Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiaochun An
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoge Li.

Ethics declarations

Conflict of interest

On behalf of all the authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, J., Li, X. & An, X. Modeling multiple latent information graph structures via graph convolutional network for aspect-based sentiment analysis. Complex Intell. Syst. 9, 4003–4014 (2023). https://doi.org/10.1007/s40747-022-00940-1

Download citation

Received: 24 July 2022
Accepted: 26 November 2022
Published: 16 December 2022
Issue Date: August 2023
DOI: https://doi.org/10.1007/s40747-022-00940-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Modeling multiple latent information graph structures via graph convolutional network for aspect-based sentiment analysis

Abstract

Similar content being viewed by others

Exploring rich structure information for aspect-based sentiment classification

Aspect Fusion Graph Convolutional Networks for Aspect-Based Sentiment Analysis

Cross and Self Attention Based Graph Convolutional Network for Aspect-Based Sentiment Analysis

Introduction

Related work

Our model

Construction of the text graph

Statistic-based graph

Semantic-based graph

Graph enhancement

Contextualized word representation

Multi-head self-attention (MHSA)

Graph convolution network (GCN)

Aspect-aware attention

Aspect irrelevant information filter layer (AIIFL)

Matrix fusion layer (MFL)

Loss function

Experiments

Datasets

Evaluation metrics

Implementation details

Comparison with the state-of-the-art

Comparison models

Experimental results

Ablation study

Impact of different graph structures

Impact of GCN layer number

Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation