Introduction

The aspect-based sentiment analysis (ABSA) task aims to determine the sentiment tendency of a particular aspect in a sentence. For example, “The environment of this restaurant is dirty, but the food is delicious.” there are two aspects “environment” and “food”, and the user expresses negative and positive sentiments over them, respectively. Thus, ABSA can precisely judge the sentiment tendency of a specific aspect, rather than simply judging sentiment polarity for a sentence.

In recent studies, various neural network models such as Recurrent Neural Networks (RNN) [1] and Convolutional Neural Networks (CNN) [2, 3] are widely used in aspect-based sentiment analysis tasks. With the popularity of the attention mechanism, more researchers [4,5,6,7] combine attention mechanism and neural network to reflect the degree of influence of each word on the aspect. However, the attention mechanism is susceptible to the noise in the sentence and mistakenly focuses on irrelevant words, and sequential models modeling context semantic relationships easily lose long-distance information.

Compared with sequential models, graph convolutional neural networks (GCN) [8] can handle complex data structures and model information to the global level. Because of the advantages of GCN, more researchers [9,10,11,12] combined dependency trees with GCN to make better use of syntactic dependency in sentences. However, the following problems arise when using dependency trees in ABSA tasks: (1) due to the establishment of the dependency tree structure is automatically created according to other tools, the dependencies will be inaccurate and uncontrollable. (2) Insensitive to domain-specific datasets [13]. To solve the problem of the dependency tree, researchers [13,14,15,16,17,18] makes improvements and enhancements based on dependency trees to solve ABSA tasks. However, these methods only consider the dependencies between words, but fail to consider more important information.

Information on the context, semantics, part of speech, and sentiment knowledge among words are essential when constructing GCN for aspect-based sentiment analysis. However, only a limited number of researchers model multiple hidden information via GCN for this task. For example, Yao et al. [19] proposed a text graph-based neural network (TextGCN). In their model, a text graph is first constructed based on the sequential contextual relationship between words. In [20], they combine syntax and knowledge via a GCN. More researchers [21, 22] focus on building graph structures based on rich contextual information, such as semantic and syntactic contextual information. The effective graph learning is not only important in the ABSA field, but many scholars have also done a lot of meaningful work [23, 24] in the graph learning in terms of multiple kernel graph-based clustering (MKGC). It can be seen that researchers have been paid more and more attention to effective graph learning.

Inspired by these studies, we model multiple latent information graph structures via GCN for aspect-based sentiment analysis. In the text graph structure building module, we integrate three different perspectives of statistics, semantics, and part of speech to build graph structures. The learned graph structure can fully extract the latent information in the text. In the ABSA module, we design a matrix fusion-based GCN for better context coding. In the ABSA field, most researchers directly combined the output of one type of neural network with the attention mechanism. They did not adequately combine the sequential neural network, GCN and attention mechanisms. Therefore, we first use the sequence model and the multi-head self-attention mechanism to generate the feature representation of the context. Second, the text graph and context feature matrix of graph structure are input into the graph structure model GCN. The aspect features that aggregate the information of adjacent nodes of the graph structure are obtained through multi-layer GCN. Then we calculate the attention matrix from the aspect features and the context features obtained from the sequence model. The filter matrix is obtained from the output of GCN through the aspect irrelevant information filter layer. Finally, the aspect-related information is enhanced using a matrix fusion layer. Finally, the aspect-related information is enhanced using a matrix fusion layer.

Our contributions are as follows:

  • We propose a graph convolution network model with multiple latent information for ABSA tasks. Our model considers the statistics, semantics, and part of speech within a sentence. We combine the graph structure obtained by the statistical method and the graph structure obtained by semantic similarity. Then the final graph structure is enhanced by part of speech rules.

  • We use a matrix fusion-based GCN over the graph structure. First, the attention matrix is obtained by combining the aspect feature output of GCN, the text feature output of the sequence model LSTM and the attention mechanism. Then, we use the information filter layer after GCN to filter the irrelevant information of the aspect. Finally, we combine the attention matrix and the filter matrix to get the final matrix.

  • Extensive experiments are conducted on four benchmark datasets to illustrate the effectiveness of our model for the ABSA task.

Related work

Aspect-based sentiment analysis is a fine-grained sentiment analysis task. Recently, most research work on aspect-based sentiment analysis (ABSA) use neural network and attention to associate aspect and context to capture semantic information hidden in sentences. For instance, Wang et al. [1] and Ma et al. [6] propose a model to compute the attention of aspects and sentences directly. Tang et al. [4] uses multiple levels of attention to the model. Chen et al. [5] adds the weighted memory mechanism on the basis of multi-layer attention. Fan et al. [7] propose a model to learn the representation containing sentence and aspect-related information, integrate it into the multi-granularity sentence modeling process, and finally get a comprehensive sentence representation. Huang and Carley [2] propose a novel parameterized convolutional neural network for aspect-level sentiment classification. They use parameterized filters and parameterized gates on CNN to incorporate aspects of information. Tan et al. [25] introduced a dual attention network to recognize conflicting opinions. However, these studies ignore the syntactic dependence between words in sentences, which may lead to ambiguity in identifying the polarity of specific targets.

Fig. 1
figure 1

The overall architecture of the MFLGCN. The upper part is built for the text graph structure. It includes graph structures for statistics, semantics, and parts of speech. The lower part is the aspect-based sentiment analysis part. It contains GCN and matrix fusion layer

In order to solve this problem, dependency trees are introduced into the ABSA task. Dependency trees can capture dependencies between words and enhances the connection between aspect and related words. More researchers use graph convolution neural network (GCN) to model based on dependency trees. Sun et al. [10] present a convolution over dependency trees model which combines Bi-directional Long Short-Term Memory (Bi-LSTM) and GCN. Zhang et al. [9] propose a model to combine attention mechanism and GCN on dependency trees. Xiao et al. [12] improve the GCN and combine a GCN with a multi-head attention mechanism for aspect-based sentiment analysis. Although dependency trees perform well in ABSA tasks, it also has some defects. The methods based on dependency trees are inaccurate and insensitive to domain-specific datasets. In order to alleviate these problems, more researchers have enhanced the graph structure information on dependency trees. Chen et al. [15] propose a model to combine dependency trees and latent graphs which are generated by self-attention networks. Wang et al. [16] utilize reshaping and pruning methods to make the ordinary dependency trees focus on the target aspect. Zhou et al. [20] employ a new Syntax- and Knowledge-based Graph Convolutional Network (SK-GCN) model for aspect-level sentiment classification. Tang et al. [17] use dual graph convolutional networks to improve the insensitivity of dependency syntax trees to online reviews.

In this paper, the text graph structure is generated by combining different methods. The statistic-based graph can consider the co-occurrence frequency between words in the text. The semantic-based graph can consider the rich semantic relationship of context in text and well-build graph structure in different domain datasets. The part of speech rules-based graph can enrich the relationship of specific aspects and strengthen the importance of aspects in the graph. We combine the aspect word feature output of GCN, the text feature output of the sequence model and the attention mechanism based on the constructed graph structure to extract attention matrix. We also use GCN and aspect information filter layer to extract aspect-related information matrix. Finally, we integrate the attention matrix considering the importance between aspect and other words and the matrix filtering irrelevant aspect information to obtain the final representation.

Fig. 2
figure 2

Semantic-based graph construction model

Our model

Figure 1 gives an overview of our model. It contains the graph building part and the aspect-based sentiment analysis part. In the graph building part, we first use statistical methods to calculate the co-occurrence between words and use it as a basis for forming each edge in the statistic-based graph \(G_1\). Second, we construct the semantic-based graph \(G_2\) using the semantic similarity between words as a basis. Thirdly, we keep the common edges in \(G_1\) and \(G_2\) and delete the rest to get a graph. Finally, we process the nodes corresponding to the edges around the aspect and other edges corresponding to these nodes based on the part of speech rules. The edges that conform to the set part of speech rules are added and those that do not conform are deleted, thus obtaining the graph \(G_3\). In the aspect-based sentiment analysis part, we first utilize Bi-LSTM to extract hidden contextual representations \(H^t\). Then these hidden representations \(H^t\) are fed into GCN as features of the graph nodes. The vector representations \(H^l\) that aggregates relevant information after GCN is obtained. We use the multi-head self-attention mechanism to link words more effectively to get \(H^m\). The attention matrix \(M_1\) is obtained by calculating the aspect vector in aspect and \(H^m\) for attention. The aspect-related information matrix \(M_2\) is obtained by passing the \(H^l\) and the aspect vector through the filtering information layer. Finally, the matrix fusion of \(M_1\) and \(M_2\) is performed to obtain the final text representations.

Construction of the text graph

The statistic-based graph integrates the co-occurrence information between words. We combine the sliding window strategy and point-wise mutual information (PMI) to express the degree of association between words. The basis for the existence of edges between nodes in the graph can be formulated as

$$\begin{aligned}{} & {} T_{\textrm{st }}\left( w_{i}, w_{j}\right) =\log \frac{p\left( w_{i}, w_{j}\right) }{p\left( w_{i}\right) p\left( w_{j}\right) } \end{aligned}$$
(1)
$$\begin{aligned}{} & {} p\left( w_{i}, w_{j}\right) =\frac{N_{\left( w_{i}, w_{j}\right) }}{N_{\textrm{total}}} \end{aligned}$$
(2)
$$\begin{aligned}{} & {} p\left( w_{i}\right) =\frac{N_{\left( w_{i}\right) }}{N_{\textrm{total}}} \end{aligned}$$
(3)
$$\begin{aligned}{} & {} p\left( w_{j}\right) =\frac{N_{\left( w_{j}\right) }}{N_{\textrm{total}}}. \end{aligned}$$
(4)

Statistic-based graph

\(p(w_i,w_j)\) is co-occurrence of \(w_i\) and \(w_j\) is the total number of the sliding windows over the whole dataset. \(N_{(w_i)}\) is the number of occurrences that the word \(w_i\) occurs in the sliding windows over the whole dataset. The weight between the word nodes \(w_i\) and \(w_j\) is defined as

$$\begin{aligned} M_{\textrm{st}_{ij}} \left\{ \begin{array}{ll}1,&{}\quad T_{\textrm{st}}\left( w_{i}, w_{j}\right) >\text {threshold}_{\textrm{st}} \\ 1, &{}\quad i=j \\ 0, &{}\quad T_{\textrm{st}}\left( w_{i}, w_{j}\right) <\text {threshold}_{\textrm{st}} \end{array}\right. \end{aligned}$$
(5)

where \(M_{\textrm{st}}\in {{\mathbb {R}}}^{n{\times }n}\) is the adjacency matrix representation of the statistic-based graph. \(M_{\textrm{st}_{ij}}\) represents whether the \(i\textrm{th}\) node is connected to the \(j\textrm{th}\) node in the graph. The \(\hbox {threshold}_{\textrm{st}}\) is the standard to judge whether nodes are connected.

Semantic-based graph

Bert can fine-tune and act as a feature extractor according to the needs of the task to obtain high-quality word vector representation. LSTM can capture context semantic relationships. We use the Bert-LSTM model to represent the word vector of the text (shown in Fig. 2) and then calculate the cosine similarity between words to construct the edges in the semantic graph. The weight calculation formula of semantic relationship between words is as follows:

$$\begin{aligned} T_{\textrm{se}}\left( w_{i}, w_{j}\right) =\cos \langle v_{i}, v_{j}\rangle =\frac{v_{i} \cdot v_{j}}{\left\| v_{i}\right\| \times \left\| v_{j}\right\| } \end{aligned}$$
(6)

where \(v_i\) and \(v_j\) are the vectors of words \(w_i\) and \(w_j\), respectively. The weight between the word nodes \(w_i\) and \(w_j\) is defined as

$$\begin{aligned} M_{\mathrm{se\,}_{ij}} \left\{ \begin{array}{ll}1,&{}\quad T_{\textrm{se}}\left( w_{i}, w_{j}\right) >\text {threshold}_{\textrm{se}} \\ 1,&{}\quad i=j \\ 0,&{}\quad T_{\textrm{se}}\left( w_{i}, w_{j}\right) <\text {threshold}_{\textrm{se}} \end{array}\right. \end{aligned}$$
(7)

where \(M_{\textrm{se}}\in {{\mathbb {R}}}^{n{\times }n}\) is the adjacency matrix representation of the semantic-based graph. \(M_{\textrm{se}_{ij}}\) represents whether the \(i\textrm{th}\) node is connected to the \(j\textrm{th}\) node in the graph. The \(\hbox {threshold}_{\textrm{se}}\) is the standard to judge whether nodes are connected.

Graph enhancement

In order to integrate statistical and semantic information simultaneously, we build a graph that contains the common edges of the semantic-based graph and statistic-based graph. The weight between the word nodes \(w_i\) and \(w_j\) is defined as

$$\begin{aligned} M_{\textrm{fusion}_{i j}}\left\{ \begin{array}{ll} 1,&{}\quad M_{\textrm{st}_{i j}}=1\quad \text {and}\quad M_{\textrm{se}_{i j}}=1 \\ 1,&{}\quad i=j \\ 0,&{}\quad M_{\textrm{st}_{i j}}=0 \quad \text {or}\quad M_{\textrm{se}_{i j}}=0 \end{array}\right. \end{aligned}$$
(8)

where \(M_{\textrm{fusion}}\in {{\mathbb {R}}}^{n{\times }n}\) is the adjacency fusion matrix representation of statistic-based graph and semantic-based graph. In order to enrich the information associated with aspect nodes in the graph, we use part of speech rules to enhance the graph structure, as shown in Fig. 3.

Fig. 3
figure 3

Part of speech rules

In the graph structure, if the part of speech of node \(w_{(i+1)}\) directly connected with aspect \(w_i\) is an adverb or verb, it will enter the next stage of part of speech judgment. if the part of speech of node \(w_{(i+2)}\) directly connected with aspect \(w_{(i+1)}\) is an adjective, add an edge between \(w_i\) and \(w_{(i+2)}\).

Overall, the graph structure we built is a graph with properties, such as the following: (1) captures the co-occurrence between words; (2) integrates rich contextual semantic relations; (3) considers the relationship between aspects and other words at the part of speech level.

Contextualized word representation

A sentence \(s=\{w_1^t,w_2^t,w_3^t,\ldots ,w_n^t\}\) containing aspects \(a=\{w_1^a,w_2^a,w_3^a,\ldots ,w_{m+1}^a\}\) is given, where n is the length of the sentence and m is the length of the aspect. We use a Bi-LSTM encoder to capture contextual semantic relationships and obtain the contextualized word representations \(H^t=\{h_1^t,h_2^t,h_3^t,\ldots ,h_\beta ^a,h_{\beta +1}^a,h_{\beta +2}^a,\ldots ,h_{\beta +m}^a,\ldots ,h_n^t\}\) \(\in {{\mathbb {R}}}^{n{\times }{d_n}}\), where \(d_h\) is the dimension of the hidden state vectors.

Fig. 4
figure 4

Calculation process of MHSA

Multi-head self-attention (MHSA)

Multi-head self-attention (MHSA) is an attention mechanism that can parallelly operate in space. Compared with self-attention, the MHSA can extract more abundant semantic features, as shown in Fig. 4. First, we initialize Q, K and V to make them equal to the input \(H^t\), respectively. The different weight matrices W is then used to map Q, K and V to different subspaces. The subspace mapping formula is shown in the following formula:

$$\begin{aligned} {\left\{ \begin{array}{ll}{Q_i}=QW_i^Q \\ {K_i}=KW_i^K&{}(i=1,2,3,\ldots ,n)\\ {V_i}=VW_i^V\end{array}\right. } \end{aligned}$$
(9)

where \(W_i^Q\), \(W_i^K\) and \(W_i^V\) are the learning parameter matrices of the i-th head, respectively. n is the number of the heads. In each subspace, the output is calculated according to the attention function with K, Q and V as inputs. The calculation formula of attention is shown in the following formula:

$$\begin{aligned} {\text {Attention}}(Q, K, V)={\text {softmax}}\left( {\hbox {tanh}\left( W_{m} \cdot [ K ; Q ]\right) }\right) V\nonumber \\ \end{aligned}$$
(10)

The formula (10) is essentially the weighted calculation of V. The weight is obtained by a softmax function. In the softmax, the tanh function is used to calculate the correlation between K and Q. \(W_m\) is the learning weight matrix. After obtaining the output of each subspace, they are concatenated to obtain the final output:

$$\begin{aligned} \text {head} _{i}= & {} \text {Attention}\left( Q_i, K_i, V_i\right) \left( i=1,2,3,\ldots ,n\right) \end{aligned}$$
(11)
$$\begin{aligned} H^m= & {} MHSA(Q,K,V)=\text {Concat}\left( \hbox {head}_1,\ldots ,\hbox {head}_n\right) \nonumber \\ \end{aligned}$$
(12)

where \(\hbox {head}_i\) is the output of i-th head attention calculation and \(H^m=\{h_1^m,h_2^m,h_3^m,\ldots ,h_n^m \}\in {{\mathbb {R}}}^{n{\times }{d_e}}\) is the final output of the MAMS. \(d_e\) is the dimension of MHSA.

Graph convolution network (GCN)

Graph convolutional network (GCN) [8] uses convolution operation to encode graph structure data. The output is the node representation that aggregates the information around the node. We use GCN to model our constructed text graph structure incorporating multiple latent information. The adjacency matrix representation of the graph and the feature matrix of the graph nodes are first passed into GCN. After multi-layer training, the obtained node feature representation then aggregates the surrounding information. Finally, the mask layer is used to extract the aspect-specific representation. A graph \(G=\{V,A\}\) is given, where V is the set of nodes in the graph, and \(A\in {{\mathbb {R}}}^{n{\times }n}\) is the adjacency matrix representation of a graph. n is the number of nodes. \(h_i^l\) is the node hidden state representation of l-layer. The node hidden state representation is updated by

$$\begin{aligned} h_{i}^{l}=\sigma \left( \sum _{j=1}^{n} A_{i j} W^{l} h_{j}^{l-1}+b^{l}\right) \end{aligned}$$
(13)

where \(W^l\) is a learning weight matrix, \(b^l\) is a bias term, \(\sigma \) and is a nonlinear function (e.g., ReLU). \(A_{ij}\) represents whether the \(i\textrm{th}\) node is connected to the \(j\textrm{th}\) node in the graph. \(A_{ij}= 1\) if node is connected to node j, otherwise \(A_{ij}= 0\).

In order to make the GCN learn aspect-specific representations. We use aspect mask on the output of GCN \(H_l=\{h_1^l,h_2^l,h_3^l,\ldots ,h_n^l\}\). The aspect \(a=\{w_\beta ^\alpha ,w_{\beta +1}^\alpha , w_{\beta +2}^\alpha ,\ldots ,h_{\beta +m}^\alpha \}\) is given. The output weight matrix is calculated as follows:

$$\begin{aligned} o_{i}= {\left\{ \begin{array}{ll}1-\frac{\beta -i}{n}, &{}\quad 1 \le i<\beta \\ 0, &{} \quad \beta \le i<m \\ 1-\frac{i-m}{n}, &{}\quad m<i \le n\end{array}\right. } \end{aligned}$$
(14)

where \(o_i\) is the weight of the \(h_i^l\). The weight and GCN output are calculated as follows:

$$\begin{aligned} h_i^{lo}=o_ih_i^l \end{aligned}$$
(15)

where \(h_i^{lo}\in {{\mathbb {R}}^d}\) is a hidden representation of the \(i\textrm{th}\) node and \(H^{lo}=\{h_1^{lo},h_2^{lo},h_3^{lo},\ldots ,h_n^{lo}\}\) denotes hidden representations.

Aspect-aware attention

GCN aggregates the information of nodes around aspect, which enriches the aspect representation. In order to fully associate aspect with other words, we combine the feature output of the aspect after GCN aggregation extracted by the masking layer, the text feature output of the sequence neural network model LSTM and the attention mechanism. This operation allows the fusion of the rich features hidden by aspect in the text graph structure with the text features learned from the sequence model for attention calculation, so as to fully integrate the vital information in aspect into the attention matrix obtained. We extract the aspect representation \(a^{lo}=\{0,\ldots ,h_\beta ^{lo},h_{\beta +1}^{lo},h_{\beta +2}^{lo},\ldots ,h_{\beta +m}^{lo},\ldots ,0\}\) from the sentence representation output by GCN, and then calculate the attention mechanism with the sentence feature representation \(H^m\) by

$$\begin{aligned} \alpha _{t}= & {} \sum _{i=\beta }^{\beta +m} h_{t}^{m^{T}} h_{i}^{l o} \end{aligned}$$
(16)
$$\begin{aligned} M_{1}= & {} \sum _{t=1}^{n} \frac{\exp \left( \alpha _{t}\right) h_{t}^{m}}{\sum _{i=1}^{n} \exp \left( \alpha _{i}\right) } \end{aligned}$$
(17)

where \(M_1\) is the output of aspect-aware attention. We use the dot product to calculate the semantic similarity between aspect and other words.

Aspect irrelevant information filter layer (AIIFL)

The attention mechanism may introduce noise (irrelevant information) in the ABSA task. It may cause the model to capture irrelevant sentimental information, thus reducing the accuracy of the analysis. In order to alleviate this problem, we designed an aspect irrelevant information filter layer (AIIFL). The calculation formula is as follows:

$$\begin{aligned} h_i^f= & {} f_ih_i^{lo} \end{aligned}$$
(18)
$$\begin{aligned} f_{i}= & {} \tanh \left( W_{s} \cdot h_{i}^{l o}+W_{a} \cdot a^{l o}+b_{f}\right) \end{aligned}$$
(19)

where \(W_s\) and \(W_a\) are the weight matrix. \(b_f\) is a bias term. \(M_2=\{h_1^f,h_2^f,h_3^f,\ldots ,h_n^f\}\) is the output matrix of the filter layer.

Matrix fusion layer (MFL)

In order to effectively integrate the attention matrix \(M_1\) and the aspect information correlation matrix \(M_2\), we designed a matrix fusion layer to improve the feature representation of sentences. The fusion formula is as follows:

$$\begin{aligned} M_{1}^{\prime }= & {} {\text {softmax}}\left( M_{1} W_{1}\left( M_{2}\right) ^{T}\right) M_{2} \end{aligned}$$
(20)
$$\begin{aligned} M_{2}^{\prime }= & {} {\text {softmax}}\left( M_{2} W_{2}\left( M_{1}\right) ^{T}\right) M_{1} \end{aligned}$$
(21)
$$\begin{aligned} M= & {} \left[ M_1^{'},M_2^{'}\right] \end{aligned}$$
(22)

where \(W_1\) and \(W_2\) are learning weight matrix.

Finally, the obtained representation M is input to a linear layer and the sentiment probability distribution is assigned using the softmax function:

$$\begin{aligned} p(e) ={\text {softmax}}(W_pM+b_p) \end{aligned}$$
(23)

where \(W_p\) and \(b_p\) are the learning weight matrix and the bias.

Loss function

The loss function of MFLGCN uses cross entropy and L2-regularization: The input format for the above table is as follows:

$$\begin{aligned} \text {Loss} =-\sum _{(d, e) \in D} \log p(e)+\lambda \Vert \theta \Vert _{2} \end{aligned}$$
(24)

where D is the training dataset, e is the true label and p(e) is the label of model prediction. \(\theta \) represents all trainable parameters, and \(\lambda \) is the coefficient of the regularization term.

Experiments

Datasets

We conduct experiments on four public datasets in different domains: Twitter [26], Restaurant, Laptop [27] and MAMS [28]. These datasets have three sentiment polarities: positive, negative, and neutral. The details of the experimental datasets are shown in the Table 1.

Table 1 The details of the experimental datasets

Evaluation metrics

We use accuracy and F1-score to evaluate the performance of each model. The evaluation metrics are defined as follows:

$$\begin{aligned}{} & {} \hbox {Accuracy} =\frac{\left( \hbox {TN}+\hbox {TP}\right) }{\left( \hbox {TN}+\hbox {TP}+\hbox {FN}+\hbox {FP}\right) } \end{aligned}$$
(25)
$$\begin{aligned}{} & {} \hbox {Recall} =\frac{\hbox {TP}}{\left( \hbox {FN}+\hbox {TP}\right) } \end{aligned}$$
(26)
$$\begin{aligned}{} & {} \hbox {Precision} =\frac{\hbox {TP}}{\left( \hbox {FP}+\hbox {TP}\right) } \end{aligned}$$
(27)
$$\begin{aligned}{} & {} F\hbox {-measure} =\frac{2\left( \hbox {Precision}*\hbox {Recall}\right) }{\left( \hbox {Precision}+\hbox {Recall}\right) } \end{aligned}$$
(28)

where TP, TN, FP and FN denote true positive, true negative, false positive and false negative, respectively.

Implementation details

For our text graph construction, we set the number of words in the sliding window to 5. In the semantic-based graph, the initial word embeddings are pre-trained with Bert, and the dimension is 768. Our experiment uses 300-dimensional Glove vectors to initialize the word embeddings. The Bi-LSTM hidden size is set to 300. We set the parameter of the regulation to 0.00001. The batch size is set to 32. To alleviate overfitting, we apply dropout at a rate of 0.5. We set the number of GCN layers for the GCN model to 2. The number of multi-head self-attention heads was 8. Adam optimizer is used.

Table 2 Comparison of different experimental results on public datasets

Comparison with the state-of-the-art

Comparison models

We use ASGCN and AEGCN as our main baseline models, and we also compare our proposed model (MFLGCN) with the following methods:

  • ATAE-LSTM [1] combines aspect embedding and attention mechanism to ABSA.

  • MEMNET [4] employs multi-hops attention level to represent the features of context.

  • IAN [6] uses the combination of LSTM and interactive attention mechanism to express the context and aspect.

  • RAM [5] designs a model combining multiple attention and memory networks to learn the sentence representation.

  • GCAE [29] utilizes the gating units to combine the output of CNN with two convolutional layers.

  • MGAN [7] proposes a multi-grained attention mechanism to capture the relationship between context and aspect.

  • AOA [2] obtains the corresponding representation of the context and aspect through the interactive learning of attention.

  • TD-GAT [11] proposes a graph attention network over the dependency tree to solve ABSA tasks.

  • ASGCN [9] propose a model to combine attention mechanism and GCN on the dependency tree.

  • kumaGCN [14] designs a latent graph structure to capture aspect representations with syntactic information.

  • BiGCN [13] propose a novel architecture which convolutes over hierarchical syntactic and lexical graphs.

  • AEGCN [12] utilizes a variety of attention mechanisms and GCN to ABSA on the dependency tree structure.

Table 3 Experimental results of ablation study

Experimental results

Table 2 shows the comparison results on four benchmark datasets, which demonstrate that the proposed MFLGCN model is better than all comparison models. Accuracy and F1-score are used to evaluate these models. Compared to the traditional sequential model combined with attention mechanism such as ATAE-LSTM, MEMNET, IAN, RAM, GCAE, MGAN, and AOA, graph convolution network (GCN) combined with dependency trees such as ASGCN and AEGCN are making use of the rich dependencies between words and can avoid noises introduced by the attention mechanism, so its performance has been improved. However, the methods based on dependency trees are inaccurate and insensitive to domain-specific datasets. Compared with the method of directly using dependency trees, the performance of GCN combined with the improved dependency tree structure such as kumaGCN and BiGCN are higher than ASGCN and AEGCN. In order to model multiple latent information, we combined statistics, semantics, and part of speech to construct the graph structure and built our sentiment classification model. Compared with the previous optimal model, our proposed MFLGCN model achieves the best results in terms of accuracy on all datasets and obtains the best macro-averaged F1-score on Lap14, Rest14 and MAMS datasets. In order to more vividly and concretely show the performance of our model, we average the accuracy of the sequence model (ATAE-LSTM, MEMNET, IAN), the dependency tree-based GCN model (ASGCN-DT, ASGCN-DG, AEGCN) and our model (MFLGCN) on the four datasets. The results are shown in Fig. 5. The green in the figure represents the sequence model, the orange represents the dependency tree-based model and the blue is our model. From the Fig. 5, we can find that the proposed model obtains higher accuracy than other models.

Fig. 5
figure 5

Comparison of average accuracy of different models

Ablation study

To further investigate the impact of each component of the model on the experimental results, we make some ablation experiments. The results are shown in Table 3.

In order to prove the effectiveness of part of speech graph structure enhancement, MFLGCN w/o GE experiment is set up. MFLGCN w/o GE means that we remove the graph enhancement (GE) in the composition stage. It can be seen from the results that the experimental results on the four datasets have decreased after removing GE. Compared with the complete model on the four datasets, when the GE was ablated, the accuracy of the model decreased by 0.9%, 0.19%, 0.07% and 0.66%, respectively. The attention matrix \(M_1\) obtained by combining the sequence model, GCN and attention mechanism can make the feature representation of aspect more relevant to the context in the model. The aspect-related information matrix \(M_2\) obtained through the combination of GCN output and aspect irrelevant information filter layer (AIIFL) not only aggregates the rich information around aspect nodes, but also alleviates the impact of noise in sentences on aspect feature generation. In order to reflect the effectiveness of our matrix fusion layer in fusing the attention matrix and the information filtering matrix, we set experiment w/o MFL_\(M_1\), which means ablation of \(M_1\) only uses \(M_2\) for testing, and experiment w/o MFL_\(M_2\), which means ablation of \(M_2\) only uses \(M_1\) for testing. Compared with the complete model on the four datasets, when the attention matrix \(M_1\) was ablated, the accuracy of the model decreased by 2.3%, 1.31%, 1.9% and 1.05%, respectively. When the aspect-related information matrix \(M_2\) was ablated, the accuracy of the model decreased by 1.03%, 1.12%, 0.93% and 0.81%, respectively. Based on the results, we can find that the interaction of GE, Attention, and AIIFL is significant in our model.

Fig. 6
figure 6

Component performance comparison

Table 4 Experimental results of different graph structures

From Fig. 6, we can find that different components have different effects on the performance of the model. The influence degree of each component on the model ranges from high to low: Attention, AIIFL, and GE.

Impact of different graph structures

In order to verify the effectiveness of our graph structure, we use our graph to replace the dependency tree on ASGCN and AEGCN, and use the dependency tree to do experiments on our model. Table 4 shows the results of different graph structures.

As can be seen from Table 4, when the baseline models (ASGCN and AEGCN) use our graph, the performance is higher than that using the dependency tree. On Lap14, Rest14 and MAMS datasets, the performance of ASGCN model was improved by 0.06%, 0.39% and 0.23%, respectively. On the Twitter dataset, our graph is inferior to the dependency tree. After our research and analysis, the reasons are as follows: (1) the design of the ASGCN model is initially aimed at the dependency tree structure, so its data processing and model components are more suitable for the dependency tree structure and less adaptable to our graph structure. (2) Twitter is an online social comment data, its grammatical structure and verbal expression are relatively irregular. On Twitter, Lap14, Rest14 and MAMS datasets, the performance of AEGCN model was improved by 0.05%, 0.02%, 1.1% and 0.15%, respectively. We also use our model to experiment on the dependency tree. The experimental results show that our graph combined with our model has better performance. On Twitter, Lap14, Rest14 and MAMS datasets, the performance of our model was improved by 1.07%, 0.26%, 1.05% and 1.02%, respectively. From these comparative experiments, it can be seen that the accuracy of sentiment analysis is improved using our graph structure.

Impact of GCN layer number

To investigate the impact of number of GCN layer, we set the number of GCN layer from one to five to evaluate our model on four datasets, respectively.

Fig. 7
figure 7

Impact of GCN layer number on the Lap14

Fig. 8
figure 8

Impact of GCN layer number on the Rest14

Fig. 9
figure 9

Impact of GCN layer number on the Twitter

Fig. 10
figure 10

Impact of GCN layer number on the MAMS

From Figs. 789 and 10, we can see with the increase of GCN layers, the performance of the model increases first and then decreases. Thus, the performance of the model does not always get improved with the increasing number of layers. This is because a large layer number makes it hard to train the model. Moreover, a larger layer number introduces more parameters and results in a less generalizable model. To avoid problems caused by large layers, the best GCN with two layers is applied to train the model.

Conclusion

In this paper, we propose a matrix fusion-based graph convolution network (MFLGCN) over multiple latent information graph structures to solve aspect-based sentiment analysis tasks. The learned graph structure combines semantic, statistic, and part of speech information to incorporate more latent information into the graph structure. MFLGCN can generate efficient and informative word coding. Experiments show that our graph structure leads to a more efficient node representation. Comprehensive experiments illustrate the effectiveness of our model. The proposed model outperforms the baseline models on four public datasets: Twitter, Lap14, Rest14, and MAMS. In addition, we performed ablation experiments on our model to prove the indispensable and effectiveness of each component of our model.