1 Introduction

With the rapid proliferation of electronic devices, an enormous quantity of comments on specific aspects has emerged on the Internet. To mine specific opinion information from these comments, aspect-based sentiment analysis (ABSA) has gained attractive interest [1]. ABSA has two main categories, Aspect Sentiment Classified (ASC) [2, 3] and Aspect Term Extraction (ATE) [4, 5]. In this study, we focus on ASC, which attempts to infer the emotional tendency expressed by particular aspects of the sentence. For instance, “The roasted pancake on this snack street is truly captivating, but the environment here is undeniably a bit unpleasant.”, where the aspect terms “roasted pancake” and “environment” express two contrasting sentiments. Previous sentence-level analyses can only yield an entire sentiment, while ABSA can accurately identify the flavor of “roasted pancake” and the complaint about “environment”. Therefore, ABSA exhibits greater efficacy in identifying the emotional polarity of particular aspects in sentences [6].

For ABSA, early studies train classifiers with manual rules and handcrafted features. Vo et al. [7] integrate distributed word representation and sentiment lexicon information for context presentation. They also use neural pooling functions for feature extraction. However, due to the tedium of manual feature extraction, the performance almost reaches bottleneck. To this end, some studies employ neural networks to avoid manual rule setting and automatically acquire contextual representations. Dong et al. [8] introduce recurrent neural networks (RNNs) with adaptation for classify Twitter emotions in objective-dependent ways. Moreover, Tang et al. [9] consider the relations among the context words and aspects with Long Short-Term Memory (LSTM), and describe how different attentions can be used to further improve model performance. Since then many studies have used attentional mechanisms to model the relationships among aspects and viewpoint words [10, 11].

Despite the effectiveness achieved by the attention-based approach, there still are some limitations in dealing with syntactic dependency relationships between aspects and viewpoints. As the sentence in Fig. 1 shows, “should be” is a verb phrase that has the adjective “more” as a modifier before “friendly”. Under this modifying context, “friendly” does not convey its literal meaning. Thus, these sentence structure and syntactic dependency relationships are crucial in determining the positional and modifier relationships, and ultimately influencing the semantic and emotional tendencies of the terms. If the attention does not take full advantage of this syntactic dependency, it may lead to computational errors in attention computation. And there is also a sparse distribution of weights in the face of long-term dependencies. All of these can affect the identification of sentiment polarity.

Fig. 1
figure 1

Syntactic dependency tree obtained through dependency parsing

Considering that simple attention-based methods are unable to make full use of the dependencies within sentences, some researchers utilize syntactic dependency trees to obtain richer structural and syntactic information. Figure 1 shows a syntactic dependency tree obtained through dependency parsing, where words such as “staff”, “should”, etc. are represented as nodes, and dependencies between words are expressed by directed edges.

To utilize syntactic information within the syntactic dependency tree, Nguyen et al. [12] encode the nodes of the entire dependency tree bottom-up through the RNN, to extract aspect and sentiment features in sentences. However, they map the embeddings by averaging word vectors with the same weights for different nodes, which makes it difficult to obtain more complex semantics. Therefore He et al. [13] compute interior node distances for attentional weights decay that assigns different weights. And fuse contexts information and aspect terms by the Bi-directional LSTM network (Bi-LSTM). But as a type of graph-structured data, dependency trees can be better utilized for capturing semantic knowledge using graph neural networks (GNNs). Thus, most of the recent work on dependency trees is related to GNN [14, 15]. In this type of work, Zhang et al. [16] leverage syntactic dependencies and aspects of emotional knowledge through graph convolution operations carried out on dependency trees. Sun et al. [17] enhance embedding using a graph convolutional network (GCN), which manipulates phrase dependency structures directly, after learning sentence feature representations by Bi-LSTM. However, these works treat all neighboring nodes equally in the graph, which lacks an effective mechanism to differentiate the importance of neighboring nodes. In other words, they only consider the knowledge from the neighbor and overlook the associations of the long-distance nodes. Within this context, the models may have difficulty in capturing all syntactic dependencies among words.

In this paper, we propose a novel Syntactic Dependency Graph Convolutional Network model (SD-GCN) for ABSA task, which models the syntactic dependencies to overcome the aforementioned problems. SD-GCN employs dependency parsing to model the syntactic dependencies and build the syntactic dependency graphs, which can utilize the full potential of syntactic dependency relations. The structured representation obtained through the syntactic dependency modeling can assist in reducing the gap between aspects and opinions. This makes SD-GCN easier to capture their long-distance syntactic dependencies. Specifically, we obtain the semantic information and contextual word representations of the sentence by the BERT encoder. Next, Biaffine Attention is used for building the syntactic dependency graphs among words. Then, we utilize GCN to implement syntactic dependencies and yield enhanced aspect features for aspects prediction.

The main contributions are as follows:

  • (1) To effectively exploit the sentence–word interactions in ABSA, we propose SD-GCN to model syntactic dependencies and enhance aspect sentiment.

  • (2) We employ the biaffine attention to model syntactic dependencies for aspects and opinions, constructing syntactic dependency graphs. This can learn both the semantic relationships of aspects and the overall semantic meaning in a sentence.

  • (3) We use GCN to handle syntactic dependencies, which improves the capability to capture long-distance syntactic dependencies and accurately catch the structural information and semantic knowledge of sentences. That allows for effective integration and learning of features related to aspects and opinions, consequently enhancing the aspect features.

  • (4) The results of experiments carried out on four datasets prove the effectiveness of our SD-GCN.

  • The rest of the paper is organized below: Sect. 2 introduces relevant research on ABSA. Section 3 describes our proposed SD-GCN. Section 4 present experiments with analysis of experimental results. Finally, we make a summary in Sect. 5.

2 Related Work

Natural language processing encompasses various tasks, and text classification (TC) has been one of the prominent research areas[18, 19]. ABSA is a fine-grained form of TC. It unlike sentence-level analysis that considers the emotion tendency in the text as a whole [20, 21], which performs sentiment analysis on different aspect terms of the sentence. Previous ABSA studies rely heavily on handcrafted features and definitions of language rules to determine the affective polarity of particular aspects. Kiritchenko et al. [22] employed a supervisory learning approach to examine aspects and classes in customer reviews for identifying emotional tendencies regarding aspect terms. Jiang et al. [23] incorporated syntactic features and context information for training classifiers.

Neural networks have evolved significantly led to their widespread application in various domains [24, 25], especially in ABSA, where their application has brought significant effects [26]. Lakkaraju et al. [27] presented a layered approach to learning framework that used RNN for modeling aspects and perspectives. Wang et al. [28] aggregated Conditional Random Fields (CRF) and RNN as a joint model for aspect affective judgments.

Recently, the research on neural networks mainly focuses on attention-based neural networks. Wu et al. [10] incorporated a special residual methodology into the convolutional neural network (CNN), which could alleviate the loss of raw information in the attentive mechanisms. LSTM aims to tackle limitations of traditional RNN in capturing and remembering long-term dependencies in sequential data. Wang et al. [29] presented an LSTM incorporating a variant of the attention mechanism, that connected aspect vectors to sentence hidden representations to calculate attention weights. Huang et al. [11] suggested a method of superimposing attention on attention that directly captured the interplay among aspect terms and contextual phrases. Song et al. [30] suggested a network of attentive encoders with multiple attentions to extract concealed states and contextual interactions among targets. Wang et al. [31] introduced Multi-Attention Method Networks (MAMN), they use a pre-trained approach to build a vector of word embeddings. As well as applied various attentive mechanisms at internal and external levels. Finally, the feature focused attention mechanism was used to enhance sentiment identification. Ayetiran et al. [32] proposed a CNN-BiLSTM approach for fused attentiveness. It extracts high-level symbolic characters and contextual characteristic representations of text by learning additional document-level emotion statistics. However, the aforementioned attention-based models overlook the influence of sentence structures and syntactic dependency information. Without utilizing this additional semantic information, it may lead to erroneous results in identifying the sentiment orientation toward specific aspects.

An emerging trend is to utilize dependency trees because syntactic messages can establish linkages among aspects and corresponding opinions. In the research that utilizes dependency trees, GCN based on dependency trees has achieved promising results in the ABSA. Compared with CNN, GCN [33] can better process graph-structured data and is employed for processing various languages tasks, such as semantic role tagging [34], machine translators [35], and relationship extractions [36]. Zhang et al. [16] first used GCN for ABSA, they obtained aspect features by employing multilayer graph convolution on a dependency tree of sentences and applying an aspect masking layer. Sun et al. [17] presented a methodology for convolving dependency trees, which employed Bi-LSTM to study the characteristic representation and utilized graph convolution to handle the dependency trees of sentences. Liang et al. [37] extracted aspect sentiment in the sentence by constructing graphs on dependency trees, and integrating sentiment knowledge from SenticNet with contextual representations learned by LSTM. Tian et al. [38] constructed the graph from dependency trees and applied attentional mechanisms to weigh the edges in the graph.

Apart from the approach that utilizes GCN, some methods that utilize Graph Attention Network (GAT) to process dependency trees. Wang et al. [39] extended GAT to create relation graph attention networks (R-GAT), and employed R-GAT to reconstruct and prune the regular dependency syntax trees to gain a new dependency tree architecture for aspects. Ke et al. [40] combined syntactic dependencies with graph attention, which encoded dependent paths to obtain aspect-oriented syntactic representations. They also redesigned the attention layer and used layered attention for weighting and aggregating contextual terms. However, the aforementioned studies did not fully utilize syntactic dependencies and did not explore how to model syntactic relationships more effectively.

The aforementioned existing works can be broadly categorized into the following three groups:

  • (1) Attention-based approaches (e.g., Wang et al. [29], Song et al. [30]), have the advantage of introducing fine-grained attention that allows model to focus on crucial aspects. The disadvantage is the challenge of exploiting the syntactic dependencies.

  • (2) Syntactic dependency-based approaches (e.g., Zhang et al. [16], Liang et al. [37]), capture the textual structure through syntactic dependencies, which allows for better understanding the relationship among words. But they disregard the importance of different neighboring nodes, and how to capture the syntactic dependencies more adequately is a challenge.

  • (3) Methods that combine syntactic dependencies with attention mechanisms (e.g., Wang et al. [39], Ke et al. [40]), combine the advantages of fine-grained focus with syntactic dependency. However, the integration of multiple mechanisms leads to computational complexity.

3 Proposed Methodology

The workflow of SD-GCN is shown in Fig. 2. SD-GCN takes sentence-aspects pairs as inputs. Then the BERT encoder is utilized to obtain rich contextual information. Next, we perform dimensionality reduction on the word vectors obtained from BERT. Following this, we employ the biaffine attention module for modeling the relationships among word pairs in sentences, which can learn both the semantic relationships of aspects and the overall semantic meaning in the sentences. By the modeling, we can obtain syntactic dependency graphs that contain rich word pair relations. Subsequently, we utilize GCNs to process the syntactic dependency graph. Aspects and opinions are aggregated by GCNs, which can capture the dependency relationships among words and integrate the contextual information of the entire graph. Thus, SD-GCN can comprehensively understand the sentiment expression of individual words and effectively extract the aspectual features. Finally, the obtained aspectual features are aggregated to forecast the sentiment tendencies of particular aspect terms at the output layer.

Fig. 2
figure 2

Workflow of the SD-GCN

3.1 Problem Definition

Give a set \(X = (S,A)\), which consists of sentence and aspects. Where \(S = \{ w_1 ,w_2 , \ldots ,w_n \}\) denotes a sentence, \(A = \{ a_1 ,a_2 , \ldots ,a_m \}\) denotes the specific aspects terms in the sentence \(S\). The target of ABSA is to predict the sentiment polarity \(Y \in \{ neutral,negative,positive\}\) of specific aspects in a sentence through \(S\) and \(A\).

3.2 Contextual Representation

To obtain vector representations of sentences with aspect terms, BERT-base-uncased is employed for obtaining hidden context definitions as a sentence encoder. BERT is a linguistic model built with the Transformer network architecture [41]. Different from traditional language models, BERT uses bidirectional encoders to learn context-dependent word embedding, allowing it to better catch the relationships and syntactic messages in words. Here, BERT is made up with numerous stacked bidirectional Transformer Encoder layers. It uses residual connections and layers normalization to alleviate the gradient vanishing problem and solve the long-term dependency problem of words, which can capture the bidirectional relations in sentences.

We expand the original input by constructing it as sentence-aspect pairs, which is a more suitable input for ABSA. To match the input form of BERT, we concatenate original text sequences with aspects separated by special tokens. The input of SD-GCN can be defined as:

$$ inputs = \{ [CLS],S,[SEP],A,[SEP]\} $$
(1)
$$ V = {\text{BERT}}(inputs) $$
(2)

where \([CLS]\) and \([SEP]\) are used to signal the beginning and ending respectively. They are both specific tokens. \(S\) represents a given sentence, and \(A\) represents aspects. The outputs of the encoding layer are \(V = \{ v_1 ,v_2 ,...,v_n \} ,v_i \in {\mathbb{R}}^{d_{bert} }\), which are the hidden representation sequences of the last Transformer model output.

3.3 Syntactic Dependency Modeling

We employ the biaffine attention to model dependency relationships for aspects and opinions. Biaffine attention is a graph-based neural dependency parser that is specifically designed to handle dependency parsing [42], and it has been proven to achieve good performance on named entity recognition tasks [43]. Biaffine Attention employs different dimension reductions on the hidden vectors of each word, resulting in two new vectors. Then uses these vectors as dependent arc and dependent heads to represent the features of the word, respectively. And then a biaffine transformation is applied to score the feature vectors of each dependent arc and head, which produces an arc score matrix and a label score matrix. With this approach, it can directly select the optimal parsing tree from all possible arcs and labels.

In our study, we utilize the biaffine attention to model word-pair relationships in sentences to catch the syntactic dependencies of aspects and viewpoints in sentences. The modeling process can be formulated as:

$$ v_i^{head} = {\text{MLP}}_0 (v_i ) $$
(3)
$$ v_j^{dep} = {\text{MLP}}_1 (v_j ) $$
(4)

where \(v_i\) and \(v_j\) are the hidden representation sequences output by BERT encoder, \({\text{MLP}}\) indicates the multi-layer perceptron. \({\text{MLP}}_{0}\) is used for processing the cyclic output vector of the header words and \({\text{MLP}}_1\) is used for the cyclic output vector of the dependent words. Our purpose is to perform dimensionality reduction and nonlinear transformation on each cyclic output vector to remove information irrelevant to the current decision before applying the biaffine transformation, which helps to improve the parsing speed and reduce the risk of overfitting. Next, the bilinear attention score is calculated with the following equations:

$$ Q_{i,j} = v_i^{head^T } U_1 v_j^{dep} + U_2 (v_i^{head} \oplus v_j^{dep} ) + b^{{\text{bam}}} $$
(5)
$$ r_{i,j,t} = \frac{{exp(Q_{i,j,t} )}}{{\sum_{k = 1}^m {exp(Q_{i,j,k} )} }} $$
(6)

where \(U_1\) denotes the tensor of \(d \times m \times d\), \(U_2\) denotes the matrix of \(2d \times m\), they are both learnable weights, \(b^{{\text{bam}}}\) is a bias term, and \(\oplus\) denotes concatenation. By modeling the relationship between \(v_i^{head}\) and \(v_j^{dep}\), the score tensor \(Q_{i,j} \in {\mathbb{R}}^{n \times n \times m}\) can be obtained. And \(r_{i,j,t}\) indicates the probability distribution with the t-th relationship type of the word pair \((v_i ,v_j )\), \(m\) denotes the quantity of relationship categories, and \(m\) is a hyper-parameter. Besides, the whole steps from Eq. (3) to (6) can be integrated into the following formula:

$$ R = {\text{Biaffine}}({\text{MLP}}_0 (V),{\text{MLP}}_1 (V)) $$
(7)

where \(R \in {\mathbb{R}}^{n \times n \times m}\) denotes the probability distribution graphs obtained by modeling syntactic dependency relationships of words. The procedure of modeling syntactic dependencies for words is described in Algorithm 1.

figure a

3.4 Graph Convolutional Operation

To obtain the dependency between aspects and viewpoints, we utilize GCN to capture the probability distribution relationship of the syntactic dependency graphs. GCN is a model that is inspired by CNN, which is used for process the graph structure data. A graph consists of edges and nodes. For each node, the GCN performs a convolution process on its neighbor to capture the topology relationships and obtain a discriminative representation of the node.

For the graph \(G\) with \(k\) nodes, we can enumerate the graph \(G\) to acquire its adjacency matrix. The resulting matrix can be expressed as \(A \in {\mathbb{R}}^{k \times k}\). Figure 3 illustrates an example about GCN performing graph convolution operations. To facilitate the explanation, we define the state of node \(i\) in layer l-th as \(h_i^l\) during graph convolution operations, where \(l \in [1,2,...,L]\). Here, \(h_i^0\) represents the initial status of node \(i\), and \(h_i^L\) represents the end status of the node \(i\). Accordingly, the convolution operation of the graph is mathematically represented as:

$$ h_i^{l + 1} = \sigma (\sum_{j = 1}^n {A_{ij} } W^l h_j^l + b_t^l ) $$
(8)

where \(\sigma\) is a function of non-linear, \(W^l\) denotes the matrix of weight, \(b^l\) is a bias term.

Fig. 3
figure 3

An example of GCN performing graph convolution operations

For any given sentence, by modeling the syntactic dependencies, we can acquire the probability distribution graphs. It denotes an \(n \times n\) adjacency matrix \(r \in R\). We perform the graph convolution operations on each node in l-th layer of t-th channel to update its state representation. By aggregating the information of neighboring nodes, an enhanced node representation with aspectual sentiment features can be obtained. The updating process follows:

$$ h_{i,t}^{l + 1} = \sigma (\sum_{j = 1}^n {r_{i,j,t} } W_t^l h_{j,t}^l + b_t^l ) $$
(9)

where \(h_{j,t}^l\) is the status representational of node \(j\) in layer \(l\), \(h_{i,t}^{l + 1}\) is the end outputs of node \(i\) in layer \(l\). \(\sigma\) is the nonlinear activity function (e.g., ReLU). \(W_t^l\) is a linear transform weight, \(b_t^l\) is a bias term. All of these parameters are on the t-th channel.

The ultimate output in layer \(l\) on the t-th channel as follows:

$$ H_t^l = \{ h_{1,t}^l ,h_{2,t}^l ,...,h_{n,t}^l \} $$
(10)

After the graph convolution operation of l-layer, GCN can get the final feature representation. Since we have a total of \(m\) channels, there are a total of \(m\) feature representations.

3.5 Sentiment Classification

We utilize an average pooling function to integrate the enhanced characteristics generated by the GCN. After that, obtain the aspect vector information by masking the non-aspect word relationship information. Furthermore, the feature representation of specific aspects is obtained. This process follows:

$$ h_{a,t} = f(h_{a_1 ,t}^l ,h_{a_2 ,t}^l ,...,h_{a_p ,t}^l ) $$
(11)

where \(h_{a_p ,t}^l\) are the aspect vectors, \(f( \cdot )\) is an average pool function, which can enhance aspect term representation on the outputs. Then, we utilize an average pool function to aggregate the feature representations over \(m\) channels to achieve the ultimate aspect sentiment characteristic representation, which is as follows:

$$ h_a = f(h_{a,1} ,h_{a,2} ,...,h_{a,m} ) $$
(12)

Finally, we employ the softmax function to classify aspect-level emotion polarity. The out probability is calculated as follows:

$$ Y = {\text{softmax(}}W_{asc} h_a { + }b_{asc} {)} $$
(13)

where \(W_{asc}\) and \(b_{asc}\) are learnable weight and bias.

4 Experiments

To verify the usefulness of our SD-GCN for aspect sentiment classified, we carry out extensive experiments on multiple datasets of ABSA. The experiments and result analysis are detailed as follows.

4.1 Datasets

Four ABSA datasets are used for the aspect emotional prediction, and details of these datasets are shown in Table 1.

Table 1 Summary of datasets

These datasets include a Twitter social media dataset [8]. A dataset on restaurant reviews and a dataset on laptop reviews from SemEval 2014 [44]. A dataset on restaurant reviews from SemEval 2015 [45]. Each dataset classifies aspectual sentiment into three polarities. The Twitter dataset consists of brief and informal texts, characterized by casual language and relatively poor grammar. Both restaurant datasets involve customer comments on entities, such as dishes, ambiance, or service in restaurants. They emphasize more on subjective feelings and emotions. The laptop dataset encompasses information related to hardware, software, and performance of the computer domain, which has numerous computer terms and numeric texts.

4.2 Implementation Details

We conduct experiments on the following platforms. OS: Ubuntu20.04, CPU: Intel Core i9-10850 K, GPU: GeForce RTX 3090. The SD-GCN model is implemented with the Pytorch1.3.0 deep learning architecture. The parameter settings of SD-GCN are referenced from the study [16], and we fine-tuned the parameters based on experiments. Additionally, we employed cross-validation to evaluate the performance of various parameter combinations, which can select the best parameters. The hyper-parameters of SD-GCN are described in Table 2.

Table 2 Parameters of SD-GCN

We train our SD-GCN on 100 epochs with batch size 16 and evaluate on the final model. For fairness, we apply accuracy and macro-F1 scores as evaluation metrics for the experiments. Five runs were performed using distinct random seeds and the experimental results were acquired on average.

4.3 Baselines

To comprehensively demonstrate the capability of our SD-GCN, we selected 16 typical aspect sentiment classification models as a baseline. The particulars of the 16 comparison approaches are described as follows:

  • LSTM [9]: utilized the LSTM to catch the links among aspects and its context, and used the final status vector output to predict the polarity of particular aspects.

  • IAN [46]: proposed the interactional attention networks, that uses multiple attention networks for simulating the interaction of target and content separately.

  • AOA [11]: proposed an attentional superposition approach that captured the relationship between aspect and contextual sentence by simulating the interaction between aspects and sentence, thus obtaining aspect-level affective polarity.

  • CNN-BiLSTM [32]: used CNN for more semantic characteristics, then inputted these features to a Bi-LSTM layer that catches contextual features of a text and could learn jointly on two levels of sentiment data.

  • ASGCN [16]: predicted sentiment by convolving a multilayer graph on dependency trees by applying an aspect-specific masking layer.

  • CDT [17]: utilized Bi-LSTM for acquiring the characteristic representations, then applied graph convolutional operations on the dependency trees to acquire aspect emotions.

  • Hete_GNNs [47]: proposed a heterogeneous graphical model that used syntactic tree information, word relations, and sentiment lexicon information to construct a unified framework to capture aspect sentiment polarity.

  • BiGCN [48]: constructed a conceptual hierarchy on lexical and syntactic graphs that allowed for the separate treatment of functionally distinct types of relations in the graph.

  • BERT4GCN [49]: combined grammatical order features in BERT with syntactic knowledge in the dependency graph to enhance the GCN using the output of BERT, and further combined the relative position information between words to make the GCN position-aware.

  • DualGCN [15]: combined syntactic knowledge with semantic information and captured feature representation using orthogonal and differential regularization to reduce the overlap of each word.

  • CRF-GCN [50]: utilized conditional random fields to withdraw viewpoint information, then integrated the opinion information through the graph convolutional networks and predicts aspect-specific sentiment by calculating a global vector expression of nodes.

  • BERT[41]: adopted multi-layer self-attention that can capture the bidirectional dependencies of the inputs.

  • SNBAN[51]: used dependency trees and additional multi-head attention to find aspects and aspect-related words, which improved the extraction of grammatical knowledge.

  • MTABSA[52]: combined ATE with ASC for joint training, and correlated aspects with dependent information through multi-task learning, thereby enhancing the connection between them and improving the focus on aspects.

  • T-GCN + BERT [38]: proposed a type-aware GCN, which could combine dependencies and types to construct an input graph, then applied attention mechanisms to weight the edges in the graph, and finally used layer integration to synthesize different contextual information.

  • R-GAT + BERT [39]: used a GAT incorporating relational knowledge to handle new dependency trees via reshaping and pruning primordial dependency trees.

4.4 Results

The results of SD-GCN compared with other 16 methods on four ABSA datasets are shown in Table 3. From it, we can find that SD-GCN model achieves satisfactory results on these datasets. The SD-GCN acquires the best results on three datasets: Rest14, Rest15, and Twitter. We can observe that the methods using GCN or GAT outperforms the other models in general, which indicates that graph neural networks can better consider the syntactic structure of sentences. In particular, by modeling syntactic dependencies, our SD-GCN outperform other models using GCN significantly due to leveraging the dependencies among aspects and emotional words. In comparison to R-GAT + BERT, the SD-GCN directly models the syntactic dependencies of the original sentences, which can avoid the potential loss caused by reshaping and pruning ordinary dependency parse trees. This enables our SD-GCN model to comprehensively capture the fine-grained linguistic nuances and the syntactic dependencies. On the other hand, the models using BERT generally outperform other representation methods, which implies that BERT is better at capturing semantic information. By applying BERT and GCN to our SD-GCN model, it can better catch the syntactic dependencies among aspects and emotional words, and achieves outstanding performance in ABSA tasks.

Table 3 Performance comparison on four datasets

Worth mentioning is that our model performs weaker than T-GCN + BERT on the Lap14 dataset, but it still obtains superior performance than the other baselines. We compare the Lap14 dataset with the other three datasets and find that there are differences in data distribution through data analysis. In the Lap14 dataset there are excessive computer terms and numbers. When faced with these terms, our SD-GCN model is more likely to get the wrong results through syntactic dependency modeling. However, the T-GCN + BERT model can utilize attention mechanisms for weighting and combining semantic knowledge. Integrating the emotional information of aspects learned from the model. Therefore, it suffers less from this influence.

Another concern is that on the Twitter dataset, our SD-GCN does not perform well compared to other datasets. We speculate that this may be due to the informality of social media posts, where grammar rules are not always strictly followed, leading to poor grammaticality in Twitter dataset. This affects the effectiveness of syntactic dependency modeling.

4.5 Ablation Experiments

To examine the efficiency of each module in SD-GCN, we design ablation experiment with SD-GCN as the baseline. The details of ablation testing are shown in Table 4.

Table 4 Ablation testing results

We generated some new models for comparison by removing or changing some modules in SD-GCN. Here PD represents to remove probability distribution in the biaffine attention module and directly uses the logits tensor to construct the adjacency matrix, RT represents not consider the relationship types in the biaffine attention model, which means that the number of defined relationship types is 1, and SA represents replacing biaffine attention with self-attention. First, it can be noticed that when removing PD, the Acc. and F1 both decreases, with the most significant decline on the Rest14 dataset, where the Acc. and F1 decrease by 0.87% and 1.31%. This indicates that normalizing the logits tensor can reduce errors in syntactic dependency parsing. Second, when removing RT, the efficiency of SD-GCN also decreases, with the most significant decline on the Rest15 dataset, where the Acc. and F1 decrease by 1.63% and 3.15%. This indicates that defining different numbers of relationship types can enhance the efficiency of SD-GCN. Finally, the effectiveness of the model decreases significantly after replacing biaffine attention with self-attention, which indicates that our model using biaffine attention can better model syntactic dependency relationships, and thus better catch syntactic dependencies among aspects and opinions.

4.6 Impact of SD-GCN Layers

We investigated the performance of SD-GCN with 1–6 layers on four datasets, to evaluate the impact of GCN layers. Figure 4 presents that when the GCN layer number is 2, the SD-GCN acquires the best performance. When the layer is one, it can only learn local node information and cannot integrate long-distance syntactic dependency information into global nodes. With GCN layers surpassing two and rising further, the model parameters and the redundant information obtained also increase. As a result, the training process become more challenging and the accuracy decreased significantly.

Fig. 4
figure 4

The influence of different SD-GCN layers

4.7 Case Study

We utilize visualizations of attention scores on several examples to further investigate the performance of SD-GCN in ABSA tasks. The visualization results of attention scores are shown in Fig. 5.

Fig. 5
figure 5

Attention visualization in six sentences

In Fig. 5a, the sentence has a modal verb phrase “should be”, which may be overlooked by some models. However, our SD-GCN model increases the attention weight on “should be” based on syntactic dependency, correctly predicting the polarity as “negative”. In Fig. 5b, our SD-GCN correctly identifies the opinion words “fast” and “friendly”, predicting the aspect term “service” as “positive”. In Fig. 5c, the sentence contains two aspect terms, and our model also predicts them correctly. In Fig. 5d, the sentence has two aspects with different polarities, “food” and “service”. Our model accurately identifies their corresponding opinions through syntactic dependency. In Fig. 5e, SD-GCN can correctly predict the sentiment by considering the effects of the long-distance interjection “Woo” and the adjective “excited”. In Fig. 5f, SD-GCN relies on syntactic dependencies to identify the modifying role of “Biggest”, thereby accurately predicting the polarity as “negative”. These six examples demonstrate that our SD-GCN can fully exploit the syntactic dependency among words, and match specific aspects with their corresponding opinion correctly.

5 Conclusion

In this study, we propose the SD-GCN model for ABSA task. It can efficiently model syntactic dependencies and integrate syntactic and semantic information of sentences by utilizing graph convolutional networks. First, we apply BERT to obtain the contextual representations. After this, we model the syntactic dependencies by the biaffine attention, and use GCN to handle these dependencies to acquire the enhanced features for emotion determination in specific aspects. Extensive experiments on four ASBA datasets validate the effectiveness of SD-GCN. We also designed ablation testing and experiments about the effect of GCN layers, as well as to further investigate the performance of SD-GCN by visualizing the attention score.

6 Limitations and Future Scope

In our work, we have considered the dependencies inherent within the texts, yet we have not harnessed some external domain-specific knowledge. This restricts the performance of SD-GCN in specific domains such as laptops or Twitter. Additionally, this study suffers from other limitations due to the constraints of the dataset and the experimental environment. High-quality datasets are still lacking in ABSA, and the commonly used datasets are released by SemEval in the early years. This constrains the applicability of the model in various scenarios and contexts. Moreover, due to restrictions imposed by the experimental equipment, we are unable to effectively run and evaluate large models, which prevents us from making sufficient comparisons with large models.

In future work, we intend to integrate domain-specific knowledge into the model through knowledge embedding, aiming to enhance the ability of understand and process information within a specific domain. We also plan to address the dataset issues by creating new datasets from diverse domains with multiple languages. Besides, we will seek to utilize more powerful computing resources to comprehensively evaluate the performance difference between our method and large models.