A Sparse Self-Attention Enhanced Model for Aspect-Level Sentiment Classification

Dhanith, P. R. Joe; Surendiran, B.; Rohith, G.; Kanmani, Sujithra R.; Devi, K. Valli

doi:10.1007/s11063-024-11513-3

A Sparse Self-Attention Enhanced Model for Aspect-Level Sentiment Classification

Open access
Published: 16 February 2024

Volume 56, article number 47, (2024)
Cite this article

Download PDF

You have full access to this open access article

Neural Processing Letters Aims and scope Submit manuscript

A Sparse Self-Attention Enhanced Model for Aspect-Level Sentiment Classification

Download PDF

P. R. Joe Dhanith¹,
B. Surendiran²,
G. Rohith³,
Sujithra R. Kanmani¹ &
…
K. Valli Devi¹

642 Accesses
Explore all metrics

Abstract

Aspect based sentiment analysis (ABSA) manifests the well refined work as Aspect-level sentiment classification (ASC) due to recent high attention and profound outcomes. This paper reveals the exorbitant relation of concerned aspect in determining the sentiment polarity of a sentence in addition to their content. Considering an instance, “The food is tasty but restaurant is untidy”, aspect food reveals the polarity as positive, while in case of restaurant, the polarity seems negative. Hence to scout relation between an $aspect$ and $context$ words present in sentence is much vital. In spite of the exceptional progress, the ASC experiences certain pitfalls as (1) The current attention based methods makes the given aspect to falsely relate syntactically unrelated words as related ones. (2) Single context-independent representation is only achieved by traditional Word2Vec or GloVe based embedding vectors. (3) Sentiments for multiple words that are inconsecutive remain insufficient for CNN based models. A Sparse-Self-Attention-based Gated Recurrent Unit with Aspect Embedding (SSA-GRU-AE) implementing BERT for ASC is proposed to solve these issues. The proposed SSA-GRU-AE mechanism is centralized on various portions of sentence as multiple aspects are taken as input. The experimental analysis on ASC datasets has proved that proposed model enhanced performance on ASC.

A span-based model for aspect terms extraction and aspect sentiment classification

Article 08 August 2020

Contextualized Word Representations with Effective Attention for Aspect-Based Sentiment Analysis

Aspect-Level Sentiment Classification with Dependency Rules and Dual Attention

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Sentiment Analysis (SA) [1] which is a sub-branch of Natural Language Processing (NLP) expresses emotion or view on the given text or speech or any other communication. ABSA [2] extends to be well refined in nature to sentiment analysis providing crucial information for NLP tasks. Aspect-category (ACSA) and Aspect-term (ATSA) forms two sub-tasks of ABSA [3, 4]. ACSA predicts the $sentiment polarity$ for a given $aspect$ in predefined sets hidden in sentence.“The food is tasty but restaurant is untidy”, the ACSA could predict the $sentiment polarity$ for aspect "eatable" yet not available in sentence. On the other hand, ATSA predicts the $sentiment polarity$ of aspect term as a sequel to sentence, where the “The food” form the example sequel to the above sentence [5].

Sentiment polarity differs for various independent aspects. For an ABSA task, the pivotal point is the given aspect term. Furthermore, many words in the sentence are futile on sentiment prediction for a given term. Taking into consideration the given aspect as "restaurant" for weakly associative words, certain words like "food" and "tasty" is impertinent for the sentiment prediction and produces inconsistent results.

Albeit of the excellent performance of Neural Networks (NN) [6,7,8] in various research domains such as $language translation$, $paraphrase recognition$, $question answering$ and $text summarization$, they are still in nonage with ASC. Certain works in target dependent classification gets benefitted with only target information but not aspect information which forms crucial part to evaluate the ASC.

The original ASC task proposed with pre-trained word embedding and task-centric neural framework faces a bottleneck even with improved accuracy or F1-score. This bottleneck is caused due to task-agnostic embedding layer initialized with Word2Vec [9,10,11] and GloVe [12] which is insufficient to capture intricate semantic relevance in the sentence, providing only context- independent word-level features. The limitation in the dataset size to train task-centric frameworks is solved by a deep LSTM [13, 14] with pre-trained word embedding layer.

Overriding the promising nature of attention based models [15,16,17] the insufficiency to seize dependencies between $context$ and $aspect$ present in a sequence occurs, which then leads to the given aspect which mistakenly attend syntactically irrelevant context words as $descriptors$. The example, “Its model is ideal and function is excellent." explains that excellent is mistakenly taken as descriptor of aspect model. Certain models does not fully exploit syntactical structure but rather imposed syntactical constraints on attention-weights.

Issues of attention-based mechanisms is solved by Convolution Neural Networks (CNN) [18,19,20] by finding features as continuous words and using convolution operations over word sequences to predict sentiment of an aspect but failed to capture sentiment polarity for multiple words that are inconsecutive to one another. The sentence, "The workers should bit more work sincerely", makes incorrect prediction for CNN based model by considering "workers" as the aspect and "more work sincerely" as descriptive phrase which reverses the effect on the sentiment.

$BERT$ [21,22,23] overcomes the bottlenecks with the information from entire sentence as input to calculate token level representation while $Word2Vec$ or $GloVe$ based embedding provides single $context-independent$ representation alone. The modeling power of $BERT$ is investigated on ASC in this paper and is considered as well-known pre-trained model equipped with Transformer. The SSA-GRU-AE is proposed to impose the model by attending an important part of sentence with respect to a particular aspect.

Principal benevolent features are:

(1)
The SSA-GRU-AE algorithm is proposed to achieve ASC by examining various portions of the sentence when several aspects are considered.
(2)
The input and aspect embedding are generated by the BERT pre-trained model.
(3)
Aspect which plays a major role in this work is employed in two ways, where in the first part, $aspect embedding$ vector is concatenated with the input embedding vector and the second part with the hidden vector of the sentence is generated by GRU to compute attention weights which efficiently extracts contextual semantic relationship between aspect and sentence.
(4)
Sparse self-attention mechanism, proposed in our method can effectively filter out the unimportant words in sequence that does not contribute to sentiment analysis and also learns sentiment aware word embedding by applying weights on word embedding of input and aspect terms.
(5)
Implementation of ${{\text{L}}}_{1}$-regularize on the attentions makes sure that minimal words influence semantic and sentiment of sequence.
(6)
Results on ASC datasets reveals that proposed model outperformed the $state-of-the-art$ models discussed in the literature.

The remainder of this article is structured as, Sect. 2 discusses the related work, Sect. 3 discloses the proposed model, Sect. 4 analyses the result and finally concluded in Sect. 5 (Table 1).

Table 1 List of abbreviations

Full size table

2 Related Work

The various models that are put forth in the literature survey is studied for ASC task. ASC is a classification technique which is a well refined task in ABSA. Several of the modern approaches could identify polarity of entire sequence even in the unavailability of aspects. Conventional approaches are utilized to design a $bag-of-words$ and $sentiment lexicons$ as features which trains SVM classifier [24] to perform ABSA. In spite of intensive labor in feature engineering, the results are greatly dependable on the caliber of features.

The invention of learning distributed representations has made the Neural Network (NN) approaches become popular for ASC. Classical models such as RecNN [25, 26], CNN [27, 28], RNN, LSTM [29,30,31], GRU [32], GCN [33] and Tree-LSTMs [34] were applied for ASC. Syntax structures of sequences used by Tree-based LSTMs though proved as an effective approach to solve ASC problems also has failed due to syntax parsing errors commonly found in resource lacking languages. LSTMs and GRUs have achieved great success in ASC. Liang et al. introduced a $deep transition$ model called as AGDT [35] using GRU encoder to effectively utilize the aspect from the scratch to improve the feature selection and extraction. The utilization of target information by TD-LSTM and TC-LSTM [36] has made remarkable achievement in ASC task where the target vector attained by TC-LSTM is acquired, by computing the average of word vectors related to aspect, however is insufficient to capture its semantics, resulting in lousy performance.

Attention or gating mechanism is implemented on several existing models that could capture context related features. Ma et al. [37] elucidated a hierarchical $attention$ model which primarily visits aspect words then goes through the whole sentence and later integrates this information with external practical knowledge using sentic-LSTM which resolves the word conflict problems caused by the basic LSTM models. Position information of aspect plays a key role for ASC task which is realized by [38] by proposing a HAPN capable of learning the position information of the aspect followed by fusing the aspects and contexts to bring about the final sentence representation. Zheng et al. [39] proposed a rotatory $attention$ mechanism based neural network which associates the aspect and the left/right contexts implementing three LSTMs one for the $left context$, second for the $target phrase$ and third for the $right context$. Laddha et al. [40] introduced an $attention$ based $Bi-LSTM$ model that productively seizes relation between the multiple aspects and the context words by ignoring the effect of one $aspect$ on another. Finally a CRF is used to model dependencies among output labels. Lin et al. proposed a DSMN [41] to guide multi-hop attention mechanism by computing distance between an aspect and its context to capture aspect aware context information.

Numerous existing models combines CNN with LSTM to seize context related word-level information. Liu et al. [19] proposed a model by combining both regional $CNN$ and $Bi-LSTM$ which obtains the context information as well as the relationship between $aspect$ and $context$ and also gate mechanism which improves word vector representation to make model language independent. Zhang et al. [27] proposed a CMA-MemNet to obtain semantic information from aspects and sentences. Convolution proposed in this model captures context related information and multi-head self-attention obtains semantic information. CNN based models inadequately determines sentiments for multiple words that are not consecutive.

Graph Convolution Network (GCN) models are developed to effectively capture dependency relation between $aspect$ and $context$ due to lack in performance of existing attention based models. Zheng et al. [42] proposed ASGCN which has an LSTM and a multi-layer GCN, to fully leverage the syntactical dependency structure within a sentence. The LSTM generates contextual word embedding for word orders and the multi-layer GCN filters out unimportant words leaving back the important information which is then fed into the attention based LSTM to generate aspect based features to predict sentiments. Hou et al. [43] proposed SAGCN with self-attention, effectively enables the interaction between $aspect$ and its $opinion words$ even if aspect term is far away from it, which later considers the connection between target and its syntactic neighbors. Sun et al. [33] proposed a $convolution over dependency tree$ which utilizes the Bi-LSTM and captures the important features from the sentence that could be fed as input to GCN which then transfers information from the opinion words to aspect words. Liang et al. [44] proposed an interactive $multi task$ learning model by incorporating a new message passing mechanism which utilizes dependency relation embedded GCN to completely exploit the syntactic knowledge for end-to-end ABSA. Wu et al. [45] introduced a GCN with $attention$ utilizing BERT to capture the relation between $aspect$ and its $context$, where attention controls information flow in the GCN.

The influence of the overall contextual score is worsened due to failure of dependency tree based models to capture hidden vector representation based on aspect. Veyseh et al. [46] elucidated a $graph-based model$ with gate vectors that could customize hidden vectors towards aspect terms and also a $dependency tree$ based mechanism to acquire importance scores for every word in sequence.

In spite of existing GCN models captures the entire tree which makes it complex during optimization, it is required that only a small part of the $dependency tree$ is necessary in ASC task. Wang et al. [47] reshaped and pruned only the important part of the dependency tree to specifically focus on target aspects. Then this pruned tree is fed into the $R-GAT$ to encode the $dependency relations$ and also establishes connections between the $aspects$ and $contexts$.

There are few issues observed in the existing models,

Overriding the promising nature of attention based models the insufficiency to capture dependencies between $context$ and $aspect$ present in a sequence occurs, which then leads to the given aspect which mistakenly attend syntactically irrelevant context words as descriptors.
Certain models does not fully exploit syntactical structure but rather imposed syntactical constraints on attention-weights.
Large-scale corpus training improves neural network models. Manually labeling aspect targets to generate aspect-level training data is difficult.
As comments and other corpora with $document-level$ sentiment labels are hard to obtain, gathering users' preferences about multiple aspect categories becomes infeasible.

To differentiate various sentiment polarities at a well refined aspect level, is highly demanding in spite of the efficiency in all the methods. Hence a design of a dominant neural network which could completely engage aspect information for ASC is vital. To address these issues this work proposes a novel SSA-GRU-AE to effectively classify the sentiments with respect to the aspects. Especially the Sparse self-attention mechanism introduced in our work devalues the unimportant words and computes the important words related to the aspect words and also helps to outperform the existing models available in the literature.

3 Proposed Work

This paper focuses on ASC for the given input sequence which employs BERT pre-trained model to compute contextualized word embedding vectors for sentence as well as the aspect terms which then forms the input to the SSA-GRU-AE to classify the aspect-level sentiments.

3.1 Input Layer

The given input sentence ${\text{S}}=\left\{{{\text{s}}}_{1},{{\text{s}}}_{2},{{\text{s}}}_{3},\dots ,{{\text{s}}}_{{\text{a}}},{{\text{s}}}_{{\text{a}}+1},{{\text{s}}}_{{\text{a}}+2},\dots {{\text{s}}}_{{\text{a}}+\left({\text{m}}-1\right)},\dots ,{{\text{s}}}_{{\text{N}}}\right\}$ of length ${\text{N}}$ with ${\text{m}}$ aspect words $\left\{{{\text{s}}}_{{\text{a}}},{{\text{s}}}_{{\text{a}}+1},{{\text{s}}}_{{\text{a}}+2},\dots {{\text{s}}}_{{\text{a}}+({\text{m}}-1)}\right\}$ has to be recast into contextualized word embedding vectors by using $pre-trained$ BERT model. The input format of the given sentence ${\text{S}}$ to the BERT is given by the split up of $"[{\text{CLS}}]+{\text{sentence}}+[{\text{SEP}}]+\mathrm{aspect term}+[{\text{SEP}}]"$. Thus this input format extracts the overt interactions between sentence and aspect term. The embedding information of sub-words generated by BERT is subjected to average pooling which produces ultimate embedding vector ${\text{X}}\in {{\text{R}}}^{{\text{N}}\times {\text{d}}}$, where ${\text{d}}$ is dimension of BERT output. The vector ${\text{X}}$ is then represented as vector sequence, $\left\{{{\text{w}}}_{1},{{\text{w}}}_{2},\dots {{\text{w}}}_{{\text{n}}}\right\}$ for sentence and $\left\{{{\text{v}}}_{{\text{a}}},{{\text{v}}}_{{\text{a}}+1},{{\text{v}}}_{{\text{a}}+2},\dots {{\text{v}}}_{{\text{a}}+({\text{m}}-1)}\right\}$ for aspect terms respectively.

3.2 Gated Recurrent Unit

The $vanilla RNN$ suffers from vanishing or exploding $gradient$ problem for long sequences due to replacement of entire sequence in hidden state during every time step ${\text{t}}$. The Gated Recurrent Unit (GRU) solves this issue by the inclusion of two additive gates in their architecture. The GRU contains two gates where one is $update gate$ and other is $reset gate$. The gates present in GRU removes the unimportant and retains only the important information. The reset gate combines current input with the important part of the previous $hidden state$ to produce a new $hidden state$. The update gate determines the amount of information from current hidden state to be included with final hidden state. This allows network to retain long-term dependencies. The workflow diagram of GRU has been illustrated in Fig. 1.Calculation process of GRU is given from Eq. (1) to (5):

$$ {\text{g}}_{{\text{r}}} = {\upsigma }\left( {{\text{W}}_{{{\text{ir}}}} \cdot {\text{x}}_{{\text{t}}} + {\text{W}}_{{{\text{hr}}}} \cdot {\text{h}}_{{{\text{t}} - 1}} } \right) $$

(1)

$$ {\text{r}} = {\text{tanh}}\left( {{\text{g}}_{{\text{r}}} \odot \left( {{\text{W}}_{{\text{h}}} \cdot {\text{h}}_{{{\text{t}} - 1}} } \right) + {\text{W}}_{{\text{x}}} .{\text{x}}_{{\text{t}}} } \right) $$

(2)

$$ {\text{g}}_{{\text{u}}} = {\upsigma }\left( {{\text{W}}_{{{\text{iu}}}} \cdot {\text{x}}_{{\text{t}}} + {\text{W}}_{{{\text{hu}}}} .{\text{h}}_{{{\text{t}} - 1}} } \right) $$

(3)

$$ {\text{u}} = {\text{g}}_{{\text{u}}} \odot {\text{h}}_{{{\text{t}} - 1}} $$

(4)

$$ {\text{h}}_{{\text{t}}} = {\text{r}} \odot \left( {1 - {\text{g}}_{{\text{u}}} } \right) + {\text{u}} $$

(5)

The class labels represented in the proposed work $\left\{{\text{positive}},\mathrm{ negative},\mathrm{ neutral}\right\}$.

3.3 Self-Attention Based GRU with Aspect Embedding

The general GRU struggles to identify the key part of a sentence for the ASC task. To overcome this problem, this work proposes a novel SSA-GRU-AE. This work integrates sparse self-attention mechanism with GRU to capture a relevant part of sentence with respect to a given aspect. Aspect information play key role to classify polarity of given sentence. To utilize aspect information effectively this work proposes to generate embedding vector using BERT for each aspect. Here ${{\text{V}}}_{{{\text{a}}}_{{\text{i}}}}\in {\mathbb{R}}^{{{\text{d}}}_{{\text{a}}}}$ is embedding of aspect i, where ${{\text{d}}}_{{\text{a}}}$ is dimension of aspect embedding. ${\text{H}}\in {\mathbb{R}}^{{\text{d}}\times {\text{N}}}$ is matrix formed by hidden vectors $\left[ {{\text{h}}_{1} ,{\text{h}}_{2} ,{\text{h}}_{3} , \ldots ,{\text{h}}_{{\text{N}}} } \right]$ that GRU generated. Here ${\text{d}}$ is size of the hidden layers and ${\text{N}}$ is the length of given sentence. ${\text{e}}_{{\text{N}}} \in {\mathbb{R}}^{{\text{N}}}$ is vector of ones. An attention weight vector ${\upalpha }$ and a hidden weight vector ${\text{ r}}$ is produced by self-attention mechanism, as specified in Eqs. (6) to (8).

$$ {\text{M}} = {\text{tanh}}\left( {\left[ {\begin{array}{*{20}c} {{\text{w}}_{{\text{h}}} {\text{H}}} \\ {{\text{W}}_{{\text{v}}} {\text{V}}_{{\text{a}}} \otimes {\text{e}}_{{\text{N}}} } \\ \end{array} } \right]} \right) $$

(6)

$$ {\upalpha } = {\text{softmax}}\left( {{\text{W}}^{{\text{T}}} {\text{M}}} \right) $$

(7)

$$ {\text{r}} = H\alpha ^{T} $$

(8)

where ${\text{ M}} \in {\mathbb{R}}^{{\left( {{\text{d}} + {\text{d}}_{{\text{a}}} } \right) \times {\text{N}}}}$, ${\upalpha } \in {\mathbb{R}}^{{\text{N}}}$, ${\text{r}} \in {\mathbb{R}}^{{\text{d}}}$. ${\text{W}}_{{\text{h}}} \in {\mathbb{R}}^{{{\text{d}} \times {\text{d}}}}$, ${\text{W}}_{{\text{V}}} \in {\mathbb{R}}^{{{\text{d}}_{{\text{a}}} \times {\text{d}}_{{\text{a}}} }}$ and ${\text{W}} \in {\mathbb{R}}^{{{\text{d}} + {\text{d}}_{{\text{a}}} }}$ are weight parameters. $\otimes$ is a concatenation operator, which repeatedly concatenates ${\text{V}}$ for ${\text{N}}$ times. ${\text{W}}_{{\text{V}}} {\text{V}}_{{\text{a}}} \otimes {\text{e}}_{{\text{N}}}$ repeats the linearly tranformed ${\text{V}}_{{\text{a}}}$ as many times till the last word in the given sentence and eventually represented in the Eq. (9),

$$ {\text{h}}^{*} = {\text{tanh}}\left( {{\text{W}}_{{\text{p}}} {\text{r}} + {\text{W}}_{{\text{x}}} {\text{h}}_{{\text{N}}} } \right) $$

(9)

where ${\text{h}}^{*} \in {\mathbb{R}}^{{\text{d}}}$, ${\text{W}}_{{\text{p}}}$ and ${\text{W}}_{{\text{x}}}$ are weight parameters.

The self-attention mechanism captures significant part of sentence with respect to an aspect. Here ${\text{h}}^{*}$ represents feature of a particular sentence with respect to an aspect. Then a linear layer is added to transform sentence vector ${\text{e}}$, which is a vector of length which equals class number $\left| {\text{C}} \right|$. Finally a softmax layer is applied to transform ${\text{e}}$ to conditional probability distribution as specified in Eq. (10).

$$ {\text{y}} = {\text{softmax}}\left( {{\text{W}}_{{\text{s}}} {\text{h}}^{*} + {\text{b}}_{{\text{s}}} } \right) $$

(10)

where ${\text{W}}_{{\text{s}}}$ and ${\text{b}}_{{\text{s}}}$ are the weight and bias parameter of ${\text{softmax}}$ layer. Further ${\text{L}}_{1}$ regularize is applied to make sure only minimal words of the sentence contributes to the semantic and sentiments of the sentence. The sparse self-attention mechanism is depicted in Eq. (11).

$$ \left| {\text{y}} \right|_{{{\text{L}}_{1} }} = \left| {{\text{softmax}}\left( {{\text{W}}_{{\text{s}}} {\text{h}}^{*} + {\text{b}}_{{\text{s}}} } \right)} \right| $$

(11)

The proposed sparse self-attention mechanism effectively removes the words which are unimportant to predict the sentiment and calculates the key part of the sentence. After computation of the important part of the sentence, a weighted summation is performed to predict the sentiment polarity which is specified in Eq. (12).

$$ {\text{d}}^{{\text{k}}} = {\text{y}}_{1} {\text{x}}_{1} + {\text{y}}_{2} {\text{x}}_{2} + \ldots + {\text{y}}_{{\text{n}}} {\text{x}}_{{\text{n}}} $$

(12)

where ${\text{x}}_{{\text{i}}}$ denotes embedding of ${\text{i}}^{{{\text{th}}}}$ word in sentence, ${\text{n}}$ is length of sentence. Finally output of sparse self-attention layer is obtained using Eq. (13).

$$ {\hat{\text{y}}} = {\text{softmax}}\left( {{\text{Wd}}^{{\text{k}}} + {\text{b}}} \right) $$

(13)

where ${\hat{\text{y}}}$ is a predicted sentiment polarity which is a $2 - {\text{D}}$ vector where $\left( {1,0} \right)$ and $\left( {0,1} \right)$ are $positive$ and $negative$ labels respectively.

The model is trained using back propagation where $cross entropy$ loss function is used to minimize error between ${\text{y}}$ and ${\hat{\text{y}}}$ for all sentences which is specified in Eq. (14).

$$ {\text{loss}} = - \mathop \sum \limits_{{\text{i}}} \mathop \sum \limits_{{\text{j}}} {\text{y}}_{{\text{i}}}^{{\text{j}}} {\text{log}}\widehat{{{\text{y}}_{{\text{i}}} }}^{{\text{j}}} $$

(14)

where ${\text{i}}$ and ${\text{j}}$ is index of sentence and class respectively.

Then an Adagrad optimization [48] is adopted to train the model over mini batches. This optimizer improves the strength of SGD on learning process and also performs increase in updates to learning rate for unusual parameters and decrease in updates to learning rate for usual parameters. The workflow diagram of proposed SSA-GRU-AE is illustrated in Fig. 2.

4 Experiments

The proposed SSA-GRU-AE model performs the ASC task in which both the $word$ and $aspect$ embedding vectors are formed by pre-trained BERT and length of attention weights equals length of input sentence. The hidden size $\left( {{\text{dim}}_{{\text{h}}} } \right)$ of BERT is 768 and transformer layers $\left( {\text{L}} \right)$ are 12. The pre-trained BERT model initializes word and aspect embedding vectors while initialization of all other weight parameters are done by random sampling with normal distribution ${ }{\mathcal{N}}\left( {0;{ }0:0.2} \right)$. The attention weights are the same length as the sentences. To implement SSA-GRU-AE, there are a total of 8 attention heads and 12 training layers are utilized. Implementation of proposed and all baseline models are performed by Tensorflow and hyper parameters applied are specified in Table 2.

Table 2 Hyper parameters for baseline models

Full size table

Averaging of 3 runs applying random initialization considering evaluation metrics as $Accuracy$ as well as $F1 - score$ gives the required results is shown in Table 3. A ${\text{Friedman }} - test$ was done on both the $Accuracy$ and $F1 - score$ to check the competency of our model with baseline models as shown in Table 10.

Table 3 Evaluation metrics

Full size table

4.1 Datasets

5 datasets $\left( {{\text{twitter}},{\text{ LAP}}14,{\text{ REST}}14,{\text{ REST}}15{\text{ and REST}}16} \right)$ are employed to prove that the proposed method mastered the existing baseline models.The 5 datasets contains reviews of users with a set of aspects with its polarities from which the sentences with contradictory polarities or inexplicit aspects are removed. The purpose of the proposed work is to ascertain aspect polarity of a sentence with respect to aspect. The dataset details are exhibited in Table 4.

Table 4 Statistical representation of dataset

Full size table

4.2 Baselines and Experimental Setting

Ten sentiment classification models are implemented as baseline models for comparison with proposed model is studied as:

1.
SVM: SVM [24] identifies many qualities of aspects are pointed out, and a quantitative look at each part is effective.
2.
LSTM: LSTM [36] models the sentiment representation and prediction.
3.
Interactive attention network (IAN): IAN model [53] utilizes two LSTMs one for the target and the other for context that are used to extract meaningful information independently with an interactive attention mechanism which later concatenates them to classify the sentiments.
4.
.Memory Network (MemNet): MemNet [54] is a combination of attention mechanism and explicit memory. The multi-hop attention mechanism proposed in this model helps to improve sentiment classification.
5.
AOA [55]: This model learns the aspects and the sentence representation jointly and also explicitly captures the interaction between them.
6.
ASGCN [42]: This model utilizes LSTM and generates contextual information followed by a GCN to obtain aspect specific features which are then fed to a masking mechanism to remove non-aspect words and later fed back to another LSTM to predict the sentiment.
7.
SAGCN [43]: SAGCN utilizes GCN to find correlation between aspect and sentence using dependency tree.
8.
IGCN [28]: A bidirectional gating proposed in IGCN computes the relation between the $aspect$ and its $context$.
9.
DSMN [41]: The multi-hop attention is guided by dynamically selected context memory which then integrates the $aspect$ information with the memory networks.
10.
CMA-MemNet [27]: The rich semantic information between the $aspect$ and the $sentence$ is extracted by this memory networks.

5 Experimental Results

In the experiments, $accuracy$ and $F1 - score$ are the metrics which evaluates performance of proposed method. To evaluate stability of model, the method is run thrice and the $mean accuracy$ and $standard deviation$ are reported in Table 5, 6, 7, 8, 9. The Friedman test verifies the magnitude in differences between the proposed and the other approaches with $p - value$ of 0.05.

Table 5 $Accuracy$ and $F1 - score$ on Twitter dataset

Full size table

Table 6 $Accuracy$ and $F1 - score$ on Lap14 dataset.

Full size table

Table 7 $Accuracy$ and $F1 - score$ on Rest14 dataset.

Full size table

Table 8 $Accuracy$ and $F1 - score$ on Rest15 dataset.

Full size table

Table 9 $Accuracy$ and $F1 - score $ on Rest16 dataset.

Full size table

5.1 Discussion

The result analysis of proposed work with eighteen baseline models clearly proves that proposed work outperformed the existing models in terms of $accuracy$ and $F1 - score.$ The SSA-GRU-AE model consistently performed better than all the baseline models in all the five datasets $\left( {twitter, LAP14, REST14, REST15 and REST16} \right)$ whereas SVM and LSTM performed poorly in all five datasets consistently due to the manual feature engineering of SVM and the lack of aspect information in LSTM for ASC.

The IAN and AOA model is implemented using attention mechanism to solve ASC task by attending all the aspect and context words that helps to outperform both the SVM and baseline LSTM models. The experimental results on all the five datasets proved that both the IAN and AOA consistently performed well. The issue with the attention mechanism is that if the dataset is noisy or it contains multiple aspects this mechanism falsely assigns high scores to irrelevant words and also these attention based models attend all the aspect words with different weights which incorrectly guides $aspect term$ to focus on syntactically $unrelated words$. These issues causes AOA and IAN models performs moderately on all five datasets, in terms of $accuracy$ and $F1-score$.

GCN based models effectively captures both syntactic word dependencies and long range word relations. These models worked well with datasets which are rich in syntax information with good grammatical structure. Especially all the GCN based models effectively worked on $LAP14, REST15 and REST16$ datasets and also improved both the $accuracy$ and $F1-score$. These models failed to work well with the datasets which has less grammatical information and less sensitive syntax information. That is on both the Twitter and REST14 datasets these GCN models produced less $accuracy$ and $F1 - score$. The results shown in Table 5, 6, 7, 8, 9 and 10 clearly proved that proposed work outperformed existing GCN based models in terms of $accuracy$ and $F1-score$.

Table 10 Friedman’s test

Full size table

Memory network based models worked well on all five datasets continuously in terms of $accuracy$ and $F1 - score$. The base memory network model effectively captured aspect-sequence modeling well, but it failed to capture context and sequence information and hence decreased performance of the MemNet model in all the five datasets. But CMA-MemNet and DSMN captured both the information effectively, which helps them to perform better in all five datasets. The results on all five datasets showed that CMA-MemNet and DSMN performed well on all five datasets which outperformed all basic, $attention$ and $GCN$ models. Although these models performed well on all datasets, still it failed to recognize all aspects correctly, which led to performance loss.

The proposed novel sparse self-attention GRU with aspect embedding implementing BERT outperformed all the baseline models on all five datasets in terms of ${\text{accuracy}}$ and ${\text{F}}1 - {\text{score}}$. The BERT model used in our work effectively captured contextual information which helped to capture semantic information. Also sparse self-attention mechanism introduced in our work effectively removed unimportant words and captured only important words related to sentence. Also ${\text{L}}1$-regularize applied on attentions helped to ensure that only a few words contributed to sentiment of sentences. This helped proposed model to perform better on datasets which are grammatically poor and noisy $\left( {{\text{Twitter and REST}}14} \right)$. These advantages helped our model to outperform the existing baseline models on all five data sets where especially our model consistently performed well on ${\text{Twitter and REST}}14$ datasets too. Figures 3, 4, 5, 6, 7 shows the comparative results of all the baseline models with the proposed model on five different datasets.

5.2 Ablation Study

An ablation research is carried out to examine the significance of each component in the proposed model. First pre-trained BERT model is replaced with GloVe model and then executed on five datasets. The proposed model without BERT performed poorly in terms $accuracy$ and $F1 - score$ compared to model with BERT. The model without BERT produced 2.9%, 2.7%, 1.6%, 1.4%, and 1.3% less $accuracy$ and 2.6%, 2.4%, 1.3%, 1.1% and 1% less $F1 - score$ than model with BERT on $Twitter, REST14, LAP14, REST15 and REST16$ datasets respectively. The proposed model without BERT performed poor when compared to some of GCN and memory network models too. The BERT model effectively captured semantic relation between aspect and context. This helped to improve performance of proposed model.

To identify importance of $sparse self - attention mechanism$, a model without $sparse self - attention mechanism $ is designed. Then model without sparse self-attention mechanism was executed on five datasets. The proposed model without sparse self-attention mechanism performed poorly in terms ${\text{accuracy}}$ and ${\text{F}}1 - {\text{score}}$ compared to model with model with sparse self-attention mechanism. The model without sparse self-attention mechanism produced 2.6%, 2.3%, 1.4%, 1.2%, and 1.1% less ${\text{accuracy}}$ and 2.3%, 2.1%, 1.1%, 0.9% and 0.7% less ${\text{F}}1 - {\text{score}}$ than model with sparse self-attention mechanism on ${\text{Twitter}},{\text{ REST}}14,{\text{ LAP}}14,{\text{ REST}}15{\text{ and REST}}16$ datasets respectively.The experimental results on five datasets proved that proposed sparse self-attention mechanism improved performance of proposed model. The sparse self-attention mechanism effectively captured the importance of each word in the context with respect to $aspect$. This helped to remove unimportant words from sentence and keep only important part of sentence. This helped to improve performance of proposed model over model without sparse self-attention mechanism.

5.3 Case Study

For instance, consider the following phrase: " $\user2{Even if it^{\prime}s a good day}$, $\user2{I don^{\prime}t feel it}$. $\user2{I^{\prime}m really miserable}$. " The terms "$miserable$," "$feel$," and "$don^{\prime}t$" are extremely significant in predicting the sentiment polarity of this sentence than the words "$good$" and "$day$". Many words, including "the," "in," "it," and "I'm," are also unimportant. Therefore, it's crucial to create a model that can accurately depict the significance of each word in the documents. Additionally, it must remain sparse enough that only a $few words$ can accurately categorize the sentiment labels of the sentence. The self-attention layer of the proposed SSA-GRU-AE was used to determine the significance of each word in sentence. In order to make sure that only a handful of words are needed to identify the sentiment polarities of sentences, an L1 regularization is then used for these weights.

Consider the following sentence," The meal is tasty, but the restaurant is untidy. ", the proposed model predicts $sentiment polarity$ for the aspect "meal" as $positive$ and "restaurant" as $negative$. The self-attention layer in the proposed model accurately identifies the $aspect$ term "meal" and its $context$ as "tasty" and also the $aspect$ term "restaurant" and its $context$ as "untidy". The sparse nature of the $self - attention$ mechanism of the proposed model helps to retain only important words from the sentence such as "meal", "tasty", "restaurant" and "untidy" and also helps to avoid unimportant words such as "The", "is" and "but". The ability of sparsification in retaining only important words from the sentence helps to achieve ASC tasks efficiently. The self-attention mechanism serves as a sparsification mechanism in the proposed model, where it is regarded as a type of regularization that potentially enhances the quality of the model by efficiently diminishing noise within it. Due to the sparse nature of the proposed SSA-GRU-AE, neurons combine the output activations which are very similar, change their biases, and then rewire the network to reflect these changes. This sparsification enhances the efficiency of models by allowing them to function effectively in feature spaces with high dimensions. This also reduces the complexity of representation, wherein only a subset of dimensions is utilized at any given moment and decrease complexity by nullifying specific subsets of the model parameters. This resulted in reducing many useless words present in the sentence for $sentiment polarity$ prediction with respect to an $aspect$. Because of these advantages, the proposed model identifies the $aspect$ phrases and its corresponding $context$ words as a sequel to sentences, such as "meal" and "tasty" in the above statement.

Another example is, " The laptop's model is ok, and its performance is great. ". The proposed SSA-GRU-AE model identifies the aspect term "model" and its context as "ok" correctly and predicts its sentiment polarity as neutral. But for the aspect term "performance" the context is identified as "great" and predicts its sentiment polarity as positive. The sparse-self-attention mechanism assigns high weights to the terms "laptop", "model", "ok", " performance", and "great" and low weights to the terms "The", "is", "its" and "and". This example shows the importance of sparsification in self-attention mechanism for ASC tasks. The L1-regularize applied on attentions helped to ensure that only a few words contributed to the $sentiment$ of sentences. This is due to the fact that when entire neurons or filters are eliminated, the principles of associativity and distributivity can be employed to convert a sparsified structure into a more compact, dense structure. Nevertheless, in the event that eliminating arbitrary components of a weight matrix, it becomes necessary to retain the indices corresponding to the non-zero items that remain. The process of model sparsification alters the model's characteristics, although it does not modify the sparsity pattern observed after successive inferences or forward passes. This helped proposed model to perform better on datasets which are grammatically poor and noisy.

Consider the following example, " The workers should do more work truly. " by considering "workers" as the aspect and "more work truly" as its context phrase to identify its sentiment polarity. The existing DNN models identify the aspect term as "workers" and its context term as "work truly" and predicts the sentiment polarity as positive. But actually the weight of the term "more" is important in this context. The sparse-self-attention in the proposed model helps to identify the context term "more work truly" to predicts its actual sentiment polarity "negative". This elucidates the importance of the sparse nature in self-attention mechanism. This also shows that the proposed model can able to capture the context words with implicit meaning (Fig. 8).

6 Conclusion

SSA-GRU-AE is proposed to perform ASC task which contains three parts. The first part is BERT embedding layer, second part is GRU layer and third part is sparse self-attention layer. The BERT pre-trained model used in this work effectively captures contextual word embedding of both the sentence and aspect and also captures relation between them. The sparse self-attention mechanism proposed in our work effectively captures the important part of sentences with respect to aspect. The experimental results on 5 datasets $\left( {{\text{twitter}},{\text{ LAP}}14,{\text{ REST}}14,{\text{ REST}}15{\text{ and REST}}16} \right)$ proved that proposed method outperformed the existing baseline models in terms of ${\text{accuracy}}$ and ${\text{F}}1 - {\text{score}}$. Addition of ablation study and discussion has further demonstrated the proficiency of proposed model.

Data Availability

Supporting data will be provided based on request.

References

Ambartsoumian A, Popowich F (2019) Self-attention: a better building block for sentiment analysis neural network classifiers. 130–139
Wang P et al (2023) A novel adaptive marker segmentation graph convolutional network for aspect-level sentiment analysis. Knowledge-Based Syst 270:110559
Article Google Scholar
Nazir A, Rao Y, Wu L, Sun L (2020) Issues and challenges of aspect-based sentiment analysis: a comprehensive survey. IEEE Trans Affect Comput 3045:1–20
Google Scholar
Dhanith PRJ, Prabha KSS (2023) A critical empirical evaluation of deep learning models for solving aspect based sentiment analysis. Artif Intell Rev
Wang Y, Huang M, Zhao L, Zhu X (2016) Attention-based LSTM for aspect-level sentiment classification. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 606–615
Yao X (1999) Evolving artificial neural networks. Proc IEEE 87(9):1423–1447
Article Google Scholar
Zhou J, Huang JX, Chen Q, Hu QV, Wang T, He L (2019) Deep learning for aspect-level sentiment classification: Survey, vision, and challenges. IEEE Access 7:78454–78483
Article Google Scholar
Liu H, Chatterjee I, Zhou M, Lu XS, Abusorrah A (2020) Aspect-based sentiment analysis: A survey of deep learning methods. IEEE Trans Comput Soc Syst 7(6):1358–1375
Article Google Scholar
Goldberg Y, Levy O (2014) word2vec explained: deriving Mikolov et al.'s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722..
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. Adv Neural Inf Process Syst 26
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: 1st international conference on learning representation ICLR 2013 - Work. Track Proc., pp 1–12
Pennington J, Richard S (2017) GloVe: global vectors for word representation Jeffrey. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. Adv Neural Inf Process Syst 4(January):3104–3112
Google Scholar
Al-Smadi M, Talafha B, Al-Ayyoub M, Jararweh Y (2019) Using long short-term memory deep neural networks for aspect-based sentiment analysis of Arabic reviews. Int J Mach Learn Cybern 10(8):2163–2175
Article Google Scholar
Yadav RK, Jiao L, Goodwin M, Granmo OC (2021) Positionless aspect based sentiment analysis using attention mechanism[Formula presented]. Knowledge-Based Syst 226:107136
Article Google Scholar
Wu H, Zhang Z, Shi S, Wu Q, Song H (2022) Phrase dependency relational graph attention network for Aspect-based Sentiment Analysis. Knowledge-Based Syst 236:107736
Article Google Scholar
Su J et al (2021) Enhanced aspect-based sentiment analysis models with progressive self-supervised attention learning. Artif Intell 296:103477
Article MathSciNet Google Scholar
Shu L, Xu H, Liu B (2019) Controlled CNN-based sequence labeling for aspect extraction.
Liu G, Huang X, Liu X, Yang A (2020) A novel aspect-based sentiment analysis network model based on multilingual hierarchy in online social network. Comput J 63(3):410–424
Article Google Scholar
Wang X, Li F, Zhang Z, Xu G, Zhang J, Sun X (2021) A unified position-aware convolutional neural network for aspect based sentiment analysis. Neurocomputing 450:91–103
Article Google Scholar
Karimi A, Rossi L, Prati A (2020) Improving BERT performance for aspect-based sentiment analysis.
Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the north American chapter of the association for computational linguistics vol 1, pp 4171–4186
Liang Y, Meng F, Zhang J, Chen Y, Xu J, Zhou J (2021) A dependency syntactic knowledge augmented interactive architecture for end-to-end aspect-based sentiment analysis. Neurocomputing 454:291–302
Article Google Scholar
Kiritchenko S, Zhu X, Cherry C, Mohammad S (2015) NRC-Canada-2014: detecting aspects and sentiment in customer reviews. No. SemEval, pp 437–442
Nguyen TH, Shirai K (2015) PhraseRNN: Phrase recursive neural network for aspect-based sentiment analysis. In: Conf. Proc. - EMNLP 2015 Conf. Empir. Methods Nat. Lang. Process., no. September, pp. 2509–2514, 2015.
Luo H, Li T, Liu B, Wang B, Unger H (2019) Improving aspect term extraction with bidirectional dependency tree representation. IEEE/ACM Trans Audio Speech Lang Process 27(7):1201–1212
Article Google Scholar
Zhang Y, Xu B, Zhao T (2020) Convolutional multi-head self-attention on memory for aspect sentiment classification. IEEE/CAA J Autom Sin 7(4):1038–1044
Article Google Scholar
Kumar A, Teja Narapareddy V, Aditya Srikanth V, Bhanu Murthy Neti L, Malapati A (2020) Special section on advanced data mining methods for social computing aspect-based sentiment classification using interactive gated convolutional network. pp 22445–22453
Ma Y, Peng H, Cambria E (2018) Targeted aspect-based sentiment analysis via embedding commonsense knowledge into an attentive LSTM. 32nd AAAI Conf Artif Intell AAAI 2018:5876–5883
Google Scholar
Giannakopoulos A, Musat C, Hossmann A, Baeriswyl M (2017) Unsupervised aspect term extraction with B-LSTM & CRF using automatically labelled datasets. arXiv preprint arXiv:1709.05094..
Zeng J, Ma X, Zhou K (2019) enhancing attention-based LSTM with position context for aspect-level sentiment classification. IEEE Access 7:20462–20471
Article Google Scholar
Setiawan EI, Ferry F, Santoso J, Sumpeno S, Fujisawa K, Purnomo MH (2020) Bidirectional GRU for targeted aspect-based sentiment analysis based on character-enhanced token-embedding and multi-level attention. Int J Intell Eng Syst 13(5):392–407
Google Scholar
Sun K, Zhang R, Mensah S, Mao Y, Liu X (2019) Aspect-level sentiment analysis via convolution over dependency tree. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 5679–5688
Tai KS, Socher R, Manning CD (2015) Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of 45th annual meeting of the association for computational linguistics (ACL), vol 1, 1556–1566
Liang Y, Meng F, Zhang J, Xu J, Chen Y, Zhou J (2020) A novel aspect-guided deep transition model for aspect based sentiment analysis. In: Proceedings of the 2020 conference on empirical methods in natural language processing, pp 5569–5580
Tang D, Qin B, Feng X, Liu T (2015) Effective LSTMs for target-dependent sentiment classification. In: COLING 2016 - 26th international conference on computing linguistics, pp 3298–3307
Ma Y, Peng H, Khan T, Cambria E, Hussain A (2018) Sentic LSTM: a hybrid network for targeted aspect-based sentiment analysis. Cognit Comput 10(4):639–650
Article Google Scholar
Li L, Liu Y, Zhou A (2018) Hierarchical attention based position-aware network for aspect-level sentiment analysis. In: Proceedings of the 22nd conference on computational natural language learning, pp 181–189
Zheng S, Xia R (2018) Left-center-right separated neural network for aspect-based sentiment analysis with rotatory attention
Laddha A, Mukherjee A (2019) Aspect specific opinion expression extraction using attention based LSTM-CRF network. International conference on computational linguistics and intelligent text processing 2018 Mar 18. Springer, Cham, pp 442–454
Google Scholar
Lin P, Yang M, Lai J (2021) Deep selective memory network with selective attention and inter-aspect modeling for aspect level sentiment classification. IEEE/ACM Trans Audio Speech Lang Process 29:1093–1106
Article Google Scholar
Zhang C, Li Q, Song D (2020) Aspect-based sentiment classification with aspect-specific graph convolutional networks. In: EMNLP-IJCNLP 2019–2019 conference on empirical methods in natural language processing, pp 4568–4578
Hou X, Huang J, Wang G, Qi P, He X, Zhou B (2021) Selective attention based graph convolutional networks for aspect-level sentiment classification. 83–93
Wu C et al (2021) Multiple-element joint detection for aspect-based sentiment analysis. Knowledge-Based Syst 223:107073
Article Google Scholar
Ben Veyseh AP, Nouri N, Dernoncourt F, Tran QH, Dou D, Nguyen TH (2020) Improving aspect-based sentiment analysis with gated graph convolutional networks and syntax-based regulation. Find Assoc Comput Linguist EMNLP 2020:4543–4548
Google Scholar
Wang K, Shen W, Yang Y, Quan X, Wang R (2020) Relational graph attention network for aspect-based sentiment analysis. 3229–3238
Duchi J, Hazan E, Singer Y (2010) Adaptive subgradient methods for online learning and stochastic optimization. In: COLT 2010 - 23rd conference on learning theory, pp 257–269
Dong L, Wei F, Tan C, Tang D, Zhou M, Xu K (2014) Adaptive recursive neural network for target-dependent twitter sentiment classification. In: 52nd annual meeting association computing, linguistics ACL 2014, vol 2, pp 49–54
Pontiki M, Galanis D, Pavlopoulos J, Papageorgiou H, Androutsopoulos I, Manandhar S (2015) SemEval-2014 Task 4: aspect based sentiment analysis. no. SemEval, pp 27–35
Pontiki M, Galanis D, Papageorgiou H, Manandhar S, Androutsopoulos I (2015) SemEval-2015 task 12: aspect based sentiment analysis, 486–495
Pontiki M et al (2016) SemEval-2016 task 5: aspect based sentiment analysis. In: ProWorkshop on semantic evaluation, pp 19–30
Ma D, Li S, Zhang X, Wang H (2017) Interactive attention networks for aspect-level sentiment classification. In: Proceedings of the twenty-sixth international joint conference artificial intelligence, pp 4068–4074
Tang D, Qin B, Liu T (2016) Aspect level sentiment classification with deep memory network. In: EMNLP 2016 – conference on empirical methods natural language processing, 214–224
Huang B, Ou Y, Carley KM (2018) Aspect level sentiment classification with attention-over-attention neural networks. In: Social, cultural, and behavioral modeling: 11th international conference, SBP-BRiMS 2018, Washington, DC, USA, July 10–13, 2018, Proceedings 11

Download references

Acknowledgements

Not applicable.

Funding

Not applicable.

Author information

Authors and Affiliations

School of Computer Science and Engineering, Vellore Institute of Technology Chennai Campus, Chennai, 600127, India
P. R. Joe Dhanith, Sujithra R. Kanmani & K. Valli Devi
Department of Computer Science and Engineering, National Institute of Technology Puducherry, Karaikal, 609609, India
B. Surendiran
School of Electronics and Communication Engineering, Vellore Institute of Technology Chennai Campus, Chennai, 600127, India
G. Rohith

Authors

P. R. Joe Dhanith
View author publications
You can also search for this author in PubMed Google Scholar
B. Surendiran
View author publications
You can also search for this author in PubMed Google Scholar
G. Rohith
View author publications
You can also search for this author in PubMed Google Scholar
Sujithra R. Kanmani
View author publications
You can also search for this author in PubMed Google Scholar
K. Valli Devi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors equally contributed to this work. PRJD developed the idea and written the manuscript, BS supervised and reviewed this work, GR designed the experimental setup & analyzed the results and SRK & KVD prepared the figures and tables.

Corresponding author

Correspondence to P. R. Joe Dhanith.

Ethics declarations

Competing interests

The authors declare no competing interests.

Consent for Publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Dhanith, P.R.J., Surendiran, B., Rohith, G. et al. A Sparse Self-Attention Enhanced Model for Aspect-Level Sentiment Classification. Neural Process Lett 56, 47 (2024). https://doi.org/10.1007/s11063-024-11513-3

Download citation

Accepted: 24 November 2023
Published: 16 February 2024
DOI: https://doi.org/10.1007/s11063-024-11513-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Sparse Self-Attention Enhanced Model for Aspect-Level Sentiment Classification

Abstract

Similar content being viewed by others

A span-based model for aspect terms extraction and aspect sentiment classification

Contextualized Word Representations with Effective Attention for Aspect-Based Sentiment Analysis

Aspect-Level Sentiment Classification with Dependency Rules and Dual Attention

1 Introduction

2 Related Work