Microblog sentiment analysis based on deep memory network with structural attention

Zhou, Lixin; Zhang, Zhenyu; Zhao, Laijun; Yang, Pingle

doi:10.1007/s40747-022-00904-5

Microblog sentiment analysis based on deep memory network with structural attention

Original Article
Open access
Published: 18 November 2022

Volume 9, pages 3071–3083, (2023)
Cite this article

Download PDF

You have full access to this open access article

Complex & Intelligent Systems Aims and scope Submit manuscript

Microblog sentiment analysis based on deep memory network with structural attention

Download PDF

Lixin Zhou¹,
Zhenyu Zhang ORCID: orcid.org/0000-0002-4888-4023²,
Laijun Zhao¹ &
…
Pingle Yang¹

1543 Accesses
2 Citations
Explore all metrics

Abstract

Microblog sentiment analysis has important applications in many fields, such as social media analysis and online product reviews. However, the traditional methods may be challenging to compute the long dependencies between them and easy to lose some semantic information due to low standardization of text and emojis in microblogs. In this paper, we propose a novel deep memory network with structural self-attention, storing long-term contextual information and extracting richer text and emojis information from microblogs, which aims to improve the performance of sentiment analysis. Specifically, the model first utilizes a bidirectional long short-term memory network to extract the semantic information in the microblogs, and considers the extraction results as the memory component of the deep memory network, storing the long dependencies and free of syntactic parser, sentiment lexicon and feature engineering. Then, we consider multi-step structural self-attention operations as the generalization and output components. Furthermore, this study also employs a penalty mechanism to the loss function to promote the diversity across different hops of attention in the model. This study conducted extensive comprehensive experiments with eight baseline methods on real datasets. Results show that our model outperforms those state-of-the-art models, which validates the effectiveness of the proposed model.

Microblog Sentiment Classification Method Based on Dual Attention Mechanism and Bidirectional LSTM

Attention based hierarchical LSTM network for context-aware microblog sentiment classification

Article 29 January 2018

Sentiment Analysis Model Based on Structure Attention Mechanism

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Microblogs have become an essential channel for information dissemination in online social media, playing an indispensable role in information dissemination and information acquisition. As an important social media, users’ emotions play an important role in microblogging information dissemination. The users who are removed from positive information in the information flow become increasingly depressed in social media. In contrast, those removed from negative information become increasingly positive [1]. In addition, users’ emotions in social media are highly contagious, as images, videos, and even words themselves can cause changes in users' emotions. Therefore, analyzing whether microblogs contain users' subjective emotions and what kind of polarized emotions they contain is important for studying the mechanism and dynamics of information dissemination in microblogs, predicting trends of unexpected events [2], and even stock market forecasting [3].

The existing microblog sentiment analysis methods can be mainly classified into knowledge-based, machine learning methods based on feature classification, and deep learning methods. The knowledge-based method first constructs a knowledge base for microblog sentiment analysis, including sentiment lexicon [4], phrase lexicon and emoji lexicon [5], syntactic dependency relationship rule base, a domain ontology base, and then uses the knowledge base to aggregate and calculate the sentiment of microblogs. The knowledge-based method is simple and suitable for large-scale and multi-domain application scenarios. Still, it relies heavily on human experience, and it is complicated and costly to construct a good knowledge base. Therefore, knowledge-based methods are often combined with machine learning methods to extract sentiment features in microblogs using knowledge bases. Feature classification-based machine learning methods start with feature engineering to construct feature sets for microblog sentiment classification and then use supervised machine learning methods to classify sentiment blog sentiments. The feature sets include n-gram features, lexical features, syntactic dependency features, TF-IDF (term frequency–inverse document frequency) features, and knowledge-based features. Commonly used machine learning methods, including Naive Bayesian methods, SVM (support vector machines), CRF (conditional random fields), and ensemble learning methods [2], have been used in microblog sentiment analysis research.

There are two main drawbacks of feature classification-based machine learning methods. First, supervised learning methods require a large number of manually labeled datasets, and unsupervised learning methods are currently not mature enough, so weakly supervised learning methods [6] have recently received more attention and are used in the research of microblog sentiment analysis [7]. Second, it relies heavily on feature sets and low domain applicability, and it often requires much effort to spend on feature engineering. Therefore, with the rapid development of deep learning, more and more scholars use deep learning-based methods to study microblog sentiment analysis.

The deep learning-based approach first divides the microblog text into words, and then represents the words as word vectors, and then builds a deep neural network model to semantically extract the microblog text. They construct a representation vector of microblog sentiment, and finally performs sentiment classification. The commonly used deep learning models include recurrent neural networks, convolutional neural networks. In the literature [8], a bidirectional long short-term memory network model was used to classify negative emotions of microblog users into anger, sadness and fear. Sun et al. [9] used a convolutional neural network model to analyze the sentiment tendency of microblogs. Ke et al. [10] constructed a multi-channel convolutional neural network for microblog sentiment classification by combining different feature information to form different network inputs. To improve the sentiment semantic extraction ability of the model, many deep learning models have been proposed to help solve sentiment analysis tasks in recent years, such as attention mechanism [11], deep memory network [12, 13]. Li et al. [14] constructed a dual attention model with microblog text and sentiment symbols to classify microblog sentiment, and the accuracy of sentiment classification was improved using sentiment symbols. Nevertheless, microblogs are characterized by diverse forms, low standardization of linguistic expressions, many online vocabularies and emojis, which makes it difficult for those above-mentioned approaches to extract semantic features and user sentiments. In addition, the memory storage capacity of those above-mentioned approaches is too weak to store much information and easy to lose some of the semantic information, which lead to these methods cannot store long dependencies information and thus affect the performance of classification of user's polarity and subjective emotions.

To address the above limitations, this paper develops a deep memory network with structural attention model for sentiment classification, capturing semantic dependencies between context words and emojis through structural self-attention mechanism, which is free of syntactic parser, sentiment lexicon and feature engineering. Then, this study conducts extensive experiments on two real datasets from NLPCC 2013 and NLPCC 2014. Results show that our approach outperforms feature-based SVM, LSTM and other baseline models in the field of sentiment analysis. The main contributions of this work are summarized as follows:

To the best of our knowledge, this study first introduces a structural self-attention mechanism in deep memory model, which combines the storage mechanism of long-term contextual information of deep memory networks with the ability of structural self-attention, to find the user subjectivity and sentiment tendency hidden from texts and emojis in the online social media.
This study employs a penalty mechanism to the loss function of our deep learning model. Each attention vector in the attention matrix of the last computation step focuses on different sentiment features, which facilitates to improve the performance of subjectivity recognition and sentiment classification tasks.
This model achieves state-of-the-art performance compare to those baseline models in the field of sentiment analysis.

The rest of the paper is organized as follows. “Related works” demonstrates the related works from deep memory network, attention mechanism and sentiment analysis. “Deep memory network with structural attention for microblog sentiment analysis” details the structural deep memory model. “Experimental Results” and “Results and Discussions” show the datasets and the model performance. Conclusion and future work are presented in “Conclusions”.

Related works

Sentiment analysis

Sentiment analysis (SA) is one of the most important tasks in the field of natural language processing in online social network platforms. Recent studies of sentiment classification can be mainly grouped into three categories: aspect-level SA, sequence-level SA, and document-level SA.

(1) Most current aspect-level SA recognizes the entities or aspects from text sentence, and makes a positive, neutral, or negative sentiment judgments on those aspects. For example, Gan et al. [15] proposed a self-attention-based hierarchical dilated convolutional neural network for multi-aspects sentiment analysis. Ding et al. [16] designed an aspect-level sentiment analysis tool consisting of sentiment classification and aspect recognition. However, these aspect-based sentiment analysis methods mainly depend on explicit aspect words in each sentence, and include word segmentation, semantic recognition, aspect extraction, and other complex processes. (2) Research on sequence-level sentiment analysis mainly focuses on the discrimination of implicit sentiment and sequence sentences, which aims to classify the sentiment polarities (positive or negative opinion) of those sentences. Long short-term memory model (LSTM) can effectively capture the semantics of long dependent sequences, and assign different weights to words in sentences automatically via introduction of attention mechanism [17]. Chen et al. [18] perform sentiment analysis separately on sentences from each type using BiLSTM-conditional random field (CRF) and convention neural networks (CNN), to improve the sequence-level sentiment classification. (3) Document-level SA aims to classify the sentiment polarity of document-level reviews posted by online users about products and services, which have a large number of sequence sentences [19]. In general, the task is referred to as document-level analysis because it considers each document as a whole and does not study entities or aspects inside the document or determine sentiments expressed about them via supervised methods of Artificial Neural Networks [20, 21].

In online social network platforms, the contents published by users are usually short sentences which less 140 words. Furthermore, microblog texts may not have explicit aspect and sentiment words, so it is not suitable for using aspect-level and document-level sentiment analysis. Therefore, we regard each sentence as a whole and need not study entities or aspects inside the sentence, and use document-level SA to determine the overall sentiment of those sentences via the deep memory networks with structural attention.

Deep memory network

Memory networks are a special kind of neural network learning framework [11]. The main idea is to store contextual information as long-term memory and generate representations of input data by reading, writing or updating operations on these long-term memories, which is used for generative or predictive tasks.

The memory network consists of five components: storage ($M$), input ($I$), generalization ($G$), output ($O$), and response ($R$) [12]. Storage component $M$ is used to store contextual information’s long-term memory. Unlike RNN, this information is stored by separate components, thus enabling the use of fairly long-term contextual information in subsequent computations. The input component $I$ is used to transform the input data into a feature representation inside the model and acts as feature extraction. The generalization component $G$ updates the memory information stored in the storage component based on new inputs. It can compress or generalize the long-term memory as needed to meet model prediction or generation needs. The output component $O$ generates an output of the feature representation space based on the current input and the state of the long-term memory. The response component $R$ gives a specific prediction or generation result based on this input.

Theoretically, the components of a memory network can be assumed by any appropriate machine learning model. Still, an end-to-end implementation based on deep neural networks is one of the most convenient and well-performing implementations. Sukhbaatar et al. [22] proposed an end-to-end memory network model incorporating attention mechanisms and multi-step (hops) computation to implement multi-level output components. Deep memory networks have been used in a variety of fields, such as attribute-level sentiment classification [23], question and answer systems [24] with good results, and their ideas have been widely adopted for building more complex deep learning models [25].

Attention mechanism

The idea of attention mechanism originated from the mechanism of selective attention in human vision and was initially used mainly in visual images. The attention mechanism allows different parts of the sentence to play different roles in completing different tasks, thus avoiding encoding the information of the whole sentence as a fixed vector for all tasks.

The attention mechanism has been used extensively in various areas of text sequence processing and has become one of the required components of deep neural network models [13]. Dot-product attention [26] and additive attention [27] are the two most commonly used attention functions. Dot-product attention calculates the alignment scores based on hidden states for the encoder and hidden states for the decoder. Similar to dot product attention, additive attention uses a one-hidden layer feed-forward network to calculate the attention alignment scores. In addition, multi-head attention allows the model to stack several attention functions and run through them in parallel [28]. In general, it commonly uses dot product attention function in different heads. Multi-head contextual attention is a variant of the multi-head attention mechanism, which sets attention mechanisms on a window of fixed context words, to learn the different semantics of subspaces in other locations [28]. Memory-based attention borrows the idea of memory networks, stores the contextual information and updates it to calculate the attention scores [29].

Compared with the above-mentioned attention mechanism, the structural self-attention mechanism allows extracting different aspects of the sentence into multiple vector representation, enabling global semantic information to be available for each word at each position, facilitating the establishment of long dependencies. Therefore, this study combines the structural self-attention mechanism and the deep memory networks for sentiment analysis.

Deep memory network with structural attention for microblog sentiment analysis

In this section, we present a structural deep memory network model based on deep memory networks and structural self-attention mechanism for microblog sentiment analysis, which combines the storage mechanism of long-term contextual information of deep memory networks with the ability of structural self-attention.

Microblog sentiment analysis task

Microblog sentiment analysis is to determine the user sentiment ${e}_{i}$ of ${d}_{i}$ in a given microblog document ${s}_{i}\!\in\! D\left(i\!=\!1,\right.\break \left.\cdots ,|D|\right)$. The document ${s}_{i}\!=\!\left\{{w}_{0},{w}_{1},\cdots ,{w}_{n}\right\}$ consists of a set of ordered words and emojis, while $D$ is a collection of documents. The microblog sentiment analysis includes subjectivity classification and sentiment classification:

Subjectivity classification is classifying microblog documents into microblogs containing user sentiment and those that do not contain user sentiment, denoted as ${e}_{i}\in {\{o}_{0},{o}_{1}\}$.

Sentiment classification is classifying microblog documents ${s}_{i}\in \left\{D|{e}_{i}={o}_{1}\right\}$ into microblogs containing positive sentiment or negative sentiment, denoted as ${e}_{i}\in {\{o}_{p},{o}_{n}\}$.

The reason for dividing the microblog sentiment analysis process into these two parts is the varying amount of sentiment categories in the microblog data. Most of contents in the microblog datasets only express some facts about users and do not express any sentiment. The two parts can reduce the impact of data imbalance on model performance.

Deep memory network with structural self-attention model

In this section, we present the details of our structural attention deep memory network, which is divided into five components.

(1)
Input components

The input component of our model is to encode the sentence sequence and extract the semantic information via the vector representation layer and a BiLSTM (bidirectional long short-term memory) layer. Following the previous work [13], emojis also play an important role in expressing their emotions, we consider the microblog $s$ consist of word and emoji sequences $\left\{{w}_{0},{w}_{1},\cdots ,{w}_{n}\right\}$. The vector representation layer converts the sentence sequences into word and emoji vector sequences ($N$ is the number of words and emojis in the sentence and $e$ is the dimension of embedding) through a vector matrix $E\in {\mathbb{R}}^{N\times e}$. The microblog document $s$ is converted into vector sequences $V=\left\{{v}_{1},{v}_{2},\cdots ,{v}_{n}\right\}$.

RNN has a serious gradient disappearance problem, which makes the model difficult to handle long sequences. Since the LSTM (long short-term memory) network is one of the widely used deep learning models for sentence learning tasks, we select it to extract text feature from microblog sequence information extraction in our model. The architecture of the LSTM model is presented in Fig. 1.

More specifically, each of the LSTM memory cell contains three gates (1) input gate ${i}_{t}$; (2) output gate ${o}_{t}$; (3) forget gate ${f}_{t}$. At every timestep $t$, each of the three gates is presented with the input ${x}_{t}$ and the previous hidden states ${h}_{t-1}$. The input gate ${i}_{t}$ specifies which information is added to the cell state. The output gate ${o}_{t}$ specifies which information from the cell state is used as output. The forget gate ${f}_{t}$ denotes which information is removed from the cell state. Afterward, the equations below describe the update of memory cells from timestep $t-1$ to $t$.

$${i}_{t}=\upsigma \left({W}_{ix}{x}_{t}+{W}_{ih}{h}_{t-1}+{b}_{i}\right)$$

$${f}_{t}=\upsigma \left({W}_{fx}{x}_{t}+{W}_{fh}{h}_{t-1}+{b}_{f}\right)$$

$${o}_{t}=\upsigma \left({W}_{ox}{x}_{t}+{W}_{oh}{h}_{t-1}+{b}_{o}\right)$$

$${g}_{t}=\mathrm{Tanh}\left({W}_{cx}{x}_{t}+{W}_{ch}{h}_{t-1}+{b}_{c}\right)$$

$${c}_{t}={f}_{t}\odot {c}_{t-1}+{i}_{t}\odot {g}_{t}$$

$${h}_{t}={o}_{t}\odot \mathrm{Tanh}\left({c}_{t}\right)$$

(1)

where ${h}_{t}$ denotes the output vector in each LSTM layer. ${W}_{ix},{W}_{ih},{W}_{fx},{W}_{fh},{W}_{ox},{W}_{oh}$ and ${W}_{ch}$ represent the weight matrices, respectively.${b}_{i}, {b}_{f}, {b}_{o}$ and ${b}_{c}$ denote the bias vectors. $\odot $ is the element-wise multiplication. $\sigma $ stands for the element-wise sigmoid unction.

However, the common LSTM can only use the forward semantic information in the process of information extraction, and cannot use the inverse semantic information of the sequence. Therefore, BiLSTM model is used for semantic information extraction of lexical sequences in microblogging texts.

$${H}^{f}=LST{M}^{f}\left(V\right)$$

$${H}^{b}=LST{M}^{b}\left(V\right)$$

(2)

$$H=concat\left({H}^{f};{H}^{b}\right)$$

where ${H}^{f}$ and ${H}^{b}$ are the hidden layer outputs of the forward LSTM and the backward LSTM, respectively. and the $i$ hidden layer output vector of the bidirectional LSTM is stitched by the hidden layer output vectors at the corresponding positions of the forward LSTM and the backward LSTM.

$${h}_{i}=\left[{h}_{i}^{f};{h}_{i}^{b}\right] \left(i=1,\cdots ,n\right)$$

(3)

(2)
Storage component

In the propose model, the memory information is not modified during the operation, and the storage component only plays the role of preserving the memory information. Therefore, the storage component of the model is the output matrix of the input component, which is the implicit output matrix $H$ of BiLSTM. During the operation, the model retrieves information from the storage component several times, thus enabling direct access to LSTM information.

(3)
Generalization and output components

The core component of the model generalization and output function is the semantic feature extraction process based on the structural self-attention mechanism. The idea of the attention mechanism is to assign different levels of importance weights to different parts of the output sequence of hidden layer. The weight is the core of the attention mechanism and is expressed in the following equation.

$${{\upalpha }}_{i}=softmax\left(f\left(q,{h}_{i}\right)\right)=\frac{\mathrm{exp}\left(f\left(q,{h}_{i}\right)\right)}{{\sum }_{j}\mathrm{exp}\left(f\left(q,{h}_{j}\right)\right)}$$

(4)

where $q$ is the query vector, when $q$ comes from $h$ is called the self-attention, $f$ is the matching function for $q$ and ${h}_{i}$, which can be scored by additive substitution models, dot product models, bilinear models and other methods. Our study adopts the idea of multi-step (hops) computation. The illustration of the model is shown in Fig. 2. Assuming that the sequence length is $n$, the dimension of BiLSTM is $d$, i.e., the memory component $R\in {\mathbb{R}}^{n\times d}$, the structural self-attention is

$${A}_{t}=softmax\left(\mathrm{tanh}\left(H{Q}_{t}\right){K}_{t}\right)$$

(5)

where ${Q}_{t}\in {\mathbb{R}}^{d\times q}$ is used to construct the query for self-attention and ${K}_{t}\in {\mathbb{R}}^{q\times k}$ is used to construct the key values for structural self-attention computation. The width $k$ of the matrix ${K}_{t}$ is the number of different aspects in the text sequence that the structural self-attention is concerned with. After the semantic extraction of structural self-attention, the feature representation matrix of microblogs is

$${M}_{t}={H}^{\mathrm{\top }}{A}_{t}$$

(6)

Equations (5) and (6) constitute a step in the calculation of the structural self-attention of the model. The difference is that the key value construction matrix of structural self-attentiveness is reconstructed using the feature representation matrix of step $t-1$ in the calculation of step $t$.

$${K}_{t}=\upgamma K+\left(1-\upgamma \right){L}_{t}{M}_{t-1}$$

(7)

where $\upgamma \in \left[\mathrm{0,1}\right]$ denotes the amount of information in the key value construction matrix of step $t-1$ that is retained in the step $t$ calculation. In the training process, $\gamma $ is used as a trainable variable to be trained together with other variables of the model. The matrix ${L}_{t}$ is the form parameter to make sure the matrix ${M}_{t}$ and the matrix ${K}_{t}$ have the same dimension.

(4)
Response component

The response component of the model consists of a fully connected neural network layer. Its input is the vectorized form of the microblog text representation matrix ${M}_{T}$ in the last computation step. Finally, the sentiment categories of the microblogs are

$$c=argmax\left(softmax\left(W flatten\left({M}_{T}\right)+b\right)\right)$$

(8)

where $W$ and $b$ are the weight matrix and the residual vector of the fully connected layer, respectively.

Model training

(1)
Parameter sharing

The parameters to be trained in the structural deep memory network model include three components: the parameters of the BiLSTM, the parameters of each computation step in the generalization and output components, and the parameters of the fully connected layer in the response component. Among them, the parameters to be trained in the generalization and output components include the parameter matrices ${Q}_{t}$, ${K}_{t}$, ${L}_{t}$ And $\upgamma $, in each computation step. To reduce the number of parameters and improve the training speed of the model, the parameter sharing is used for training.

(2)
Loss function

The activation function of the fully connected layer in the response module of the model uses a softmax function, which can be viewed as a probability distribution of the output results over the microblog sentiment categories. Therefore, the model uses the cross-entropy loss function.

$$loss=-{\sum }_{s\in D}{\sum }_{e\in E}{I}_{e}\left(s\right)log({P}_{e}\left(s\right))$$

(9)

where $D$ is the set of microblog documents, $E$ is the set of three sentiment categories and ${I}_{e}\left(\cdot \right)$ is the indicator function. When the three sentiment categories of the microblog $s$ in the training data is 1, otherwise is 0; ${P}_{e}\left(s\right)$ is the probability that the sentiment $e$ of the microblog $s$ given by the structural self-attention deep memory model.

(3) Attention penalty factor

In the training process of the model, the number of microblog sentiment features (aspects) that can be paid attention to is $k$, in other words, the matrix ${K}_{t}$ is composed of $k$ attention vectors. In the training process, different attention vectors in ${K}_{t}$ may focus on the same microblog sentiment feature, which causes redundancy of information and affects the model performance. To force different attention vectors to focus on different features, a penalty mechanism is proposed in the literature, which makes different attention vectors have different distributions and non-zero elements are concentrated in different dimensions. The penalty mechanism is used to make each attention vector in the attention matrix of the last computation step focus on different twitter sentiment features. The penalty term is minimized as part of the objective function along with the loss function in the training. Hence, the objective function for model training is

$$obj=loss+p\cdot || {A}_{T}{A}_{T}^{ \top }-I |{|}_{F}^{2}$$

(10)

where $p\in \left[\mathrm{0,1}\right]$ is the penalty factor, $||\cdot |{|}_{F}$ is the Frobenius norm of the matrix, and $I$ is the unit array.

Experimental results

Dataset

The experimental data used the NLPCC 2013 (Natural Language Processing and Chinese Computing, 2013) and NLPCC 2014 microblog sentiment evaluation task datasets. The data were first pre-processed to remove links, microblog usernames, and some punctuation marks from the microblog text. Microblog users usually publish emoticons, repeated punctuation, etc. to express specific emotions, so such punctuation was not removed. Therefore, we vectorized emojis based on the Emoji2vec, which is similar to the word2vec embedding model and widely used for NLP tasks [30,31,32,33].

More specifically, there are 7 types of emotions tags: happiness, like, anger, sadness, fear, disgust, and surprise in the NLPCC2013 and NLPCC2014 datasets. In this paper, like and happiness are considered as positive emotions, while sadness, anger, fear and disgust are negative emotions. The emotional polarity of surprise emotion is uncertain, and it is known from the data that both positive and negative surprise emotions exist, so the tweet data marked as surprise are excluded in the emotion classification. Furthermore, because some microblog texts in NLPCC2014 contain two emotions that may reflect opposing feelings, we eliminated these data in our experiments. For example, the sentence in Table 1 “Should I consider reducing the frequency of coming home, once I come back to argue with my dad, I'm so annoyed!”, the emotion is disgust and the sentiment is ‘-1’, reflecting the negative emotion of the sentences.

Table 1 Snapshot of the dataset

Full size table

Table 2 Description of subjective classification data

Full size table

Table 3 Description of sentiment classification data

Full size table

For the microblog text processing, HanLP (http://hanlp.linrunsoft.com) language processing toolkit was used for word separation. In model training, Chinese word vectors pre-trained using a large-scale microblog corpus were used [34]. The deep learning toolkit was used for the experiments, and the code was run on a GPU workstation with 8 Titan XP graphic cards.

Baseline models

To verify the effective of our model, this section presents the baseline models as follows:

Feature-Naive Bayes: The approach employs a deep learning model (word2vec) to obtain text embedding, conduct subjective classification and sentiment classification experiments [12].

Feature-Random Forest: Similar to Feature-Naive Bayes, the approach combines word2vec model and Random Forest to conduct opinion and sentiment classification [34].

Feature-SVM: support vector machine (SVM) is a representative of the traditional machine learning classifier and performs very well on aspect-level sentiment classification. This model input texts feature to SVM model to conduct subjective classification and sentiment classification experiments [35].

LSTM: recurrent neural network based on long- and short-term memory units. This model takes a sequence of word vectors as input, and the last output vector of the implicit layer is used as the representation vector of the text sequence, and then the fully connected layer is used for classification [36].

BiLSTM: includes a forward LSTM and an inverse LSTM, and the output sequences of both are combined as the representation vector of the text sequence, and classification is also performed using the fully connected layer [37].

BiLSTM + Attention: the attention mechanism is added on top of BiLSTM, so that different outputs of the implicit layer play different roles in the final sentence tableau [38].

Structural Attention: structural self-attention model, which uses a structural attention mechanism that makes different attention vectors focus on different aspects of features in a text sequence [11].

BERT: BERT is a transformer-based deep learning model, which has achieved state-of-the-art performance in a wide variety of NLP tasks, and has become a ubiquitous baseline for the text classification tasks [39].

Parameter setting

The hyper-parameters setting of microblog sentiment classification experiment are shown in Table 4. More specifically, we set the learning rate as 0.001 with Adam optimizer and the dropout rate as 0.5. In addition, the measurement is the accuracy rate and F-score.

Table 4 Experimental parameter setting

Full size table

Results and discussions

Overall Performance

The overall performances of all competing methods over datasets are shown in Tables 5 and 6. Both in Accuracy and F-score, our model outperforms others benchmark models on the tasks of subjective classification and sentiment classification, which verify the effectiveness of the proposed models.

Table 5 Accuracy results of sentiment analysis

Full size table

Table 6 F-score of sentiment analysis

Full size table

As shown in Table 5 and Fig. 3, for the datasets in NLPCC2013 and NLPCC2014, Feature-Naive Bayes, Feature-Random Forest and Feature-SVM, which are composed of the deep learning models and classical classifiers, achieves 0.6502, 0.7175 and 0.7101 in the task of subjective classification, 0.5920, 0.6848 and 0.7321in the sentiment classification, respectively. We notice that SVM classifiers outperform Naive Bayes and Random Forest classifiers on the overall tasks of subjective classification and sentiment classification.

Then, the structural attention model achieves 0.7588 and 0.7769 in the task of subjective classification, 0.7837 and 0.8207 in the sentiment classification, respectively. In addition, the performance of BiLSTM + Attention in the task of subjective classification is 0.7579 and 0.7792 in the two datasets, meanwhile, for the task of sentiment classification, the baseline model achieves 0.7830 and 0.8120. LSTM, BiLSTM, BiLSTM + Attention, Structural Attention models outperform those deep learning models and classical classifiers. Among them, the LSTM model has the lowest accuracy rate, while the BiLSTM model has much better results due to its ability to extract both forward and reverse semantic information. The structural attention model does not have an absolute advantage over the general attention model in microblog text analysis, probably because the number of emotional features expressed in microblogs is limited, and focusing on too many features may instead have an impact on the results. In contrast, the developed model suffers from the same problem as the structural attention model, with the difference that the effect of too many features is eliminated to some extent after multiple steps.

Finally, BERT achieves 0.7409 and 0.7805 in the task of subjective classification, 0.796 and 0.8243 in the sentiment classification. To our surprise, even though it has achieved state-of-the-art performance in many NLP tasks, it did not obtain the best performance in our datasets. One of the main reasons that BERT model cannot handle the emojis in the datasets directly, and it will lose many important information in the process of sentence embedding [40, 41].

In summary, the comparative experiments verify that our model proposed in this paper outperforms those baseline models in the datasets of NLPCC2013 and NLPCC2014.

This study compares the F-score of our model with those baseline models. Table 6 and Fig. 4 present the comparative results in the datasets of NLPCC2013 and NLPCC2014. We notice that our method achieves better performance than baseline models in the measure of F-score, 0.7609 and 0.8756 in the tasks of subjective classification, 0.8028 and 0.8364 in the task of sentiment classification, respectively.

Ablation analysis

As shown in Fig. 5 and 6, we further investigate the effect of each part in our model. The DMNSA model outperforms those models that remove penalty mechanism, self-attention mechanism, deep memory network on NLPCC 2013 and NLPCC 2014 datasets. Specifically, the DMNSA model without deep memory network will directly take the BiLSTM to handle the task of sentiment classification and subjectivity classification.

The architecture of deep memory network brings 8.33% and 8.06% absolute accuracy promotion in the above-mentioned tasks for the NLPCC 2013, respectively. Also, model with self-attention mechanism or penalty mechanism achieves better performance compared to directly using the BiLSTM model for the sentiment and subjectivity classification. Besides, our proposed DMNSA model surpasses other models without the core three components in terms of f-score on different datasets.

The experimental results validate that the memory network, self-attention mechanism and penalty mechanism are effective and necessary in the task of sentiment classification and subjectivity classification on different datasets.

Error analysis

Error analysis is important to building our sentiment classification framework based on the deep memory network with structural attention model. In this section, we carry out an error analysis of our model on the datasets from NLPCC 2013 and NLPCC2014, and find that most of the errors could be summarized as follows. The first factor is tweets have a wide array of online social ways of expressing the same token. For example, “It’s soooo del” and “It’s so delicious”, our model may incorrectly train the different sentence embedding, which is the same semantics and same sentiment with a different expression. In addition, this model considers a single contextual word as the basic unit, so it cannot handle semantics phrases. For example, "Die for" is a feeling phrase whose meaning cannot be deduced from the word "Die" and "for". Finally, some sentences may contain comparative opinions or emotions, such as “compare to crying, we should try to smile to eat more”. It is difficult to infer the sentiment of those sentences.

Conclusion

In this study, a novel microblog sentiment analysis model called structural deep memory network is proposed, which combines deep memory network and structural attention mechanism. The model employs bidirectional LSTM to extract the semantic information contained in the microblog text, and uses the extraction results as the memory component of the deep memory network and multi-step (hops) structural attention operations as the generalization and output components. Using the model to classify the subjectivity of the tweet text and the sentiment contained in the tweets, the experimental results show that the model performs well on both tasks.

The model can also be further extended to perform more fine-grained microblog sentiment analysis using a graph-based approach [42]. In addition, with the development of information technology, the online social media may include not only texts and emojis, but also audio and video [43], which may attract more attention and becomes a problem worthy of further research.

References

Bastick Z (2021) Would you notice if fake news changed your behavior? An experiment on the unconscious effects of disinformation. Comput Hum Behav 116:106633. https://doi.org/10.1016/j.chb.2020.106633
Article Google Scholar
Liu X, He D, Liu C (2019) Information diffusion nonlinear dynamics modeling and evolution analysis in online social network based on emergency events. IEEE Trans Comput Soc Syst 6(1):8–19. https://doi.org/10.1109/TCSS.2018.2885127
Article MathSciNet Google Scholar
Behrendt S, Schmidt A (2018) The Twitter myth revisited: intraday investor sentiment, Twitter activity and individual-level stock return volatility. J Bank Financ 96:355–367. https://doi.org/10.1016/j.jbankfin.2018.09.016
Article Google Scholar
Wei-dong H, Qian W, Jie C (2018) Tracing public opinion propagation and emotional evolution based on public emergencies in social networks. Int J Comput Commun 13(1):129–142 https://doi.org/10.15837/ijccc.2018.1.3176
Article Google Scholar
Zhang L, Wei J, Boncella RJ (2020) Emotional communication analysis of emergency microblog based on the evolution life cycle of public opinion. Inform Discovery Delivery 48(3):151–163 https://doi.org/10.1108/IDD-10-2019-0074
Article Google Scholar
Z. H. (2018) A brief introduction to weakly supervised learning. Natl Sci Rev 5(1):44–53. https://doi.org/10.1093/nsr/nwx106
Article Google Scholar
Chen F, Ji R, Su J et al (2017) Predicting microblog sentiments via weakly supervised multimodal deep learning. IEEE T Multimedia 20(4):997–1007. https://doi.org/10.1109/TMM.2017.2757769
Article Google Scholar
Kumar A, Sangwan SR, Arora A, Nayyar A, Abdel-Basset M (2019) Sarcasm detection using soft attention-based bidirectional long short-term memory model with convolution network. IEEE access 7:23319–23328. https://doi.org/10.1109/ACCESS.2019.2899260
Article Google Scholar
Sun B, Tian F, Liang L (2018) Tibetan micro-blog sentiment analysis based on mixed deep learning. In: 2018 international conference on audio, language and image processing (ICALIP), pp 109–112 https://doi.org/10.1109/ICALIP.2018.8455328
Ke C, Bin L, Wende K, Bo X, Guochao Z (2018) Chinese micro-blog sentiment analysis based on multi-channels convolutional neural networks. J Comput Res Dev 55(5):945. https://doi.org/10.7544/issn1000-1239.2018.20170049
Article Google Scholar
Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., & Gao, J. (2021). Deep Learning--based Text Classification: A Comprehensive Review. ACM Computing Surveys (CSUR), 54(3):1–40. https://doi.org/10.48550/arXiv.2004.03705
Chaudhari S, Mithal V, Polatkan G, Ramanath R (2021) An attentive survey of attention models. ACM Trans Intell Syst Technol (TIST) 12(5):1–32
Article Google Scholar
Li L, Wu Y, Zhang Y, Zhao T (2019) Time + user dual attention based sentiment prediction for multiple social network texts with time series. IEEE Access 7:17644–17653. https://doi.org/10.1109/ACCESS.2019.2895897
Article Google Scholar
Gan C, Wang L, Zhang Z (2020) Multi-entity sentiment analysis using self-attention based hierarchical dilated convolutional neural network. Future Gener Comp Sy 112:116–125. https://doi.org/10.1016/j.future.2020.05.022
Article Google Scholar
Ding J, Sun HL, Wang X, Liu XD. (2018). Entity-level sentiment analysis of issue comments. In: Proceedings of the 3rd International Workshop on Emotion Awareness in Software Engineering, ACM. pp 7–13 https://doi.org/10.1145/3194932.3194935
Moraes R, Valiati JF, Neto WPG (2013) Document-level sentiment classification: An empirical comparison between SVM and ANN. Expert Syst Appl 40(2):621–633. https://doi.org/10.1016/j.eswa.2012.07.059
Article Google Scholar
Zhou L, Zhang Z, Zhao L (2022) Attention-based bilstm models for personality recognition from user-generated content. Inform Sci 596:460–471. https://doi.org/10.1016/j.ins.2022.03.038
Article Google Scholar
Chen T, Xu R, He Y (2017) Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN. Expert Syst Appl 72:221–230. https://doi.org/10.1016/j.eswa.2016.10.065
Article Google Scholar
Rao G, Huang W, Feng Z, Cong Q (2018) LSTM with sentence representations for document-level sentiment classification. Neurocomputing 308:49–57. https://doi.org/10.1016/j.neucom.2018.04.045
Article Google Scholar
Chen J, Yu J, Zhao S, Zhang Y (2021) User’s review habits enhanced hierarchical neural network for document-level sentiment classification. Neural Process Lett 53(3):2095–2111. https://doi.org/10.1007/s11063-021-10423-y
Article Google Scholar
Sukhbaatar, S., Weston, J., & Fergus, R. (2015). End-to-end memory networks. Advances in neural information processing systems, pp 28. https://doi.org/10.48550/arXiv.1503.08895
Galassi A, Lippi M, Torroni P (2020) Attention in natural language processing. IEEE T Neur Net Lear. https://doi.org/10.1109/TNNLS.2020.3019893
Article Google Scholar
Young T, Hazarika D, Poria S, Cambria E (2018) Recent trends in deep learning based natural language processing. IEEE Comput Intell M, 13(3):55–75. https://doi.org/10.48550/arXiv.1708.02709
Chen K, Chen JK, Chuang J, Vázquez M, Savarese S (2021) Topological Planning with Transformers for Vision-and-Language Navigation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11276–11286. https://doi.org/10.1109/CVPR46437.2021.01112
Luong MT, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. pp 1412–1421. https://doi.org/10.18653/v1/D15-1166
Bahdanau D, Cho KH, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations, ICLR 2015. https://doi.org/10.48550/arXiv.1409.0473
Vaswani A, Noam S, Niki P, Jakob U, Llion J, Aidan NG, Łukasz K, Illia P (2017) Attention is all you need. Adv Neural inform Process Syst 30:6000–6010
Google Scholar
Mao R, Lin C, Guerin F (2019) End-to-end sequential metaphor identification inspired by linguistic theories. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 3888–3898. https://doi.org/10.18653/v1/P19-1378
Wang W, Pan SJ, Dahlmeier D, Xiao X (2017) Coupled multi-layer attentions for co-extraction of aspect and opinion terms. Proc AAAI Conf Artificial Intell. https://doi.org/10.1609/aaai.v31i1.10974
Article Google Scholar
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P Amodei, D. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165.
Minaee S, Kalchbrenner N, Cambria E, Nikzad N, Chenaghlu M, Gao J (2021) Deep learning-based text classification: a comprehensive review. ACM Comput Surv 54(3):1–40. https://doi.org/10.1145/3439726
Article Google Scholar
Eisner B, Rocktäschel T, Augenstein I, Bošnjak M, Riedel S (2016) Emoji2vec: Learning emoji representations from their description. arXiv preprint arXiv:1609.08359.
Hitesh MSR, Vaibhav V, Kalki YA, Kamtam SH, Kumari S (2019) Real-time sentiment analysis of 2019 election tweets using word2vec and random forest model. In 2019 2nd International Conference on Intelligent Communication and Computational Techniques (ICCT), pp 146–151. https://doi.org/10.1109/ICCT46177.2019.8969049
Kurnia R, Tangkuman Y, Girsang A (2020) Classification of User Comment Using Word2vec and SVM Classifier. Int. J. Adv. Trends Comput. Sci. Eng, 9:643–648. https://doi.org/10.30534/ijatcse/2020/90912020
Yu Y, Si X, Hu C, Zhang J (2019) A review of recurrent neural networks: LSTM cells and network architectures. Neural comput 31(7):1235–1270. https://doi.org/10.1162/neco_a_01199
Article MathSciNet MATH Google Scholar
Xu G, Meng Y, Qiu X, Yu Z, Wu X (2019) Sentiment analysis of comment texts based on BiLSTM. IEEE Access 51:2–51532. https://doi.org/10.1109/ACCESS.2019.2909919
Article Google Scholar
Qiu Y, Li H, Li S, Jiang Y, Hu R, Yang L (2018) Revisiting correlations between intrinsic and extrinsic evaluations of word embeddings. Proceedings of the Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, Changsha, pp 209–221.
Devlin J, Chang MW, Lee K, Toutanova . (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Fernández-Gavilanes M, Costa-Montenegro E, García-Méndez S, González-Castaño FJ, Juncal-Martínez J (2021) Evaluation of online emoji description resources for sentiment analysis purposes. Expert Syst Appl 184:115279. https://doi.org/10.1016/j.eswa.2021.115279
Article Google Scholar
Mostafavi M, Porter MD (2021) How emoji and word embedding helps to unveil emotional transitions during online messaging. In 2021 IEEE International Systems Conference. pp 1–8. https://doi.org/10.1109/SysCon48628.2021.9447137
Chen L, Tingting C, Lixin Z (2022) Learning to rank complex network node based on the self-supervised graph convolution model. Knowl-Based Syst. https://doi.org/10.1016/j.knosys.2022.109220
Article Google Scholar
Rao A, Ahuja A, Kansara S, Patel V (2021) Sentiment analysis on user-generated video, audio and text. In: 2021 International Conference on Computing, Communication, and Intelligent Systems, pp 24–28. https://doi.org/10.1109/ICCCIS51004.2021.9397147

Download references

Acknowledgements

The authors are very grateful for the insightful comments and suggestions of the anonymous reviewers and the editor, which have helped to significantly improve this article. Furthermore, this work was supported by China Postdoctoral Science Found (No. 2021M692135), The National Social Science Fund of China (No. 22CGL050) and Shanghai Philosophy and Social Science Planning Project (No. 2021BTQ003).

Author information

Authors and Affiliations

Business School, University of Shanghai for Science and Technology, Shanghai, 200093, China
Lixin Zhou, Laijun Zhao & Pingle Yang
School of Automation, Nanjing University of Science and Technology, Nanjing, 210094, China
Zhenyu Zhang

Authors

Lixin Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Zhenyu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Laijun Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Pingle Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhenyu Zhang.

Ethics declarations

Conflict of interest

Authors have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zhou, L., Zhang, Z., Zhao, L. et al. Microblog sentiment analysis based on deep memory network with structural attention. Complex Intell. Syst. 9, 3071–3083 (2023). https://doi.org/10.1007/s40747-022-00904-5

Download citation

Received: 24 July 2021
Accepted: 08 October 2022
Published: 18 November 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s40747-022-00904-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Microblog sentiment analysis based on deep memory network with structural attention

Abstract

Similar content being viewed by others

Microblog Sentiment Classification Method Based on Dual Attention Mechanism and Bidirectional LSTM

Attention based hierarchical LSTM network for context-aware microblog sentiment classification

Sentiment Analysis Model Based on Structure Attention Mechanism

Explore related subjects

Introduction

Related works

Sentiment analysis

Deep memory network

Attention mechanism

Deep memory network with structural attention for microblog sentiment analysis

Microblog sentiment analysis task

Deep memory network with structural self-attention model

Model training

Experimental results

Dataset

Baseline models

Parameter setting

Results and discussions

Overall Performance

Ablation analysis

Error analysis

Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation