A multi-modal and multi-scale emotion-enhanced inference model based on fuzzy recognition

Only the label corresponding to the maximum value of the fully connected layer is used as the output category when a neural network performs classification tasks. When the maximum value of the fully connected layer is close to the sub-maximum value, the classification obtained by considering only the maximum value and ignoring the sub-maximum value is not completely accurate. To reduce the noise and improve classification accuracy, combining the principles of fuzzy reasoning, this paper integrates all the output results of the fully connected layer with the emotional tendency of the text based on the dictionary to establish a multi-modal fuzzy recognition emotion enhancement model. The provided model considers the enhancement effect of negative words, degree adverbs, exclamation marks, and question marks based on the smallest subtree on the emotion of emotional words, and defines the global emotional membership function of emojis based on the corpus. Through comparing the results of CNN, LSTM, BiLSTM and GRU on Weibo and Douyin, it is shown that the provided model can effectively improve the text emotion recognition when the neural network output result is not clear, especially for long texts.


Introduction
People express their emotions in many ways, such as text [1,2], voice [3], intonation [4], facial expressions [5,6] and the multimodal combination [7]. With the popularity of online media, emojis have become one of the main ways people express their emotions intuitively. Existing work usually uses contextual emotions to predict or recommend suitable Human emotions are complex, expressing multiple emotions at the same time [8]. In Chinese expression, sentences composed of the same words can express different emotional tendencies. The objects modified by negative words in different positions are also different. Emojis as a vehicle to intentionally convey affective states, can express the sentiment and emotion in texts. Take the following sentence as an example. S 1 : Only from the textual analysis, different people have different emotional tendencies for the same text. People with positive emotions will feel that they have visited all the other scenic spots except the Yangtze River Cableway, which is no regret; negative people will think that it is regrettable that the Yangtze River Cableway did not go. If there are emojis in S 1 , we can get the most accurate emotional tendency based on the emojis, as shown in Table 1. A very happy emoticon in S 2 expresses positive emotional tendencies; while in S 3 a crying Table 1 Comparing to the sentiment influenced by emojis Fig. 1 Overall architecture of the provided model emoticon expresses negative emotional tendencies. Integrating the emotion of emoji into the sentiment analysis of the text can more accurately express the sentiment tendency of the text.
Some works take negative words into account when analyzing the sentiment of a text. Wu et al. [9] discussed the change of negative words to emotional polarity of emotional words. If an adjoint word of the emotional word is negative, the emotional polarity of the emotional word will be the opposite. Yan et al. discussed the influence of negative words and emojis on the polarity of emotions, ignoring whether negative words and emojis modify emotion words [10]. On these basis, this paper proposes a text emotion enhancement model based on the smallest subtree, discusses the emotion enhancement effects of negative words and degree adverbs on emotion words, and considers the enhancement of text emotion by question marks and exclamation points. Since fuzzy reasoning can not only deal with the fuzzy expression of language [11,12], but also optimize the model without coupling relationship between components [13][14][15], we combine the emotional word dictionary, emoticon dictionary and deep learning model to establish an fuzzy recognition emotion enhancement reasoning model that optimizes the output results of the deep neural networks. The overall architecture of sentiment analysis model fused by deep neural networks, emojis and lexicon-based sentiment enhancement fuzzy reasoning is listed in Fig. 1.
The advantages of this article are as follows.
(1) Construct a membership function of the emotion of emojis, and concatenate the emoji2vec with the word2vec of the input document. (2) Establish a lexicon-based fuzzy inference emotional enhancement model which consists of the emotional enhancement of negative words and degree adverbs on their modified emotional words and the enhancement of question marks and exclamation marks on text emotions. (3) Present a fuzzy classification membership function of the fully connected layer of neural networks combined with the membership. (4) The fuzzy classification membership function of the fully connected layer of the neural network, the emotional membership of emojis and the dictionary-based fuzzy inference model are combined to establish a multimodal and multi-scale emotion-enhanced inference model based on fuzzy recognition.
The rest of this paper is organized as follows. The next section summarizes some related literature on multimodal sentiment analysis and emoji in sentiment analysis. "3" explains our proposed model on sentiment enhancement of the output of deep neural networks by lexicon-based fuzzy reasoning model in detail. "4" shows the experiment processing. "5" demonstrates the experimental results and discussions. In the last section, some conclusions are drawn.

Multimodal sentiment analysis
Sentiment analysis has widespread commercial applications and practical values in various domains, such as decision analysis [16] and topic detection [17]. There are two main models to analyze the sentiment of a text: lexicon-based method [2] and deep learning-based method [9,18,19]. A labelled corpus with moral foundation scores was introduced to illustrate how morality-related information be inferred from text by shallow and deep learning models [2]. Residual memory networks can be used to comprehend multimodal human sentiment [20]. Huang et al. constructed a sentiment convolutional neural network to analyze the sentence sentiments based both on contextual and sentiment information of sentiment words [21].
The sentiment and emotion analysis of multi-modal dialogue such as video, audio, image attracts more and more attention [7,[22][23][24]. To improve the performance of the classifier, the image-text pair in the existing unimodal data is usually used as a multi-modal comprehensive data set [25]. To obtain the true sentiment, significant leaps responsible for its current relevance by pointing out the shortcomings and under-explored was analyzed [26]. Considering both textual information and sentiment diffusion patterns, sentiment diffusion patterns were utilized to improve sentiment analysis in Twitter [27]. To rich correlations among social images, correlations among multiple social images were incorporated to conduct visual sentiment analysis [28].

Emoji in sentiment analysis
With the continuous upgrading of smart devices, the number and types of emojis are constantly upgrading, and people are more and more like to use emojis to express emotions.
Some scholars have studied the semantic information of emoji from the semantic aspect [35]. A model for emoji similarity was explored to enable moving forward to novel emoji keyboard designs for good emoji entry [36]. An encoderdecoder model named as Seq2Emoji was used to predict multiple emojis based on a short text, considering the correlations between emojis and predicting emoji after the fully connected layer of hierarchical neural networks [37]. The features of given texts and emojis could be automatically extracted by a multi-modal Siamese-based framework [38]. To represent the conceptual content through emoji from cognitive strategies, participants were asked to use emoji to provide semantic representations for abstract and concrete concepts [39].
Some scholars have also studied the emotional information of emoji [40,41]. To learn the final sentiment classifier, Tweets and GitHub posts containing emojis were leveraged to learn text representations through emoji prediction [42]. An emoji sentiment lexicon using an unsupervised sentiment analysis system was constructed to analyze the sentiments expressed by emojis in online textual messages without human annotation [43]. A large-scale data set of emoji usage containing both text-emoji and image-emoji relationships within Tweets was used to predict emoji from both text and images [44]. Taking personal factors and contextual factors into account, a given textual microblog post of a user can be used to predict emojis [45].

Approach
In the task of emotion recognition, the output of neural networks is the corresponding label of the maximum value of the fully connected layer. When the maximum and the second maximum of the output of the fully connected layer are close, only the label corresponding to the maximum is used as the output result and other options are ignored, which will introduce errors. To reduce errors, the corresponding label of the maximum of the fused results of the neural network and the lexicon-based emotional fuzzy inference model are defined as the emotional tendency of the input text.

Sentiment distribution of a document based on neural network
Define the normalized output of the fully connected layer of neural networks as the sentiment membership of the senti-ment classification task, P dp = [P dp 0 P dp 1 P dp 2 where P dp 0 is the sentiment membership degree of neural emotion, P dp 1 is the sentiment membership degree of positive emotion, P dp 2 is the sentiment membership degree of negative emotion and P dp 0 + P dp 1 + P dp 2 = 1.

The smallest subtree-based sentiment enhancement reasoning model
To express the emotion of a text more completely, the modifiers in front of the emotion words are regarded as the enhancement of emotion, such as negative words and degree adverbs that are directly related to emotional words. The provided model only discusses 3 consecutive words that modify emotional words and are in the same smallest subtree with the emotional word. Definition of the smallest subtree of w i and w j : in the subtree set of a sentence, there is no subtree that contains both w i and w j and has fewer branches and leaves than the existing subtree.

Representation of emotional words
Two tuples (w i , Po i ) are defined to represent the ith emotional word w i in a text S. Po i is emotional polarity of w i with the definition as follows, The domain of emotional polarity is defined as where u 0 denotes neural emotion, u + denotes positive emotion and u − denotes negative emotion.
The emotional polarity of the emotional word w i on the domain P is defined as (4)

Influence of negative words
If continuous negative words and the emotional word w i are in the same smallest subtree, the emotional polarity takes the opposite number. where n neg is the number of continuous negative words with direct modification of w i . If a negative word, an emotional word, and a question mark are in the same smallest subtree, the emotional polarity Po i is unchanged,

Influence of degree adverbs
In Chinese, adverbs of degree with fuzzy semantics are often used to express different degrees of emotion, such as D_1 listed in Table 2, which are named as degree adverbs. Degree adverbs are stored in two dictionaries D str and D sli according to their semantics, where D str denotes strong emotional modifiers, and D sli denotes slightly emotional modifiers. Some examples of adverbs of degree are shown in Table 2.
An emotional enhancement factor β adb is defined to describe the emotional enhancement of degree adverb v word i to its adjacent emotional words w i .
If the emotional word is modified by more than one degree adverb, the fuzzy product of all emotional memberships of the degree adverbs is set as the emotion enhancement factor.
where β p adb is the pth consecutive emotional enhancement modifier of w i . Therefore, the emotional value of w i is changed as follows,

Influence of degree adverbs and negative words
If the modifier in front of the emotional word w i contains both degree adverbs and negative words, combine Eqs. 4 and 8 to obtain the emotional enhancement representation of w i ,

Emotional polarity of a sentence
Substituting the Po i obtained from Eq. 9 into Eq. 3, the sentiment matrix P w i of (w i , Po i ) is obtained. The sum of the sentiment value of all sentiment words is defined as the sentiment value of the sentence S, where n w is the number of emotional words in S.

Influence of punctuation
Some punctuation, such as question mark, exclamation mark can enhance the emotional expression.
where λ qm (> 0) is the emotional decay factor of question marks and n qm is the number of question marks in the end of S. The question mark only strengthens emotional expression and does not change the emotional polarity of a sentence, therefore, 1 − λ qm n qm > 0.
where λ em (> 0) is the emotional enhancement factor of exclamation marks and n em is the number of exclamation marks in the end of S. Normalize the emotional expression where Po Si =

Fuzzy recognition-based emotion analysis of emoji
To let emoji express emotions more accurately and comprehensively, membership is introduced to express the various emotional orientation values, which is defined as the emotional membership of emoji. The steps are as follows: -Utilize the emotional word distribution characteristics of the contextual text of emoji to train the emotional representation of emoji, and normalize the emotional vector. We divide the emotional tendencies of texts into five categories: happiness, like, disgust, sadness, and calm.
where 5 i=1 A i = 1, denoting the respective probability of happiness, like, disgust, sadness, and calm.
-Define the value represented by the emoji embedding as the emotional membership degree of the emoji in these five emotions.
where e i denotes the ith emoji and A i (e i ) is the ith value in emoji embedding e i . If only the maximum emotional tendency of emoticons is considered, the emotional tendency is determined by the maximum value,

Tree-based fuzzy recognition emotion enhanced reasoning
Emojis belong to different emotional tendencies, which not only reflect the emotional distribution of emojis in the corpus, but also reflect the different emotional expressions of the context. Combining the distribution of emotional words in a text and the fuzzy emotional recognition of emoticons, a tree-based fuzzy recognition emotion enhancement model is established.
For the emotional tendency of emojis, happiness and like are regarded as positive emotions, disgust and sadness are Algorithm 1 Framework of sentiment embedding of emoji based on emotional word dictionary.
Require: input seed emotional words w e i of the five emotional word dictionaries D t , (t = 1, 2, 3, 4, 5); Ensure: compute the sentiment embedding of emoji e i , P e i ; 1: compute the semantic similarity between the input words and the seed words in each sentiment dictionary to expand the sentiment dictionaries; 2: compute the semantic similarity between the emoji e i and each word w e j in sentiment dictionary and sum up all the similarities, S t ← j (Sim(e i , w e j )); regarded as negative emotions, and calm is regarded as no emotion.
, denoting a negative emotion. The emotional tendency of multiple emojis is where and n e is the number of emojis. Normalize the emotional tendency of multiple emojis Add Eqs. 14-20 to obtain the emotion of the text containing emojis, where P e0 , P e1 , P e2 respectively denote neural emotion, positive emotion and negative emotion. Normalize the emotion of the text containing emojis P e = [P e0 P e1 P e2 ],

Fusion of neural network and lexicon
Combine the deep learning results with lexicon-based results to get the emotional tendency of the text, where P dp i is the output of the full connection layer of deep neural network and λ is the fusion factor, 0 < λ < 1.

Datasets and evaluation metrics
The emotional dictionary and emoji embedding in the provided model are the extended dictionary given in [10]. To evaluate the proposed model, macro-precision, macro-recall, macro-F1_score 1 are chosen to evaluate emotion recognition results from Weibo and Douyin. The Douyin data set contains 26,000 pieces of data with an average sentence length of 13.67 Chinese characters. Three college students manually annotated 3200 texts as a training set. NLPCC2013 2,3,4 are the public corpus of the Chinese Weibo sentiment analysis competition with an average length of sentences of 38.33 Chinese characters, in which each document is combined with a Weibo topic and its comments. In NLPCC2012, 5 we treat one Weibo topic as a document. Therefore, the NLPCC2012 dataset consists of some documents with varying length.

Experimental setting and process
The input word2vec and emoji2vec are 100-dimension embeddings, trained on a corpus which consists of 32,000 comments captured in the Douyin APP and 36,000 comments crawled from Sina Weibo. To evaluate the performance of our methods, we combine the calculation results of deep neural networks (DP) with lexicon-based emotion reasoning of texts and emotion distribution of emojis to form different comparison models, as illustrated in Algorithm 2. .

Results
The provided algorithms are listed as follows, where DP is replaced by convolutional neural networks (CNN), long short-term memory (LSTM), bi-directional long short-term memory (BiLSTM) or gate recurrent unit (GRU) in the following experimental results.
1. DP + emoji denotes that the emoji embedding is concatenated in the input embedding of DP, and the out-

Algorithm 2
Framework of fusion of the output of neural networks and lexicon-based fuzzy inference emotional enhancement model.
Require: input sentence, S; emotional word dictionary; emoji dictionary; degree adverb dictionary; negative word dictionary Ensure: Analyze the sentiment polarity of the sentence S, T e ; 1: divide the input document into a text P t , emojis P e and punctuation at the end of the sentence; 2: input the word2vec and emoji2vec of the document into neural networks in order; 3: store the output of the fully connected layer of neural networks for sentiment analysis, P dp ; 4: extract the emotional word and it's emotional polarity, (w i , Po i ); 5: generate the subtree set of P t , T S ; 6: compute the smallest subtree-based emotional enhancement of degree adverbs to the modified emotional words; 7: compute the smallest subtree-based emotional enhancement of negative words to the modified emotional words; 8: sum up the sentiment distributions of all sentiment words in the sentence to obtain the sentiment distribution of text P t , Po S ; 9: compute emotional enhancement of question mark and exclamation mark to the sentence sentiment; 10: compute the sentiment embedding of emojis, P e ; 11: fuse the neural network with lexicon-based fuzzy inference emotion enhancement model, T e ; 12: return T e ; put result incorporates the fuzzy emotional reasoning enhancement of emojis. 2. DP + subtree denotes that the negative words and degree adverbs are considered in the input embedding of DP, and the output result combines the negative words and degree adverbs to enhance the fuzzy emotional reasoning. 3. DP + all denotes that the input embedding of DP considers emojis, negative words, degree adverbs, question marks, and exclamation marks, and the output result combines emoji, negative words, degree adverbs, question marks, and exclamation marks to enhance fuzzy emotional reasoning.

Overall results
For the sake of comparison, all parameters of the same neural network for all corpora are set the same in this article and the rest of the parameters except for the neural network on the same corpus are set to the same: λ = 0.7, β str = 1.2, β sli = 0.8 and λ qm = λ em = 0.05. By comparison, it is found that the calculation results of the four neural networks of the provided model are significantly improved in NLPCC2012, NLPCC2013 and Douyin, as shown in Table 3. Especially in the corpus NLPCC2013, the precision of CNN_tree is 22% higher than that of CNN_fr and 31.6% higher than that of CNN_flip. The precision of BiLSTM_tree is 31% higher than that of BiLSTM_fr, reaching the maximum rate of improvement. For the four kinds of neural networks, the precision of the provided model is improved by an average of 27.5% over DP_fr and 26% over DP_flip. The main reason is that the texts of NLPCC2013 are mainly long texts with rich emotional expression and few emojis. However, for the other corpus Douyin with short texts and emoji used frequently to express emotions directly, the improvement rate of the provided model is not as high as that on NLPCC2013. Especially for CNN, LSTM, and BiLSTM, the effects of DP_fr are better than DP_flip. The main reason is that the comments in Douyin are mainly based on the storyline in the video, so the emotional expression is very direct and concise. Commonly, the superpositions of emotional words, emojis, question marks, and exclamation marks are used to highlight strong emotions. Directly counting the number of emotional words and emojis can highlight the expression of emotions. Moreover, an opposite conclusion is reached on NLPCC2012, mainly because there are fewer emojis in sentences in NLPCC2012 compared to Douyin. Therefore, through experimental comparison and reason analysis, the provided model effectively enhances the emotional expression of the text from multiple angles.

Influence of different deep learning models on NLPCC2013
Weibo comments are context-based, with free and diverse expressions, including irony, metaphors, slang and other indirect emotional expressions, which makes it difficult for computers to evaluate the sentiment of texts. By comparing the fuzzy emotional reasoning results of different deep learning models, the effects of emoji, degree adverbs and negative words on text emotions are analyzed (Table 4). Compare the effects of corresponding elements by concatenating corresponding element embedding to the input vector of the deep learning model and adding corresponding elements to the input of the inference model. Due to the long length of texts, netizens are accustomed to using emojis at the beginning or at the end of the text to enhance their emotions, which makes the emotional expression of the text information complex and diverse while the emojis are less and single. Therefore, the emotional contributions of emojis in the three deep learning models are lower than the emotion enhancement effects of modified adverbs, as shown in Fig. 2. Comparing the three figures, it is found that the emotional enhancement effect of the modified adverb in the calculated results of LSTM and BiLSTM considering the word order is stronger than the enhanced reasoning of the modified adverb in the CNN without considering the word order. As shown in Fig. 2b, c, the emotional enhancement effects of degree adverbs and negative words are greater than the emotional enhancement effects of emoji, while in Fig. 2a, the opposite is true.  [10], DP_flip denotes that the sentiment polarity of emotional word is flipped when its adjoint word is a negative word [9] and DP_tree is the provided model

Comparing the effects of CNN on different datasets
Comparing the performance of inference-based CNN on three different corpora, the experimental effects of the provided model on corpora with different characteristics on emotional expression are analyzed. Comparing the distributions of curves in the three figures illustrated in Fig. 3, we can find that the main ways of expressing emotions in different corpora are different. In Douyin, to express emotions visually and intuitively, the text is short, the emotions are highly targeted, and the expressions of strong emotions are diverse. Exclamation marks, question marks, and emoticons are frequently used. Hence, the influence of emojis on sentiment is higher than the emotion advancement of adverbs, as listed in Fig. 3a. For the same sentiment prediction for short texts, the sentiment impact factors of NLPCC2012 are different from Douyin. In NLPCC2012, the emotional expression of the text is related to the context. There are no emotional words in some texts, but only the emotional expression of the previous comment, such as: same as above, plus one. the context it should be positive emotion, and the model cannot recognize its positive emotion. Therefore, the influence of emoji in Weibo is lower than that of subtree, as shown in Figs. 3b, c. Long text has more grammatical modifications than short text, although long text is also composed of short texts. Hence, the contribution of adverbs is higher than emojis, as reflected in Fig. 3b, c (Table 5).

Emotional orientation analysis of Weibo names on NLPCC2012
The sentiment prediction results of the corpus obtained by deleting the Weibo names in the corpus one by one. As the number of Weibo names ignored increases, Precision fluctuates greatly, and Accuracy changes very little, as listed in Table 6. Take P dp = 0.3 for example. Comparing the characteristics of Weibo topic names, it is found that when the subject name describes the object and event with strong pertinence, the emotional tendency of the comments is direct and strong. Thus, the reasoning effect is good, and the corresponding Precision is relatively large. When the event described by the topic name cannot be simply judged as correct or wrong in binary, the contents of the comments on Weibo are discrete and there are inconsistencies in the topic of the comments, which makes Precision low and forms a trough, as shown in bold in Table 6.
Changing P dp means that the number of sentences enhanced by fuzzy inference has changed. If P dp is reduced, the range of fuzzy inference is reduced, that is, the difference between the output results of the fully connected layer of the deep neural network is reduced, and the ambiguity is increased. When increasing the number of Weibo names ignored, Precision of P dp = 0.5 fluctuates first and then almost stabilizes, while Precision of P dp = 0.3 always fluctuates greatly, as illustrated in Fig. 4a. With the increase of the number of Weibo names ignored, the change trends of Precision in the two cases are almost the same, as shown in Fig. 4b. The research on Weibo name orientation shows that the neural network does have ambiguity among the output results of the fully connected layer when processing sentiment classification, and this ambiguity does affect the effectiveness of the neural network on emotion predic-   where delta is the difference between the maximum and second maximum of the output of full connected layer of neural networks, delta = P dp Table 7 Detailed results of fuzzy reasoning based CNN Table 8 Detailed results of fuzzy reasoning based LSTM tion. Therefore, fuzzy inference based on neural network is necessary (Table 7).

Analyze sentences in detail
Take some calculated results of some sentences for example through different deep learning models based on fuzzy inference in detail, as illustrated in Tables 5, 8, and 9. Take P 1 calculated by CNN as an example to explain the calculation process of the output result.

Conclusion
In the emotion recognition task, the deep learning model uses the label corresponding to the maximum value in the output result of the fully connected layer as the output result.
In a multi-classification task, when the boundary between the maximum and the second maximum of the output result of the fully connected layer is not clear, only considering the maximum value and ignoring other values will introduce noise terms. To reduce the errors caused by the unclear classification, a fuzzy classification membership function of the fully connected layer of the neural network, an emotional membership of emojis and a dictionary-based fuzzy inference model are combined to establish a multi-modal and multi-scale emotion-enhanced inference model based on fuzzy recognition. The emotional tendency of emoji based on the corpus, the emotional enhancement of negative words and degree adverbs to modified emotional words, and the sentiment enhancement of the text by question marks and exclamation marks are discussed. Experiments show that this model can accurately recognize text emotions, especially in the effect of long text emotion recognition on Weibo, which is significantly improved.