1 Introduction

Opinion mining is a general framework for processing emotions, criticisms, suggestions and comments in different inputs such as text, video, voice, etc. Social networks such as Twitter, Facebook, or any other platforms process textual information and then categorize them using opinion mining methods [5, 14, 44].

A large number of political [81], social [34], economic [29], health [17, 50], educational [82], and military groups are interested in obtaining the opinion of customers to modify or improve their services. For example, the information obtained from opinion mining helps financial companies to identify the risks related to new investments [59]. Furthermore, politicians are interested in becoming aware of their voter’s satisfaction or dissatisfaction using their comments [31]. Governments tend to know their policies consequences through domestic and foreign media [23]. Sentiment analysis leads to a better and faster understanding of challenges and shortcomings during e-learning and enables providers to solve problems [56]. Physicians can evaluate and improve their treatments by opinion mining. Some managers use sentiment analysis (SA) to invest in new products or change company strategies [42].

Sentiment analysis contains various processes such as subjectivity and polarity detection [4, 77], emotion estimation [79], answering to the emotional questions [53], detecting spam comments [75], question answering [70], crime detection [38], sarcasm/irony detection [43], summarizing opinions [44] and many other subjects. Much research are currently conducting to extract user’s comments from documents [15], sentences [57], or aspect-based [55]. Different methods of analyzing sentiments can be classified into three categories [73]. The first are the ones that determine the sentiment polarity through opinion words, known as dictionary-based analysis methods [1, 52].

Pattern classification is a usual method in text categorization (TC) and SA for training a classifier with a set of examples from different classes. In the first step, these methods should extract useful features from documents or phrases. Therefore, the performance of classifier is highly dependent on the extracted features. As a result, newly developed techniques will inevitably generate large amounts of features and select useful ones for classification operations. The main purpose of sentiment analysis and polarity detection is to extract information from a large volume of unstructured texts to find a structured concept of sentiment. One of the advantages of polarity detection is providing a classification of unlabeled comments that specify the semantic bias of comments. Based on the above explanations, finding an algebraic method to solve the subjective concept of sentiment in a phrase or document is interesting.

The contributions of our work are summarized as follows.

  • Proposing an approach that transforms text and labels into a set of equations.

  • Proposing a deterministic mathematical approach to solve ill-posed equation of polarity detection problem.

2 Literature of opinion mining

The first documented work on opinion mining was made in 1979 at Yale University in the United States by designing a computer model of human mental understanding following the work of Carbonell [14]. It automatically determined the political views of the people of the United States and Russia. Subsequently, with the development of general computer systems, in the early work in 2002, Pang and his colleagues [44] conducted an automated survey using machine learning techniques in which they converted the textual data of movies into various forms, including single words and then used Bayesian, Maximum Entropy, and Support Vector Machines to classify the textual data prepared in the preprocessing stage. The best performance was obtained by using the backing vector classification of single-word and two word data sets. They manually tagged 2,000 different comments on the films for training purpose [14].

Furthermore, Pang and co-workers investigated machine learning and observation techniques [45, 46]. They presented a survey on Social Network Data Observation Using the Cloud Bags Approach. The database used in this research consists of comments on Twitter using a specific set of keywords. The simulations were performed in three main parts. In the first step, Naive Bayes was performed on the bag of words and 71% accuracy was achieved. Secondly, the simple Bayesian algorithm on the word bag was used without stop words, which obtained 72% accuracy. In the final step, the information interest was figured out by selecting the most-informative features using the quadratic probability density function which obtained the best accuracy. The Naive Bayes algorithm yielded an accuracy of about 89% by applying the data quadratic accuracy criterion using this method.

Long Short-term memory networks (LSTM) have performed well in emotion analysis tasks. The general way to use LSTM is to combine embedded words to display text. However, embedding words carry more semantic information rather than emotions. It is only the use of word expressions to represent words in false emotion analysis tasks. To solve this issue, a vocabulary enhanced LSTM model was proposed that used emotion vocabulary as additional information before teaching a classification and then emotion words incorporated into words, including words that are not available in the vocabulary. The combination of embedded emotion and embedded words can train the system to be more accurate. The results of the tests on the English and Chinese datasets show that the presented models have equivalent results as the existing models [25].

Other researchers have described the effectiveness of various emotion classification techniques ranging from simple rule-based and vocabulary-based approaches to more sophisticated machine learning systems. Vocabulary based approaches suffer from a lack of vocabulary, on the other hand, machine learning approaches usually show lower accuracy than vocabulary based approaches. Iqbal and his team [19] presented an integrated framework that bridges the gaps between vocabulary-machine learning approaches for achieving accuracy and scalability. To solve the scalability problem created by increasing the set features, a new genetic algorithm (GA) is suggested using a feature reduction method. Using this hybrid method, we can reduce the feature dimensions up to 42% without compromising accuracy. Based on delayed semantic analysis, feature reduction techniques showed 15.4% and 40.2% accuracy increments compare to Principal Component Analysis (PCA) and Latent Semantic Analysis (LSA) techniques respectively [30].

The text-based emotion analysis method of emotion dictionary often has problems. For example, the emotion dictionary contains inadequate emotion words and does not eliminate some emotion words in the field. Besides, due to the presence of some positive, negative, and neutral polysemic emotional words, the polarity of the words cannot be accurately expressed, thus reducing the accuracy of emotion analysis of the text. In 2019, a vast emotional vocabulary is constructed, and to improve the accuracy of emotion analysis, extensive emotion dictionaries containing the main emotion words, contextual emotion words, and multimedia emotion words were introduced. The Bayesian classification is utilized to designate the text field where the word is polysemic. Therefore, the value of the word emotion comes from polysemic feelings in this context. Using emotion-based vocabulary and designed scoring rules, the text emotion is achieved. The empirical results prove a promising ability and accuracy of the proposed emotion analysis method based on the emotion dictionary [72].

Due to the rich morphology of the Arabic language and the informal nature of the language on Twitter, on the other hand, the emotional analysis of Arabic tweets is a complex task. Previous research on SA from tweets focused mainly on manual extraction of text features. Recently, embedded neural words have been used as powerful displays for manual feature engineering. Many of these keywords model the syntactic information of the words while ignoring the emotional text. To solve this problem, a feature set model of surface and Deep-Neural-Network-based (DNN-based) features is proposed. Surface features are extracted manually (hand-crafted), and DNN-based features containing embedding of general words and special words. Experimental results reveal that: 1) the best model used a set of surface-based and DNN based features and 2) the approach achieved state-of-the-art results on several benchmarking datasets [3].

The widespread availability of online comments and posts on social media provides valuable information for businesses to make better decisions on guiding their marketing strategies toward their interests and preferences of users. Therefore, evaluating emotions is necessary to determine public opinion about a particular subject, product, or service. Traditionally, emotion analysis was done on a single data source, for example, online product reviews or tweets. However, the need for more accurate and comprehensive results has led to the move to conduct emotion analysis across multiple data sources. Using multiple data sources for a particular domain of interest can increase the number of datasets needed for emotion classification training. So far, the problem of insufficient data sets for classifier training has only been addressed by multi-domain emotion analysis [2].

Many researchers have suggested using machine learning algorithms to accurately analyze tweets based on regression. The proposed method includes pre-processing tweets and using a feature extraction method to create an efficient feature set. Then, under several classes, these features are scored and balanced. Multivariate logistic regression (Soft Max), support vector regression (SVR), decision tree (DTs), and random forest (RF) algorithms are used to classify sentiments within the proposed framework. Experimental findings showed that these approaches can detect regular regression using machine learning techniques with acceptable accuracy. Besides, the results proved that the decision tree achieves the best results over the other algorithms [58].

Recently, many works used hierarchical structures to obtain specific emotional information which may, in turn, lead to emotion mismatches in some specific aspects by extracting irrelative words. To solve this problem, a collaborative extraction hierarchical attention network is proposed consisting of two hierarchical units. One-unit extracts attribute and use them to capture specific information about the other layer. Experiments show that the proposed approach performed better than recent methods that only use aspect features [26]. Emotional vocabulary is an important source of thought extraction. Lately, many of the best works of art used deep learning techniques to build emotional vocabulary. Generally, they learned the word embedding by including emotional awareness first and then used it as features to build an emotional vocabulary. However, these methods do not take the importance of each word in distinguishing the polarities of attribution emotions into account. As we know, most of the words in a document do not help to understand the meaning or feelings of the entire document. For example, in the tweet, It’s a good day, but I can’t feel it. I’m really unhappy. The words “unhappy”, “feel” and “can’t” much more important than the words “good”, “day” in predicting the polarity of this twitter emotion. Meanwhile, many words, such as “this”, “it” and “I am” are ignored. At Sparse self-attention LSTM (SSALSTM), a new self-awareness mechanism is used to emphasize the importance of each word in identifying the pole of emotions. When we learn a word using conscious and efficient emotions, we train a classifier that uses the word conscious feeling as a characteristic to predict the polarity of the words’emotions. Extensive experiments on four publicly available datasets, SemEval 2013-2016, show that the sentimental vocabulary produced by the SSALSTM model achieves optimal performance in both the supervised and unsupervised emotion classification tasks [20].

Word embedding, which provides low dimensional vector representations of words, have been widely used for various natural language processing tasks. However, text-based word embedding such as Word2vec and Glove typically failed to capture enough emotional information, which may lead to words with similar representation vector having a polarity of conflicting emotions (e.g., good and bad). Thus, the performance of emotion analysis may reduce. To address this problem, recent studies have suggested learning to combine empathy information (positive and negative) from labeled corpora [28]. This study adopts another strategy for learning emotional embodiments. Instead of creating a new word embedded by labeled companies, it proposes a word vector refinement model to refine existing pre-prepared word vectors using the actual emotion intensity scores provided by the emotional vocabulary. The idea behind the refinement model is to improve each word vector in a way that it can be used in terms of both semantic and emotional words (i.e., those with the same severity scores) and beyond the different emotional words (e.g. those with no difference). A significant advantage of the proposed method is the capability of applying to any pre-embedded variant. Besides, intensity scores can provide accurate sensory information (real value) compare to the binary polarity label to guide the refinement process [78].

In one of the recent methods Pre-training of language, models have been shown to provide large improvements for a range of language understanding tasks [21, 47, 49, 54]. The main part is training a large generative model in big data and using the result on tasks whit limited amounts of labeled data. Sequence Pre-training models have been previously investigated for text classifying [16] but not for text generation. In neural machine translation, there has been work on transferring representations from high resource language pairs to low-resource settings [83]. Nemes et al. [41] used Recurrent Neural Network (RNN) to classify emotions on tweets. They created an emotion dataset using Twitter API and after some simple preprocessing steps, applied it to RNN.

2.1 Grammatical structures (Language limited)

In 2013, Kim proposed a supervised vector machine-based learning approach using a hierarchical sense recognition structure [32]. In this method, first of all, a tree is built in which, each node has two levels. The first level defines the word position in the sentence in terms of importance and the second level polarizes the sentence in terms of the emotion given by the users.

In 2014, Vinodini and coworkers proposed a hybrid machine learning method that attempts to classify users’comments as positive or negative, using several classifiers. This method combines both Bayesian boosting and bagging models with the PCA method to reduce feature dimensions. This article also examines the impact of different types of features such as one, two, and three-part. The classifications used include SVM methods and logistic regression. Finally, it was found that the best performance was observed when the combination of the three sets is used namely, unigrams, bi-grams and, tri-grams [71].

In 2015, Tewari and his team proposed a system of E-learning recommendations called A3. The system uses feature-based research that studies the details of individual student reviews of a topic [18].

Thus far, several methods have been introduced to analyze emotions and opinions from social media on SA and opinion mining [45, 46] as there is a special interest in sharing opinions on social networks like Facebook [71], Twitter [18], and TripAdvisor [69], regarding many topics. These data are analyzed using two main methods: ML methods and lexicon-based methods. ML methods such as the support vector machine (SVM), Naive Bayes (NB), logistic regression, and multilayer perceptron (MLP) need a training dataset to learn the model from corpus data, and a testing dataset to verify the built model [25, 27, 35, 36, 62]. The lexicon-based method is based on a dictionary of words and phrases with positive and negative values.

Grammatical structures are limited to language and text. One of the disadvantages of statistical methods is their uncertainty and language dependence, so we seek to propose an algorithmic algebraic equation that solves these problems in a deterministic and non-probabilistic method while maintaining the advantage of language independence.

2.2 Statistical structure (Uncertainty)

SentiWordNet is the most widely used lexicon in the field of sentiment analysis [19, 40]. However, due to the different senses of words, this approach may not obtain a good result in some domains. To overcome this problem, domain dependent lexicons are presented for the proposed system. In the field of text mining, the extraction of relevant information from social media is a challenging task. Many approaches have been proposed for information extraction and sentiment analysis, in the literature, including lexical knowledge, deep learning, neural word embedding, and fuzzy logic [11, 67, 76].

Statistical structures are used in many text mining methods. Mahmoudi et al. [37] analyzed tweets for finding behavior of tweet writers in COVID-19 conditions. The proposed scheme was based on statistical and mathematical operations about user sentiment changes in COVID-19 conditions. Sharma et al. [61] analyzed textual entries posted by college students in a four year period and found that Education topics were less important than health issues during Covid-19 growth. Some other research worked on SA in COVID-19 [50].

Although they are independent of the text or language and are therefore common, as mentioned, these methods are not deterministic.

2.3 N-gram properties

One of the most important phases of the process of analyzing emotions is the text modeling using the attributes that are capable of expressing attributes of documents by focusing on N-gram properties. N-gram attributes are divided into two categories:

  • Fixed N-gram: An exact sequence at the character or vocabulary level. Such as 1-gram or 2-gram;

  • Variable N-grams: Templates for extracting information from text. Such as noun + adjective.

Variable N-gram attributes are capable of expressing more complex linguistic concepts [71]. Besides, N-POS features, which are N combinations of speech utensils, are used in emotion analysis [18]. The N-POS Word, which is a combination of N words, along with their speech tags, has not been used extensively. Since POS-Tag features along with the word itself can reduce word ambiguity, thus improving the accuracy of evaluating and classifying texts if the problems of dispersion and redundancy can be managed [62].

2.4 Machine-based learning methods

Classifiers such as Maximum Entropy, Naive Bayes, and Support Vector Machines are some of the methods that should be taught appropriately with a learning set. In the case of a specific subject like hotel reviews, the dataset and the learning set should be selected from the same type while the use of inappropriate educational data has caused a severe loss of accuracy in classification and this indicates the importance of appropriate educational data.

Blankers and his colleagues [7] used the chi-square method to select the feature. They achieved their best results by employing SVM classification and maximum entropy in combination. To improve the classification, presenting a correct model of documentation is crucial. Simultaneous uses of several classification algorithms do not necessarily increase classification accuracy but may increase complexity over time. Using a combination of several classification algorithms for a dataset cannot be a solution to improve the speed and accuracy of text classification. Instead of using multiple classifiers, we can use some feature selection filters or look for a more suitable model for document modeling. Univariate methods have been utilized in many investigations due to their less time complexity compare to multivariate methods.

2.5 Semantic-lexical methods

Advanced lexical dictionaries are used in the developed methods for text analysis. These dictionaries include WordNet, which is built for language processing research. It emphasizes the meaning of the words, clustering all the English words, and each of these clusters can have a relation like contradiction or proportionality to the other clusters [39].

For example, a simple lexical approach examines the text using a glossary, in which all words in the domain are rated as positive or negative. This method is rarely used despite its simplicity due to the complexity of the linguistic structure. A sentence can have lots of negative words but it may entail a positive meaning. Another problem is the inability to recognize metaphors. This problem can be partially solved by having a separate glossary for common metaphors and similarity recognition methods. But in this case, vocabulary alone is not enough and other methods such as the grammar tree should be used [67].

Considering advantages and shortcomings in the previous researches, our focus in this research is on the detection of content on fully automated methods and features based on mathematical analysis, solving equations in sparse (thin) space.

3 Prerequisites

Given the background of literature analysis in gaining perspective or emotion in sentences and weaknesses of previously employed methods in this field, we attempt a mathematical weight allocation method to enhance the accuracy of polarity detection in texts. This study is expected to overcome the disadvantages of available statistical methods such as the lack of attention to the mathematical possibilities and shortcomings of the text processing when different combinations are used. It should be noted that the processing required for the mathematical investigation of weight allocation to vocabulary in different databases is the main factor in adding the complexity of the task. This section introduces the existing databases and dictionaries, followed by the approach.

3.1 Amazon

The Amazon site is one of the first successful examples of its kind in the world as an American E-commerce company. This site was first launched in 1994 by Jeff Bezos in Seattle, USA. The company’s core business began in 1995 as a book store online, following legal and regulatory processes in the United States. ESWC Semantic Sentiment Analysis 2016 consisted of some challenges. Fine-Grained Sentiment analysis as one of these challenges was divided into five subtasks related to classification and quantification of the polarity of sentiment according to a two or five-point scale in Amazon reviews. We tested our method on a two-scale polarity detection task.

3.2 Taboada database

The Taboada database is a collection of eight different domains and has fifty positive and negative comments in each domain [66]. This database was collected and labelled by Stanford University in 2004. Because of the variety of vocabulary, this database is commonly used as a benchmark for comparison between different methods.

3.3 Dictionary

In the proposed method, we needed a list of sentimental words and the sign of their scores. GI dictionary [64], WordNet dictionary [39], ANEW dictionary [9] and SentiWordNet dictionary [6] have been considered as usual sentiment dictionaries in our work. The SentiWordNet 3.0 [6] dictionary is the main dictionary that has been used. The SentiWordNet 3.0 is an improved version of SentiWordNet 1.0. This dictionary is a lexical resource that was made by automatically annotating WordNet.

4 Proposed method

4.1 Pre-processing

The steps are used for preprocessing are consistent with many other articles that have been implemented with the following procedure:

Word bag conversion::

A common way to learn text categorization is to parse the documents into their constituent words. In this case, we have transformed the document into a constant-length vector of existing lists by creating our dictionary of several words.

Lowercase::

It will be used to identify a single word with two uppercase and lowercase letters.

Token::

At this point, we remove all dots and signs and replace spaces with non-text characters into a space character, and as a result, our text becomes a set of words.

Length::

Based on the existing document, the number of useful lengths of words is identified, and finally, by removing unnecessary length words, we clean the collection of redundant words.

Stop words::

A list of words that have no semantic directions and will be deleted in processing.

Remove unnecessary parts::

A list of words that have no semantic directions, have different forms of representation, or have a specific representation pattern, will be deleted in the process.

Root extraction::

Extracting the root of words and removing some words together can be eliminated.

4.2 Proposed algorithm

The proposed system (shown in Fig. 1) gets the total inputs, such as database labelled texts, dictionaries, intensifiers, negators, etc., and breaks the labelled texts into unigrams and bigrams (Match & Extract) that include sentimental words or intensifiers and negators.

Fig. 1
figure 1

Proposed system block diagram

A bigram is deleted even if one of the two components is out of range. We convert each unigram, and bigram to an equivalent numeric code. Assuming that for every unigram, bigram is a variable and we have a sum of m variables and n labelled expressions, the problem arose for solving an n equation with m unknowns.

The above steps are given in the example below for a sample negative text from the BOOK section of the Taboada database.

Original text:

“I have read all of Grisham’s books and this was by far the most boring one! The story is about a lawyer who gives up his high paying job to help the homeless in an office staffed by 2 other people. There is no mystery to the story. Grisham’s books are normally mysterious, can’t wait to finish it, a “who done it” style. This book does not have any surprises, its dull. I expected a book like other books (f.i. Pelican Brief etc) he has written. It definitely was not! I would definitely not recommend it to anybody”

This phrase has 98 words, and only nine of them include boring, definitely, help, high, homeless, like, normally and recommend, are in the sentiment dictionary.

This routine is done for all phrases and an equation is made for each phrase. In this equation system, each sentimental word is a variable and the coefficients of variables are the number of sentimental word repetitions in the phrase. The answer of the equation is set to + 1 for positive and − 1 for negative documents. The following (14) are four samples of 2 positive and 2 negative documents.

$$ x_{602} +x_{693} +x_{2284} +x_{2442} + x_{2694} +x_{2847} +x_{3115} +x_{3175} +x_{3726} + $$
(1)
$$ x_{3975} + x_{4225} +x_{5389} +x_{5392} +x_{5427} +x_{5935} +x_{6140} +x_{6500} +x_{7501} = 1 $$
$$ x_{1034} +x_{2173} + x_{2505} +x_{2541} +x_{2884} +x_{3617} +x_{3759} +x_{4062} +x_{4364} + $$
(2)
$$ x_{4479}+ x_{4487} +x_{4715} +x_{5203} +x_{5387} +x_{6168} +x_{7393} +x_{7460} =-1 $$
$$ x_{823} +x_{1618} +x_{3075} +x_{3096} +x_{3142} +x_{3907} +x_{4560} +x_{5427} = -1 $$
(3)
$$ x_{2439} +2 x_{3156} +x_{3759} + x_{3907} + x_{3925} + x_{4195} + x_{5395} + x_{5427} + x_{7325} = -1 $$
(4)

Since the number of equations and unknowns is not balanced, in a large database we solve the sparse ill-posed equations using the Tikhonov method. In a normal database such as Taboada, we can use classical numerical methods such as Cholesky’s decomposition, LU decomposition, QR decomposition, least-squares method, etc (In this article, we used LU decomposition).

4.3 Tikhonov method

The main problem with discrete position problems such as our system is the lack of numerical order of the unknown coefficient matrix and the indefiniteness of the problem due to the small single values of the coefficient matrix. Therefore, to stabilize the problem, it is necessary to add information about the systems of position equations mentioned in the form “Answer method” to the problem. One of the methods to solve these equations is “Tikhonov” which is the most common method in stabilizing discrete position problems, especially solving inverse problems. The idea of this method was proposed almost simultaneously, but independently, by Philips and Tikhonov. From a statistical point of view, this method is considered inverse methods of solving inverse problems and is used when the initial information or assumption of unknowns is available.

In the Tikhonov method, as in the least square’s method, the assumption is that the observational error is random and that the probability distribution function of the errors is normal and that their mathematical expectation is zero. Therefore, in this method, like the least square method, we are looking for an answer with the least number of residuals. But it was also not possible to obtain an answer only with the least-squares condition in the devices of discrete position equations due to the bad conditions of the operator, so in Tikhonov’s method, while minimizing the residuals vector, by minimizing a feature of unknowns, the infinity of the answer is prevented. Tikhonov has many applications in various research fields belong to computer science, simulations and engineering such as load identification [51], radiation problem [60], Thermal-Conductivity problem [8], Hemivariational Inequality problems [68], Time-fractional diffusion equation [74] and Singular value decomposition [12].

The responses, in the ill-posed equations are sensitive to the input data error, meaning a small disturbance in the input data leads to a fundamental disturbance in the response. On the other hand, in practical applications, data always contains errors such as measurement errors, approximation errors, rounding errors, and so on that greatly affect the solution of the problem. In regularization methods, they add more information about the answer to the problem to obtain a sustainable answer.

By introducing this constraint, they seek to provide an appropriate balance between the constraint minimization and the residual ∥Axb2 minimization [13].

In Tikhonov method, the ordered solution is defined as the minimum of the residual norm weight combination and the constraint.

In Tikhonov’s method, the systematic solution to the system of linear equations, Ax = b which A is a maladaptive matrix, is defined as follows.

$$ x_{\lambda}={\arg\min}_{x}\{\Vert Ax-b{\Vert_{2}^{2}}+\lambda^{2}\Vert x {\Vert_{2}^{2}}\} $$
(5)

Where the parameter λ > 0 is regularization and must be carefully selected. A large λ (equivalent to a large amount of regularization) results in norm shrinkage, and enlarges in contrast to the remaining norm, while a small λ (equivalent to a small amount of regularization) has the opposite effect.

According to the matrix SVD analysis, Tikhonov’s ordered solution is expressed as follows.

$$ x_{reg}=\sum\limits_{i=1}^{n} \frac{{\sigma_{i}^{2}}}{{\sigma_{i}^{2}}+\lambda^{2}}\frac{{u_{i}^{T}}b}{\sigma_{i}}v_{i} $$
(6)

Where \(f_{i}=\frac {{\sigma _{i}^{2}}}{{\sigma _{i}^{2}}+\lambda ^{2}},i=1,2,{\dots } ,n\) the coefficients are the filter, and we have

$$ \begin{array}{@{}rcl@{}} f_{i}=\frac{{\sigma_{i}^{2}}}{{\sigma_{i}^{2}}+\lambda^{2}}\approx\left\{ \begin{array}{ll} 1 &\ \ \ \ \sigma_{i} \gg \lambda\\ \frac{{\sigma_{i}^{2}}}{\lambda^{2}} &\ \ \ \ \sigma_{i}\ll \lambda \end{array}\right. \end{array} $$
(7)

If λ = 0, all filter coefficients will be one, and we will have the xi answer given by relation 8, and in return \(\lambda =\infty ,x_{reg}=0\), is obtained.

$$ x_{reg}=\sum\limits_{i=1}^{n} f_{i} \frac{{u_{i}^{T}}b}{\sigma_{i}}v_{i} $$
(8)

Formula (6) can be simply used by the Tikhonov function achieved.

$$ \Vert Ax-b{\Vert_{2}^{2}}+\lambda^{2}\Vert x {\Vert_{2}^{2}} $$
(9)

By calculating the soft and deriving from (9) we have the minimum to calculate.

$$ (A^{T}A+\lambda^{2}I)x=A^{T}b $$
(10)

The following ATA,ATb can be expressed using the right singularity vectors.

$$ A^{T}A=V{{\varSigma}}^{2}V^{T}=\sum\limits_{i=1}^{n} v_{i}{\sigma_{i}^{2}}{v_{i}^{T}},A^{T}b=\sum\limits_{i=1}^{n}\sigma_{i}({u_{i}^{T}}b)v_{i} $$
(11)

Substituting in (10) results in (11).

According to the expression \(\frac {{u_{i}^{T}}b}{\sigma _{i}}\) in (8), it is clear that the rate of velocity that \({u_{i}^{T}}b\) and σi tends to zero to each other plays an important role in the behavior of the bad condition. Intuitively, we expect that when the coefficients \(\vert {u_{i}^{T}}b\vert \) tend to be zero at a much slower rate of σi Tikhonov’s regularization and other methods that filter out small singular values cannot provide a well-ordered systematic answer. This causes the softening of a large regularization error. The error in the regularization methods for which filter coefficients are defined is expressed in (12).

$$ x_{exact}-x_{reg}=\sum\limits_{i=1}^{n} \frac{{u_{i}^{T}}b}{\sigma_{i}}v_{i}-\sum\limits_{i=1}^{n} f_{i} \frac{{u_{i}^{T}}b}{\sigma_{i}}v_{i}=\sum\limits_{i=1}^{n}(1- f_{i} )\frac{{u_{i}^{T}}b}{\sigma_{i}}v_{i} $$
(12)

To calculate the orderly answer that approximates the exact answer to the problem well, the right xexact should apply to the discrete Picard’s condition criterion.

The discrete Picard condition applies if the Fourier coefficients \(\vert {u_{i}^{T}}b_{exact}\vert \) (at least on average) tend to zero when I increasing, faster than singular values σi [63].

5 Result

The important details of the training step are as follows:

  • In making equations, the positive polarity answer has been assumed 1 and negative polarity -1.

  • The list and sign of sentimental words have been made using SentiWordNet 3.0 [6].

  • The list of intensifiers has been made using the list of general intensifiers [10, 25]. The negators are selected based on Kiritchenko and Saif research [35, 75].

The proposed dictionary has 14072 negative and 15023 positive sentimental words (a totally of 29095 words), 15 negators and 176 general intensifiers.

We calculated scores in two different datasets.

  • First: The train and test set were selected from Taboada database. Since the number of equations is lower than unknowns in Taboada database, we used LU decomposition for solving equation system. The sentimental words, intensifiers and negators which do not exist in train samples are removed from the sentimental words, intensifiers and negators list.

  • Second: The train set was 70% (700000 samples) of randomly selected samples from the Amazon dataset, and the calculated scores were tested on the remaining samples. Since the number of equations is so higher than unknowns, in this case, we solved the sparse ill-posed equation system using the Tikhonov method.

All implementations have been done using Matlab 2021a.

For comparison of the proposed method versus other states of the art methods, True Positives (TP), True Negatives (TN), False Positives (FP) and False Negatives (FN) were calculated and using these values, precision, recall, F-measure, and accuracy were obtained.

5.1 Taboada dataset

To validate our method in normal database we compared our method with Zargari et al. [80] and Senti-N-Gram dictionary [22] an equation system was made based on the 350 sample (equivalent to 70% of the Taboada dataset) and then the results on the all samples show the efficiency of our method. The Other conditions are selected similar to Zargari et al.’s research. Due to two different methods being proposed in Zargari et al. [80], we compared our method with the best-reported result. The Taboada database, including 500 positive and 500 negative samples, has been used as the train and test set in this case.

Due to Table 1 results, our method enhances scoring efficiency by increasing accuracy in negative polarity. Obviously, this increase in negative polarity decreases positive polarity detection accuracy, but the overall result shows that our system detects polarity better than the state of the art methods.

Table 1 Comparison of previous dictionaries & proposed method

In fact, scoring sentimental words based on mathematical solutions is more accurate than statistical or ML approach. ML methods have many challenges in scoring negative sentimental words but our analytic approach found scores in negative and positive cases more accurate without any problem.

5.2 Amazon dataset

In this case, we compared our method with Sygkounas et al. [65], Di Rosa & Durante [24] and Petrucci & Dragoni [48]. The evaluation conditions are selected similar to these researches. Since the number of equations is so higher than unknowns, in this case, we solved the sparse ill-posed equation system using the Tikhonov method. Table 2, shows the result on the Amazon test set. In this test set, the number of positive samples, similar to negative samples, is equal to 150000. In our method, to eliminate the effect of random training and testing data, the training scheme, run 10 times, and the reported result is the median of obtained answers.

Table 2 Comparison of Sygkounas et al. [65], Di Rosa & Durante [24], Petrucci & Dragoni [48] methods vs proposed method (Amazon test set)

Based on simulation results, our method’s overall accuracy, Precision, Recall, and F measure is better than other methods in this case. In fact, the proposed scheme enhances the efficiency of scoring. Similar to Taboada, our method enhances scoring efficiency by increasing accuracy in negative polarity. Obviously, this increase in negative polarity decreases positive polarity detection accuracy, but the overall result shows that our system detects polarity better than the state of the art methods. In fact, Tikhonov method (as an analytic approach) is more accurate than other methods in finding the score of sentimental words.

6 Conclusion

This research contributes to presenting an application of Tikhonov method to solve unbalanced, complex semi linear algebraic equation systems. The suggested method uses Tikhonov method to solve an NLP scoring scheme in a polarity detection method. The simulation results showed the proposed method’s efficiency compared to other machine learning, fuzzy or stochastic methods.

The proposed method was tested on two different cases. On Taboada dataset and ESWC (Amazon) databases. In both cases, our method surpassed the state of the art methods. The ESWC Database is a very large, balanced database from Amazon. We observed improvement of our method over negative polarity due to our proposed mathematical scheme. Moreover, we demonstrated the effectiveness of our proposed method over the most common and traditional machine learning, stochastic and fuzzy methods.