1 Introduction

The study and analysis of the sensations, emotions, sentiments, opinions, judgments, and evaluations contained in digital recordings (like text, audio, video) are among the main objectives of the sentiment analysis (SA). Specifically, with the term Sentiment Analysis, we refer to the field of study and research which analyzes sensations, emotions, sentiments, opinion, viewpoints, judgments, evaluations, attitudes, behaviors that people have with respect to entities (like products, services, organizations, individuals, events, topics) and their attributes (Liu 2012). Nowadays, this analysis involves any possible media that people use to express their opinion like writing, speech, movements, facial expression, actions, and so on Yadollahi et al. (2017). However, still today, among all the possible media, the main field of application of SA is the analysis of the natural language. In this field, the SA can be seen as the set of activities aimed at analyzing and extract opinions, judgments, emotions, and sentiments contained within a text, written by a user regarding a specific entity. Currently, the SA is applied to a plethora of different domains like, for instance, marketing (products and services), political elections, tourism and cultural heritage (Casillo et al. 2020, 2019), healthcare, security, social events, education (Chang et al. 2020; Colace et al. 2014), smart city (D’Aniello et al. 2020).

From the point of view of marketing applications, despite the numerous techniques and applications that have been realized, the traditional approaches of sentiment analysis still present some limitations. Indeed, the analysis of sentiment does not allow us to analyze, with a sufficient level of detail, the specific aspects, components, parts, or functionalities of the entity that have been judged and commented on by the end-users. To obtain a finer analysis that could make evident the specific aspects of a product or a service the users liked or disliked the most, novel approaches and techniques are needed. In this case, the opinions of the users have to be analyzed for each specific aspect of the product or service: this kind of analysis is called aspect-based sentiment analysis (ABSA).

In recent years, ABSA received great attention from the scientific community with a proliferation of original and novel techniques and approaches that have their roots in the SA field. The main tasks of the ABSA consist of: the identification of the aspects of an entity from a text; the extraction of the linguistic expression used in the text to refer to the aspects; the identification of the opinion or sentiment polarity for each aspect (Pontiki et al. 2016). Unfortunately, many of the techniques and approaches proposed so far, consider a different interpretation of what is “sentiment” and how it should be measured. This situation generates a lot of confusion between those that are very different concepts (e.g., sentiment, affective reaction, judgment, emotion) which are often treated in the same way by the same approach. Such a problem has been widely discussed and highlighted also by scholars of other disciplines (social sciences, economy, psychology, etc.) as a serious limitation of the current computational approaches adopted for real-life applications. Considering the affective reactions as sentiments, for instance, has a great impact in terms of wrong evaluations on the loyalty of customers from the viewpoint of marketing applications. To date, the technical aspects and the consolidated approaches used for realizing applications of sentiment analysis seems to receive much attention and to prevail in the research agenda, by accepting on the other hand simplifications with respect to the real differences between the concepts. Such simplifications, which probably could have had a rationale in the pioneer works on sentiment analysis to be able to tackle the problem, are not anymore needed thanks to the advancements in computer science and in the related disciplines. Rather, continuing to use such simplifications, does not allow a real leap forward and a change of pace of the research in this area.

The main goal of such works is to detect the users’ attitude towards an entity or its aspects. Therefore, they reduce the whole variety of human sentiments to these two classes: positive of negative sentiment, due to their specific goals. This approach, in principle, is correct as it answers their specific goal. However, recent advances in artificial intelligence and sentiment analysis research areas, together with the emerging needs of business companies and marketing strategies that require a finer analysis of user opinions, require a finer analysis that cannot reduce the whole human sentiments to only two classes (positive and negative). Sentiments, opinions, emotions, affects, orientations, are very different concepts and should be interpreted and treated separately with different approaches. Their correct interpretation is critical from the point of view of marketing applications because an incorrect interpretation may lead to incorrect results and consequently to wrong actions. Furthermore, the different interpretations used by these approaches make also it very difficult to compare these techniques and to use them in real-world applications. According to us, and as noted in several works in literature (Munezero et al. 2014; Yadollahi et al. 2017) (as will be discussed in Sect. 3), sentiment is a lasting, durable emotional disposition that is developed by the user in a long period of time with respect to an entity or one of its aspects. Therefore, it is very different from an immediate declaration of an affect, which has not matured during the time but rather is a reaction to an entity or an aspect: we refer to these with the term affective reaction. As a consequence, these two different kinds of elements (sentiment, and affective reaction) should be treated in a different way as they will lead to different decisions in marketing contexts. Sentiment will need time to be developed, therefore it could be an indication of loyal customers, and it requires long-term marketing strategies, while affective reactions require completely different, more immediate, strategies. Section 3 provides further details on these definitions.

What seems to lack is a formal definition of the fundamental concepts of ABSA (although some works have proposed important definitions and clarified such concepts, as we will report in the next section). In such a context, this work proposes an overview of the literature in the field of SA and ABSA which identifies the main families of techniques and approaches and the interpretation they gave to the fundamental concepts of ABSA. Furthermore, this work proposes a new model for ABSA, namely the KnowMIS-ABSA model, with the aim of providing a shared model on which further approaches and techniques can be grounded. The KnowMIS-ABSA model recognizes the importance of some recent advancements in the field and tries to harmonize and integrate such results in a new formal representation that can be used for novel ABSA techniques. The model underlines the main differences between the ways an opinion can be expressed (sentiment, reactions, emotion, affect) while providing a common representation that allows dealing with these different dimensions in an integrated way. The model, indeed, allows for representing the results of a sentiment analysis technique at a different level of detail (document, sentence, and aspect level) by using different metrics, always separating the interpretation of the different dimensions of a sentiment (affective reactions, sentiment, opinion). The proposed model is both technology and techniques agnostic, in the sense that it can be adopted and used to represent the information regarding sentiments and reviews in different approaches. Prior to the presentation of the proposed model, in what follows, we describe the main reasons and limitations that motivate this work.

1.1 Motivations

In this work, we mainly refer to the analysis of sentiment of opinionated texts in which users express their opinion regarding a product, a service, or an event, by writing a review. A model for ABSA has to define the different dimensions of sentiment, by defining concepts like emotion, affect, opinion. The analysis of the literature review, presented in the next section, evidences the many different interpretations of sentiment. Those definitions lead to different ways of measuring the same thing or, on the other side, to measure different aspects with the same metric. This situation makes evident the need to have a shared definition of the basic concepts of ABSA, which is among the objectives of our work. Alongside the definitions of the basic concepts of ABSA, it is also important to identify the metrics that should be used for each different measure. Considering, for instance, that expressing a sentiment towards a product is very different from just having a positive opinion, it is clear that these two elements should be evaluated and measure in different ways. This approximation between sentiment and opinion, which probably could be acceptable to a computer scientist, is not so in other domains like social sciences, psychology, marketing & communications, neurosciences, etc. We consider this aspect as the real limitation to realize successful and trustful applications for the real world.

Another element that a model for ABSA should propose is the characteristics of the document or media that is considered. Usually, when considering a review, the analysis is focused especially on the text written by the users. However, modern websites as well as social networks, offer many features and tools that allow the end-users to create a very complex and detailed review of a product or a service. Besides the main text which is written in natural language, the user typically can add a rating (for instance on a given scale of five stars); add tags to catalog the review; add some emoticons to express their opinions, and so on. Traditional ABSA techniques usually take into account only some elements of the above-mentioned. More recent approaches consider also rating scales associated with the review (Nakov et al. 2016; Rosenthal et al. 2017), which transforms the sentiment classification problem from a binary classification to ordinal classification. Other approaches consider elements like emoticons (Wang and Castanon 2015), domain-specific language (Jiaxiang 2020) and jargon (Aly and van der Haar 2020), and so on. Therefore, it is evident that a new model, from the point of view of ABSA, for taking into account all these additional aspects to the plain text of the reviews, is needed.

1.2 Contributions

The main contributions of this work can be summarized as follows: (1) an overview of the modern approaches and techniques for SA and ABSA; (2) an analysis of current definitions of the basic concepts of SA and ABSA; (3) the KnowMIS-ABSA model, a reference model for ABSA which is grounded on the major results of the recent literature; iv) a qualitative case study, using a public dataset, to highlight the capabilities of the proposed model in the analysis of the sentiment. Let us underline that the KnowMIS-ABSA model is intentionally generic and abstract in order to be used with different techniques (from lexicon-based to learning-based), and it can be applied to the different levels of analysis (from document level to aspect level). The aim of this model is not to propose novel techniques for ABSA, rather it represents a reference to guide the use or the new definitions of techniques taking into account the deep differences between sentiments, affective reactions, and statements.

1.3 Organization

The rest of the article is organized as follows. Section 2 contains an overview of the state-of-the-art techniques and approaches to SA and ABSA. Section 3 describes the KnowMIS-ABSA model. Section 4 contains the qualitative case study with a comparison between a traditional approach to ABSA and the new model. Section 5 concludes the work with final discussions and future work.

2 Sentiment analysis and aspect-based sentiment analysis: an overview

2.1 Sentiment, opinion, emotion and affect

In the last twenty years, the research on Opinion Mining (OM) and Sentiment Analysis (SA) has reached a high level of maturity, with wide adoption in many (also commercial) applications. Despite the great ferment and innovation in this research field, together with the great interest from the business world in such kind of applications, there is still a lot of confusion regarding what is the meaning of OM and SA and what are the differences (if any) between the concepts of opinion, sentiment, emotion, affect, and related terms. Even though one may think that this is just a terminological debate and that there is not a real difference between the above-mentioned concepts, an increasing number of researchers argue that not only such concepts are profoundly different, but they must be treated using completely different approaches as they lead to different kind of results. Specifically, while almost all the scholars agree on the fact that the tasks of Opinion Mining and Sentiment Analysis regards understanding the subjectivity embedded in a text (or other multimedia content), they have different opinions on what is the meaning of opinion and sentiment.

Some researchers consider OM and SA as synonyms. For instance, in the work of Farhadloo and Rolland (2016), SA and OM are considered as synonyms and the tasks of SA and OM are treated in the same way. In particular, the word “sentiment” is associated with a personal experience which leads to having a certain opinion on a specific topic. Haider et al. (2021) affirm that feelings are expressed by opinions and emotions that usually are collectively defined as sentiments. A similar vision is shared also by Liu (2012) and by Saad and Saberi (2017), which considers the two terms as belonging essentially to the same concept. Medhat et al. (2014) also affirm that the term SA and OM are interchangeable but they recognize that for some researchers the terms have a slightly different meaning. They consider the OM as the activity of extraction and analysis of the opinion regarding an entity while the SA is defined as the identification of a sentiment expressed in a text, by considering opinion and sentiment as two separate concepts. In Novielli et al. (2020), the authors define their task of sentiment analysis applied to the Software Engineering field as the identification of six basic emotions: love, joy, anger, sadness, fear, and surprise, where, therefore, sentiment and emotion can be seen as synonyms (or at least very strictly related concepts). Moreover, they identify the need for defining a theoretical model of affect on which the sentiment analysis should be ground, from which it seems to emerge that also affect can be considered as a similar concept to emotion and sentiment.

Our vision is different as we consider opinion and sentiment as two profound different concepts. Such an idea is indeed shared by many recent works as Munezero et al. (2014), Yadollahi et al. (2017). In what follows, we explain why according to us the two concepts are different, comforted by several works that we will analyze and review. Kim and Hovy (2004), for instance, affirm that an opinion can be subjective without necessarily implying a sentiment. Indeed, an opinion could also not convey a sentiment. On the other hand, Yadollahi et al. (2017) propose a partition of SA into two categories: Opinion Mining, which is considered as a specialization of SA related to the analysis of the neutral, positive or negative opinion in the text, and Emotion Mining, which is the analysis of the emotions in the text. Furthermore, we not only consider sentiment and opinion as different concepts, but we think that it is critical to analyze the different facets of the sentiment, by identifying and defining the key elements as emotion, affect, orientation, etc. Even in this regard, recent works tried to analyze and define such elements.

The work of Munezero et al. (2014) proposes an enlightening discussion, funded on physiological and psychological studies, which tries to eventually establish a fixed frame of definitions for sentiment analysis. We consider these definitions as the foundational pillars on which we construct and define a novel reference model for ABSA, described in Sect. 3. In the cited work, opinions are considered as personal interpretations of information about a topic, while sentiments are prompted by emotions. As an example, an opinion could be: (1) “the battery of the smartphone has a good capacity”, while a sentence containing a sentiment could be: (2) “I always loved the longevity of the battery of all the smartphones of this manufacturer, since the first model”. Considering this difference, it appears clear that an opinion can also be expressed without any emotion, for instance in a descriptive way, as in the sentence: (3) “In my experience, the average duration of the battery is 8 hours with a normal use”. Sentiment can be also considered a social construct of emotions that develop over time and are enduring, meaning that the temporal aspects are critical for the sentiment, as is evident from the previous example. Opinions, on the other hand, are just personal interpretations of facts that may or may not be emotionally charged (in the above example, the first sentence contains an opinion with a positive orientation, while the third sentence contains an opinion without orientation or with a neutral orientation). And even when an opinion expresses a certain kind of subjectivity and judgment, it does not necessarily imply that there is a sentiment (Kim and Hovy 2004; Sokolova and Bobicev 2011). Indeed, considering again the work of Novielli et al. (2020), although they seem to consider as similar concept affect, emotion and sentiment, they affirm that a sentence that contains a negative opinion does not always have to contain an emotion as well, providing some examples supporting this claim. From this consideration, the authors conclude that it is important to identify which is the desired output of a SA tool, whether it is to classify an opinion or to identify emotion in the text.

Continuing the literature analysis with the scope of defining the groundings on which our model will be defined in the next section, here it is important to define the concept of emotion, considering that sentiments are usually prompted by emotions (Loia and Senatore 2014). Munezero et al. (2014) define emotion as social expressions of feelings and affect influenced by culture. Feelings are defined as person-centered psychological and physiological sensations and as affective phenomena to which we have conscious access. On the other hand, emotions are typically preconscious phenomena. Lastly, affect is a more abstract concept that may encompass all the other concepts of emotions, feelings, and sentiments. According to these authors and other scholars, affect is typically difficult to express (and therefore to find) in language. Usually, what we can identify in language and text are not the affect, but the emotions (which can be considered as the expression of affect) or the so-called affective reactions. The affective reactions are the conscious representation of affect (for instance, the fact that we consider an object as beautiful or awful; or that we have an immediate good or bad reaction to a fact). Notice that an affective reaction is completely different from the sentiment, as the sentiment is an enduring phenomenon and a social construct related to emotions; while the affective reaction is usually more limited in time and could be related to the affect and the feelings.

According to this analysis, our vision, sustained by the recent literature, leverages the deep differences among the concepts of sentiment, opinion, emotions, and affective reactions. Consequently, the techniques to use in order to determine their values should be substantially different, as such techniques should look for different aspects of subjectivity. For instance, when we are looking for analyzing the sentiments, we are looking for the polarity (positive or negative) of an enduring experience towards an object, a fact, or an aspect. When we are looking for an opinion, instead, we would like just to understand which is the idea or judgment (which could be also neutral, and which could not contain emotion or sentiment) regarding an object or a fact. In Sect. 3 we will propose our definitions of sentiments, affective reactions, and opinions (conveyed in what we call statements) as the main elements of the proposed model. As discussed, such definitions will be our original synthesis and elaboration of the recent literature.

2.2 Sentiment analysis: levels of analysis

The analysis of the sentiment, or in general of the opinion, can be performed at different levels of detail, from the most generic to the most specific: document-level, sentence-level, aspect-level, and concept-level (Hemmatian and Sohrabi 2019; Pontiki et al. 2016; Nakov et al. 2016; Rosenthal et al. 2017). The document-level analysis [sometimes referred to as text-level (Pontiki et al. 2016)] aims at understanding the polarity of a whole document [e.g., reviews (Behdenna et al. 2018), news article (Shirsat et al. 2017), a post, a tweet (Gurini et al. 2013)]. Therefore, the information is quite general since it summarizes the polarity of many sentences as a unique positive or negative score [usually measured on a two-point or five-point scale (Nakov et al. 2016)]. Lexicon-based approaches for document-level analysis typically identify the polarity of each term and then aggregate their polarity scores to classify the sentiment of the whole document. Considering the level of importance of each word in this kind of approach could improve the accuracy of the classification. Asif et al. (2020) propose an approach to identify the sentiment of posts and comments on social networks by considering the sentiment polarity of each term and by weighting such polarity using a degree of importance evaluating the \(tf-idf\) measure of the term.

A finer analysis is provided by sentence-level SA (Appel et al. 2016). The aim is to understand if a sentence has a positive or negative opinion. Nowadays, there is not a strong difference between the techniques used for document or sentence level, although the latter problem is usually harder, especially for short sentences and sentences without a surrounding context. In fact, as (Neviarouskaya et al. 2007) and (Yadollahi et al. 2017) pointed out, the same sentence used in two different contexts could express two opposite sentiments. Usually, a sentence level approach can be used also to analyze the sentiment of a complete document by analyzing its sentences and using an aggregation operator to combine the polarity of each sentence to obtain the global polarity of the document (Fang and Zhan 2015; Rao et al. 2018).

An analysis at the document or sentence-level provides useful insights regarding the opinion of users about specific entities, but they do not allow to understand which elements, features, or aspects of such entities have influenced the (positive or negative) opinion of the users. To perform such a kind of analysis, aspect-level (or aspect-based) sentiment analysis should be performed (Liu 2012). This level of analysis clearly provides a more accurate result if compared with document and sentence-level analysis. An interesting description of the main tasks and subtasks of ABSA has been proposed in the SemEval competition (the International Workshop on Semantic Evaluation) and in particular in the work of Pontiki et al. (2016). In what follows, we will refer to many of these tasks to both analyze the literature and propose the process in Sect. 3.1. According to this work, typically, an aspect-based task analysis technique foresees the following tasks: (1) identification of the opinionated/subjective sentences from the document; (2) identify the entity (also called aspect category) for each sentence; (3) identify the aspect of the entity to which the sentence refers to; (4) extract the linguistic expression used in the sentence to refer to the aspect; (5) for each identified aspect, identify the polarity (e.g., on a binary scale, five-point scale, or real number in a range)(Pontiki et al. 2016).

Let us underline that the task of ABSA can be performed at the aspect-level, where each portion of sentence referred to the single aspect is considered; at the sentence level, where a whole sentence is analyzed in order to understand the opinion with respect to each aspect contained in it (there could be more than one fragment of the sentence referring to the same aspects); and at document-level or text-level, where the whole document is processed and a overall analysis for each aspect is performed (Pontiki et al. 2016; Nakov et al. 2016; Rosenthal et al. 2017).

Recently, a finer level of analysis has been introduced by Cambria (Cambria 2013; Cambria et al. 2013) with the concept-level sentiment analysis. The idea is to have machines able to deeply understand the natural language and in particular, to understand emotions, opinions, sentiments. Such kind of approaches is based on semantic analysis techniques able to identify and analyze the concepts in a text. SenticNet Cambria et al. (2020) is a lexical resource that has been proposed to assign emotion labels and mood tags to support concept-level sentiment analysis with interesting results.

2.3 Aspect-based sentiment analysis

The first task of ABSA is the aspect extraction (or identification). Several works, like Zhang and Liu (2014), Rana and Cheah (2016), Hemmatian and Sohrabi (2019), proposed the analysis and classification of the techniques for aspect extraction. Specifically Rana and Cheah (2016) proposes a comprehensive analysis of recent techniques for aspect extraction, identifying three main categories: unsupervised, semi-supervised, and supervised approaches. In the first category, we can find approaches based on: frequency or statistics (Hu and Liu 2004; Bafna and Toshniwal 2013; Rana and Cheah 2018; Wang et al. 2015; Luo et al. 2015); heuristics like (Singh et al. 2013) or the work of (Bancken et al. 2014) that uses a syntactic dependency path to identify entities or (Poria et al. 2014) that adopts a rule-based approach. In the semi-supervised category, we can find techniques based on lexicon (Yan et al. 2015; D’Aniello et al. 2018; Shah and Swaminarayan 2021; Klyuev and Oleshchuk 2011) or dependency trees (Yu et al. 2011) and graphs (Xu et al. 2013). Supervised techniques typically use machine learning approaches like random field, SVM (Manek et al. 2017), decision trees, neural networks, and autoencoders (Angelidis and Lapata 2018; Tomasiello 2020).

The next task is sentiment identification (or polarity identification) which requires that a sentiment score (in terms of polarity or orientation) is identified for each aspect extracted from the text (Pontiki et al. 2016). Typically, the score is in a numerical range which indicates the intensity of positiveness or negativeness of the opinion regarding the aspect (for instance, on a five-point scale (Nakov et al. 2016), or as a decimal value in the range [− 1; \(+\) 1] where − 1 is really negative, \(+\) 1 is really positive). In case the objective is to understand only if the sentiment is positive or negative, the task is also called sentiment classification.

Many techniques have been proposed for sentiment identification, which can be classified into three categories (Schouten and Frasincar 2016): (1) lexicon-based approaches; (2) machine learning approaches; (3) hybrid.

2.3.1 Lexicon-based

The lexicon-based approaches for ABSA use lexical resources, like a dictionary, to find the scores for the sentiment words identified in the text. Typically, they also use an aggregation step to combine the scores of the single words together with mechanisms that take into account grammar rules and surrounding words to assign the final score to each aspect.

The approach proposed by Hu and Liu (2004) uses a semantic dictionary obtained from the Wordnet dictionary. The dictionary considers only adjectives and uses a binary score for the sentiment. The approach computes a sentiment score for a sentence and then simply assigns the same score to all the aspects of that sentence. The work proposed in Zhu et al. (2009), instead, splits the sentence into segments, each containing one aspect. A sentiment lexicon is used to identify the polarity of each segment and thus to each aspect. Mowlaei et al. (2020) propose an approach to adapt lexicons used for sentiment analysis to the specific problem of ABSA. In particular, they use statistical methods and a genetic algorithm to extend two lexicon generation methods for the aspect-based problem. This allows the authors to obtain good experimental results outperforming existing methods. Alexopoulos and Wallace (2015) propose an approach for creating and using domain-specific lexicons based on an ontology containing aspect-polarity relations that can be used in domains where it is difficult to find a suitable lexicon.

2.3.2 Machine learning

Supervised and unsupervised machine learning techniques are widely used for the identification of the sentiment score in the ABSA. Some approaches also use a lexicon as part of the training data for machine learning methods.

Traditional machine learning techniques, like Naive Bayes (Mubarok et al. 2017; Anand and Naorem 2016), Decision Trees/Random Forest (Fitri et al. 2019; Karthika et al. 2019) and SVM (Pannala et al. 2016; Wilson et al. 2005; Tripathy et al. 2016) have been widely used in the past. Esuli et al. (2020) proposes an approach for cross-lingual sentiment quantification based on Structural Correspondence Learning (SCL) which is a technique that can be applied to different kinds of classifiers to transfer knowledge through a mapping between pivot terms of two feature spaces. More recently, given the increasing trend and diffusion of deep learning approaches in all the fields of computer science, many deep learning techniques are used for sentiment identification. Convolutional Neural Network (CNN) for instance is used in Cahyadi and Khodra (2018); Maglogiannis et al. (2020); also approaches based on LSTM (Wang et al. 2016; Yu et al. 2019; Bao et al. 2019; Xing et al. 2019) and Recurrent Neural Networks (Al-Smadi et al. 2018; Nguyen and Shirai 2015; Colace et al. 2019) have been proposed. A comprehensive review of deep learning approaches used for sentiment analysis (not only for ABSA) can be found in Yadav and Vishwakarma (2020).

2.3.3 Hybrid

Several works propose hybrid approaches that aim at combining the advantages of lexicon-based and machine learning (especially deep learning) approaches to ABSA. Meškelė and Frasincar (2020) propose a solution combining a lexicalized domain ontology with a neural attention model for ABSA. Asif et al. (2020) propose a hybrid approach to classify sentiment in social media posts. Specifically, the authors use a domain-specific multilingual lexicon to annotate posts and comments containing extremism sentences. Then, a machine-learning based approach is used to classify the sentiment of posts and comments published on social networks. This technique is able to reduce the effort and time needed to annotate large textual corpora to train classifiers. Mumtaz and Ahuja (2018) propose an architecture that uses a lexical resource (an English dictionary) which is given in input to an SVM. Pekar et al. (2014) combine a lexical resource with a deep linguistic processing technique. (Bao et al. 2019) integrate an attention network with a lexical resource for ABSA, while Zeng et al. (2019) combines a convolutional neural network with lexical resources.

3 A reference model for aspect-based sentiment analysis: KnowMIS-ABSA

In this section, we propose a new reference model, the KnowMIS-ABSA model, whose objective is to bring more clarity in the definition of the main concepts related to the ABSA. The main aspects that have driven the definition of the model are:

  • preliminary works which recognize the importance of clarifying what is a sentiment, an opinion, an emotion, and the main difference between SA and OM, like the work of Munezero et al. (2014), but these do not provide a formal model for SA;

  • state-of-the-art approaches and techniques which provide very interesting results but that could be better interpreted in the light of what are sentiments, opinions, emotions, and affects;

  • the need for providing a more recent model of SA that takes into account the characteristics of review and the difference between sentiment, opinion, affect. Many recent approaches still use a definition of a review which has been proposed 10 years ago (Liu 2012). More recent approaches start introducing relevant concepts, as the helpfulness of the review Fang and Zhan (2015). The Web in recent years has undergone a terrific evolution. We cannot consider a review just as a traditional opinionated text: it contains more than this. It contains emoticons, likes and dislikes, comments made on an original review, number of views or web shares, etc.

The KnowMIS-ABSA model consists of the following definitions.

Definition 1

(Entity) An entity \(e_k \in E\) is any abstract and/or real subject of the real or virtual world which is of interest for a user \(u_m\); there is at least a digital record \(rec_j \in REC\) involving the entity \(e_k \in E\) created by \(u_m \in U\).

Definition 2

(User) A user \(u_m \in U\) interacts with a website or a social platform or another kind of virtual community; his/her interactions are represented by records \(rec_j \in REC\).

Definition 3

(Record) A record \(rec_j \in REC\) is the digital representation of an interaction of a user \(u_m \in U\) with the digital platform; it consists of one or more symbols, words, sentences, texts, or other digital contents, related to one or more entities \(e_k \in E\), created by the user in a specific time instant, and which may contain the expression of:

  • emotions: social expressions of feelings and affect influenced by culture;

  • sentiments: social constructs of emotions that develop over time and are enduring;

  • opinions: personal interpretations of information about a topic;

  • judgments: a personal opinion formed after thinking carefully and based on grounds that could be also incorrect; it usually contains an orientation

owned by an opinion holder \(h_l \in H\) (Munezero et al. 2014).

Definition 4

(Opinion holder) An opinion holder \(h_l \in H\) is the subject that has a specific opinion (or sentiment, emotion, judgment, etc.) on the entity or its aspects. Such an opinion can be expressed by the holder himself (in this case, the user and the holder are the same person) or another user. To make the difference between user and opinion holder clear, consider the following review written by the user John: “My wife Mary thinks this smartphone is exceptional. I don’t think so, I hate it”. The user \(u_m\) that written the review is John. The opinion holder \(h_{l1}\) of the first sentence is Mary, the opinion holder \(h_{l2}\) of the second sentence is John himself (so \(h_{l2}\equiv u_m\)).

Definition 5

(Aspect) An aspect \(a_t^{e_k} \in A^{e_k}\) is a component, a part, a functionality, or an attribute of the entity \(e_k \in E\); the set \(A^{e_k}\) contains all the aspects of the entity \(e_k\). Moreover, the aspects could be also hierarchically organized, belonging to different aspect category, as in Pontiki et al. (2016), Rosenthal et al. (2017).

Using the above definitions, it is possible to give a definition of what is a review.

Definition 6

(Review) A review \(R_r \in R \subseteq REC\) is the set of one or more symbols, characters, words, sentences and/or texts, by means of which a user \(u_m\), in a specific time instant, expresses sensations, emotions, sentiments, opinions, viewpoints, judgments, evaluations, made by himself or other holders, about one or more entities \(e_k \in E\) and optionally on their aspects \(a_t^{e_k} \in A^{e_k}, \forall e_k \in E\), published on online platforms.

The analysis of a review aims at studying the opinions contained in it. A review consists of multiple sentences, and it may contain different, also contrasting, opinions with affective reactions or sentiments expressed by the opinion holder. To analyze such a review, the KnowMIS-ABSA defines (based on the definitions analyzed in Sect. 3) three different subjectivity forms, or sentiment dimensions (depicted in Fig. 1). The opinion holder expresses his/her opinion along these dimensions in the review. The three subjectivity forms are:

  • Sentiment: a sentence or an opinion contained in a review may indicate a sentiment of the holder, which is a lasting, durable emotional disposition developed by the holder in a long period of time with respect to the opinion target (the entity or one of its aspects). The sentiments can only be positive or negative. It has no sense to consider a neutral sentiment (while, instead, we may have a neutral opinion or a neutral sentence expressed by the opinion holder). Therefore, the score of the sentiment is indicated by a polarity value. The time dimension of the sentiment is critical: the sentiments have a duration greater than an affective reaction or an emotion because they are more stable and are developed over a long period of time. It can be expressed in the natural language (written or spoken) with one or more words or sentences which usually indicates a durable, well-established opinion. An example containing a sentiment is: “I’ve always hated smartphones with oversized screens”

  • Affective Reaction: a conscious declaration of an affect that describes the way the holder perceives the entity. It differs from the sentiment as it is less stable and less lasting than the sentiment. Like the sentiment, an affective reaction is positive or negative, and cannot be neutral, and therefore the score of the affective reaction is measured with a polarity value. An example containing an affective reaction is: “The screen of this smartphone is fantastic!”

  • Statement (or Affirmation): a sentence or an opinion without sentiment or emotion. It is a statement, an affirmation, a claim to describe a fact, to provide information, or to give a judgment. Such statements can include (implicitly) an orientation that can be positive, negative, or neutral. The score of a statement is given by an orientation value, which differs from the polarity value as it can be also neutral. An example containing a statement is: “The smartphone has a large screen of \(6.1^{\prime \prime }\)”.

Each of the three elements of subjectivity (Sentiment, Affective Reactions, Statement) has a different meaning, is expressed by the holder in a different form, and represents different aspects of the opinion of the holder. Therefore, they are measured and represented using different metrics. In particular, we use polarity metrics (which contain only positive or negative values) for the affective reaction and sentiment, and orientation metrics (which contain positive, neutral, and negative values) for the statements.

Fig. 1
figure 1

The KnowMIS-ABSA model

3.1 Metrics

For each opinion contained in the review, we can define a tuple that allows us to computationally represent such an opinion along with the three different levels of subjectivity (sentiment, affective reaction, or statement) and to process them. We define the following tuples, depicted in Fig. 1.

  • Sentiment tuple \(S{-}tuple: (R_r, u_m, h_l, e_k, a_n^k, S_{kn}, t_i)\)

  • Affective Reaction tuple \(AR{-}tuple: (R_r, u_m, h_l, e_k, a_n^k, AR_{kn}, t_i)\)

  • Statement tuple \(ST{-}tuple: (R_r, u_m, h_l, e_k, a_n^k, ST_{kn}, t_i)\)

where \(R_r\) is the review written by the user \(u_m \in U\); \(h_l \in H\) is the holder of the opinion; \(e_k \in E\) is the target entity of the opinion; \(a_n^k\) is the specific aspect belonging to the entity \(e_k\). If the opinion is related to the whole entity and not to a specific aspect, \(a_n^k\) assumes the special value GENERAL to indicate that the opinion refers to the entity. \(S_{kn}\) is the polarity value of the sentiment of the \(S{-}tuple\) for the aspect \(a_n^k\); \(AR_{kn}\) is the polarity value of the affective reaction of the \(AR{-}tuple\) for the aspect \(a_n^k\); \(ST_{kn}\) is the orientation value for the statement of the \(ST{-}tuple\) for the aspect \(a_n^k\). \(t_i\) is the time instant in which the opinion has been expressed.

Each review can be split into a set of \(S{-}tuple\), \(AR-tuple\), and \(ST-tuple\), one for each opinion related to a specific aspect. The identification of the opinions together with the separation of sentiments, reactions, and judgments/statements sustains the subsequent process of analysis.

In Fig. 2, we propose an abstract process with the main steps that an approach based on the KnowMIS-ABSA model should follow. This is not a concrete implementation of a system, rather it is a generic flow that underlines the main tasks that should be done to complete an ABSA analysis. In particular, the focus of the process is the identification of the different kinds of tuples (sentiment, affective reactions, and statement) and the subsequent different processing using a different technique and metric for each of them. Therefore, the process can be concretely instantiated with any kind of computational technique in each step. However, in this section, we suggest some recent techniques for each step, while in Sect. 4 we propose the implementation of the process we realized for the case study. Let us define the process \(\rho (R_r)\) as the process that takes a review \(R_r\) as its input and produces three sets as output: \(\mathbf {\overline{S_r}}\), \(\mathbf {\overline{AR_r}}\), \(\mathbf {\overline{ST_r}}\) that are respectively the set of all the \(S{-}tuple\), \(R{-}tuple\) and \(ST-tuple\) extracted from the review \(R_r\). The process \(\rho (R_r)\) consists of the following steps:

  1. 1.

    Identification of the candidate opinion sentences: the review is analyzed to identify the sentences that may contain an opinion. This requires the use of data preprocessing techniques (tokenization, stop words removal, part-of-speech tagging, lemmatization) together with a feature extraction task to identify terms, opinion words and phrases, negations, etc. NLP techniques, text mining, and information retrieval approaches can be used for these initial tasks, as analyzed in the recent survey of Birjali et al. (2021).

  2. 2.

    Entity recognition: for each sentence, the entity \(e_k \in E\), representing the target of the sentence, is identified. This step should also deal with the presence of synonyms of the entity and the indirect references to an entity (e.g., by using pronouns) in the sentence. In SemEval competitions, the importance of entity recognition, or target identification, is reported as a critical task even for ABSA (Pontiki et al. 2016; Rosenthal et al. 2017; Nakov et al. 2016). Both grammar-based (Ding et al. 2018; Basiri et al. 2020; Wiegand et al. 2016) as well as machine learning-based techniques (Gan et al. 2020; Liu et al. 2015; Li et al. 2019) can be used in this step (Liu and Zhang 2012; Khan et al. 2014).

  3. 3.

    Aspect extraction: the aspect targets \(a_n^k \in A^{e_k}\) are identified in this step for each sentence. As for the entity, issues like synonymy, polysemy, and indirect reference have to be addressed by the aspect identification technique. This step is not trivial as some aspects can be also implicit and should be inferred from the text (Ganganwar and Rajalakshmi 2019; Tubishat et al. 2018). Many works are based on natural language processing techniques, using linguistic feature patterns, to identify grammatical classes and syntactic relations of the text to identify aspects (Tubishat et al. 2021). Some approaches use inductive supervised learning algorithms (Rana and Cheah 2016). Other approaches performs aspect extraction and aspect sentiment classification in a unified technique (Akhtar et al. 2020). However, recent approaches are heavily leveraging on deep learning and word embeddings approach for aspect extraction (Do et al. 2019). A recent survey on aspect extraction techniques is proposed in (Nazir et al. 2020).

  4. 4.

    Opinion holder identification: another element of the tuple that should be identified in this step is the opinion holder \(h_l \in H\). This is an important task for discriminating between opinions that are viewed from different perspectives (Seki et al. 2009; Wiegand et al. 2016). Many approaches have been proposed for this task, and are integrated into many ABSA techniques, ranging from NLP to learning-based approaches (Sima and Vunvulea 2013; Wiegand et al. 2016; Xu et al. 2015; Colhon et al. 2014; Kolya et al. 2012; Wiegand and Klakow 2010).

  5. 5.

    Time extraction: identify the time \(t_i\) in which the opinion, regarding the aspect \(a_n^k\) of the entity \(e_k\), has been given by the opinion holder \(h_l\) (Liu and Zhang 2012; Tu et al. 2015). This task requires the identification of time expression and to identify tense in order to correctly interpret the time reference of each sentence (Medhat et al. 2014; Min and Park 2012; Preethi et al. 2015). The role of adverbs, especially adverbs of time and comparative adverbs, plays a critical role to identify the time dimension. The work of Haider et al. (2021) proposes an interesting approach to deal with adverbs in sentiment classification, addressing an open research challenge.

  6. 6.

    Classification of the type of tuple: Sentiment, Affective Reaction, Statement. This step is critical, although many existing approaches skip this phase. Before evaluating the sentence, it should be identified if it contains a sentiment, an affective reaction, or just a statement. Each category should be evaluated with a different technique and using a different metric. The classification could be done with approaches similar to those used for the first step identification of candidate opinion sentences. Some approaches based on machine learning can be adopted as in Carrillo-de Albornoz et al. (2019), Xu et al. (2020) and using text classification techniques (Minaee et al. 2021; Vieira and Moura 2017).

  7. 7.

    Evaluation of the polarity/orientation for each tuple: for each kind of sentence (sentiment, affective reaction, statement) a technique is used to measure the polarity or the orientation. Many different techniques can be used for this task, as discussed in Sect. 3. Recent surveys of sentiment and orientation classification techniques are in Yue et al. (2019), Birjali et al. (2021).

  8. 8.

    Construction of the tuples: the tuples for each sentence are then defined as above described.

Fig. 2
figure 2

Abstract process \(\rho (R_r)\) based on KnowMIS-ABSA model to generate tuples from the review

4 A qualitative case study

The objective of this case study is to highlight the existing differences between affective reactions, sentiments, and statements. The case study is based on the annotated dataset created by Hu and Liu (2004). The dataset contains real reviews on five products extracted from Amazon.com. We evaluate the performances of three existing sentiment analysis tools regarding their capability to evaluate the polarity or orientation of each sentence contained in the dataset.

The aim of the comparison is not to identify the tool which gives the best accuracy in the sentiment analysis task. Rather, to fulfill the objective of this case study, we will compare two cases: (1) the case where each subjectivity sentence is considered as containing a polarity/sentiment (as in traditional approaches of ABSA); (2) the case in which a sentence may contain one of the following: an affective reaction, a sentiment, or a statement (which could be both neutral or containing an orientation) as per the KnowMIS-ABSA model. In the second case, we instantiate some of the steps of the process of Fig. 2, by classifying the sentences as sentiments, affective reactions and statements. Comparing the results, we will see the differences between a traditional approach and the proposed model. What emerges is that the state-of-the-art tools have good performances in the analysis of sentiments and affective reactions, but worse performances when analyzing statements. This is because such tools have been constructed to find sentiments, not the orientation that could be hidden in factual sentences like statements. A finer analysis, by creating ad-hoc tools for such kinds of sentences, could greatly improve the performance of these tools. The case study, therefore, will demonstrate that a difference between these concepts exists and that they should be treated differently.

4.1 Data

The dataset “Customer Review DatasetFootnote 1” (Hu and Liu 2004) contains the reviews posted on Amazon.com regarding five electronic devices: Apex AD2600 Progressive-scan DVD player, Canon G3, Creative Labs Nomad Jukebox Zen Xtra 40 GB, Nokia 6610, Nikon Coolpix 4300. In the case study, we consider the reviews about the Nikon Coolpix 4300. The dataset contains 34 unstructured reviews regarding this product. The dataset has been manually annotated by the authors of (Hu and Liu 2004). In what follows we refer to this dataset as “Hu-Liu dataset”. Specifically, Hu and Liu have split each review into sentences. For each sentence, they have indicated the product features (i.e., the aspects) to which the sentence refers to (if any). Then, for the sentence containing an opinion or judgment, defined as “sentences with sentiment” the authors have classified the polarity of the sentence. The polarity is expressed with a discrete value in the interval \([-3; +3]\), without the zero value. The sentence with a negative value has a negative sentiment; the one with a positive value has a positive sentiment. The number indicates the strength of the sentiment according to the authors. Some of the sentences which do not contain a subjective opinion are defined as “sentences with claims” and they are considered by the authors as objective or neutral. The 34 reviews contain 346 unique sentences (excluding the titles of the reviews). Considering that some of the sentences may refer to more than one aspect, we decided to evaluate the opinion with respect to each aspect (since the opinion regarding the different aspects may be different although contained in the same sentence). Therefore, in such cases, we consider the same sentence more than once, each time for one of the aspects therein contained. Following this approach, we obtained 403 sentences, with some duplicates for the sentences with multiple aspects.

Subsequently, we constructed a new dataset by elaborating on the Hu-Liu dataset. We asked five experts to manually analyze and classify each sentence in the three categories of the KnowMIS-ABSA model: (1) affective reactions, (2) sentiments, (3) statements. Therefore, we obtained a second dataset, namely the “KnowMIS dataset”, containing the classifications in these three classes of all the sentences of the Hu-Liu dataset. Moreover, the experts have indicated the polarity for sentiments and affective reactions (positive or negative) and the orientation (positive, neutral, and negative) for the statements.

The annotation procedure was the following. We provided each annotator with the instructions shown in Table 1. We made available to the annotators some examples to let them familiarize themselves with the procedure (reported in 2). Each annotator performed the task independently to avoid influences. For each sentence the annotator should indicate:

  • the type of sentence (sentiment, affective reaction, statement);

  • in the case of sentiment or affective reaction, the polarity (negative or positive) of the overall sentence; in the case of statement, the orientation (negative, neutral, or positive).

  • for each aspect contained in the sentence, the polarity or orientation as above.

The annotator could add a comment to explain the reasons for the made choices and for reporting eventual doubts. Once the annotation process was completed, we proceed with the aggregation and consolidation of the annotations. We discarded the ones wherein the annotator indicates a strong doubt in the comments. In cases where a majority emerged (three out of the five annotators agreed on the classification and the polarity/orientation), we accept the annotation. Otherwise, we adopted a consensus reaching process [proposed by Herrera et al. (1996), Dong et al. (2018) for group decision making] in which the annotators together are asked to discuss among them about the specific sentence and each one explains the reasons behind the previous choices. After this, several rounds of voting are performed, trying to reach a consensus on the final choice. In this case, however, to give more flexibility to the voting process, the annotators express their preferences using utility values. This allows them to express more than one preference but with a different weight (e.g., a sentence could be considered as positive with a utility value of 0.7 and neutral with a value 0.3). Further details on this process are in Herrera-Viedma et al. (2014), Herrera et al. (1996), D’Aniello et al. (2021). If a consensus value of at least 0.8 is reached, the process is stopped and the alternative with the maximum consensus is chosen as the label of that sentence (both for the class and for the polarity/orientation). If the consensus is not reached, another round of discussion and voting is performed. The process ends when the consensus threshold has reached or after three rounds (in this case, we select in any case the choices on which there is the maximum consensus).

Table 1 The instructions given to the annotators and a screenshot of the form to provide the annotations
Table 2 Examples of sentences and annotation provided to the annotators

The classifications provided by the experts are reported in Table 3. Of the 403 sentences, the experts have identified 16 sentences containing an expression of sentiment; (2) 193 sentences with affective reactions; (3) and 194 sentences containing statements. In this table, each sentence is indicated by three numbers x.y.z where x refers to the number (or position) of the review in the Hu-Liu dataset (considering the order by which the reviews appear in the dataset); y refers to the number (position) of the sentence in each review (considering the order and the splitting performed by Hu and Liu in their dataset); z refers to the aspect (in the case where more than one aspect is cited in the sentence).

Table 3 Dataset KnowMIS: Classification of the sentences of the Hu-Liu dataset (Hu and Liu 2004) performed by five experts according to the KnowMIS-ABSA model

4.2 Methods

Three tools for sentiment analysis are used for the case study:

  1. 1.

    Goole Cloud Natural Language API (https://cloud.google.com/natural-language): the API uses machine learning techniques to support different tasks of natural language analysis, among which entity recognition, content classification, and sentiment analysis. In particular, the library supports both sentiment analysis and aspect-based sentiment analysis, at a document or sentence level. The polarity of the sentiment is given in the range [–1; +1].

  2. 2.

    Python Natural Language Toolkit (NLTK https://www.nltk.org/): the NLTK is the platform for building Python programs working with natural language. It provides interfaces to over 50 corpora and lexical resources as well as a suite of text processing libraries for classification, parsing, semantic reasoning. In our case study, we used an already available implementation of a sentiment analysis demo tool implemented with Python NLTK (available at https://text-processing.com/demo/sentiment/). This demo classifies a text as positive, negative, or neutral. The tool uses a hierarchical classification: first, it decides if the text is neutral or not (by giving also a value between 0 and 1 to the level of neutrality); if it is not neutral, a sentiment polarity is determined (with two values between 0 and 1 for the positive and negative sentiment).

  3. 3.

    ParallelDots Text Analysis API (https://www.paralleldots.com/text-analysis-apis): ParalleDots offers a text classification and a NLP API that uses machine learning models (trained on more than a billion documents) offering different tasks like sentiment analysis, emotion recognition, entity recognition, semantic analysis. Regarding the sentiment score, it is given as three percentages respectively for positive, neutral, and negative sentiment.

We compare the results of the three tools applied to the sentences of the Hu-Liu dataset. Specifically, each tool is used to perform an analysis of the sentiment at a document level (i.e., the whole review) and at aspect-level. Indeed, in our case, each sentence has only one aspect, and thus even if we analyze a sentence, we are analyzing the sentiment towards a specific aspect. The results of these tools are compared with the annotations of the dataset to obtain a confusion matrix for each tool.

To obtain these confusion matrices for each tool, we need to compare the results with the annotated dataset. Considering that each tool has a different score mechanism for the sentiment, we need to uniform these scores. Moreover, considering that the annotations of the dataset consist only in the discrete values positive or negative, we need to transform the score of each tool to these two discrete values. We perform the following transformations for the score of each tool:

  • Google Cloud NL API: [0.25; 1]: Positive; [− 0.25; 0.25): Neutral; [− 1; − 0.25): Negative.

  • Python NLTK: when no polarity is identified by the tool, we consider the value Neutral; in the case the positive polarity has a score greater than the negative one, we consider the Positive value, otherwise the Negative value.

  • ParralelDots API: since this tool always gives a score on the three values (positive, neutral, and negative), as a result of the analysis we take the categorical value with the maximum score.

Using this setting, we execute two experiments. First, we compare the tools on the Hu-Liu dataset in which the sentences belong only to two categories: (1) sentences with sentiments and (2) sentences with claims (neutral). Second, we execute the process of Fig. 2 on the annotations of the KnowMIS dataset. For this experiment, the process has been executed in a semi-automatic way, as depicted in Fig. 3. Indeed, considering that we already have the sentences (candidate opinions) with the related entity and aspects in the two datasets, the step 1–3 are not needed as they have been executed by humans. Furthermore, as our aim is to evaluate only the polarity or the orientation of the sentences, we do not need steps 4, 5 and 8 (opinion holder, time extraction and tuple generation). Indeed, we need step 6 (classification of the type of tuples) performed by the experts during the annotation of the KnowMIS dataset (see Sect. 4.1) and step 7 in which the three tools have been used to evaluate the polarity or the orientation. It is clear that in a complete ABSA system, all these steps will be performed automatically (as explained in Sect. 3.1). Specifically, we underline that the Google NL tool is also able to extract both entity and aspects from sentences with good accuracy and to give a polarity for both.

The comparison between the two experiments (the one based on the Hu-Liu dataset and that based on the KnowMIS-ABSA dataset with the above-described process) allows us to qualitatively evaluate the advantages of the proposed model.

Fig. 3
figure 3

Instantiation of the KnowMIS-ABSA process to the case study. The evaluation of sentiments, affective reactions and statements are performed comparing three tools: Google Cloud NL API, Python NLTK, Parallel Dots

4.3 Results and discussion

The first set of experiments concerns the evaluation of the three tools to analyze the sentiment of all the sentences of the Hu-Liu dataset at a sentence-level. In this first case, we do not distinguish between affirmations, affective reactions, and sentiment. Table 4 contains the confusion matrices for the three tools. On the rows, there is the classification of the sentences in the three classes (Pos: Positive, Neu: Neutral, Neg: Negative) as annotated in the dataset (i.e., this represents the ground truth). On the columns, for each tool, there is the number of sentences classified as positive, neutral, and negative. Therefore, on the diagonal we have the sentences correctly classified; out of the diagonal, there are the sentences incorrectly classified. The accuracy of each tool computed as the ratio between the sentences correctly classified on the total number of sentences is depicted in Fig. 4a). Figure 4b) shows the F-score for each tool and each class. The f-score for each class is computed as follows:

$$\begin{aligned} F-Score = \frac{2 \times Precision \times Recall}{Precision + Recall} \end{aligned}$$
(1)

where the precision and recall for a given class is given by:

$$\begin{aligned} Precision = \frac{TP}{TP+FP} \end{aligned}$$
(2)
$$\begin{aligned} Recall = \frac{TP}{TP+FN} \end{aligned}$$
(3)

where TP is the number of true positives, FP the number of false positives and FN the number of false negatives.

Table 4 Confusion matrices for each tool applied to all the sentences of the Hu-Liu dataset
Fig. 4
figure 4

Accuracy and F-Score of the three tools applied to all the sentences of the Hu-Liu dataset

It can be observed, both from the accuracy and F-Score, that Google NL API obtains the best performance. We can also observe that the performance regarding the neutral class is quite poor. According to us, this could be also related to the fact that the dataset does not distinguish between sentences that may be neutral (like sentences containing statements) and sentences that cannot be neutral for definition (like sentences containing sentiments and affective reactions). This suggests that we cannot process all the sentences in the same way. Moreover, if we observe the confusion matrices and in particular the row containing neutral sentences, we see that all the tools have often classified these as positive or negative. This indeed is in line with the main idea of the KnowMIS-ABSA model: a sentence that contains just a statement or an affirmation can also have an orientation (towards positive or negative) and thus it is not true that they are always neutral. And this, in part, seems to be revealed by the tools.

The second set of experiments regards the analysis of the whole reviews (review-level sentiment analysis) of the Hu-Liu dataset. Table 5 contains the confusion matrices. Figure 5a, b show the accuracy and F-score of the tools applied to the reviews. Again, Google NL API seems to over-perform the other two, but what is interesting to our analysis is the fact that, in the considered dataset, there is only one review that has been considered neutral. When compared with the results of the sentence-level analysis, we can observe how in an opinionated document can coexist sentences that contain an opinion or a form of subjectivity with other sentences that instead are just neutral and do not contain subjectivity. We also have a confirmation that modern tools are better at analyzing the sentiment of an entire review (or a longer text) rather than single sentences (or shorter text).

Table 5 Confusion matrices for each tool applied to all the reviews of the Hu-Liu dataset (review-level sentiment analysis)
Fig. 5
figure 5

Accuracy and F-Score of the three tools applied to all the reviews of the Hu-Liu dataset

In the last set of experiments, the KnowMIS-ABSA model is applied to classify the sentences in the three classes (sentiment, affective reaction, and statement) before applying the three tools to the analysis of the sentences of the KnowMIS dataset (Table 3).

Table 6 shows the confusion matrices of the three tools applied to only the sentences containing sentiments. Figure 6a, b show the accuracy and F-scores. In this case, the ParallelDots API seems to perform slightly better than the Google Cloud NL API. An interesting result is that all three tools have improved their performance with respect to the case where we consider all the sentences. This is due to the fact that we selected only the sentences that contain sentiment. Usually, such sentences contain sentiment words (love, hate, etc.) to express the sentiment. Therefore, it is easier for all the tools to identify the polarity, considering also that the polarity can not be neutral (so they surely have a polarity, instead of statements that can be also neutral).

Table 6 Confusion matrices for each tool applied only to the sentences with sentiment of the KnowMIS dataset
Fig. 6
figure 6

Accuracy and F-Score of the three tools applied to all the sentences with sentiment of the KnowMIS dataset

Table 7 shows the confusion matrices for the sentences with affective reactions, and Fig. 7a, b the accuracy and F-scores. We observe that Google NL API has slightly better performances, but what is interesting is that again all the tools improve their performances when compared with the case wherein all the sentences are analyzed. Indeed, many affective reactions are quite easy to detect, especially for the presence of affective words and adjectives that indicate a positive or negative polarity.

Table 7 Confusion matrices for each tool applied to the sentences with affective reactions of the KnowMIS dataset
Fig. 7
figure 7

Accuracy and F-Score of the three tools applied to all the sentences with affective reactions of the KnowMIS dataset

Lastly, Table 8 contains the confusion matrices related to the sentences with statements, with Fig. 8a, b showing accuracy and F-scores. What emerges is that all the tools do not perform very well on this set of sentences, with only Google NL API having an accuracy greater than 50%. This shows that the analysis of sentences containing a statement (which usually are factual sentences but that may contain an orientation) is often more complex to be analyzed with respect to those sentences that contain sentiments or reactions. But what is more interesting for our analysis is that all the three tools find that some kind of orientation (positive or negative) could be present in those sentences which usually are just considered as objective claims. Indeed, the analysis of such statements may hide some hidden judgment by the opinion holders that would be worth investigating.

Table 8 Confusion matrices for each tool applied to the sentences with statements of the KnowMIS dataset
Fig. 8
figure 8

Accuracy and F-Score of the three tools applied to all the sentences with statements of the KnowMIS dataset

The results obtained by this qualitative evaluation can be summarized as follows. Typically, the sentiment analysis at a review-level is more efficient than analysis at sentence-level. The class which seems to be more problematic for the three tools is the neutral one: it is more difficult to understand if an opinion of the user is neutral rather than positive or negative. This can also be explained by considering that in the style of a review, even a sentence that could seem neutral indeed contains at least an orientation of the opinion holder towards the product. And this is one of the main characteristics of the KnowMIS-ABSA model: a statement can have an orientation even when it does not contain a sentiment or an affective reaction. Indeed, if we observe the results obtained only for the sentences with statement, that has been annotated in the original dataset bu Hu and Liu as “claims” that do not contain subjectivity, only the 39% of such sentence (75 out of 194) have been effectively classified as neutral. In the remaining 61%, the tools have identified that an orientation (positive or negative) is present in these sentences, although there is not an affective or emotional content. Therefore, differently from what the classical approaches like the one proposed by Liu (2012) foresee, according to which a statement should be necessarily neutral, the obtained results seem to confirm that the statements may contain a positive or negative orientation. Eventually, an analysis according to the KnowMIS-ABSA model allows the identification of other kinds of subjectivity, besides the traditional concept of sentiment, which can be useful, for instance, in the context of marketing analysis.

Another relevant result is that performing a separate analysis of the sentences containing sentiments from the ones containing affective reactions and from the statements seems to slightly improve the results. Therefore, this suggests that the research should focus on improving performances regarding statements and factual sentences to improve the results of ABSA solutions, using different tools and techniques for each of the three categories.

5 Conclusion

The analysis of the literature regarding the latest advancements in the field of SA and ABSA made evident the terrific progress that has been made in the automatic understanding and evaluation of human opinions. Such an analysis has also highlighted the great confusion lying in the field regarding the basic concepts that underline the ABSA, like sentiment, opinion, affect, emotion. Moved by these considerations, in this work, we proposed an overview and a new reference model for ABSA solutions. The model has been used in a case study related to the analysis of product reviews, comparing it with a traditional approach to ABSA. This qualitative evaluation highlighted that the differences between the different dimensions (sentiment, affective reaction, statement) and metrics (polarity or orientation) defined in the model allow us to better analyze a set of reviews, even when using the same computational techniques and software tools.

From the marketing perspective, the KnowMIS-ABSA model could be effective in the interpretation of the results of any ABSA technique used to analyze end-users reviews. In particular, the capability to distinguish between sentiments and affective reactions could drive the identification of marketing strategies which could be more effective to target a specific group of users. Furthermore, the analysis of statements, that could contain an orientation also when there is no emotion or sentiment in them, can represent another source of important information for the marketing.

In future works, thanks to a clearer definition of the basic concepts of ABSA proposed by the KnowMIS-ABSA model, new approaches and techniques can be defined upon it. This work, indeed, gave rise to further reflections on additional elements of ABSA solutions, in particular on the new characteristics of reviews in modern websites. Therefore, in future work, we will extend the KnowMIS-ABSA model to strengthen and extend it with considerations on reviews and other elements of ABSA. One of the elements to consider is the analysis of comments and opinions related to the reviews published by other users. This requires the evaluation of the importance of the review itself, as well as of the comments, and to understand if the comment has a positive, neutral, or negative opinion with respect to the original review. An approach similar to the one proposed in Aljuaid et al. (2021), wherein a technique to evaluate the importance and polarity of citations in scientific documents is proposed, will be investigated. Lastly, having demonstrated in the current work that implementation of the model is feasible and provides some advantages, we will develop a completely automatic solution for ABSA based on the KnowMIS-ABSA model, by selecting and implementing adequate computational techniques.