Keywords

1 Introduction

One of the main challenges of recent years is to create more natural, sensitive and socially intelligent machines, that are not able only to communicate but also to understand social signals and make sense of the various social contextual settings [24]. Thus, besides communication through various channels and through verbal content (semantics), machines also need to be able to recognize, interpret, and process emotional information as humans. In human cognition, thinking and feeling are mutually present: emotions are often the product of our thoughts, as well as our reflections are often the product of our affective states. But, what does it mean to be socially intelligent when incorporating interaction context? So far, in natural conversations context awareness is defined as past visual information [10], general situational understanding [7], past verbal information [13], cultural background [15], gender of the participants, knowledge of the general interaction setting in which an emotional phenomenon is taking place [4], discourse and social situations [2]. Accordingly, studies in intelligent Human-Computer interfaces (iHCI), which incorporate context, correspond to the following contextual aspects, known as W5+ formalization: Who you are with (e.g. dyadic/multiparty interactions [25]), What is communicated (e.g., (non)-linguistic message/conversational signal, and emotion), How the information is communicated (the person’s affective cues), Why, i.e., in which context the information is passed on, Where the user is, What his current task is, How he/she feels (has his mood been polarized changing from negative to positive) and which (re)action should be taken to satisfy human’s needs, goals and tasks [9].

Unfortunately, so far the efforts on human affective behavior understanding are usually context independent [12]. In light of these observations, understanding the process of a natural progression of context-related questions when people interact in a social environment could provide new insights into the mechanisms of their interaction context and affectivity. The “Who”, “What”, “Where” context-related questions have been mainly answered either separately or in groups of two or three using the information extracted from multimodal input streams [28]. Thus, as of date, no general W5+ formalizations exist, like the systems that answer to most of the “W” questions are founded on different psychological theories of emotion and they all fit specific purposes according to the goals of a particular research in various fields.

Recent research on progressing to the questions of “Why” and “How” has led to the emerging field of sentiment analysis [6, 14, 19], through mining opinions and sentiments from natural language, which involves a deep understanding of semantic rules proper of a language. Furthermore, the interpretation of cognitive and affective information associated with natural language and, hence, further inferring new knowledge and making decisions, in connection with one’s social and emotional values and ideals, is of crucial importance. The problem when trying to emulate such cognitive and affective processes, is that while cognitive information is usually objective and unbiased, answering the “Why” context-related question through affective information is rather subjective and argumentative.

Under this view, our long-term goal is to understand whether and how context is incorporated in automatic analysis of human affective behavior and to propose a novel context-aware incorporation framework (Fig. 1) which (1): includes detection and extraction of semantic context concepts, (2): enriches better a number of Psychological Foundations with sentiment values and (3): enhances emotional models with context information and context concept representation in appraisal estimation, using publicly available on-line knowledge sources (OKS) in natural language processing [26]. As a first step in this work, we focus on bridging the gap at concept level by exploiting semantics cognitive and affective information, associated with the image verbal content (semantics), which for the needs of our research is the contextual interactional information between the user and the operator of the SEMAINE database [16], keeping fixed the “Where” context-related question. This context concept-based annotation method, that we are examining, allows the system to go beyond a mere syntactic analysis of the semantics associated with fixed window sizesFootnote 1. In most of traditional annotation methods, emotions and contextual information are not always inferred by appraisals and thus contextual information about the causes of that emotion is not taken into account [8].

Fig. 1.
figure 1

System’s Overview: (a) We discover semantic context concepts from verbal content (semantics) associated with SEMAINE dataset and (b) represent each one with multi-word expressions, enhanced with sentiment values (c). A number of Psychological Foundations are enriched in terms of visuality (d). We finally show that this proposed approach could show a clear connection between semantics, cognitive and affective information prediction (e).

The structure of the paper is as follows: Sect. 2 discusses the challenges of existing emotion categorization models w.r.t. context concept semantic models; Sect. 3 details on the methodology that has been followed; Sect. 4 presents an analysis of the context-concept indicative examples generated from SEMAINE corpus; Sect. 5 discusses a number of suggestions to further enhance the framework’s robustness and finally, Sect. 6 sets out conclusions and a description of future work.

2 Related Work

Emotions are complex states of feeling, resulting in physical and psychological reactions influencing both our thought and behavior. The study of emotions still remains an essential and open part of psychology. Of interest to Natural Language Processing (NLP) is being able to tell which emotion is expressed in the text. Predominantly, research on detecting emotions from text has focused on capturing emotion words based on three emotion models, i.e. categories of basic emotions, emotion dimensions and cognitive-appraisal categories, particularly the componential model [5, 12].

Unlike the categorical and dimensional approaches, recently, increasingly attention has been dedicated to another set of psychological models, referred to as componential models of emotion, which are based on the appraisal theory and might be more appropriate for developing context-aware frameworks [18]. However, how to use the appraisal approach for automatic analysis of affect is an open research problem. In the componential models of emotion, various ways of linking automatic emotion analysis and appraisal models of emotion are proposed. This link aims to enable the addition of contextual information into automatic emotion analyzers, and enrich their interpretation capability in terms of a more sensitive and richer representation.

However, these emotional models have some limitations. Categorical approaches usually fail to describe the complex range of emotions that can occur in daily communication. Furthermore, the dimensional space neither allows to compare affect words according to their reciprocal distance, nor models the fact that two or more emotions might be experienced at the same time.

Particularly, a number of 2D-dimensional approaches are mainly used to visualize Psychological Foundations. An early example is Russell’s circumplex model [21], which uses the dimensions of arousal and valence to plot 150 affective labels. Similarly, Whissell considers emotions as a continuous 2D space with evaluation and activation as dimensions [27]. Another bi-dimensional model is Plutchik’s wheel of emotions [20], according to which emotions are adaptive as they are based on evolutionary principles, even though we conceive emotions as feeling states. These feeling states are part of a process involving both cognition and behavior and containing several feedback loops. Eventually, all such approaches work at word level, so they are unable to grasp the affective valence of multiple-word concepts.

However, since the above models currently focus on the objective inference of affective information when associated with natural language opinions, appraisal-based emotions are not taken into account. Nevertheless, in view of their suitability to context modeling, emphasis should be given on emotional models based on cognitive appraisal, which characterize emotional states in terms of detailed evaluation of emotions acquisition and especially implicit methods. For an extended overview on modeling affect, the reader is referred to [5, 12].

Semantic context concepts. For a more applicable semantic context concept model, rather than a theoretical one such as the componential model, research has been focused on mining opinions and sentiments from natural language. This is challenging, as it requires a deep understanding of the explicit and implicit and semantic language rules, struggling with NLP’s unresolved problems such as negation handling, named-entity recognition, word-sense disambiguation, etc. Concept-based approaches [23] aim to grasp the conceptual and affective information associated with natural language opinions. Additionally, concept-based approaches can analyze multi-word expressions that don’t explicitly convey emotion, but are related to concepts doing so. For example, instead of gathering isolated opinions about a whole “event” (e.g. birthday party), users are generally more interested in comparing different events according to their specific set of semantically related concepts, e.g. “cake”, “surprised friend”, or “gift” (which can be considered as contextual information for improving search results), associated with a set of affectively related concepts, e.g. “celebration” or “special occasion”. This taken-for-granted information referring to obvious things people normally know and usually leave uncommented, is necessary to properly deconstruct natural language text into sentiments. For example, the concept “small room” should be appraised as negative for a hotel review and “small queue” as positive for a post office, or the concept “go read the book” as positive for a book review but negative for a movie review.

3 Methodology

3.1 Corpus for Semantic Context Concepts Extraction

The model here is confronted with the SEMAINE corpus [16]. This corpus comprises manually-transcribed sessions where a human user interacts with a human operator acting the role of a virtual agent. These interactions are based on a scenario involving four agent characters: Poppy: happy and outgoing, Prudence: sensible and level-headed, Spike: angry and confrontational and Obadiah: depressive and gloomy. Agent’s utterances are constrained by a script, however, some deviations to the script occur in the database.

3.2 Pre-processing

The pre-processing submodule firstly interprets all the affective valence indicators usually contained in the verbal content of transcriptions, such as special punctuation, complete upper-case words, exclamation words and negations. Handling negation is an important concern in such scenario, as it can reverse the meaning of the examined sentence. Secondly, it converts text to lower-case and, after lemmatizing it, splits the sentence into single clauses according to grammatical conjunctions and punctuation.

These n-grams are not used blindly as fixed word patterns but exploited as reference for the module, in order to extract multiple-word concepts from information-rich sentences. So, differently from other shallow parsers, the module can recognize complex concepts also when irregular verbs are used or when these are interspersed with adjective and adverbs, for example, the concept “buy easter present” in the sentence “I bought a lot of very nice Easter presents”.

3.3 Semantic Context Concept Parser

The aim of the semantic parser is to break sentences into clauses and, hence, deconstruct such clauses into concepts. This deconstruction uses lexicons which are based on sequences of lexemes that represent multiple-word concepts extracted from ConceptNet, WordNet [17] and other linguistic resources.

Under this view, the Stanford ParserFootnote 2 has been used according to Python NLTKFootnote 3; a general assumption during clause separation is that, if a piece of text contains a preposition or subordinating conjunction, the words preceding these function or are interpreted not as events but as objects. Secondly, dependency structure elements are processed by means of Stanford Lemmatizer for each sentence. Each potential noun chunk associated with individual verb chunks is paired with the stemmed verb in order to detect multi-word expressions of the form “verb plus object”. The pos-based bigram algorithm extracts concepts, but in order to capture event concepts, matches between the object concepts and the normalized verb chunks are searched. It is important to build the dependency tree before lemmatization as swapping the two steps result in several imprecisions caused by the lower grammatical accuracy of lemmatized sentences. Each verb and its associated noun phrase are considered in turn, and of more concepts is extracted from these.

3.4 Opinion and Sentiment Lexicon

Current approaches to concept-level sentiment analysis mainly leverage on existing affective knowledge bases such as ANEW [3], WordNet-Affect [22] and SentiWordNet [11]. However, for the needs of our current work, we use the SentiWordNet, which is a concept-level opinion lexicon and contains multi-word expressions labeled by their polarity scores.

4 Research Findings

In this section, we present an analysis of context-concept indicative and representative but not exhaustive examples generated from SEMAINE coprus [16]. Additionally, we provide a list of the main research findings observed during the analysis.

The first example is extracted from Session 70 for Prudence, focusing to phrases [16–32] and [48–64]. During this interactional context, the discussion revolves round the context-related question: What is discussed? referring to the topic “holidays” and the topic “trip” in phrases [50–64]. The user is not interested in the topic of “work” and thus says that the topic is boring. Throughout this interaction, both the operator and user have the same subjective opinions, as both of them repeat several times the words “excessive” and “absurd” referring to the “trip” topic.

19 - Prudence: “And have you considered where you might go for holidays?”

20 - User: “(Looks around in thought). Ah yeah. I’m thinking about going to Australia.”

...

56 - User: “(Nods smiling). That is absolutely absurd. I concur. (Looks around smiling). But it was... fantastic fun. Eh... four guys doing a road trip from Houston to New Orleans as well. Emm... Obviously myself included. (Licks lips). An... A lot of eating, a lot of drinking, a lot of not talking about research. It was fantastic.”

57 - Prudence: “Well it sounds like it was an... excessive trip.”

58 - User: “Very excessive.”

The second example refer to Spike operator role in Session 73. Studying the phases 56–73, it is observed that the user is empathized by the operator. Multi-word expressions such as “that’s a lot” and “piss me off” are highlighted. The user tells that he is often mistaken for an American, whereas he is Canadian and shows that he has been annoyed by this confusion. Taking into consideration that Spike’s role is to make user angry, these expressions are employed as a way to reinforce user’s annoyance.

56 - User: “Hmm... The world doesn’t really think highly of Americans.”

57 - Spike: “Yeah... But Canadians are just the same aren’t they?”

...

72 - User: “That’s a lot. (Smiles, nods).”

73 - Spike: “That would piss me off.”

In Session 72, the Obadiah role, we identified a number of appraisal expressions. This finding is aligned with Obadiah’s affective style. In this role, the operator expresses attitudes about life and about the user, which triggers short-distance repetitions. This is presented in phases [37–38] and [41–42], in which affect related words such as “happy”, “feeling”, “sad”, “bored” and “interested” are highly repeated.

37 - Obadiah: “Life is hard sometimes.”

38 - User: “(Nods). Life can suck sometimes. I agree.”

...

41 - Obadiah: “Yeah. But you can’t be cheery all the time.”

42 - User: “(Shakes head). Oh God I’m not cheery (laughs).[...]”

Finally, in Session 71, Poppy in phases [24–32] due to her happy and outgoing operator character seems to be aligned with the user’s sentences, providing feedback such as “hmh”, or “yeah”, but without repeating user’s words.

24 - User: “Yeah it’s... very fast. Very... high contact which I... tend to like. Emm... Haven’t done it in a while so (smiling and wide eyed) I guess that makes me a bit sad.”

25 - Poppy: “Ah...”

26 - User: “Emm... Yeah.”

“What” is discussed: identifying the topic. Due to the fact that in SEMAINE corpus, in which only the user is a teller, the former occupies the 65,5 % of the total speech duration, hence, the speech activity is not equally distributed between the user and the operator. Additionally, in a more depth analysis, this phenomenon is also observed while computing the percentage of user’s speech for all sessions corresponding to a specific operator’s role. The speech activity percentages vary from 60,6 % for Obadiah (minimum) to 70,4 % for Prudence (maximum). However, that could be partially explained taken into account the role of personalities of the four agents (played by human operators). For example, Prudence is even-tempered sensible, making the user to talk more, while Obadiah’s depressive mood may lead the user to talk less. On the contrary, as far as the role of Poppy, in the session 26, the happy operator asks to the user, “where is the best wake you ever had?”. The user’s answers “in a tent in kilimanjaro”. Here, the agent’s question opens a new topic without completely defining it. It is the user’s answer which chooses the new topic, but after following the indications given by the agent’s questions.

“Why” and “How” he/she feels - context related questions: identifying the affective style and the sentiments of operators and users. Apart from the context interactional topic, the user’s and operator’s lexical and affective style as well as the operator’s role depend also on the type of the corpus. On the whole, it is expected that the specific vocabulary that is used corresponds to the 5 min interaction and to a restrained vocabulary specific to the operator’s role and to its linguistic style respectively. Examining the most frequent words used by the depressive Obadiah (“miserable”, “suffering”, “disappointed”) and by Poppy (“excellent”, “cool”, “exciting”, “happiness”) is observed that it is possible to extract information for the affective style of each operator role. On the whole, with regard to the operator’s identity, the expressions of affect are more numerous, excepting for the Prudence sessions. This is probably, due to the Prudence’s personality that the operators have to play: a sensible and level-headed person who expresses appraisals about the user’s behavior and asks the user to express attitudes about specific things. Furthermore, for the role of Spike, we found that the most frequent words used (Table 1), such as “fool” and “annoyed”, incorporated a more offensive affective style, probably due to the fact that when the operator playing Spike is offending the user, the former sometimes repeats operator’s words.

Table 1. Top ranked words (score \(>\) 0.025) in SEMAINE coprus

On the other hand, for SEMAINE users, the most frequent words correspond to different topics and users lexical opinions or topics of their corpus, are defined by specific adverbs and include words such as “weekend”, “holiday”, that are indicative of the discussed topic. Consequently, the user’s sentiments are in accordance to the type of agent played by the operator. As expected, Poppy and Prudence sessions express affective information with negative sentiments. Finally, the distribution is more balanced concerning Spike.

5 Discussion

Gradually, the new multi-disciplinary area that lies at the crossroads between Affective Computing, Human-Computer Interaction (HCI), social sciences, linguistics, psychology and context awareness is distinguishing itself as a separate field. It is thus possible to better recognize, interpret and process opinions and sentiments, incorporate contextual information and finally to understand the related ethical issues about the nature of mind and the creation of emotional machines. For applications in fields such as real-time HCI and big social data analysis [1], deep natural language understanding is not strictly required: a sense of the semantics associated with text and some extra information (affect) associated with such semantics are often sufficient to quickly perform tasks such as emotion recognition and cognitive and affective information detection.

We have illustrated a method for extracting context concept aspects from SEMAINE corpus interaction. The proposed framework only leverages on any taken-for-granted information. By allowing sentiments to flow from multi-word concept to multi-word concept, we could possibly achieve a better understanding of the contextual role of each concept within the sentence.

As far as the selection of the corpus is concerned, on which the experiments will be performed every time, the new trend is the collection of data in real time through new sources of opinion mining and sentiment analysis which abound. Webcams installed in smartphones, touchpads, or other devices let users post opinions in an audio or audiovisual format rather than in text. Aside from converting spoken language to written text for analysis, the audiovisual format provides an opportunity to mine opinions and sentiment. Many new areas might be useful in opinion mining, such as facial expression, body movement. Affect analysis, a related field, addresses the use of linguistic, acoustic and (potentially) video information. This field focuses on a broader set of emotions or the estimation of continuous emotion primitives; for example, valence can be related to sentiment.

Furthermore, as far as the presence and the position of the multi-word concepts in the text unit, further examination is necessary, as typically bi-grams and tri-grams, are often taken into account as useful features. Some methods also rely on the distance between terms. Part-of-speech (POS) information (nouns, adjectives, adverbs, verbs, etc.) is also commonly exploited in general textual analysis as a basic form of word-sense disambiguation. Certain adjectives, in particular, have been proved to be good indicators of sentiment and sometimes have been used to guide feature selection for sentiment classification. In other works, the detection of sentiments was performed through selected phrases, which were chosen via a number of pre-specified POS patterns, most including an adjective or an adverb. However, such approaches and their performance are strictly bound to the considered domain of application and to the related topics.

Finally, most of the literature on sentiment analysis has focused on text written in English and, consequently, most resources developed, e.g., sentiment lexicons, are in English. Adapting such resources to other languages should be seriously considered as the choice of words and their intended meaning are personally, contextually, culturally and socially dependent and differ on the level of the different expertise and purposes of tagging users, resulting many times in tags that use various levels of abstraction to describe a resource.

6 Conclusions

Technology has the potential to investigate how to tackle the issues of context awareness of Human-Computer analysis and to progress towards real-world affect analysis. In this work, we attempted to automatically detect semantic concepts, and broad the scope of affect analysis both quantitatively (identify and describe more (non)-emotional states) and qualitatively (enrich the contextual information content by establishing links with contextual appraisal determinants, cognitive and affective information). We would like to emphasize that our findings are clearly preliminary with inevitable limitations. Probably the main limitation is the absence of a more appropriate corpus.

Our future research work will concentrate on further refinement of the existing corpora w.r.t. their productivity and reproducibility. These indicative but not exhaustive results provide the insight of the effectiveness of our proposed framework for the automatic recognition of spontaneous affective states in a human-agent interaction scenario based on nonverbal behavior and contextual information and provides additionally an important contribution to research on affect recognition “in the wild”. Future work, will involve exploration of re-evaluation of objective words in SentiWordNet by assessing the sentimental relevance of such words and their associated sentiment sentences. In addition, work will be undertaken exploring the proposed method in a fully unsupervised method, depending only on the accuracy of the context-concept parser and the sentiments, rather than training the SEMAINE corpus, along with using an enhanced set of rules and opinion lexicon.