Keywords

1 Introduction

Automatic sentiment analysis of texts, that is, the identification of the author’s opinion about the subject discussed in the text, has been one of the most significant tasks in natural language processing in the past two decades. The interest in sentiment analysis is connected to the large volume of electronic texts available on social networks and online recommendation services that contain an abundance of individuals’ opinions on various issues: from products and services to the current political and economic situation (for more on corpora, see Chap. 17).

A large number of scholarly works is devoted to sentiment analysis of user reviews stored in recommendation services (Pang et al. 2002; Pang and Lee 2008; Liu 2012). Another important area of sentiment analysis is the so-called reputation monitoring that tracks positive and negative feedback about a company and its products (Amigo et al. 2012). Sentiment analysis of financial reports and financial news is used to determine trends in the stock and currency markets (Nassirtoussi et al. 2015). The sentiment of mentioning terms in scientific articles is used to predict the most important concepts and scientific trends (McKeown et al. 2016). Sentiment information extracted from texts can be used to determine the personal characteristics of the author (Volkova et al. 2015).

The role of automatic sentiment analysis of social network messages for political and social research is growing. Such studies include the identification of political preferences (Volkova et al. 2014), the prediction of election results (Vepsäläinen et al. 2017; Vilares et al. 2015), and the identification of attitudes toward various political decisions. Also, automatic sentiment analysis can be used to recognize hate speech and calls for violence or fake news (Volkova and Bell 2016).

The first approaches to sentiment analysis aimed to determine the overall sentiment of the document or its fragment (Pang et al. 2002). This level of analysis assumes that a document expresses a unanimous opinion about a single entity, such as in a review of a product. Since the document can express multiple attitudes in relation to the different entities it contains, at the next stage scholars studied the tasks of sentiment analysis aimed toward specified entities mentioned in the text (Amigo et al. 2012; Jiang et al. 2011; Loukachevitch et al. 2015; Loukachevitch and Rubtsova 2016). Finally, an even more detailed level of sentiment analysis is the analysis of opinions on specific properties or parts (the so-called aspects) of the entity (Liu and Zhang 2012; Pontiki et al. 2016; Popescu and Etzioni 2007).

Liu and Zhang (2012, 4) define opinion as a five tuple (ei, aij, sijkl, hk, tl), where ei is the name of an entity to which the opinion relates, aij is an aspect (part or characteristic) of ei, sijkl is the sentiment regarding the entity and its aspect, hk is the author of the opinion (opinion holder), and tl is the time when the opinion is expressed by hk. The sentiment sijkl may be positive, negative, or neutral, or may be expressed with varying degrees of intensity that is measured, for example on a scale of 1–5.

In this chapter, we first describe the problems that can be encountered in automatic sentiment analysis. Then, we briefly consider the main methods to conduct sentiment analysis and approaches to creating sentiment vocabularies. Finally, Russian-specific components of automatic sentiment analysis are described, including publicly available vocabularies and sentiment-related shared tasks.

2 Problems in Sentiment Analysis

If we ask native speakers what the most significant problems in sentiment analysis would be, the respondents often name irony and sarcasm. Certainly, the difficulties with these language phenomena really exist but problems of automatic sentiment analysis are much more diverse. In what follows, six additional challenges of sentiment analysis are presented.

2.1 Multiple Opinions in a Single Text

Approaches to extracting the main components of opinion largely depend on the genre of the analyzed text. One of the most studied genres of text in the task of sentiment analysis are user reviews on products or services. Such texts usually consider a single entity (but, perhaps in its different aspects), and the opinion is expressed by one author, namely the reviewer (Pang et al. 2002; Pang and Lee 2008; Liu 2012).

Another popular type of texts for sentiment extraction is Twitter messages (Pak and Paroubek 2010; Rosenthal et al. 2017; Loukachevitch and Rubtsova 2016). Tweets (Twitter posts) were limited to 140 symbols before 2017, when they were extended to 280 characters. Such short texts often require precise sentiment analysis but most of them mention the only opinion target and opinion holder (for more on Twitter analysis, see Chap. 30). The following tweet shows an example of a negative attitude toward Russian phone company Megaphone, presented in sarcastic form, which requires the use of sophisticated methods to reveal the correct attitude:

Megafon, spasibo tebe za zablokirovannye uvedomleniâ ot Rajffajzena

[Megaphone, hank you for the blocked notifications from Raiffeisen]

It can seem that in longer texts the author’s opinion can be repeated several times in different ways, which would facilitate the analysis. However, long texts may include various entities and related sentiments (Choi et al. 2016; Loukachevitch and Rusnachenko 2018) and they may mention opinions of different persons. If the task is to find an attitude toward the entities mentioned, then the problem of determining the scope of the sentiments arises. For example, sentiment extraction is often carried out in relation to an entity mentioned in the same sentence. However, the author can refer to an entity using the means of reference, for example, pronouns. In addition, if the entire text is devoted to the discussion of one entity, then it can be explicitly mentioned far from the sentiment location (Ben-Ami et al. 2014).

In such document genres as news texts, or especially analytical texts, many opinions from different sources can be simultaneously mentioned. These texts contain opinions conveyed by different subjects, including the author(s)’ attitudes, the positions of cited sources, and the relations of the mentioned entities to each other. Analytical texts usually contain a lot of named entities, and only a few of them are subjects or objects of a sentiment attitude (Loukachevitch and Rusnachenko 2018). It is clear that in texts with multiple subjects and/or objects of opinion, the complexity of high-quality automatic analysis of sentiment increases manifold.

2.2 Implicit vs. Explicit Sentiment

It is usually assumed that sentiment is expressed using specialized sentiment words (such as good, bad, awful), which is an explicit way of conveying attitudes. However, sentiment can be expressed also implicitly with the so-called sentiment facts (Liu 2012; Loukachevitch and Levchik 2016; Tutubalina 2015) or words with connotations (Feng et al. 2013).

According to the definition provided by Liu (2012, 26), an implicit opinion is an objective statement, from which the sentiment follows, that is, an implicit opinion that conveys a desirable or undesirable fact. In preparation of datasets for testing sentiment analysis systems, such sentiment facts can be specifically annotated (Loukachevitch et al. 2015; Nozza et al. 2017). For example, Russian restaurant reviews may include such sentences as: “Dolgo ždali” (Waited for a long time) or “Našli muhu v supe” (Found a fly in the soup), which, on the one hand, describe what happened (report real facts), but on the other hand convey sentiment.

Connotation is a feeling or idea that is suggested by a particular word, although it need not be a part of the word’s meaning. Connotations often convey positive or negative sentiment (Feng et al. 2013). The appearance of words with positive or negative connotations in a text correlates with the corresponding sentiment expressed in the text. For example, in movie reviews, names of famous actors usually have positive connotations. In restaurant reviews, the noun muha (fly) is associated with a negative sentiment in different contexts, for example:

No sil’no dulo ot okna, pri ètom letala nazojlivaâmuhai ne hvatalo oficiantov [But there was a strong draft from the window, while an annoying fly was flying around and there were not enough waiters].

Prišli v kafe na Ozernoj, oficiantku ele doždalis’ ležalamuhamertvaâ na stole [Went to the cafe on Lake street, barely waited for the waitress, there was a dead fly on the table].

An interesting example of a word with specific connotations in Russian restaurant reviews is the word majonez (mayonnaise). Many sources indicate majonez as a key component of Soviet and Russian cuisine (Shearlaw 2014; Whalley 2018). However, when mentioned in contemporary Russian restaurant reviews, this word usually conveys negative sentiment, for example:

Absolûtno vse salaty soderžatmajonez, pričem ego vezde mnogo [Absolutely all salads contain mayonnaise, and lots of it in everything].

Edinstvennye teplye rolly byli tâželovaty vvidu naličiâ v nihmajoneza [The only hot rolls were heavy due to the presence of mayonnaise].

In news and analytical texts, we can find a lot of words with international negative connotations such as war, unemployment, segregation, or traffic jam. Positive connotations are often associated with achievements of a nation. For example, in Russia positive connotations are associated with cosmos-related concepts such as sputnik, Yuri Gagarin, or MKS (International Space Station).

Gradual adjectives (such as long − short, large − small, etc.) can often convey sentiment facts but their sentiment orientation is very dependent of the context (Cambria et al. 2010). For example, the word long can be both negative and positive in the digital camera domain: if it has a long battery life, it means the battery is good; if you need to adjust the focus for a long time, then the opinion about the camera is negative.

Because of the existence of implicit sentiments and connotations, it is impossible to create general sentiment lexicons, which can be equally useful across many domains. It is, therefore, necessary to develop specialized sentiment lexicons using domain-specific text collections or update existing general lexicons to adapt them to a specific domain. (Hamilton et al. 2016; Severyn and Moschitti 2015; Chetviorkin and Loukachevitch 2012).

2.3 Ambiguity of Sentiment Words

Difficulties with the interpretation of explicit sentiment vocabulary may also arise. Sentiment words can be ambiguous: in one sense, they can be neutral, while in other senses they are negative or positive (Akkaya et al. 2009; Baccianella et al. 2010). For example, the Russian word presnyj (fresh) bears a positive connotation in the phrase presnaâ voda (freshwater), while in other senses of the word presnyj (tasteless for food and uninteresting as in movie reviews) this word is negative.

A word can change or lose its polarity depending on the subject area or the current context. For example, the Russian sentiment words verolomstvo (treachery) and predatel’stvo (betrayal) cannot be considered as conveying an opinion in movie reviews, because they are usually mentioned in a movie synopsis to retell the plot of a movie. The word smešnoj (funny), most likely, is negative in the sphere of politics, yet indicates a positive orientation when it is used in reviews of comedies. When characterizing other movie genres, this word can be both positive and negative.

2.4 Sentiment Modifiers

The appearance of sentiment words in the text may be accompanied by sentiment modifiers that enhance (for example, much, more), reduce (too, less) or inverse prior word sentiment (negation: no, not). Thus, when analyzing the sentiment, such modifiers should be taken into account, and it is necessary to have some numerical model that modifies the original polarities of the word (Taboada et al. 2011; Wilson et al. 2005; Wiegand et al. 2010). One of the common models of accounting for polarity modifiers ascribes some coefficients to them, which are considered as factors modifying the initial polarity of the words to which these modifiers relate.

Another important issue is determining the scope of the polarity modifier in a particular sentence (Taboada et al. 2011). Most approaches suppose that polarity modifiers, such as negation, modify sentiment of neighbor words, but long-distance influence is also possible. For example, in the sentence “Â ne dumaû, čto èto zasluživaet upominaniâ” (I do not think it is worth mentioning), the negation changes the sentiment orientation of the word zasluživaet (worth) from positive to negative.

If negation stands before several sentiment-bearing words, it is important to calculate the overall sentiment of the whole group and then to apply negation to it. In the following sentence, we see the phrase “ne boitsâ raskola” (is not afraid of a split), where negation stands before two words with negative sentiment. To obtain the positive mood of the sentence, it is necessary to determine the sentiment of the phrase as negative and then to apply negation to it:

Sekretar’ prezidiuma gensoveta “Edinoj Rossii,” zampredsedatelâ Gosdumy Sergej Neverov v subbotu zaâvil, čto partiâ ne boitsâ raskola v svâzi s poâvleniem v nej raznyh ideologičeskih platform [Secretary of the Presidium of the General Council of “United Russia,” Deputy Chairman of the State Duma Sergei Neverov, on Saturday stated that the party is not afraid of a split in connection to the appearance of various ideological platforms within it].

Polarity modifiers can also form groups such as double negation. We can see such double negation ne bez in well-known Russian proverb “V sem’ye ne bez uroda,” which translates into English without any negations: “Every family has its black sheep.” In this example, we see negative sentiment as if negation coefficients were multiplied.

2.5 Factors of Irreal Context

When analyzing the sentiment, it is important to consider how a proposition conveying sentiment corresponds to reality (Saurí and Pustejovsky 2012; Taboada et al. 2011; Wilson et al. 2005). For example, in the sentence “My nadeâlis’, čto nam ponravitsâ kino” (We hoped that we would like the movie), one can see the positive word ponravitsâ (like), but it says nothing about whether the author really liked the movie.

In linguistics, this is covered by the concept of irreals or irreal mood, which is a group of grammatical means that is used to denote that what is said in a sentence does not refers to what really happens (Taboada et al. 2011). In every language, there are some factors showing that the proposition is not factual (the so-called irrealis markers). In Russian modal verbs, private-state verbs, such as nadeât’sâ (to hope), ožidat’ (to expect), dumat’ (to think), can be used as such markers.

According to Kuznetsova et al. (2013, 72), for the Russian language, such function words as esli, by, li, esli by also often mark the irrealis mood. When selecting parameters on the training set, Kuznetsova et al. (2013, 72) indicated that the prior sentiment scores of sentiment words found in the sentences with irrealis markers should be decreased (but not nullified).

2.6 Comparisons

Comparisons complicate the process of determining sentiment, because additional entities are mentioned in the text, and some sentiments can refer to them. It is often supposed that comparisons are conveyed with the so-called comparative constructions such as lučše čem (better than) or dorože čem (more expensive than). In most cases, comparisons may be introduced without any specialized constructions. Additional entities mentioned for comparison are sometimes very difficult to detect, and it can also be a complex task to single out the attitudes related to them. For example, in the following extract from a restaurant review, the comparison is marked with word drugoy (another), and positive words naslaždalis’ (enjoyed) and volšebnym (wonderful) characterize a restaurant distinct from the restaurant under review (example from Loukachevitch et al. 2015, 8):

My rešili ne brat’ zdes’ desert i kofe, a pošli v drugoj restoran, gde naslaždalis’ volšebnym zaveršeniem našego večera (We decided not to have dessert and coffee there, but instead went to another restaurant where we enjoyed a wonderful end to our evening).

2.7 Irony and Sarcasm

The processing of irony and sarcasm is a serious problem for sentiment analysis systems, since the sentiment of an ironic (sarcastic) utterance differs from its literal sentiment (Wilson and Sperber 2007). In Benamara et al. (2017, 37), a generalized understanding of irony is proposed as “an incongruity between the literal meaning of an utterance and its intended meaning.” Most often, a positive-looking statement (containing more positive sentiment words or an equal number of positive and negative words) hides a negative opinion, for example, “Sberbank—naibolee krupnaâ set’ nerabotaûŝih bankomatov v Rossii” (Sberbank is the largest network of nonoperating ATMs in Russia). Sarcasm is regarded as a sharper, more aggressive, possibly degrading form of irony (Benamara et al. 2017).

The annotation of textual data for the study of irony and sarcasm is a complex task. Interesting data for analyzing these phenomena are Twitter messages that the user can mark with special hashtags: #irony, #sarcasm and some others (Reyes et al. 2013; Sulis et al. 2016). However, recent studies of irony in Twitter show that ironic tweets marked with hashtags and annotated by experts have different characteristics (Kunneman et al. 2015). In addition, in the Russian segment of Twitter, users do not use similar Russian hashtags in the same way as American or European audience (Zefirova and Loukachevitch 2019, 48). The “ironiâ” (irony) hashtag is mostly used as a description for images or jokes alongside with such hashtags as “#šutka” (joke), “#smeh” (laugh) and does not seems to express the desired content.

3 Methods and Resources Used in Sentiment Analysis

Automatic analysis of sentiment can utilize two main types of approaches (Liu 2012; Pang and Lee 2008): knowledge-based methods using sentiment lexicons and rules (Taboada et al. 2011; Kuznetsova et al. 2013) and approaches based on machine learning (Liu 2012; Pang and Lee 2008). Knowledge-based methods require the creation of a specialized sentiment lexicon for a specific domain. Linguistic rules are necessary to sum up sentiment scores of several sentiment words and for accounting for the word context (sentiment modifiers, irreal context, etc.).

Supervised machine learning requires preliminary annotation of a training collection. Depending on the task, different classification algorithms, features of the text representation, and feature weights can be chosen (Pang et al. 2002; Pang and Lee 2008; Liu 2012). Currently, the best results in machine-learning sentiment analysis are achieved by deep learning with neural networks (Rosenthal et al. 2017; Cliché 2017; Arkhipenko et al. 2016), which substituted a previous leader: Support vector machine (SVM) classifier (Pang et al. 2002; Pang and Lee 2008; Chetviorkin and Loukachevitch 2013).

At present, there exist also approaches that integrate available sentiment vocabularies (both manually created and automatically generated) into machine learning methods, transforming them into specialized features (Rosenthal et al. 2017; Mohammad et al. 2013; Loukachevitch and Levchik 2016). The use of preliminary created lexicons helps to overcome data sparsity of training collections (Loukachevitch and Rubtsova 2016). Below, we consider some approaches to creating sentiment lexicons and publicly available Russian lexicons.

Most sentiment vocabularies look like lists of words and expressions with scores of their sentiment (Wilson et al. 2005). Some vocabularies also provide additional characteristics of the word sentiment called “strength.” Sentiment scores can also be assigned to specific senses of ambiguous words (Baccianella et al. 2010; Loukachevitch and Levchik 2016).

For many languages, general sentiment vocabularies have been published. Despite the fact that in each particular domain, specialized vocabularies are needed, general lexicons are also useful since they can serve as source material, which can then be adapted to a domain. Domain-specific sentiment vocabularies are usually generated with automatic or semiautomatic methods using domain-specific text collections (Hamilton et al. 2016; Severyn and Moschitti 2015; Chetviorkin and Loukachevitch 2012).

For Russian, Chetviorkin and Loukachevitch (2012) have described an automatically generated Russian sentiment lexicon in the domain of products and services called ProductSentiRus (ProductSentiRus 2012). The ProductSentiRus lexicon is obtained by applying a supervised machine-learning model to several domain review collections. It is presented as a list of 5000 words ordered by the decreased probability of their sentiment orientation without any positive or negative labels. For example, the most probable sentiment words in ProductSentiRus are as follows: bespodobnyj (peerless), nevnâtnyj (slurred), obaldennyj (awesome), otvratnyj (disgusting), et cetera.

The general Russian lexicon of sentiment words and expressions, RuSentiLex (RuSentiLex 2017), was created in a semiautomatic way (Loukachevitch and Levchik 2016). The entries of the RuSentiLex lexicon are classified according to four sentiment categories (positive, negative, neutral, or positive/negative) and three sources of sentiment (opinion, emotion, or fact). The words in the lexicon that have different sentiment scores in different senses are linked to the appropriate concepts of the thesaurus of the Russian-languageRuThes (Loukachevitch and Dobrov 2014; RuThes 2016), which can help disambiguate sentiment ambiguity in specific domains or contexts. The lexicon was gathered from several sources: opinionated words from general Russian thesaurusRuThes, slang and curse words extracted from Twitter, and objective words with positive or negative connotations from a news collection (Loukachevitch and Levchik 2016; for more on RuThes, see Chap. 18). For example, the description of word presnyj in RuSentiLex is as follows (labels in quotes correspond to the names of RuThes concepts):

  • presnyj, Adj, presnyj, negative, emotion, “NEVKUSNYJ” [tasteless];

  • presnyj, Adj, presnyj, negative, opinion, “NEINTERESNYJ” [insipid];

  • presnyj, Adj, presnyj, positive, fact, “PRESNAÂ VODA” [fresh water]

The Russian sentiment lexicon LINIS Crowd was created via crowdsourcing (Koltsova et al. 2016; LINIS Crowd SENT 2016). The lexicon is aimed at detecting sentiment in user-generated content (blogs, social media) related to social and political issues. Each word was assessed by at least three volunteers in the context of different texts. The words were scored from −2 (negative) to +2 (positive). For example, the word anarhizm (anarchism) obtained three 0 (neutral) scores and three −1 (weakly negative) scores in the considered contexts.

Several international lexicons were automatically constructed for Russian. The Chen-Skiena’s lexicon (2876 words) (Chen and Skiena 2014; Chen-Skiena’s Lexicon 2014) was generated for 136 languages via graph propagation from seed words. However, from the human point of view, the words included in this automatically generated lexicon seem extremely strange. For example, positive words in the Chen-Skiena’s Lexicon include such words as tipa (type of), post (post), sootvetstvenno (correspondingly), sovsem (at all), et cetera.

Mohammad and Turney (2013) generated the Russian variant of the EmoLex lexicon (EmoLex 2017) with automatic translation from the English lexicon obtained by crowdsourcing (4412 Russian words).

Kotelnikov et al. (2018) studied available Russian sentiment lexicons and found that all the lexicons have relatively small intersection with each other. Besides, the translated lexicons (EmoLex and Chen-Skiena’s lexicon) have a smaller intersection with other lexicons than on average (10.0%), and at the same time are relatively similar to each other (18.2%). Kotelnikov et al. (2018) also compared available Russian lexicons as features in machine-learning text categorization. Users’ reviews from five domains (books, movies, banks, hotels, and kitchens) were used as text collections for the experiments. The study found that the best results of classification using a single lexicon in all domains were obtained with ProductSentiRus (Chetviorkin and Loukachevitch 2012). The union of all lexicons gives slightly better results.

As was mentioned, useful sentiment lexicons should be fine-tuned or constructed specially for the domain under analysis. Therefore, to apply sentiment lexicons in a specific domain, it is recommended to gather all available lexicons and to collect a domain-specific text collection as large as possible. Having such data, it is possible to filter out sentiment words and constructions relevant in the domain.

4 Russian Sentiment-Related Shared Tasks

For the evaluation of Russian sentiment analysis systems, several shared tasks have been organized. In 2011–2013, two evaluations of document-level sentiment approaches were carried out. Two types of text collections were used for the evaluation: users’ reviews in three domains (movies, books, and digital cameras) and news quotations (Chetviorkin and Loukachevitch 2013).

For training in the review track, users’ reviews were collected from recommendation services (Imhonet and Yandex.market). The reviews had users’ scores on a ten-point scale for the Imhonet reviews (movies and books) and on a five-point scale for the Yandex reviews (digital cameras). The participants could choose any of tracks classifying reviews into two, three, or five classes. The reviews for the test collections were extracted from social network messages. The sentiment annotation was created manually by human experts. The participants utilized various machine-learning and knowledge-based approaches, but the best methods in all review-related tasks were SVM-based classifiers.

In the quotation track, direct or indirect speech fragments extracted from news reports had to be classified into three classes (positive, negative, or neutral). About 5000 fragments each were prepared for the training and test collection. Both collections were annotated manually; therefore, the size of the training collection was much smaller than for the review task. In this quotation task, the knowledge-based approaches showed the best results (Chetviorkin and Loukachevitch 2013).

The second series of Russian sentiment analysis evaluations (SentiRuEval 2014–2016) was devoted to the entity-oriented and aspect-based tasks of sentiment analysis. Namely, the tasks included aspect-based analysis of reviews in two domains (car and restaurant reviews). Using the prepared collections, Russian training and test datasets were further utilized in the international SemEval aspect-based sentiment evaluation in 2016 (Pontiki et al. 2016; ABSA SemEval-2006 2016).

The entity-oriented task was based on Twitter messages. The participants were asked to classify messages into three classes from the point of view of reputation monitoring (positive, negative, or neutral) in two separate domains: banks and mobile operators. For example, positive tweets could contain a positive opinion or positive fact about the company. The training and test collections were prepared via crowdsourcing (SentiRuEval-2016 data 2016).

The approaches of the participants for the Twitter sentiment analysis differed significantly in 2015 and 2016 (Loukachevitch and Rubtsova 2016). In 2015, the basic approach was the SVM classifier trained on only the training collection without any additional data (unlabeled text collections or sentiment lexicons). Due to this, the participating systems could make mistakes in the classification of the test tweets if a tweet contained sentiment words absent in the training dataset (Loukachevitch and Rubtsova 2016).

In 2016, the best approach was based on neural networks, which used word embeddings (vector representations of words) calculated on a large collection of user comments (Arkhipenko et al. 2016). Such representations allowed the winner to overcome the differences in the training and test collections because words that have semantic similarity also have similar vector representations. The next most successful approaches in terms of the quality of results combined machine learning and the existing Russian lexicons (Loukachevitch and Rubtsova 2016).

5 Conclusion

Automatic sentiment analysis of texts is among the popular applications in natural language processing of texts. In this chapter, we described the problems that can be encountered in automatic sentiment analysis. Then, we briefly considered the main methods for sentiment analysis and approaches to creating sentiment vocabularies. Finally, Russian-specific components of automatic sentiment analysis—publicly available vocabularies and sentiment-related shared tasks—were presented.

The current state of affairs in sentiment analysis (including in its application to the Russian language) can be characterized as follows: approaches to sentiment analysis of some text genres, such as user reviews or short posts on social networking sites, are well studied, but there are a lot of complicated phenomena in sentiment analysis that require further research, especially in the processing of full-text news and analytical articles.

From the practical point of view, there are at least four Russian sentiment vocabularies currently available on the Internet. To apply sentiment lexicons in a specific domain, it is recommended to gather all available lexicons and to collect a domain-specific text collection as large as possible. Having such data, it is possible to filter out sentiment words and constructions that are relevant in the domain.