Skip to main content
Log in

Using EmotiBlog to annotate and analyse subjectivity in the new textual genres

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Thanks to the increasing amount of subjective data on the Web 2.0, tools to manage and exploit such data become essential. Our research is focused on the creation of EmotiBlog, a fine-grained annotation scheme for labelling subjectivity in non-traditional textual genres. We also present the EmotiBlog corpus; a collection of blog posts composed by 270,000 tokens about 3 topics and in 3 languages: Spanish, English and Italian. Additionally, we carry out a series of experiments focused on checking the robustness of the model and its applicability to Natural Language Processing tasks with regards to the 3 languages. The experiments for the inter-annotator agreement, as well as for feature selection, provided satisfactory results, which have given an impetus to continue working with the model and extend the annotated corpus. In order to check its applicability, we tested different Machine Learning models created using the annotation in EmotiBlog on other corpora in order to see if the obtained annotation is domain and genre independent, obtaining positive results. Finally, we also applied EmotiBlog to Opinion Mining, proving that our resource allows an improvement the performance of systems built for this task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Artstein R, Poesio M (2008) Inter-coder agreement for computational linguistics (survey article). Comput Linguist 34(4): 555–596

    Article  Google Scholar 

  • Balahur A, Montoyo A (2008) Applying a culture dependent emotion triggers database for text valence and emotion classification. Procesamiento del Lenguaje Natural 40(40)

  • Balahur A, Montoyo A (2009) Semantic approaches to fine and coarse-grained feature-based opinion mining. In: Proceedings of the international conference on application of natural language to information systems, NLDB

  • Balahur A, Montoyo A (2010) OpAL: applying opinion mining techniques for the disambiguation of sentiment ambiguous adjectives in SemEval-2 evaluation exercises on semantic evaluation SemEval-2 task 18. Cophenagen, Sweden

  • Balahur A, Boldrini E, Montoyo A, Martínez-Barco P (2009a) Opinion and generic question answering systems: a performance analysis. In: Proceedings of ACL, 2009, Singapore

  • Balahur A, Boldrini E, Montoyo A, Martínez-Barco P (2009b) A Comparative study of open domain and opinion question answering systems for factual and opinionated queries. In: Proceedings of RANLP 2009

  • Balahur A, Boldrini E, Montoyo A, Martínez- Barco P (2009c) Cross-topic opinion mining for real-time human-computer interaction. In: Proceedings of the workshop on natural language and cognitive science, NLPCS, 2009

  • Balahur A, Boldrini E, Montoyo A, Martínez-Barco P (2010a) Opinion question answering: towards a unified approach. In: Proceedings of the ECAI conference

  • Balahur A, Boldrini E, Montoyo A, Martínez-Barco P (2010b) A unified proposal for factoid and opinionated question answering. In: Proceedings of the COLING conference

  • Balahur A, Boldrini E, Montoyo A, Martínez-Barco P (2010c) The OpAL system at NTCIR 8 MOAT. In: Proceedings of the NTCIR 8 MOAT conference, Tokyo, Japan

  • Balahur A, Steinberger R, Kabadjov M, Zavarella V, Van der Goot E, Halkia M, Pouliquen B, Belyaeva J (2010d) Sentiment analysis in the news. In: Proceedings of the 7th international conference on language resources and evaluation (LREC’2010), Valletta, Malta, 19–21 May 2010, pp 2216–2220

  • Boldrini E, Balahur A, Martínez-Barco P, Montoyo A (2009a) EmotiBlog: a fine-grained model for emotion detection in non-traditional textual. In: Proceedings of WOMSA 2009, Seville, Spain

  • Boldrini E, Balahur A, Martínez-Barco P, Montoyo A (2009b) EmotiBlog: an annotation scheme for emotion detection and analysis in non-traditional textual genres. In: Proceedings of the 5th international conference on data mining. Las Vegas, Nevada, USA

  • Boldrini E, Balahur A, Martínez-Barco P, Montoyo A (2010c) EmotiBlog: a finer-grained and more precise learning of subjectivity expression models. In: Proceedings of the fourth linguistic annotation workshop, association of computational linguistics, Copenhagen, Sweden

  • Carletta J (1996) Assessing agreement on classification tasks: the kappa statistic CoRR cmp lg/9602004

  • Cerini S, Compagnoni V, Demontis A, Formentelli M, Gandini G (2007) Micro-wnop: a gold standard for the evaluation of auto-matically compiled lexical resources for opinion mining. Franco Angeli Editore, Milano

    Google Scholar 

  • Chaovalit P, Zhou L (2005) Movie review mining: a comparison between supervised and unsupervised classification approaches. In: Proceedings of HICSS-05, the 38th Hawaii international conference on system sciences

  • Choi Y, Cardie C, Riloff E, Patwardhan S (2005) Identifying sources of opinions with conditional random fields and extraction patterns. In: Proceedings of HLT/EMNL

  • Cohen J (1960) A coefficient of agreement for nominal scales. Edu Psychol Meas 20(1): 37–46

    Article  Google Scholar 

  • Craggs R, Wood MM (2005) Evaluating discourse and dialogue coding schemes. Comput Linguist 31(3): 289–296

    Article  Google Scholar 

  • Cui H, Mittal V, Datar M (2006) Comparative experiments on sentiment classification for online product reviews. In: Proceedings of the 21st national conference on artificial intelligence, AAAI

  • Dave K, Lawrence S, Pennock D (2003) Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of WWW-03

  • Esuli A, Sebastiani F (2006) SentiWordnet: a publicly available resource for opinión mining. In: Proceedings of the 6th international conference on language resources and evaluation

  • Gamon M (2004) Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis. In: Proceedings of COLING-04, the 20th international conference on computational linguistics, Geneva, CH, pp 841–847

  • Gamon M, Aue S, Corston-Oliver S, Ringger E (2005) Mining customer opinions from free text. Lecture notes in computer science

  • Goldberg AB, Zhu J (2006) Seeing stars when there aren’t many stars: graph-based semi-supervised learning for sentiment categorization. In: HLT-NAACL 2006 workshop on textgraphs: graph-based algorithms for natural language processing

  • Hatzivassiloglou V, Wiebe J (2000) Effects of adjective orientation and gradability on sentence subjectivity. In: Proceedings of COLING 2000

  • Hu M, Liu B (2004) Mining opinion features in customer reviews. In: Proceedings of nineteenth national conference on artificial intellgience AAAI-2004

  • Kim SM, Hovy E (2004) Determining the sentiment of opinions. In: Proceedings of COLING 2004

  • Lesk M (1986) Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: proceedings of the SIGDOC conference 1986

  • Lewis D, Gale W (1994) A sequential algorithm for training text classifiers. In: Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval. Springer, New York

  • Liu B (2007) Web data mining. Exploring hyperlinks, contents and usage data, 1st edn. Springer, New York

  • Mathieu J (2005) Annotation of emotions and feeling in texts. Affectve computing and intelligent interaction. Bejing, China

    Google Scholar 

  • Mullen T, Collier M (2004) Sentiment analysis using support vector machines with diverse information sources. In: Proceedings of EMNLP

  • Ng V, Dasgupta S, Arifin SM (2006) Examining the role of linguistic knowledge sources in the automatic identification and classification of reviews. In: Proceedings 40th annual meeting of the association for computational linguistics

  • Pang B, Lee L (2003) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd annual meeting of the ACL, pp 115–124

  • Paquet S (2003) Personal Knowledge publishing and its uses in research. Knowledge Board, 10 January

  • Riloff E, Wiebe J (2003) Learning extraction patterns for subjective expressions. In: Proceedings of the 2003 conference on empirical methods in natural language processing

  • Russell JA (1983) Pancultural aspects of the human conceptual organization of emotions. J Person Soc Psychol 45: 1281–1288

    Article  Google Scholar 

  • Salton G, Lesk ME (1971) Computer evaluation of indexing and text processing. Prentice Hall, Englewood Cliffs, pp 143–180

  • Scherer K (2005) What are emotions? and how can they be measured? Soc Sci Inf 3(44)

  • Scherer K, Wallbott HG (1997) The ISEAR questionnaire and codebook

  • Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv (CSUR) 34: 1–47

    Article  MathSciNet  Google Scholar 

  • Somasundaran S, Wilson T, Wiebe J, Stoyanov V (2007) Qa with attitude: exploiting opinion type analysis for improving question answering in on-line discussions and the news. In: Proceedings of the international conference on weblogs and social media, ICWSM

  • Somasundaran S, Wiebe J, Ruppenhofer J (2008) Discourse level opinion interpretation. In: The 22nd international conference on computational linguistics (COLING)

  • Stoyanov V, Cardie C (2006) Toward opinión summarization: linking the sources. In: Proceedingns of the COLINGACL 2006 workshop on sentiment and subjectivity in text

  • Stoyanov V, Cardie C, Wiebe J (2005) Multiperspective question answering using the opqa corpus. In: Proceedings of the human language technology conference and the conference on empirical methods in natural language processing (HLT/EMNLP)

  • Strapparava C, Milhacea R (2007) SemEval-2007 task 14: affective text

  • Strapparava C, Valitutti A (2004) WordNet-Affect: an affective extension of WordNet. In: Proceedings of the 4th international conference on language resources and evaluation, LREC, Lisbon, May 2004, pp 1083–1086

  • Turney P (2002) Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings 40th annual meeting of the association for computational linguistics

  • Turney P, Littman M (2003) Measuring praise and criticism: inference of semantic orientation from association. ACM Trans Inf Syst 21(4):315–346

    Google Scholar 

  • Vapnik V (1995) The nature of statistical learning theory. Springer, New York

    MATH  Google Scholar 

  • Wiebe J, Mihalcea R (2006) Word sense and subjectivity. In: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics. Sydney, Australia, July, pp 1065–1072

  • Wiebe J, Riloff E (2006) Creating Subjective and Objective Sentence Classifiers from Unannotated Texts. In: Proceedings of the 6th international conference on computational linguistics and intelligent text processing (CICLing-05)

  • Wiebe J, Wilson T (2005) Annotating attribution and private states. In: Proceedings of the ACL Workshop on frontiers in corpus annotation II: Pie in the Sky

  • Wiebe J, Wilson T, Cardie C (2005) Annotating expressions of opinions and emotions in language. Lang Resour Eval 1:2165–210

    Google Scholar 

  • Wilson T, Wiebe J, Hwa R (2004a) Just how mad are you? Finding strong and weak opinion clauses. In: Proceedings of AAAI

  • Wilson T, Wiebe J, Hwa R (2004b) Just how mad are you? Finding strong and weak opinion clauses. In: Proceedings of AAAI 2004

  • Wilson T, Wiebe J, Hoffmann P (2005) Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of HLT-EMNLP 2005

  • Yang Y, Pedersen J (1997) A comparative study on feature selection in text categorization. In: Proceedings of ICML-97, 14th international conference on machine learning

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ester Boldrini.

Additional information

Responsible editor: Fei Wang, Hanghang Tong, Phillip Yu, Charu Aggarwal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Boldrini, E., Balahur, A., Martínez-Barco, P. et al. Using EmotiBlog to annotate and analyse subjectivity in the new textual genres. Data Min Knowl Disc 25, 603–634 (2012). https://doi.org/10.1007/s10618-012-0259-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-012-0259-9

Keywords

Navigation