Abstract
Thanks to the increasing amount of subjective data on the Web 2.0, tools to manage and exploit such data become essential. Our research is focused on the creation of EmotiBlog, a fine-grained annotation scheme for labelling subjectivity in non-traditional textual genres. We also present the EmotiBlog corpus; a collection of blog posts composed by 270,000 tokens about 3 topics and in 3 languages: Spanish, English and Italian. Additionally, we carry out a series of experiments focused on checking the robustness of the model and its applicability to Natural Language Processing tasks with regards to the 3 languages. The experiments for the inter-annotator agreement, as well as for feature selection, provided satisfactory results, which have given an impetus to continue working with the model and extend the annotated corpus. In order to check its applicability, we tested different Machine Learning models created using the annotation in EmotiBlog on other corpora in order to see if the obtained annotation is domain and genre independent, obtaining positive results. Finally, we also applied EmotiBlog to Opinion Mining, proving that our resource allows an improvement the performance of systems built for this task.
Similar content being viewed by others
References
Artstein R, Poesio M (2008) Inter-coder agreement for computational linguistics (survey article). Comput Linguist 34(4): 555–596
Balahur A, Montoyo A (2008) Applying a culture dependent emotion triggers database for text valence and emotion classification. Procesamiento del Lenguaje Natural 40(40)
Balahur A, Montoyo A (2009) Semantic approaches to fine and coarse-grained feature-based opinion mining. In: Proceedings of the international conference on application of natural language to information systems, NLDB
Balahur A, Montoyo A (2010) OpAL: applying opinion mining techniques for the disambiguation of sentiment ambiguous adjectives in SemEval-2 evaluation exercises on semantic evaluation SemEval-2 task 18. Cophenagen, Sweden
Balahur A, Boldrini E, Montoyo A, Martínez-Barco P (2009a) Opinion and generic question answering systems: a performance analysis. In: Proceedings of ACL, 2009, Singapore
Balahur A, Boldrini E, Montoyo A, Martínez-Barco P (2009b) A Comparative study of open domain and opinion question answering systems for factual and opinionated queries. In: Proceedings of RANLP 2009
Balahur A, Boldrini E, Montoyo A, Martínez- Barco P (2009c) Cross-topic opinion mining for real-time human-computer interaction. In: Proceedings of the workshop on natural language and cognitive science, NLPCS, 2009
Balahur A, Boldrini E, Montoyo A, Martínez-Barco P (2010a) Opinion question answering: towards a unified approach. In: Proceedings of the ECAI conference
Balahur A, Boldrini E, Montoyo A, Martínez-Barco P (2010b) A unified proposal for factoid and opinionated question answering. In: Proceedings of the COLING conference
Balahur A, Boldrini E, Montoyo A, Martínez-Barco P (2010c) The OpAL system at NTCIR 8 MOAT. In: Proceedings of the NTCIR 8 MOAT conference, Tokyo, Japan
Balahur A, Steinberger R, Kabadjov M, Zavarella V, Van der Goot E, Halkia M, Pouliquen B, Belyaeva J (2010d) Sentiment analysis in the news. In: Proceedings of the 7th international conference on language resources and evaluation (LREC’2010), Valletta, Malta, 19–21 May 2010, pp 2216–2220
Boldrini E, Balahur A, Martínez-Barco P, Montoyo A (2009a) EmotiBlog: a fine-grained model for emotion detection in non-traditional textual. In: Proceedings of WOMSA 2009, Seville, Spain
Boldrini E, Balahur A, Martínez-Barco P, Montoyo A (2009b) EmotiBlog: an annotation scheme for emotion detection and analysis in non-traditional textual genres. In: Proceedings of the 5th international conference on data mining. Las Vegas, Nevada, USA
Boldrini E, Balahur A, Martínez-Barco P, Montoyo A (2010c) EmotiBlog: a finer-grained and more precise learning of subjectivity expression models. In: Proceedings of the fourth linguistic annotation workshop, association of computational linguistics, Copenhagen, Sweden
Carletta J (1996) Assessing agreement on classification tasks: the kappa statistic CoRR cmp lg/9602004
Cerini S, Compagnoni V, Demontis A, Formentelli M, Gandini G (2007) Micro-wnop: a gold standard for the evaluation of auto-matically compiled lexical resources for opinion mining. Franco Angeli Editore, Milano
Chaovalit P, Zhou L (2005) Movie review mining: a comparison between supervised and unsupervised classification approaches. In: Proceedings of HICSS-05, the 38th Hawaii international conference on system sciences
Choi Y, Cardie C, Riloff E, Patwardhan S (2005) Identifying sources of opinions with conditional random fields and extraction patterns. In: Proceedings of HLT/EMNL
Cohen J (1960) A coefficient of agreement for nominal scales. Edu Psychol Meas 20(1): 37–46
Craggs R, Wood MM (2005) Evaluating discourse and dialogue coding schemes. Comput Linguist 31(3): 289–296
Cui H, Mittal V, Datar M (2006) Comparative experiments on sentiment classification for online product reviews. In: Proceedings of the 21st national conference on artificial intelligence, AAAI
Dave K, Lawrence S, Pennock D (2003) Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of WWW-03
Esuli A, Sebastiani F (2006) SentiWordnet: a publicly available resource for opinión mining. In: Proceedings of the 6th international conference on language resources and evaluation
Gamon M (2004) Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis. In: Proceedings of COLING-04, the 20th international conference on computational linguistics, Geneva, CH, pp 841–847
Gamon M, Aue S, Corston-Oliver S, Ringger E (2005) Mining customer opinions from free text. Lecture notes in computer science
Goldberg AB, Zhu J (2006) Seeing stars when there aren’t many stars: graph-based semi-supervised learning for sentiment categorization. In: HLT-NAACL 2006 workshop on textgraphs: graph-based algorithms for natural language processing
Hatzivassiloglou V, Wiebe J (2000) Effects of adjective orientation and gradability on sentence subjectivity. In: Proceedings of COLING 2000
Hu M, Liu B (2004) Mining opinion features in customer reviews. In: Proceedings of nineteenth national conference on artificial intellgience AAAI-2004
Kim SM, Hovy E (2004) Determining the sentiment of opinions. In: Proceedings of COLING 2004
Lesk M (1986) Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: proceedings of the SIGDOC conference 1986
Lewis D, Gale W (1994) A sequential algorithm for training text classifiers. In: Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval. Springer, New York
Liu B (2007) Web data mining. Exploring hyperlinks, contents and usage data, 1st edn. Springer, New York
Mathieu J (2005) Annotation of emotions and feeling in texts. Affectve computing and intelligent interaction. Bejing, China
Mullen T, Collier M (2004) Sentiment analysis using support vector machines with diverse information sources. In: Proceedings of EMNLP
Ng V, Dasgupta S, Arifin SM (2006) Examining the role of linguistic knowledge sources in the automatic identification and classification of reviews. In: Proceedings 40th annual meeting of the association for computational linguistics
Pang B, Lee L (2003) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd annual meeting of the ACL, pp 115–124
Paquet S (2003) Personal Knowledge publishing and its uses in research. Knowledge Board, 10 January
Riloff E, Wiebe J (2003) Learning extraction patterns for subjective expressions. In: Proceedings of the 2003 conference on empirical methods in natural language processing
Russell JA (1983) Pancultural aspects of the human conceptual organization of emotions. J Person Soc Psychol 45: 1281–1288
Salton G, Lesk ME (1971) Computer evaluation of indexing and text processing. Prentice Hall, Englewood Cliffs, pp 143–180
Scherer K (2005) What are emotions? and how can they be measured? Soc Sci Inf 3(44)
Scherer K, Wallbott HG (1997) The ISEAR questionnaire and codebook
Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv (CSUR) 34: 1–47
Somasundaran S, Wilson T, Wiebe J, Stoyanov V (2007) Qa with attitude: exploiting opinion type analysis for improving question answering in on-line discussions and the news. In: Proceedings of the international conference on weblogs and social media, ICWSM
Somasundaran S, Wiebe J, Ruppenhofer J (2008) Discourse level opinion interpretation. In: The 22nd international conference on computational linguistics (COLING)
Stoyanov V, Cardie C (2006) Toward opinión summarization: linking the sources. In: Proceedingns of the COLINGACL 2006 workshop on sentiment and subjectivity in text
Stoyanov V, Cardie C, Wiebe J (2005) Multiperspective question answering using the opqa corpus. In: Proceedings of the human language technology conference and the conference on empirical methods in natural language processing (HLT/EMNLP)
Strapparava C, Milhacea R (2007) SemEval-2007 task 14: affective text
Strapparava C, Valitutti A (2004) WordNet-Affect: an affective extension of WordNet. In: Proceedings of the 4th international conference on language resources and evaluation, LREC, Lisbon, May 2004, pp 1083–1086
Turney P (2002) Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings 40th annual meeting of the association for computational linguistics
Turney P, Littman M (2003) Measuring praise and criticism: inference of semantic orientation from association. ACM Trans Inf Syst 21(4):315–346
Vapnik V (1995) The nature of statistical learning theory. Springer, New York
Wiebe J, Mihalcea R (2006) Word sense and subjectivity. In: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics. Sydney, Australia, July, pp 1065–1072
Wiebe J, Riloff E (2006) Creating Subjective and Objective Sentence Classifiers from Unannotated Texts. In: Proceedings of the 6th international conference on computational linguistics and intelligent text processing (CICLing-05)
Wiebe J, Wilson T (2005) Annotating attribution and private states. In: Proceedings of the ACL Workshop on frontiers in corpus annotation II: Pie in the Sky
Wiebe J, Wilson T, Cardie C (2005) Annotating expressions of opinions and emotions in language. Lang Resour Eval 1:2165–210
Wilson T, Wiebe J, Hwa R (2004a) Just how mad are you? Finding strong and weak opinion clauses. In: Proceedings of AAAI
Wilson T, Wiebe J, Hwa R (2004b) Just how mad are you? Finding strong and weak opinion clauses. In: Proceedings of AAAI 2004
Wilson T, Wiebe J, Hoffmann P (2005) Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of HLT-EMNLP 2005
Yang Y, Pedersen J (1997) A comparative study on feature selection in text categorization. In: Proceedings of ICML-97, 14th international conference on machine learning
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Fei Wang, Hanghang Tong, Phillip Yu, Charu Aggarwal.
Rights and permissions
About this article
Cite this article
Boldrini, E., Balahur, A., Martínez-Barco, P. et al. Using EmotiBlog to annotate and analyse subjectivity in the new textual genres. Data Min Knowl Disc 25, 603–634 (2012). https://doi.org/10.1007/s10618-012-0259-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-012-0259-9