Skip to main content

Challenges and solutions in the opinion summarization of user-generated content

Abstract

The present is marked by the influence of the Social Web on societies and people worldwide. In this context, users generate large amounts of data, especially containing opinion, which has been proven useful for many real-world applications. In order to extract knowledge from user-generated content, automatic methods must be developed. In this paper, we present different approaches to multi-document summarization of opinion from blogs and reviews. We apply these approaches to: (a) identify positive and negative opinions in blog threads in order to produce a list of arguments in favor and against a given topic and (b) summarize the opinion expressed in reviews. Subsequently, we evaluate the proposed methods on two distinct datasets and analyze the quality of the obtained results, as well as discuss the errors produced. Although much remains to be done, the approaches we propose obtain encouraging results and point to clear directions in which further improvements can be made.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Notes

  1. http://technorati.com/blogging/article/state-of-the-blogosphere-2009-introduction/

  2. http://alias-i.com/lingpipe/

  3. The degree of importance of each ‘latent’ topic is given by the singular values and the optimal number of latent topics (i.e., dimensions) r can be fine-tuned on training data.

  4. The annotation described was done by two Computer Science graduates. The task they were given was to annotate the sentences that were found in the summaries obtained and that were not found in the Gold Standard, in terms of whether they are relevant to the topic of the post and in terms of whether they express a positive, negative or no opinion.The results presented are for the cases in which they agreed. In case of disagreement, the sentence was simply considered as incorrectly included in the final summary.

  5. The motivation for not considering negation in our system is based on two aspects: 1) The paper “A Survey on the Role of Negation in Sentiment Analysis” by Wiegand et al. (2010) describes the various manners in which negation has been considered in sentiment analysis. One of the important conclusions is that negation, unless used within a method for sentiment analysis that uses syntactic analysis (i.e. which is able to precisely detect the scope of the negation), brings negligible or no improvement over the more simple approach, which does not consider negation. 2) We are using this simple approach in the view of multilinguality. In this context, given the positioning of negation in the different languages and the manner in which it should be considered for each of the languages involved, we must omit to consider negation for the time being (until a thorough analysis is done on how negation should be considered correctly, for various languages).

  6. As far as the word sense disambiguation (WSD) is concerned, work by Akkaya et al. (2009) has shown that WSD, in the sense in which it is understood by the research community (i.e. to assign a WordNet synset to each word) is not useful for subjectivity analysis. Instead, the authors propose “Subjectivity WSD”, by which they aim solely at discriminating among subjective and objective usage of the word synsets. Even so, performing WSD (in the traditional sense) has not been a priority to the sentiment analysis community. The only step that has been proven useful is the shallow disambiguation using part of speech information (Wiegand and Klakow 2010).

  7. Although there are many shortcomings to ROUGE, it remains the only automatic method to measure the performance of summarization systems. While perhaps a more linguistic-quality based evaluation would be more informative, it is more costly and subjective and at this time there is no common agreement on alternative evaluation metrics.

  8. http://www.nist.gov/tac/

  9. We used F1 score instead of recall used at TAC, because the lengths of our model summaries and system summaries are different, that is, model summaries can be longer than the system produced summaries.

  10. Certainly, in order to use gold polarity alongside the score produced by the sentiment analysis tool as we do, we had to firstly automatically align all the automatically identified sentences with the annotated comments.

  11. We note, however, that the results on our corpus are not directly comparable with those of TAC08, since the data sets are different and the tasks involved are significantly distinct.

  12. Blog posts in our corpus were annotated as important with respect to the main topic of the respective blog threads.

  13. http://infomap-nlp.sourceforge.net/

  14. The Medical Subject Headings (MeSH) thesaurus is prepared by the US National Library of Medicine for indexing, cataloguing, and searching for biomedical and health-related information and documents. Although it was initially meant for biomedical and health-related documents, since it represent a large IS-A taxonomy, it can be used in more general tasks (http://www.nlm.nih.gov/mesh/meshhome.html).

  15. Europe Media Monitor (EMM) (Steinberger et al. 2009c) and the related text mining tools have been entirely developed at the European Commission’s Joint Research Centre (JRC). Unfortunately, the tools cannot be currently made publicly available, but the end product (the news analysis output) is freely accessible at http://emm.newsbrief.eu/overview.html. The tools used for the experiments described here i.e., named entity recognition (NER) (Steinberger and Pouliquen 2009) and disambiguation (Pouliquen et al. 2006) can in principle be replaced by any other tools. For a detailed description of how these tools were built and the manner in which they function, please see Pouliquen et al. (2006) and Pouliquen and Steinberger (2009).

References

  • Akkaya, C., Wiebe, J., & Mihalcea, R. (2009). Subjectivity word sense disambiguation. In Proceedings of the 2009 conference on empirical methods in natural language processing: Volume 1. EMNLP ’09 (Vol. 1. pp. 190–199). Stroudsburg, PA, USA: Association for Computational Linguistics. http://dl.acm.org/citation.cfm?id=1699510.1699535.

    Chapter  Google Scholar 

  • Balahur, A., Boldrini, E., Montoyo, A., & Martínez-Barco, P. (2009a). Cross-topic opinion mining for real-time human-computer interaction. In Proceedings of ICEIS 2009 conference.

  • Balahur, A., Kabadjov, M., Steinberger, J., Steinberger, R., & Montoyo, A. (2009b). Summarizing opinions in blog threads. In Proceedings of the 23rd pacific asia conference on language, information and computation (PACLIC) (pp. 606–613).

  • Balahur, A., Lloret, E., Boldrini, E., Montoyo, A., Palomar, M., & Martínez-Barco, P. (2009c). Summarizing threads in blogs using opinion polarity. In Proceeding of the workshop on events in emerging text types at RANLP, Borovetz, Bulgaria.

  • Balahur, A., Lloret, E., Ferrández, O., Montoyo, A., Palomar, M., & Muñoz, R. (2008). The dlsiuaes team’s participation in the tac 2008 tracks. In Proceedings of the text analysis conference (TAC) 2008. National Institute of Standards and Technology (NIST).

  • Balahur, A., Steinberger, R., Kabadjov, M., Zavarella, V., van der Goot, E., Halkia, M., et al. (2010). Sentiment analysis in the news. In Proceedings of LREC2010.

  • Balahur, A., Steinberger, R., van der Goot, E., Pouliquen, B., & Kabadjov, M. (2009). Opinion mining from newspaper quotations. In Proceedings of the workshop on intelligent analysis and processing of web news content at the IEEE/WIC/ACM international conferences on web intelligence and intelligent agent technology (WI-IAT).

  • Beineke, P., Hastie, T., Manning, C., & Vaithyanathan, S. (2004). An exploration of sentiment summarization. In J. G. Shanahan, J. Wiebe, & Y. Qu (Eds.), Proceedings of the AAAI spring symposium on exploring attitude and affect in text: Theories and applications, Stanford, US. http://nlp.stanford.edu/~manning/papers/rotup.pdf.

  • Bossard, A., Généreux, M., & Poibeau, T. (2008). Description of the LIPN systems at TAC 2008: Summarizing information and opinions. In Proceedings of the text analysis conference (TAC) 2008. National Institute of Standards and Technology (NIST).

  • Cerini, S., Compagnoni, V., Demontis, A., Formentelli, M., & Gandini, G. (2007). Micro-WNOp: A gold standard for the evaluation of automatically compiled lexical resources for opinion mining. In A. Sansò (Ed.), Language resources and linguistic theory: Typology, second language acquisition, english linguistics, Franco Angeli, Milano, IT.

  • Conroy, J., & Schlesinger, S. (2008). Classy at tac 2008 metrics. In Proceedings of the text analysis conference (TAC) 2008. National Institute of Standards and Technology (NIST).

  • Cruz, F., Troyani, J., Ortega, J., & Enríquez, F. (2008). The Italica system at tac 2008 opinion summarization task. In Proceedings of the text analysis conference (TAC) 2008. National Institute of Standards and Technology (NIST).

  • Erkan, G., & Radev, D. R. (2004). LexRank: Graph-based centrality as salience in text summarization. Journal of Artificial Intelligence Research (JAIR), 22, 457–479.

    Google Scholar 

  • Esuli, A., & Sebastiani, F. (2006). SentiWordNet: A publicly available resource for opinion mining. In Proceedings of the 6th international conference on language resources and evaluation, Italy.

  • Gong, Y., & Liu, X. (2002). Generic text summarization using relevance measure and latent semantic analysis. In Proceedings of ACM SIGIR, New Orleans, US.

  • He, T., Chen, J., Gui, Z., & Li, F. (2008). CCNU at TAC 2008: Proceeding on using semantic method for automated summarization yield. In Proceedings of the text analysis conference (TAC) 2008. National Institute of Standards and Technology (NIST).

  • Hovy, E. H. (2005). Automated text summarization. In R. Mitkov (Ed.), The Oxford handbook of computational linguistics (pp. 583–598). Oxford, UK: Oxford University Press.

    Google Scholar 

  • Kabadjov, M., Balahur, A., & Boldrini, E. (2009). Sentiment intensity: Is it a good summary indicator? In Proceedings of the 4th language and technology conference (LTC) (pp. 380–384).

  • Kabadjov, M. A., Steinberger, J., Pouliquen, B., Steinberger, R., & Poesio, M. (2009). Multilingual statistical news summarisation: Preliminary experiments with English. In Proceedings of the workshop on intelligent analysis and processing of web news content at the IEEE/WIC/ACM international conferences on web intelligence and intelligent agent technology (WI-IAT).

  • Kupiec, J., Pedersen, J., & Chen, F. (1995). A trainable document summarizer. In Proceedings of the 18th annual international ACM SIGIR conference on research and development in information retrieval, Seattle, Washington (pp. 68–73).

  • Lerman, K., Blair-Goldensohn, S., & McDonald, R. (2009). Sentiment summarization: Evaluating and learning user preferences. In Proceedings of the 12th conference of the European chapter of the ACL (EACL 2009) (pp. 514–522). Athens, Greece: Association for Computational Linguistics.

    Google Scholar 

  • Lerman, K., & McDonald, R. (2009). Contrastive summarization: An experiment with consumer reviews. In Proceedings of human language technologies: The 2009 annual conference of the North American chapter of the association for computational linguistics, companion volume: Short papers. (pp. 113–116). Boulder, Colorado: Association for Computational Linguistics.

  • Lin, C. Y. (2004). ROUGE: A package for automatic evaluation of summaries. In Proceedings of the workshop on text summarization branches out, Barcelona, Spain.

  • Lin, C.Y., & Hovy, E. (2003). Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of HLT-NAACL, Edmonton, Canada.

  • Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1–2), 1–135.

    Google Scholar 

  • Pouliquen, B., Kimler, M., Steinberger, R., Ignat, C., Oellinger, T., Blackler, K., et al. (2006). Geocoding multilingual texts: Recognition, disambiguation and visualisation. In Proceedings of the 5th international conference on language resources and evaluation (LREC 2006), Genoa, Italy (pp. 53–58).

  • Pouliquen, B., & Steinberger, R. (2009). Automatic construction of multilingual name dictionaries. In C. Goutte, N. Cancedda, M. Dymetman, & G. Foster (Eds.), Learning machine translation. MIT Press, NIPS series.

  • Riloff, E., Wiebe, J., & Phillips, W. (2005). Exploiting subjectivity classification to improve information extraction. In Proceedings of the 20th conference of the association for the advancement of artificial intelligence (AAAI).

  • Saggion, H., & Funk, A. (2010). Interpreting SentiWordNet for opinion classification. In Proceedings of LREC 2010.

  • Saggion, H., Lloret, E., & Palomar, M. (2010). Using text summaries for predicting rating scales. In Proceedings of the 1st workshop on subjectivity and sentiment analysis (WASSA 2010).

  • Steinberger, J., & Jez̆ek, K. (2004). Text summarization and singular value decomposition. In Proceedings of the 3rd ADVIS conference, Izmir, Turkey.

  • Steinberger, J., & Jez̆ek, K. (2009). Update summarization based on novel topic distribution. In Proceedings of the 9th ACM DocEng, Munich, Germany.

  • Steinberger, J., Kabadjov, M., Pouliquen, B., Steinberger, R., & Poesio, M. (2009a). WB-JRC-UT’s participation in TAC 2009: Update summarization and AESOP tasks. In National Institute of Standards and Technology (Ed.), Proceedings of the text analysis conference, Gaithersburg, MD.

  • Steinberger, J., Lenkova, P., Ebrahim, M., Ehrman, M., Hurriyetoglu, A., Kabadjov, M., et al. (2011). Creating sentiment dictionaries via triangulation. In Proceedings of the 2nd workshop on computational approaches to subjectivity and sentiment analysis (WASSA 2.011) (pp. 28–36). Portland, Oregon: Association for Computational Linguistics. http://www.aclweb.org/anthology/W11-1704.

    Google Scholar 

  • Steinberger, J., Poesio, M., Kabadjov, M. A., & Jez̆ek, K. (2007). Two uses of anaphora resolution in summarization. Information Processing and Management, 43(6), 1663–1680. Special Issue on Text Summarisation (Donna Harman, ed.).

    Google Scholar 

  • Steinberger, R., & Pouliquen, B. (2009). Cross-lingual named entity recognition. In Named entities—recognition, classification and use, Benjamins current topics (Vol. 19). John Benjamins Publishing Company. ISBN 978–90–272–8922.

  • Steinberger, R., Pouliquen, B., & Ignat, C. (2009b). Using language-independent rules to achieve high multilinguality in text mining. In F. Fogelman-Soulié, D. Perrotta, J. Piskorski, & R. Steinberger (Eds.), Mining massive data sets for security. Amsterdam, Holland: IOS-Press.

    Google Scholar 

  • Steinberger, R., Pouliquen, B., & Van der Goot, E. (2009c). An introduction to the Europe media monitor family of applications. In Information access in a multilingual world—proceedings of the SIGIR 2009 workshop (SIGIR–CLIR 2009) (Vol. 19).

  • Stone, P. J., Dunphy, D. C., Smith, M. S., & Ogilvie, D. M. (1966). The general inquirer: A computer approach to content analysis (Vol. 8). MIT Press. http://mitpress.mit.edu/catalog/item/default.asp?tid=5144&ttype=2.

  • Stoyanov, V., & Cardie, C. (2006). Toward opinion summarization: Linking the sources. In Proceedings of the COLING-ACL workshop on sentiment and subjectivity in text. Sydney, Australia: Association for Computational Linguistics.

    Google Scholar 

  • Strapparava, C., & Valitutti, A. (2004). WordNet-affect: An affective extension of wordnet. In Proceedings of the 4th international conference on language resources and evaluation, Lisbon, Portugal (pp. 1083–1086).

  • Titov, I., & McDonald, R. (2008). A joint model of text and aspect ratings for sentiment summarization. In Proceedings of ACL-08: HLT, Columbus, Ohio (pp. 308–316).

  • Turney, P. D., & Littman, M. L. (2003). Measuring praise and criticism: Inference of semantic orientation from association. ACM Transactions on Information Systems (TOIS), 21(4), 315–346.

    Article  Google Scholar 

  • Varma, V., Pingali, P., Katragadda, R., Krisha, S., Ganesh, S., Sarvabhotla, K., et al. (2008). IIT hyderabad at TAC 2008. In Proceedings of the text analysis conference (TAC) 2008. National Institute of Standards and Technology (NIST).

  • Wiegand, M., Balahur, A., Roth, B., Klakow, D., & Montoyo, A. (2010). A survey on the role of negation in sentiment analysis. In Proceedings of the workshop on negation and speculation in natural language processing (pp. 60–68). NeSp-NLP ’10, Stroudsburg, PA, USA: Association for Computational Linguistics. http://dl.acm.org/citation.cfm?id=1858959.1858970.

  • Wiegand, M., & Klakow, D. (2010). Convolution kernels for opinion holder extraction. In Human language technologies: The 2010 annual conference of the North American chapter of the association for computational linguistics. HLT ’10 (pp. 795–803). Stroudsburg, PA, USA: Association for Computational Linguistics. http://dl.acm.org/citation.cfm?id=1857999.1858120.

    Google Scholar 

  • Zhuang, L., Jing, F., & Zhu, X. Y. (2006). Movie review mining and summarization. In CIKM ’06: Proceedings of the 15th ACM international conference on Information and knowledge management (pp. 43–50).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexandra Balahur.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Balahur, A., Kabadjov, M., Steinberger, J. et al. Challenges and solutions in the opinion summarization of user-generated content. J Intell Inf Syst 39, 375–398 (2012). https://doi.org/10.1007/s10844-011-0194-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-011-0194-z

Keywords

  • Opinion mining
  • Blog threads
  • User-generated content
  • Sentiment analysis
  • Opinion summarization