Skip to main content
Log in

Extracting common emotions from blogs based on fine-grained sentiment clustering

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Recently, blogs have emerged as the major platform for people to express their feelings and sentiments in the age of Web 2.0. The common emotions, which reflect people’s collective and overall sentiments, are becoming the major concern for governments, business companies and individual users. Different from previous literatures on sentiment classification and summarization, the major issue of common emotion extraction is to find out people’s collective sentiments and their corresponding distributions on the Web. Most existing blog clustering methods take into account keywords, stories or timelines but neglect the embedded sentiments, which are considered very important features of blogs. In this paper, a novel method based on Probabilistic Latent Semantic Analysis (PLSA) is presented to model the hidden sentiment factors and an emotion-oriented clustering approach is proposed to find common emotions according to the fine-grained sentiment similarity between blogs. Extensive experiments are conducted on real-world datasets consisting of different topics. The results show that our approach can partition blogs into sentiment coherent clusters and the extracted common emotion words afford good navigation guidelines for embedded sentiments in each cluster.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agarwal N, Oliveras M, Liu H, Subramanya S (2008) Clustering blogs with collective wisdom. In: Proceedings of the eighth international conference on web engineering (ICWE 2008). Yorktown Heights, New York, USA

  2. Averill J (1975) A semantic atlas of emotional concepts. JSAS Catalog of Selected Documents in Psychology, 5530 Ms. No. 421

  3. Bansal N, Chiang F, Koudas N, Tompa F (2007) Seeking stable clusters in the blogosphere. In: Proceedings of 33rd international conference on very large data bases. University of Vienna, Austria

  4. Bar-Ilan J (2004) An outsider’s view on “Topic-oriented” Blogging. In: Proceedings of 13th international conference on world wide web alternate papers track. New York, NY, USA

  5. Bekkerman R, Raghavan H, Allan J, Eguchi K (2007) Interactive clustering of text collections according to a user-specified criterion. In: Proceedings of 20th international joint conference on artificial intelligence. Hyderabad, India

  6. Chesley P, Bruce V, Li X, Rohini S (2006) Using verbs and adjectives to automatically classify blog sentiment. In: AAAI spring symposium technical report SS-06-03

  7. China Internet Network Information Center (CNNIC), http://www.cnnic.cn/en/index

  8. Cilibrasi R, Vitányi P (2007) The google similarity distance. IEEE Trans Knowl Data Eng 19(3): 370–383

    Article  Google Scholar 

  9. Efron M (2006) Using cocitation information to estimate political orientation in web documents. Knowl Inf Syst (KAIS) 9(4): 492–511

    Article  MathSciNet  Google Scholar 

  10. Fan T, Chang C (2009) Sentiment-oriented contextual advertising. Knowl Inf Syst (KAIS) 23(3):321–344

    Google Scholar 

  11. Feng S, Wang D, Yu G, Yang C, Yang N (2009) Chinese blog clustering by hidden sentiment factors. In: Proceedings of 5th international conference on advanced data mining and applications (ADMA 2009). Beijing, China

  12. Glance N, Hurst M, Tornkiyo T (2004) Blogpulse: automated trend discovery for weblogs. In: Proceedings of WWW 2004 workshop on the weblogging ecosystem. New York, NY, USA

  13. Google Blog Search, http://blogsearch.google.com

  14. He J, TanA, Tan C, Sung S (2002) On quantitative evaluation of clustering systems. Information Retrieval and Clustering. Kluwer Academic Publishers, Dordrecht

    Google Scholar 

  15. Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of 22nd international ACM SIGIR conference on research and development in information retrieval (SIGIR 1999). Berkeley, CA, USA

  16. HowNet, http://www.keenage.com/html/e_index.html

  17. ICTCLAS, http://www.ictclas.org

  18. Ku L, Chen H (2007) Mining opinions from the web: beyond relevance retrieval. J Am Soc Inf Sci Technol 58(12): 1838–1850

    Article  Google Scholar 

  19. Kumar R, Novak J, Raghavan P, Tomkins A (2004) Structure and Evolution of Blogspace. In: Commun. ACM, 47(12): 35–39

  20. Liu Y, Huang X, An A, Yu X (2007) ARSA: a sentiment-aware model for predicting sales performance using blogs. In: Proceedings of 30th international ACM SIGIR conference on research and development in information retrieval (SIGIR 2007). Amsterdam, The Netherlands

  21. Lu Y, Zhai C (2008) Opinion integration through semi-supervised topic modeling. In: Proceedings of 17th international conference on world wide web (WWW 2008). Beijing, China

  22. Mei Q, Zhai C (2006) A mixture model for contextual text mining. In: Proceedings of twelfth ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2006). Philadelphia, PA, USA

  23. Melville P, Gryc W, Lawrence R (2009) Sentiment analysis of blogs by combining lexical knowledge with text classification. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2009). Paris, France

  24. Nardi B, Schiano D, Gumbrecht M, Swartz L (2004) Why we blog. Commun ACM 47(12): 41–46

    Article  Google Scholar 

  25. Nguyen C, Phan X, Horiguchi S, Nguyen T, Ha Q (2009) Web search clustering and labeling with hidden topics. ACM Trans Asian Lang Inf Process 8(3): 1–40

    Article  Google Scholar 

  26. Online Opinion Channel. http://yq.people.com.cn

  27. Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? sentiment classification using machine learning techniques. In: Proceedings of 2002 conference on empirical methods in natural language processing (EMNLP 2002). Philadelphia, PA, USA

  28. Pew (2006) Internet and the American Life Project. http://www.pewinternet.org/PPF/r/186/report_display.asp

  29. Plutchik R (1962) The emotions: facts, theories and a new model. Random House, New York

    Google Scholar 

  30. Phan X, Nguyen M, Horiguchi S (2008) Learning to classify short and sparse text & web with hidden topics from large-scale data collections.In: Proceedings of the 17th international conference on world wide web (WWW 2008). Beijing, China

  31. Qamra A, Tseng B, Chang E (2004) Mining blog stories using community based and temporal clustering. In: Proceedings of thirteen ACM conference on information and knowledge management (CIKM 2004). Washington, DC, USA

  32. Quan C, Ren F (2009) Construction of a blog emotion corpus for Chinese emotional expression analysis. In: Proceedings of the 2009 conference on empirical methods in natural language processing (EMNLP 2009). Singapore

  33. Shen D, Sun J, Yang Q, Chen Z (2006) Latent friend mining from blog data. In: Proceedings of 6th IEEE international conference on data mining (ICDM 2006). Hong Kong, China

  34. Song X, Chi Y, Hino K, Tseng B (2007) Identifying opinion leaders in the blogosphere. In: Proceedings of the sixteenth ACM conference on information and knowledge management (CIKM 2007). Lisbon, Portugal

  35. Titov I, McDonald R (2008) A joint model of text and aspect ratings for sentiment summarization. In: Proceedings of 46th meeting of association for computational linguistics (ACL08). Columbus, OH, USA

  36. Turney P (2002) Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews. In: Proceedings of 40th annual meeting of the association for computational linguistics (ACL 2002). Philadelphia, PA, USA

  37. Wong K, Xia Y, Li W, Yuan C (2006) An overview of temporal information processing. J Comput process orient lang 18(2): 137–152

    Article  Google Scholar 

  38. Wu X, Kumar V, Quinlan J, Ghosh J, Yang Q, Motoda H, Mclachlan G, Ng A, Liu B, Yu P, Zhou Z, Steinbach M, Hand D, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst (KAIS) 14(1): 1–37

    Article  Google Scholar 

  39. Yang C, Lin K, Chen H (2007) Building emotion lexicon from weblog corpora. In: Proceedings of 45th annual meeting of the association for computational linguistics (ACL 2005). Prague, Czech Republic

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shi Feng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Feng, S., Wang, D., Yu, G. et al. Extracting common emotions from blogs based on fine-grained sentiment clustering. Knowl Inf Syst 27, 281–302 (2011). https://doi.org/10.1007/s10115-010-0325-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-010-0325-9

Keywords

Navigation