Abstract
Recently, blogs have emerged as the major platform for people to express their feelings and sentiments in the age of Web 2.0. The common emotions, which reflect people’s collective and overall sentiments, are becoming the major concern for governments, business companies and individual users. Different from previous literatures on sentiment classification and summarization, the major issue of common emotion extraction is to find out people’s collective sentiments and their corresponding distributions on the Web. Most existing blog clustering methods take into account keywords, stories or timelines but neglect the embedded sentiments, which are considered very important features of blogs. In this paper, a novel method based on Probabilistic Latent Semantic Analysis (PLSA) is presented to model the hidden sentiment factors and an emotion-oriented clustering approach is proposed to find common emotions according to the fine-grained sentiment similarity between blogs. Extensive experiments are conducted on real-world datasets consisting of different topics. The results show that our approach can partition blogs into sentiment coherent clusters and the extracted common emotion words afford good navigation guidelines for embedded sentiments in each cluster.
Similar content being viewed by others
References
Agarwal N, Oliveras M, Liu H, Subramanya S (2008) Clustering blogs with collective wisdom. In: Proceedings of the eighth international conference on web engineering (ICWE 2008). Yorktown Heights, New York, USA
Averill J (1975) A semantic atlas of emotional concepts. JSAS Catalog of Selected Documents in Psychology, 5530 Ms. No. 421
Bansal N, Chiang F, Koudas N, Tompa F (2007) Seeking stable clusters in the blogosphere. In: Proceedings of 33rd international conference on very large data bases. University of Vienna, Austria
Bar-Ilan J (2004) An outsider’s view on “Topic-oriented” Blogging. In: Proceedings of 13th international conference on world wide web alternate papers track. New York, NY, USA
Bekkerman R, Raghavan H, Allan J, Eguchi K (2007) Interactive clustering of text collections according to a user-specified criterion. In: Proceedings of 20th international joint conference on artificial intelligence. Hyderabad, India
Chesley P, Bruce V, Li X, Rohini S (2006) Using verbs and adjectives to automatically classify blog sentiment. In: AAAI spring symposium technical report SS-06-03
China Internet Network Information Center (CNNIC), http://www.cnnic.cn/en/index
Cilibrasi R, Vitányi P (2007) The google similarity distance. IEEE Trans Knowl Data Eng 19(3): 370–383
Efron M (2006) Using cocitation information to estimate political orientation in web documents. Knowl Inf Syst (KAIS) 9(4): 492–511
Fan T, Chang C (2009) Sentiment-oriented contextual advertising. Knowl Inf Syst (KAIS) 23(3):321–344
Feng S, Wang D, Yu G, Yang C, Yang N (2009) Chinese blog clustering by hidden sentiment factors. In: Proceedings of 5th international conference on advanced data mining and applications (ADMA 2009). Beijing, China
Glance N, Hurst M, Tornkiyo T (2004) Blogpulse: automated trend discovery for weblogs. In: Proceedings of WWW 2004 workshop on the weblogging ecosystem. New York, NY, USA
Google Blog Search, http://blogsearch.google.com
He J, TanA, Tan C, Sung S (2002) On quantitative evaluation of clustering systems. Information Retrieval and Clustering. Kluwer Academic Publishers, Dordrecht
Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of 22nd international ACM SIGIR conference on research and development in information retrieval (SIGIR 1999). Berkeley, CA, USA
ICTCLAS, http://www.ictclas.org
Ku L, Chen H (2007) Mining opinions from the web: beyond relevance retrieval. J Am Soc Inf Sci Technol 58(12): 1838–1850
Kumar R, Novak J, Raghavan P, Tomkins A (2004) Structure and Evolution of Blogspace. In: Commun. ACM, 47(12): 35–39
Liu Y, Huang X, An A, Yu X (2007) ARSA: a sentiment-aware model for predicting sales performance using blogs. In: Proceedings of 30th international ACM SIGIR conference on research and development in information retrieval (SIGIR 2007). Amsterdam, The Netherlands
Lu Y, Zhai C (2008) Opinion integration through semi-supervised topic modeling. In: Proceedings of 17th international conference on world wide web (WWW 2008). Beijing, China
Mei Q, Zhai C (2006) A mixture model for contextual text mining. In: Proceedings of twelfth ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2006). Philadelphia, PA, USA
Melville P, Gryc W, Lawrence R (2009) Sentiment analysis of blogs by combining lexical knowledge with text classification. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2009). Paris, France
Nardi B, Schiano D, Gumbrecht M, Swartz L (2004) Why we blog. Commun ACM 47(12): 41–46
Nguyen C, Phan X, Horiguchi S, Nguyen T, Ha Q (2009) Web search clustering and labeling with hidden topics. ACM Trans Asian Lang Inf Process 8(3): 1–40
Online Opinion Channel. http://yq.people.com.cn
Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? sentiment classification using machine learning techniques. In: Proceedings of 2002 conference on empirical methods in natural language processing (EMNLP 2002). Philadelphia, PA, USA
Pew (2006) Internet and the American Life Project. http://www.pewinternet.org/PPF/r/186/report_display.asp
Plutchik R (1962) The emotions: facts, theories and a new model. Random House, New York
Phan X, Nguyen M, Horiguchi S (2008) Learning to classify short and sparse text & web with hidden topics from large-scale data collections.In: Proceedings of the 17th international conference on world wide web (WWW 2008). Beijing, China
Qamra A, Tseng B, Chang E (2004) Mining blog stories using community based and temporal clustering. In: Proceedings of thirteen ACM conference on information and knowledge management (CIKM 2004). Washington, DC, USA
Quan C, Ren F (2009) Construction of a blog emotion corpus for Chinese emotional expression analysis. In: Proceedings of the 2009 conference on empirical methods in natural language processing (EMNLP 2009). Singapore
Shen D, Sun J, Yang Q, Chen Z (2006) Latent friend mining from blog data. In: Proceedings of 6th IEEE international conference on data mining (ICDM 2006). Hong Kong, China
Song X, Chi Y, Hino K, Tseng B (2007) Identifying opinion leaders in the blogosphere. In: Proceedings of the sixteenth ACM conference on information and knowledge management (CIKM 2007). Lisbon, Portugal
Titov I, McDonald R (2008) A joint model of text and aspect ratings for sentiment summarization. In: Proceedings of 46th meeting of association for computational linguistics (ACL08). Columbus, OH, USA
Turney P (2002) Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews. In: Proceedings of 40th annual meeting of the association for computational linguistics (ACL 2002). Philadelphia, PA, USA
Wong K, Xia Y, Li W, Yuan C (2006) An overview of temporal information processing. J Comput process orient lang 18(2): 137–152
Wu X, Kumar V, Quinlan J, Ghosh J, Yang Q, Motoda H, Mclachlan G, Ng A, Liu B, Yu P, Zhou Z, Steinbach M, Hand D, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst (KAIS) 14(1): 1–37
Yang C, Lin K, Chen H (2007) Building emotion lexicon from weblog corpora. In: Proceedings of 45th annual meeting of the association for computational linguistics (ACL 2005). Prague, Czech Republic
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Feng, S., Wang, D., Yu, G. et al. Extracting common emotions from blogs based on fine-grained sentiment clustering. Knowl Inf Syst 27, 281–302 (2011). https://doi.org/10.1007/s10115-010-0325-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-010-0325-9