An Empirical Study of Massively Parallel Bayesian Networks Learning for Sentiment Extraction from Unstructured Text

Chen, Wei; Zong, Lang; Huang, Weijing; Ou, Gaoyan; Wang, Yue; Yang, Dongqing

doi:10.1007/978-3-642-20291-9_47

Wei Chen²¹,
Lang Zong²¹,
Weijing Huang²¹,
Gaoyan Ou²¹,
Yue Wang²¹ &
…
Dongqing Yang²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6612))

Included in the following conference series:

Asia-Pacific Web Conference

1157 Accesses
7 Citations

Abstract

Extracting sentiments from unstructured text has emerged as an important problem in many disciplines, for example, to mine on-line opinions from the Internet. Many algorithms have been applied to solve this problem. Most of them fail to handle the large scale web data. In this paper, we present a parallel algorithm for BN(Bayesian Networks) structure leaning from large-scale dateset by using a MapReduce cluster. Then, we apply this parallel BN learning algorithm to capture the dependencies among words, and, at the same time, finds a vocabulary that is efficient for the purpose of extracting sentiments. The benefits of using MapReduce for BN structure learning are discussed. The performance of using BN to extract sentiments is demonstrated by applying it to real web blog data. Experimental results on the web data set show that our algorithm is able to select a parsimonious feature set with substantially fewer predictor variables than in the full data set and leads to better predictions about sentiment orientations than several usually used methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blitzer, J., Dredze, M., Pereira, F.: Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, pp. 440–447 (2007)
Google Scholar
Cheng, J., Greiner, R., Kelly, J., Bell, D.: A Learning Bayesian Networks from Data: An Information-Theory Based Approach. The Artificial Intelligence journal 137, 43–90 (2002)
Article MATH Google Scholar
Choi, Y., Cardie, C., Riloff, E., Patwardhan, S.: Identifying sources of opinions with conditional random fields and extraction patterns. In: Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Vancouver, British Columbia, Canada, pp. 355–362 (2005)
Google Scholar
Chu, C.T., Kim, S.K., Lin, Y.A., Yu, Y., Bradski, G., Ng, A.Y., Olukotun, K.: Map-reduce for machine learning on multicore. In: Advances in Neural Information Processing Systems (NIPS 19), pp. 281–288 (2007)
Google Scholar
Cohen, J.: Graph twiddling in a mapreduce world. Computing in Science and Engineering 11, 29–41 (2009)
Article Google Scholar
Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: Symposium on Operating System Design and Implementation (OSDI), San Francisco, CA, pp. 137–150 (2004)
Google Scholar
Eguchi, K., Lavrenko, V.: Sentiment retrieval using generative models. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, Sydney, Australia, pp. 345–354 (2006)
Google Scholar
Kim, S.M., Hovy, E.: Determining the sentiment of opinions. In: Proceedings of the 20th international conference on Computational Linguistics, Morristown, NJ, USA, p. 1367 (2004)
Google Scholar
MapReduce at Rackspace, http://blog.racklabs.com/?p=66
McDonald, R., Hannan, K., Neylon, T., Wells, M., Reynar, J.: Structured models for fine-to-coarse sentiment analysis. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, pp. 432–439 (2007)
Google Scholar
McNabb, A.W., Monson, C.K., Seppi, K.D.: Parallel PSO Using MapReduce. In: Proceedings of the Congress on Evolutionary Computation (CEC 2007), pp. 7–14. IEEE Press, Singapore (2007)
Chapter Google Scholar
Movie Review Data, http://www.cs.cornell.edu/people/pabo/movie-review-data/
Panda, B., Herbach, J.S., Basu, S., Bayardo, R.J.: PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce. In: VLDB, France, pp. 1426–1437 (2009)
Google Scholar
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment Classification using Machine Learning Techniques. In: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, pp. 79–86 (2002)
Google Scholar
Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Morristown, NJ, USA, p. 271 (2004)
Google Scholar
Turney, P.D., Littman, M.L.: Unsupervised learning of semantic orientation from a hundred-billion-word corpus. In: Proceedings of CoRR (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Key Laboratory of High Confidence Software Technologies, (Ministry of Education), School of EECS, Peking University, Beijing, 100871, China
Wei Chen, Lang Zong, Weijing Huang, Gaoyan Ou, Yue Wang & Dongqing Yang

Authors

Wei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Lang Zong
View author publications
You can also search for this author in PubMed Google Scholar
Weijing Huang
View author publications
You can also search for this author in PubMed Google Scholar
Gaoyan Ou
View author publications
You can also search for this author in PubMed Google Scholar
Yue Wang
View author publications
You can also search for this author in PubMed Google Scholar
Dongqing Yang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information, Renmin University of China, 100872, Beijing, China
Xiaoyong Du
LFCS, School of Informatics, University of Edinburgh, 10 Crichton Street, EH8 9AB, Edinburgh, Scotland, UK
Wenfei Fan
School of Software, Tsinghua University, Room 819, Main Building, 100084, Beijing, China
Jianmin Wang
Computer School, Wuhan University, Luojiashan Road, 430072, Wuhan, Hubei, China
Zhiyong Peng
School of Information Technology and Electrical Engineering, The University of Queensland, QLD 4072, St. Lucia, Australia
Mohamed A. Sharaf

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, W., Zong, L., Huang, W., Ou, G., Wang, Y., Yang, D. (2011). An Empirical Study of Massively Parallel Bayesian Networks Learning for Sentiment Extraction from Unstructured Text. In: Du, X., Fan, W., Wang, J., Peng, Z., Sharaf, M.A. (eds) Web Technologies and Applications. APWeb 2011. Lecture Notes in Computer Science, vol 6612. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20291-9_47

Download citation

DOI: https://doi.org/10.1007/978-3-642-20291-9_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20290-2
Online ISBN: 978-3-642-20291-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics