An Empirical Study of Massively Parallel Bayesian Networks Learning for Sentiment Extraction from Unstructured Text

  • Wei Chen
  • Lang Zong
  • Weijing Huang
  • Gaoyan Ou
  • Yue Wang
  • Dongqing Yang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6612)

Abstract

Extracting sentiments from unstructured text has emerged as an important problem in many disciplines, for example, to mine on-line opinions from the Internet. Many algorithms have been applied to solve this problem. Most of them fail to handle the large scale web data. In this paper, we present a parallel algorithm for BN(Bayesian Networks) structure leaning from large-scale dateset by using a MapReduce cluster. Then, we apply this parallel BN learning algorithm to capture the dependencies among words, and, at the same time, finds a vocabulary that is efficient for the purpose of extracting sentiments. The benefits of using MapReduce for BN structure learning are discussed. The performance of using BN to extract sentiments is demonstrated by applying it to real web blog data. Experimental results on the web data set show that our algorithm is able to select a parsimonious feature set with substantially fewer predictor variables than in the full data set and leads to better predictions about sentiment orientations than several usually used methods.

Keywords

Sentiment Analysis Bayesian Networks MapReduce Cloud Computing Opinion Mining 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Blitzer, J., Dredze, M., Pereira, F.: Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, pp. 440–447 (2007)Google Scholar
  2. 2.
    Cheng, J., Greiner, R., Kelly, J., Bell, D.: A Learning Bayesian Networks from Data: An Information-Theory Based Approach. The Artificial Intelligence journal 137, 43–90 (2002)CrossRefMATHGoogle Scholar
  3. 3.
    Choi, Y., Cardie, C., Riloff, E., Patwardhan, S.: Identifying sources of opinions with conditional random fields and extraction patterns. In: Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Vancouver, British Columbia, Canada, pp. 355–362 (2005)Google Scholar
  4. 4.
    Chu, C.T., Kim, S.K., Lin, Y.A., Yu, Y., Bradski, G., Ng, A.Y., Olukotun, K.: Map-reduce for machine learning on multicore. In: Advances in Neural Information Processing Systems (NIPS 19), pp. 281–288 (2007)Google Scholar
  5. 5.
    Cohen, J.: Graph twiddling in a mapreduce world. Computing in Science and Engineering 11, 29–41 (2009)CrossRefGoogle Scholar
  6. 6.
    Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: Symposium on Operating System Design and Implementation (OSDI), San Francisco, CA, pp. 137–150 (2004)Google Scholar
  7. 7.
    Eguchi, K., Lavrenko, V.: Sentiment retrieval using generative models. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, Sydney, Australia, pp. 345–354 (2006)Google Scholar
  8. 8.
    Kim, S.M., Hovy, E.: Determining the sentiment of opinions. In: Proceedings of the 20th international conference on Computational Linguistics, Morristown, NJ, USA, p. 1367 (2004)Google Scholar
  9. 9.
    MapReduce at Rackspace, http://blog.racklabs.com/?p=66
  10. 10.
    McDonald, R., Hannan, K., Neylon, T., Wells, M., Reynar, J.: Structured models for fine-to-coarse sentiment analysis. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, pp. 432–439 (2007)Google Scholar
  11. 11.
    McNabb, A.W., Monson, C.K., Seppi, K.D.: Parallel PSO Using MapReduce. In: Proceedings of the Congress on Evolutionary Computation (CEC 2007), pp. 7–14. IEEE Press, Singapore (2007)CrossRefGoogle Scholar
  12. 12.
  13. 13.
    Panda, B., Herbach, J.S., Basu, S., Bayardo, R.J.: PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce. In: VLDB, France, pp. 1426–1437 (2009)Google Scholar
  14. 14.
    Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment Classification using Machine Learning Techniques. In: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, pp. 79–86 (2002)Google Scholar
  15. 15.
    Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Morristown, NJ, USA, p. 271 (2004)Google Scholar
  16. 16.
    Turney, P.D., Littman, M.L.: Unsupervised learning of semantic orientation from a hundred-billion-word corpus. In: Proceedings of CoRR (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Wei Chen
    • 1
  • Lang Zong
    • 1
  • Weijing Huang
    • 1
  • Gaoyan Ou
    • 1
  • Yue Wang
    • 1
  • Dongqing Yang
    • 1
  1. 1.Key Laboratory of High Confidence Software Technologies, (Ministry of Education), School of EECSPeking UniversityBeijingChina

Personalised recommendations