Abstract
A joint model for annotation bias and document classification is presented in the context of media sentiment analysis. We consider an Irish online media data set comprising online news articles with user annotations of negative, positive or irrelevant impact on the Irish economy. The joint model combines a statistical model for user annotation bias and a Naive Bayes model for the document terms. An EM algorithm is used to estimate the annotation bias model, the unobserved biases in the user annotations, the classifier parameters and the sentiment of the articles. The joint modeling of both the user biases and the classifier is demonstrated to be superior to estimation of the bias followed by the estimation of the classifier parameters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Brew, A., Greene, D., & Cunningham, P. (2010a). The interaction between supervised learning and crowdsourcing. In NIPS workshop on computational social science and the wisdom of crowds, Whistler, Canada.
Brew, A., Greene, D., & Cunningham, P. (2010b). Using crowdsourcing and active learning to track sentiment in online media. In H. Coelho, R. Studer, & M. Wooldridge (Eds.), ECAI 2010 – 19th European conference on artificial intelligence (pp. 1–11). Berlin: IOS.
Dawid, A., & Skene, A. (1979). Maximum likelihood estimation of observer error-rates using the EM algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1), 20–28.
Domingos, P., & Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29, 103–130.
Hand, D. J., & Yu, K. (2001). Idiot’s Bayes—not so stupid after all? International Statistical Review, 69(3), 385–398.
Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1–2), 1–135.
Raykar, V., Yu, S., Zhao, L., Valadez, G., Florin, C., Bogoni, L., & Moy, L. (2010). Learning from crowds. Journal of Machine Learning Research, 11, 1297–1322.
Rogers, S., Girolami, M., & Polajnar, T. (2010). Semi-parametric analysis of multi-rater data. Statistics and Computing, 20, 317–334.
Smyth, P., Fayyad, U. M., Burl, M. C., Perona, P., & Baldi, P. (1994). Inferring ground truth from subjective labelling of venus images. In G. Tesauro, D. S. Touretzky, & T. K. Leen (Eds.), Advances in neural information processing systems (Vol. 7, pp. 1085–1092). Cambridge: MIT.
Tetlock, P. C. (2007). Giving content to investor sentiment: The role of media in the stock market. The Journal of Finance, 62(3), 1139–1168.
Acknowledgements
This work is supported by the Science Foundation Ireland under Grant No. 08/SRC/I1407: Clique: Graph & Network Analysis Cluster.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Salter-Townshend, M., Murphy, T.B. (2013). Sentiment Analysis of Online Media. In: Lausen, B., Van den Poel, D., Ultsch, A. (eds) Algorithms from and for Nature and Life. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-00035-0_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-00035-0_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-00034-3
Online ISBN: 978-3-319-00035-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)