A multi-stage method for content classification and opinion mining on weblog comments
In this paper, we illustrate how to combine supervised machine learning algorithms and unsupervised learning techniques for sentiment analysis and opinion mining purposes. To this end, we describe a multi-stage method for the automatic detection of different opinion trends. The proposal has been tested on real textual data available from comments introduced in a weblog, connected to organizational and administrative affairs in a public educational institution. The use of the described tool, given its potential impact to obtain valuable knowledge from opinion streams created by commenters, may be straightforwardly extended, for example, to the detection of opinion trends concerning policy decision making or electoral campaigns.
KeywordsMulti-stage method Text classification Text analytics Opinion mining Sentiment analysis k-Nearest neighbors Support vector machines
Research supported by grants from the Spanish Ministry Science and Innovation, the Ministry of Industry Tourism and Trade and the Government of Madrid: RIESGOS-CM (Ref. CAM s2009/esp-1594), Agora.net, e-COLABORA, Corporate Community, Democracy4All, EDUCALAB (Ref. IPT-2011-1071-430000) and MyUniversity.
- Dumais, S., Platt, J., Heckerman, D., & Sahami, M. (1998). Inductive learning algorithms and representations for text categorization. In Proceedings of the seventh international conference on information and knowledge management (CIKM ’98) (pp. 148–155). New York: ACM. doi: 10.1145/288627.288651. Google Scholar
- Godbole, N., Srinivasaiah, M., & Skiena, S. (2007). Large-scale sentiment analysis for news and blogs. In Proceedings of the international conference on weblogs and social media (ICWSM). Google Scholar
- Kressel, U. H. G. (1999). Pairwise classification and support vector machines. In C. J. C. B. Schölkopf & A. J. Smola (Eds.), Advances in kernel methods—support vector learning (pp. 255–268). Cambridge: MIT Press. Google Scholar
- Liu, B. (2012). Sentiment analysis and opinion mining: synthesis lectures on human language technologies. San Rafael: Morgan & Claypool. Google Scholar
- Mardia, K., Kent, J. T., & Bibby, J. (1979). Multivariate analysis. San Diego: Academic Press. Google Scholar
- Mergel, I. A., Schweik, C. M., & Fountain, J. E. (2009). The transformational effect of web 2.0. Technologies on government. http://dx.doi.org/10.2139/ssrn.1412796.
- Muñoz, A., & Moguerza, J. M. (2005). Building smooth neighbourhood kernels via functional data analysis. In ICANN (Vol. 2, pp. 631–636). Google Scholar
- Olson, D. L., & Delen, D. (2008). Advanced data mining techniques (1st ed.). Berlin: Springer. Google Scholar
- Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on empirical methods in natural language processing (EMNLP ’02), Stroudsburg, PA, USA (Vol. 10, pp. 79–86). Association for Computational Linguistics. doi: 10.3115/1118693.1118704. CrossRefGoogle Scholar
- Russell, M. (2011). Mining the social web: analyzing data from Facebook, Twitter, LinkedIn, and other social media sites. Media: O’Reilly. Google Scholar
- Shein, K. P. P., & Nyunt, T. T. S. (2010). Sentiment classification based on ontology and svm classifier. In Proceedings of the 2010 second international conference on communication software and networks (ICCSN ’10), Washington, DC, USA, pp. 169–172). Los Alamitos: IEEE Comput. Soc. doi: 10.1109/ICCN.2010.35. CrossRefGoogle Scholar
- Tikhonov, A., & Arsenin, V. (1977). Solutions of ill-posed problems. Scripta series in mathematics. New York: Winston. Google Scholar
- Witten, I., Frank, E., & Hall, M. (2011). Data mining: practical machine learning tools and techniques. The Morgan Kaufmann series in data management systems. Amsterdam: Elsevier. Google Scholar
- Zheng, W., & Ye, Q. (2009). Sentiment classification of Chinese traveler reviews by support vector machine algorithm. In Proceedings of the 3rd international conference on intelligent information technology application (IITA’09), Piscataway, NJ, USA (pp. 335–338). New York: IEEE Press. Google Scholar