Annals of Operations Research

, Volume 236, Issue 1, pp 197–213 | Cite as

A multi-stage method for content classification and opinion mining on weblog comments

  • César Alfaro
  • Javier Cano-Montero
  • Javier Gómez
  • Javier M. Moguerza
  • Felipe Ortega
Article

Abstract

In this paper, we illustrate how to combine supervised machine learning algorithms and unsupervised learning techniques for sentiment analysis and opinion mining purposes. To this end, we describe a multi-stage method for the automatic detection of different opinion trends. The proposal has been tested on real textual data available from comments introduced in a weblog, connected to organizational and administrative affairs in a public educational institution. The use of the described tool, given its potential impact to obtain valuable knowledge from opinion streams created by commenters, may be straightforwardly extended, for example, to the detection of opinion trends concerning policy decision making or electoral campaigns.

Keywords

Multi-stage method Text classification Text analytics Opinion mining Sentiment analysis k-Nearest neighbors Support vector machines 

References

  1. Baeza-Yates, R., & Ribeiro-Neto, B. (2011). Modern information retrieval: the concepts and technology behind search. Reading: Addison-Wesley. http://www.mir2ed.com. Google Scholar
  2. Dietterich, T. (2000). Ensemble methods in machine learning. In Lecture notes in computer science: Vol. 1857. Multiple classifier systems (pp. 1–15). Berlin: Springer. doi:10.1007/3-540-45014-9_1. CrossRefGoogle Scholar
  3. Dumais, S., Platt, J., Heckerman, D., & Sahami, M. (1998). Inductive learning algorithms and representations for text categorization. In Proceedings of the seventh international conference on information and knowledge management (CIKM ’98) (pp. 148–155). New York: ACM. doi:10.1145/288627.288651. Google Scholar
  4. Godbole, N., Srinivasaiah, M., & Skiena, S. (2007). Large-scale sentiment analysis for news and blogs. In Proceedings of the international conference on weblogs and social media (ICWSM). Google Scholar
  5. Hamel, L. H. (2009). Knowledge discovery with support vector machines. New York: Wiley/Interscience. CrossRefGoogle Scholar
  6. Joachims, T. (2002). Learning to classify text using support vector machines. Methods, theory and algorithms. Norwell: Kluwer Academic. CrossRefGoogle Scholar
  7. Kressel, U. H. G. (1999). Pairwise classification and support vector machines. In C. J. C. B. Schölkopf & A. J. Smola (Eds.), Advances in kernel methods—support vector learning (pp. 255–268). Cambridge: MIT Press. Google Scholar
  8. Li, N., & Wu, D. D. (2010). Using text mining and sentiment analysis for online forums hotspot detection and forecast. Decision Support Systems, 48(2), 354–368. CrossRefGoogle Scholar
  9. Liu, B. (2012). Sentiment analysis and opinion mining: synthesis lectures on human language technologies. San Rafael: Morgan & Claypool. Google Scholar
  10. Mardia, K., Kent, J. T., & Bibby, J. (1979). Multivariate analysis. San Diego: Academic Press. Google Scholar
  11. Mergel, I. A., Schweik, C. M., & Fountain, J. E. (2009). The transformational effect of web 2.0. Technologies on government. http://dx.doi.org/10.2139/ssrn.1412796.
  12. Moguerza, J., & Muñoz, A. (2006). Support vector machines with applications. Statistical Science, 21(3), 322–336. CrossRefGoogle Scholar
  13. Muñoz, A., & Moguerza, J. M. (2005). Building smooth neighbourhood kernels via functional data analysis. In ICANN (Vol. 2, pp. 631–636). Google Scholar
  14. Olson, D. L., & Delen, D. (2008). Advanced data mining techniques (1st ed.). Berlin: Springer. Google Scholar
  15. O’Reilly, T. (2007). What is web 2.0: design patterns and business models for the next generation of software. Communications & Strategies, 1, 18–37. http://ssrn.com/abstract=1008839. Google Scholar
  16. Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1–2), 1–135. doi:10.1561/1500000011. CrossRefGoogle Scholar
  17. Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on empirical methods in natural language processing (EMNLP ’02), Stroudsburg, PA, USA (Vol. 10, pp. 79–86). Association for Computational Linguistics. doi:10.3115/1118693.1118704. CrossRefGoogle Scholar
  18. R Core Team (2012). R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. http://www.R-project.org/. ISBN 3-900051-07-0. Google Scholar
  19. Roberts, F. S. (2008). Computer science and decision theory. Annals of Operations Research, 163(1), 209–253. CrossRefGoogle Scholar
  20. Russell, M. (2011). Mining the social web: analyzing data from Facebook, Twitter, LinkedIn, and other social media sites. Media: O’Reilly. Google Scholar
  21. Shein, K. P. P., & Nyunt, T. T. S. (2010). Sentiment classification based on ontology and svm classifier. In Proceedings of the 2010 second international conference on communication software and networks (ICCSN ’10), Washington, DC, USA, pp. 169–172). Los Alamitos: IEEE Comput. Soc. doi:10.1109/ICCN.2010.35. CrossRefGoogle Scholar
  22. Silverman, B. (1986). Density estimation. Chapman & Hall/CRC monographs on statistics and applied probability series. London: Chapman & Hall. CrossRefGoogle Scholar
  23. Tikhonov, A., & Arsenin, V. (1977). Solutions of ill-posed problems. Scripta series in mathematics. New York: Winston. Google Scholar
  24. Wilson, T., Wiebe, J., & Hoffmann, P. (2009). Recognizing contextual polarity: an exploration of features for phrase-level sentiment analysis. Computational Linguistics, 35(3), 399–433. doi:10.1162/coli.08-012-R1-06-90. CrossRefGoogle Scholar
  25. Witten, I., Frank, E., & Hall, M. (2011). Data mining: practical machine learning tools and techniques. The Morgan Kaufmann series in data management systems. Amsterdam: Elsevier. Google Scholar
  26. Zheng, W., & Ye, Q. (2009). Sentiment classification of Chinese traveler reviews by support vector machine algorithm. In Proceedings of the 3rd international conference on intelligent information technology application (IITA’09), Piscataway, NJ, USA (pp. 335–338). New York: IEEE Press. Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • César Alfaro
    • 1
  • Javier Cano-Montero
    • 1
  • Javier Gómez
    • 1
  • Javier M. Moguerza
    • 1
  • Felipe Ortega
    • 1
  1. 1.Dept. of Statistics and Operations ResearchRey Juan Carlos UniversityFuenlabradaSpain

Personalised recommendations