Probabilistic Approaches for Sentiment Analysis: Latent Dirichlet Allocation for Ontology Building and Sentiment Extraction

  • Francesco ColaceEmail author
  • Massimo De Santo
  • Luca Greco
  • Vincenzo Moscato
  • Antonio Picariello
Part of the Studies in Computational Intelligence book series (SCI, volume 639)


People’s opinion has always driven human choices and behaviors, even before the diffusion of Information and Communication Technologies. Thanks to the World Wide Web and the widespread of On-Line collaborative tools such as blogs, focus groups, review web sitesorums, social networks, millions of messages appear on the web, which is becoming a rich source of opinioned data. Sentiment analysis refers to the use of natural language processing, text analysis and computational linguistics to identify and extract subjective information in documents, comments and posts. The aim of this work is to show how the adoption of a probabilistic approach based on the Latent Dirichlet Allocation (LDA) as Sentiment Grabber can be an effective Sentiment Analyzer. Through this approach, for a set of documents belonging to a same knowledge domain, a graph, the Mixed Graph of Terms, can be automatically extracted. This graph, which contains a set of Mixed Graph of Terms, can be transformed in a Sentiment Oriented Terminological Ontology thanks to a methodology that involves the introduction of annotated lexicon as Wordnet. The chapter shows how the obtained ontology can be discriminative for sentiment classification. The proposed method has been tested in different contexts: standard datasets and comments extracted from social networks. The experimental evaluation shows how the proposed approach is effective and the results are quite satisfactory.


Sentiment analysis Ontologies Latent Dirichlet Allocation Natural language processing 


  1. 1.
    Albanese, M., d’Acierno, A., Moscato, V., Persia, F., Picariello, A.: A multimedia semantic recommender system for cultural heritage applications. In: 2011 Fifth IEEE International Conference on Semantic Computing (ICSC), pp. 403–410. IEEE (2011)Google Scholar
  2. 2.
    Amato, F., Mazzeo, A., Moscato, V., Picariello, A.: Semantic management of multimedia documents for e-government activity. In: International Conference on Complex, Intelligent and Software Intensive Systems, 2009. CISIS’09, pp. 1193–1198. IEEE (2009)Google Scholar
  3. 3.
    Baroni, M., Vegnaduzzo, S.: Identifying subjective adjectives through web-based mutual information. In: Proceedings of the 7th Konferenz zur Verarbeitung Natrlicher Sprache (German Conference on Natural Language Processing KONVENS04, pp. 613–619 (2004)Google Scholar
  4. 4.
    Bird, S., Klein, E., Loper, E., Baldridge, J.: Multidisciplinary instruction with the natural language toolkit. In: Proceedings of the Third Workshop on Issues in Teaching Computational Linguistics, TeachCL’08, pp. 62–70 (2008)Google Scholar
  5. 5.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  6. 6.
    Brewster, C., Jupp, S., Luciano, J., Shotton, D., Stevens, R., Zhang, Z.: Issues in learning an ontology from text. BMC Bioinformatics 10(Suppl 5), S1 (2009)CrossRefGoogle Scholar
  7. 7.
    Buitelaar, P., Magnini, B.: Ontology learning from text: an overview. In: Paul Buitelaar, P., Cimiano, P., Magnini, B. (eds.) Ontology Learning from Text: Methods, Applications and Evaluation, pp. 3–12. IOS Press (2005)Google Scholar
  8. 8.
    Chaovalit, P., Zhou, L.: Movie review mining: a comparison between supervised and unsupervised classification approaches. In: Proceedings of the Proceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS’05)—Track 4—HICSS’05, vol. 04, p. 112.3. IEEE Computer Society, Washington, DC, USA (2005)Google Scholar
  9. 9.
    Ciaramita, M., Gangemi, A., Ratsch, E., Šaric, J., Rojas, I.: Unsupervised learning of semantic relations between concepts of a molecular biology ontology. In: Proceedings of the 19th international joint conference on Artificial intelligence, IJCAI’05, pp. 659–664 (2005)Google Scholar
  10. 10.
    Cimiano, P., Pivk, A., Schmidt-Thieme, L., Staab, S.: Learning taxonomic relations from heterogeneous evidenceGoogle Scholar
  11. 11.
    Cimiano, P., Völker, J., Studer, R.: Ontologies on demand a description of the state-of-the-art, applications, challenges and trends for ontology learning from text (2006)Google Scholar
  12. 12.
    Colace, F., De Santo, M., Greco, L.: Sentiment mining through mixed graph of terms. In: 2014 17th International Conference on Network-Based Information Systems (NBiS), pp. 324–330, Sept 2014Google Scholar
  13. 13.
    Colace, F., De Santo, M., Greco, L.: A probabilistic approach to tweets’ sentiment classification. In: 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII), pp. 37–42 (2013)Google Scholar
  14. 14.
    Colace, F., De Santo, M., Greco, L., Napoletano, P.: Weighted word pairs for query expansion. Inf. Process. Manage. (2014)Google Scholar
  15. 15.
    Colace, F., De Santo, M., Greco, L.: An adaptive product configurator based on slow intelligence approach. Int. J. Metadata Semant. Ontol. 9(2), 128–137, 01 (2014)Google Scholar
  16. 16.
    Colace, F., De Santo, M., Greco, L., Amato, F., Moscato, V., Picariello, A.: Terminological ontology learning and population using latent dirichlet allocation. J. Vis. Lang. Comput. 25(6), 818–826 (2014)CrossRefGoogle Scholar
  17. 17.
    Colace, F., De Santo, M., Greco, L., Napoletano, P.: Text classification using a few labeled examples. Comput. Human Behav. 30, 689–697 (2014)CrossRefGoogle Scholar
  18. 18.
    Colbaugh, R., Glass, K.: Estimating sentiment orientation in social media for intelligence monitoring and analysis. In: 2010 IEEE International Conference on Intelligence and Security Informatics (ISI), pp. 135–137, May 2010Google Scholar
  19. 19.
    Esuli, A., Sebastiani, F.: Sentiwordnet: a publicly available lexical resource for opinion mining. In: Proceedings of the 5th Conference on Language Resources and Evaluation (LREC06), pp. 417–422 (2006)Google Scholar
  20. 20.
    Fotzo, H.N., Gallinari, P.: Learning generalization/specialization relations between concepts—application for automatically building thematic document hierarchiesGoogle Scholar
  21. 21.
    Gamallo, P., Agustini, A., Lopes, G.P.: Learning subcategorisation information to model a grammar with “co-restrictions” (2003)Google Scholar
  22. 22.
    Gamon, M., Aue, A.: Automatic identification of sentiment vocabulary: exploiting low association with known sentiment terms. In: Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing, pp. 57–64. Association for Computational Linguistics, Ann Arbor, Michigan, June 2005Google Scholar
  23. 23.
    Hippisley, A., Cheng, D., Ahmad, K.: The head-modifier principle and multilingual term extraction. Nat. Lang. Eng. 11(2), 129–157 (2005)CrossRefGoogle Scholar
  24. 24.
    Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings of Uncertainty in Artificial Intelligence, UAI99, pp. 289–296 (1999)Google Scholar
  25. 25.
    Kennedy, A., Inkpen, D.: Sentiment classification of movie reviews using contextual valence shifters. Comput. Intell. 22, 2006 (2006)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Wrobel, S.: Inductive logic programming for knowledge discovery in databases. In: Dzeroski, S., Lavrač, N. (eds.) Relational Data Mining. Springer, pp. 74–101 (2001)Google Scholar
  27. 27.
    Liu, B.: Sentiment Analysis and Subjectivity. In: Handbook of Natural Language Processing, 2nd edn. Taylor and Francis Group, Boca (2010)Google Scholar
  28. 28.
    Maedche, A., Staab, S.: Ontology learning for the semantic web. IEEE Intell. Syst. 16(2), 72–79 (2001)CrossRefGoogle Scholar
  29. 29.
    Melville, P., Gryc, W., Lawrence, R.D.: Sentiment analysis of blogs by combining lexical knowledge with text classification. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’09, pp. 1275–1284. ACM, New York, NY, USA (2009)Google Scholar
  30. 30.
    Napoletano, P., Colace, F., De Santo, M., Greco, L.: Text classification using a graph of terms. In: 2012 Sixth International Conference on Complex, Intelligent and Software Intensive Systems (CISIS), pp. 1030–1035, July (2012)Google Scholar
  31. 31.
    Neviarouskaya, A., Prendinger, H., Ishizuka, M.: Sentiful: a lexicon for sentiment analysis. IEEE Trans. Affect. Comput. 2(1), 22–36, jan–june 2011Google Scholar
  32. 32.
    Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. In: Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC’10). European Language Resources Association (ELRA), Valletta, Malta, May 2010Google Scholar
  33. 33.
    Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2(1–2), 1–135 (2008)CrossRefGoogle Scholar
  34. 34.
    Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? sentiment classification using machine learning techniques. In: Proceedings of EMNLP, pp. 79–86 (2002)Google Scholar
  35. 35.
    Pedersen, T., Patwardhan, S., Michelizzi, J.: Wordnet::similarity: measuring the relatedness of concepts. In: Demonstration Papers at HLT-NAACL 2004, HLT-NAACL-Demonstrations’04, pp. 38–41 (2004)Google Scholar
  36. 36.
    Prabowo, R., Thelwall, M.: Sentiment analysis: a combined approach. J. Informetrics 3, 143–157 (2009)CrossRefGoogle Scholar
  37. 37.
    Sebastiani, F.: Machine learning in text categorization. ACM Comput. Surv. 34, 1–47 (2002)MathSciNetCrossRefGoogle Scholar
  38. 38.
    Shamsfard, M., Barforoush, A.A.: The state of the art in ontology learning: a framework for comparison. Knowl. Eng. Rev. 18(4), 293–316 (2003)Google Scholar
  39. 39.
    Shein, K.P.P.: Ontology based combined approach for sentiment classification. In: Proceedings of the 3rd International Conference on Communications and Information Technology, CIT’09, pp. 112–115. World Scientific and Engineering Academy and Society (WSEAS), Stevens Point, Wisconsin, USA (2009)Google Scholar
  40. 40.
    Snow, R.: Semantic taxonomy induction from heterogenous evidence. In: Proceedings of COLING/ACL 2006, 801–808 (2006)Google Scholar
  41. 41.
    Turney, P., Littman, M.: Unsupervised learning of semantic orientation from a hundred-billion-word corpus. Technical report NRC technical report ERB-1094, Institute for Information Technology, National Research Council Canada (2002)Google Scholar
  42. 42.
    Turney, P.D.: Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In: Proceedings of the 12th European Conference on Machine Learning, EMCL’01, pp. 491–502 (2001)Google Scholar
  43. 43.
    Turney, P.D.: Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL’02, pp. 417–424. Association for Computational Linguistics, Stroudsburg, PA, USA (2002)Google Scholar
  44. 44.
    Velardi, P., Navigli, R., Cucchiarelli, A., Neri, F.: Evaluation of ontolearn, a methodology for automatic learning of ontologies. In: Buitelaar, P., Cimiano, P., Magnini, B. (eds.) Ontology Learning from Text: Methods, Evaluation and Applications. IOS Press, pp. 92–105Google Scholar
  45. 45.
    Wang, C., Xiao, Z., Liu, Y., Yanru, X., Zhou, A., Zhang, K.: Sentiview: sentiment analysis and visualization for internet popular topics. IEEE Trans. Hum. Mach. Syst. 43(6), 620–630 (2013)CrossRefGoogle Scholar
  46. 46.
    Wilson, T., Wiebe, J., Hwa, R.: Just how mad are you? finding strong and weak opinion clauses. In: Proceedings of the 19th national conference on Artifical intelligence, AAAI’04, pp. 761–767. AAAI Press (2004)Google Scholar
  47. 47.
    Wong, W., Liu, W., Bennamoun, M.: Tree-traversing ant algorithm for term clustering based on featureless similarities. Data Min. Knowl. Discov. 15(3), 349–381 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  48. 48.
    Wong, W., Liu, W., Bennamoun, M.: Ontology learning from text: a look back and into the future. ACM Comput. Surv. 44(4), 20:1–20:36, Sept 2012Google Scholar
  49. 49.
    Yangarber, R., Grishman, R., Tapanainen, P.: Automatic acquisition of domain knowledge for information extraction. In: Proceedings of the 18th International Conference on Computational Linguistics, pp. 940–946 (2000)Google Scholar
  50. 50.
    Yu, X., Liu, Y., Huang, X., An, A.: Mining online reviews for predicting sales performance: a case study in the movie domain. IEEE Trans. Knowl. Data Eng. 24(4), 720–734 (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Francesco Colace
    • 1
    Email author
  • Massimo De Santo
    • 1
  • Luca Greco
    • 1
  • Vincenzo Moscato
    • 2
  • Antonio Picariello
    • 2
  1. 1.Dipartimento di Ingegneria dell’Informazione, Ingegneria Elettrica e Matematica ApplicataUniversità degli Studi di SalernoFisciano, SalernoItaly
  2. 2.Dipartimento di Ingegneria elettrica e delle Tecnologie dell’InformazioneUniversità degli Studi di Napoli Federico IINapoliItaly

Personalised recommendations