Information geometry enhanced fuzzy deep belief networks for sentiment classification

  • Meng Wang
  • Zhen-Hu NingEmail author
  • Tong Li
  • Chuang-Bai Xiao
Original Article


With the development of internet, more and more people share reviews. Efficient sentiment analysis over such reviews using deep learning techniques has become an emerging research topic, which has attracted more and more attention from the natural language processing community. However, improving performance of a deep neural network remains an open question. In this paper, we propose a sophisticated algorithm based on deep learning, fuzzy clustering and information geometry. In particular, the distribution of training samples is treated as prior knowledge and is encoded in fuzzy deep belief networks using an improved Fuzzy C-Means (FCM) clustering algorithm. We adopt information geometry to construct geodesic distance between the distributions over features for classification, improving the FCM. Based on the clustering results, we then embed the fuzzy rules learned by FCM into fuzzy deep belief networks in order to improve their performance. Finally, we evaluate our proposal using empirical data sets that are dedicated for sentiment classification. The results show that our algorithm brings out significant improvement over existing methods.


Fuzzy neural networks Information geometry Semi-supervised learning Sentiment classification 



  1. 1.
    Shoushan L, Lee SYM, Chen Y, Huang C, Zhou G (2010) Sentiment classification and polarity shifting. In: Proceedings of the 23rd international conference on computational linguistics, pp 635–643Google Scholar
  2. 2.
    Taboada M, Brooke J, Tofiloski M, Voll K, Stede M (2010) Lexicon-based methods for sentiment analysis. Comput Linguist 37(2):267–307Google Scholar
  3. 3.
    Ravishankar N, Raghunathan S (2017) Corpus based sentiment classification of tamil movie tweets using syntactic patterns. Comput Sci 8(2):172–178Google Scholar
  4. 4.
    HaCohen-Kerner Y, Badash H (2016) Positive and negative sentiment words in a blog corpus written in hebrew. Procedia Comput Sci 96(50):733–743Google Scholar
  5. 5.
    Gao K, Su S, Wang J (2015) A sentiment analysis hybrid approach for microblogging and E-commerce corpus. In: 7th international conference on modelling, identification and control (ICMIC), pp 1–6Google Scholar
  6. 6.
    Bo P, Lillian L, Shivakumar V (2002) Thumbs up? Sentiment classification using machine learning techniques. Proc EMNLP-02 10(2):79–86Google Scholar
  7. 7.
    Turney P (2002) Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Annual meeting of the association of computational linguistics, pp 417–424Google Scholar
  8. 8.
    Turney PD, Littman ML (2003) Measuring praise and criticism: inference of semantic orientation from association. ACM Trans Inf Syst 21(1):315–346Google Scholar
  9. 9.
    Da Silva NFF, Coletta LFS, Hruschka ER, Hruschka ER Jr (2016) Using unsupervised information to improve semi-supervised tweet sentiment classification. Inf Sci 355(1):348–365Google Scholar
  10. 10.
    Torresani L (2014) Weakly supervised learning. Comput Vis A Ref Guide 10(2–3):883–885Google Scholar
  11. 11.
    Guan Z, Chen L, Zhao W, Zheng Y, Tan S, Cai D (2016) Weakly-supervised deep learning for customer review sentiment classification. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence (IJCAI-16)Google Scholar
  12. 12.
    Hady MFA, Schwenker F (2013) Semi-supervised learning. In: Bianchini M, Maggini M, Jain L (eds) Handbook on neural information processing. intelligent systems reference library, vol 49. Springer, BerlinGoogle Scholar
  13. 13.
    Li S, Wang Z, Zhou G, Lee SYatM (2017) Semi-supervised learning for imbalanced sentiment classification. J R Stat Soc 172(2):530–530Google Scholar
  14. 14.
    Hinton GE, Osindero S, Teh Y (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(1):1527–1554MathSciNetzbMATHGoogle Scholar
  15. 15.
    Zhou S, Chen Q, Wang X (2014) Fuzzy deep belief networks for semi-supervised sentiment classification. Neurocomputing 131(1):312–322Google Scholar
  16. 16.
    Zadeh LA (1965) A Fuzzy sets. Inf Control 8:338–353zbMATHGoogle Scholar
  17. 17.
    Basseville M (2013) Divergence measures for statistical data processing—an annotated bibliography. Signal Process 93(4):621–633MathSciNetGoogle Scholar
  18. 18.
    Zhao K, Alavi A, Wiliem A, Lovell BC (2005) A novel information geometric approach to variable selection in MLP networks. Neural Netw 18(2):1309–1318Google Scholar
  19. 19.
    Amari SI (1998) Natural gradient works efficiently in learning. Neural Comput 10(2):251–276Google Scholar
  20. 20.
    Zhao J (2015) Natural gradient learning algorithms for RBF networks. Neural Comput 27(2):481–505MathSciNetGoogle Scholar
  21. 21.
    Bezdek AC, Ehrlich R, Full W (1984) FCM: the Fuzzy C-means clustering algorithm. Comput Geosci 10(2–3):191–203Google Scholar
  22. 22.
    Zhuang L, Jing F, Zhu Z (2006) Movie review mining and summarization. In: Proceedings of the 15th ACM international conference on information and knowledge management, pp 43–50Google Scholar
  23. 23.
    Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, pp 1–12Google Scholar
  24. 24.
    Wu F, Song Y, Huang Y (2015) Microblog sentiment classification with contextual knowledge regularization. In: Proceedings of the twenty-ninth AAAI conference on artificial intelligence, pp 2332–2338Google Scholar
  25. 25.
    Xia Y, Wang AL, Wong KF, Xu M (2008) Lyric-based song sentiment classification with sentiment vector space model. In: Annual meeting of the association of computational linguistics, pp 133–136Google Scholar
  26. 26.
    Mcdonald R, Hannan K, Neylon T (2007) Structured models for fine-to-coarse sentiment analysis. In: Annual meeting of the association of computational linguistics, pp 432–439Google Scholar
  27. 27.
    Deng Z, Luo K, Yu H (2014) A study of supervised term weighting scheme for sentiment analysis. Expert Syst Appl 41(1):3506–3513Google Scholar
  28. 28.
    Aue A, Gamon M (2005) Customizing sentiment classifiers to new domains: a case study. In: International conference on recent advances in natural language processing, pp 210–231Google Scholar
  29. 29.
    Tan S, Wu G, Tang AH, Cheng X (2007) A novel scheme for domain-transfer problem in the context of sentiment analysis. In: ACM conference on information & knowledge management, pp 979–982Google Scholar
  30. 30.
    Li S, Zong C (2008) Multi-domain sentiment classification. In: Annual meeting of the association of computational linguistics, association for computational linguistics, pp 257–260Google Scholar
  31. 31.
    Pan J, Ni X, Sun J, Yang Q, Chen Z (2010) Cross-domain sentiment classification via spectral feature alignment. In: International World Wide Web Conference, ACM, pp 751–760Google Scholar
  32. 32.
    Biagioni R (2016) Unsupervised sentiment classification. Springer, ChamGoogle Scholar
  33. 33.
    Read J, Carroll J (2009) Weakly supervised techniques for domain-independent sentiment classification. In: Proceedings of the 1st international CIKM workshop on topic-sentiment analysis for mass opinion, TSA’09, pp 45–52Google Scholar
  34. 34.
    Zhao ZW, Guan L, Chen X, He D, Cai B, Wang, Wang Q (2018) Weakly-supervised deep embedding for product review sentiment analysis. IEEE Trans Knowl Data Eng 30(1):1–23Google Scholar
  35. 35.
    Zhu X (2007) Semi-supervised learning literature survey. Ph.D. thesisGoogle Scholar
  36. 36.
    Goldberg AB, Zhu X (2006) Seeing stars when there aren’t many stars: graph-based semi-supervised learning for sentiment categorization. In: Proceedings of text graphs: the first workshop on graph based methods for natural language processing, association for computational linguistics, pp 45–52Google Scholar
  37. 37.
    Sindhwani V, Melville P (2008) Document-word co-regularization for semi-supervised sentiment analysis. In: IEEE international conference on data mining, pp 1025–1030Google Scholar
  38. 38.
    Zhou S, Qingcai C, Xiaolong W (2010) Active deep networks for semi-supervised sentiment classification. In: International conference on computational linguistics, poster, pp 1515–1523Google Scholar
  39. 39.
    Smolensky S (1986) Information processing in dynamical systems: foundations of harmony theory. Parallel Distrib Process Explor Micro Struct Cognit 1:194–281Google Scholar
  40. 40.
    Park K-J, Lee J-P, Lee DY (2012) Optimal design of fuzzy clustering-based fuzzy neural networks for pattern classification. Int J Grid Distrib Comput 5(3):361–831Google Scholar
  41. 41.
    Rubio JJ, Pacheco J (2009) An stable online clustering fuzzy neural network for nonlinear system identification. Neural Comput Appl 18(1):633–641Google Scholar
  42. 42.
    Anuar N, Zakaria Z (2012) Electricity load profile determination by using Fuzzy C-means and probability neural network. Energy Procedia 14(5):1861–1869Google Scholar
  43. 43.
    Kass RE, Vos PW (1997) Geometrical foundations of asymptotic inference. Wiley, New YorkzbMATHGoogle Scholar
  44. 44.
    Amari S, Kawanabe M (1997) Information geometry of estimating functions in semiparametric statistical models. Bernoulli 3:29–54MathSciNetzbMATHGoogle Scholar
  45. 45.
    Dasgupta S, Ng V (2009) Mine the easy, classify the hard: a semi-supervised approach to automatic sentiment classification. In: Joint conference of the 47th annual meeting of the association for computational linguistics and 4th international joint conference on natural language processing of the Asian federation of natural language processing, pp 701–709Google Scholar
  46. 46.
    Sergey I, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. Comput Sci 3(21):15–23Google Scholar
  47. 47.
    Frieden BR (2004) Science from Fisher information: a unification. Cambridge Univ. Press, CambridgezbMATHGoogle Scholar
  48. 48.
    Devroye L, Gyorfi L, Lugosi G (1996) A probabilistic theory of pattern recognition. Springer, Berlin. ISBN:0-3879-4618-7zbMATHGoogle Scholar
  49. 49.
    Nielsen F, Garcia V (2009) “Statistical exponential families: a digest with flash cards. Scholar
  50. 50.
    Nielsen F (2013) Pattern learning and recognition on statistical manifolds. Int Workshop Similarity Based Pattern Recognit 7953:1–25Google Scholar
  51. 51.
    Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86MathSciNetzbMATHGoogle Scholar
  52. 52.
    Bengio YA (2009) Learning deep architecture for AI. Found Trends Mach Learn 2:1–127zbMATHGoogle Scholar
  53. 53.
    Kamvar S, Klein D, Manning C (2003) Spectral learning. In: International joint conferences on artificial intelligence. AAAI, Catalonia, pp 561–566Google Scholar
  54. 54.
    Xiong X, Chan KL, Tan KL (2012) Similarity-driven cluster merging method for unsupervised fuzzy clustering. In: Proceedings of the 20th international conference on uncertainty in artificial intelligence, pp 55–67Google Scholar
  55. 55.
    Smith LN (2017) Corpus based sentiment classification of tamil movie tweets using syntactic patterns. In: Applications of computer vision (WACV), 2017 IEEE winter conference on, pp 464–472. IEEEGoogle Scholar
  56. 56.
    Amari S (2001) Information geometry on hierarchy of probability distributions. IEEE Trans Inf Theory 47(5):1701–1711MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Meng Wang
    • 1
  • Zhen-Hu Ning
    • 1
    Email author
  • Tong Li
    • 1
  • Chuang-Bai Xiao
    • 1
  1. 1.Faculty of Information TechnologyBeijing University of TechnologyBeijingPeople’s Republic of China

Personalised recommendations