On Interpretability and Feature Representations: An Analysis of the Sentiment Neuron

  • Jonathan DonnellyEmail author
  • Adam Roegiest
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11437)


We are concerned with investigating the apparent effectiveness of Radford et al.’s “Sentiment Neuron,” [9] which they claim encapsulates sufficient knowledge to accurately predict sentiment in reviews. In our analysis of the Sentiment Neuron, we find that the removal of the neuron only marginally affects a classifier’s ability to detect and label sentiment and may even improve performance. Moreover, the effectiveness of the Sentiment Neuron can be surpassed by simply using 100 random neurons as features to the same classifier. Using adversarial examples, we show that the generated representation containing the Sentiment Neuron (i.e., the final hidden cell state in a LSTM) is particularly sensitive to the end of a processed sequence. Accordingly, we find that caution needs to be applied when interpreting neuron-based feature representations and potential flaws should be addressed for real-world applicability.


  1. 1.
    Howard, J., Ruder, S.: Universal language model fine-tuning for text classification. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2018)Google Scholar
  2. 2.
    Karpathy, A., Johnson, J., Li, F.: Visualizing and understanding recurrent networks. In: International Conference on Learning Representations (2016)Google Scholar
  3. 3.
    Krause, B., Lu, L., Murray, I., Renals, S.: Multiplicative LSTM for sequence modelling. In: International Conference on Learning Representations (2017)Google Scholar
  4. 4.
    Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, HLT 2011, pp. 142–150. Association for Computational Linguistics, Stroudsburg (2011)Google Scholar
  5. 5.
    McAuley, J.: Amazon product data.
  6. 6.
    Morcos, A.S., Barrett, D.G., Rabinowitz, N.C., Botvinick, M.: On the importance of single directions for generalization. In: International Conference on Learning Representations (2018)Google Scholar
  7. 7.
    Nvidia: Sentiment discovery. GitHub repository (2017)Google Scholar
  8. 8.
    Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of ACL, pp. 115–124 (2005)Google Scholar
  9. 9.
    Radford, A., Józefowicz, R., Sutskever, I.: Learning to generate reviews and discovering sentiment (2017).
  10. 10.
    Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642. Association for Computational Linguistics, Seattle, Washington, October 2013Google Scholar
  11. 11.
    Wang, S., Manning, C.D.: Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2, ACL 2012, pp. 90–94 (2012)Google Scholar
  12. 12.
    Wiebe, J., Wilson, T., Cardie, C.: Annotating expressions of opinions and emotionsin language. Lang. Resour. Eval. 1 (2005)Google Scholar
  13. 13.
    Zhou, B., Khosla, A., Lapedriza, À., Oliva, A., Torralba, A.: Object detectors emerge in deep scene CNNs. In: International Conference on Learning Representations (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Kira SystemsTorontoCanada

Personalised recommendations