Skip to main content

Deep Learning for Suicide and Depression Identification with Unsupervised Label Correction

  • 1284 Accesses

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 12895)

Abstract

Early detection of suicidal ideation in depressed individuals can allow for adequate medical attention and support, which can be life-saving. Recent NLP research focuses on classifying, from given text, if an individual is suicidal or clinically healthy. However, there have been no major attempts to differentiate between depression and suicidal ideation, which is a separate and important clinical challenge. Due to the scarce availability of EHR data, suicide notes, or other verified sources, web query data has emerged as a promising alternative. Online sources, such as Reddit, allow for anonymity, prompting honest disclosure of symptoms, making it a plausible source even in a clinical setting. However, online datasets also result in inherent noise in web-scraped labels, which necessitates a noise-removal process to improve performance. Thus, we propose SDCNL, a deep neural network approach for suicide versus depression classification. We utilize online content to train our algorithm, and to verify and correct noisy labels, we propose a novel unsupervised label correction method which, unlike previous work, does not require prior noise distribution information. Our extensive experimentation with various deep word embedding models and classifiers display strong performance of SDCNL as a new clinical application for a challenging problem (We make our supplemental, dataset, web-scraping script, and code (with hyperparameters) available at https://github.com/ayaanzhaque/SDCNL).

Keywords

  • Suicide/Depression
  • Noisy labels
  • Deep learning
  • Online content
  • Natural Language Processing
  • Unsupervised learning

A. Haque and V. Reddi—Equal contribution.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-86383-8_35
  • Chapter length: 12 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   84.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-86383-8
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   109.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.

References

  1. Reddit C-SSRS Suicide Dataset. Zenodo, May 2019

    Google Scholar 

  2. Bering, J.: Suicidal: Why We Kill Ourselves. University of Chicago Press, Chicago (2018)

    CrossRef  Google Scholar 

  3. Bouveyron, C., Girard, S.: Robust supervised classification with mixture models: learning from data with uncertain labels. Pattern Recogn. 42(11), 2649–2658 (2009)

    CrossRef  Google Scholar 

  4. Cer, D., Yang, Y., et al.: Universal sentence encoder for English. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 169–174 (2018)

    Google Scholar 

  5. De Choudhury, M., De, S.: Mental health discourse on reddit: self-disclosure, social support, and anonymity. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 8 (2014)

    Google Scholar 

  6. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171–4186, June 2019

    Google Scholar 

  7. Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25, 845–869 (2014)

    CrossRef  Google Scholar 

  8. Hendrycks, D., Mazeika, M., Wilson, D., Gimpel, K.: Using trusted data to train deep networks on labels corrupted by severe noise. In: Advances in Neural Information Processing Systems, vol. 31, pp. 10456–10465 (2018)

    Google Scholar 

  9. Hendrycks, D., Mazeika, M., Wilson, D., Gimpel, K.: Using trusted data to train deep networks on labels corrupted by severe noise. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 10477–10486 (2018)

    Google Scholar 

  10. Ji, S., Pan, S., Li, X., Cambria, E., Long, G., Huang, Z.: Suicidal ideation detection: a review of machine learning methods and applications. IEEE Trans. Comput. Soc. Syst. (2020)

    Google Scholar 

  11. Jiang, Z., Silovsky, J., Siu, M.H., Hartmann, W., Gish, H., Adali, S.: Learning from noisy labels with noise modeling network. arXiv preprint arXiv:2005.00596 (2020)

  12. Jindal, I., Pressel, D., Lester, B., Nokleby, M.: An effective label noise model for DNN text classification (2019)

    Google Scholar 

  13. Leonard, C.: Depression and suicidality. J. Consult. Clin. Psychol. 42(1), 98 (1974)

    CrossRef  Google Scholar 

  14. Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the Association for Computational Linguistics, pp. 142–150 (2011)

    Google Scholar 

  15. McInnes, L., Healy, J., Melville, J.: UMAP: uniform manifold approximation and projection for dimension reduction. arXiv:1802.03426 (2018)

  16. Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data: a review. SIGKDD Explor. Newsl. 6(1), 90–105 (2004)

    CrossRef  Google Scholar 

  17. Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks (2019)

    Google Scholar 

  18. Schrading, N., Alm, C.O., Ptucha, R., Homan, C.: An analysis of domestic abuse discourse on reddit. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 2577–2583 (2015)

    Google Scholar 

  19. Shen, J., Rudzicz, F.: Detecting anxiety through reddit, pp. 58–65 (2017)

    Google Scholar 

  20. Song, H., Kim, M., Park, D., Lee, J.G.: Learning from noisy labels with deep neural networks: a survey. arXiv preprint arXiv:2007.08199 (2020)

  21. Steinbach, M., Ertöz, L., Kumar, V.: The challenges of clustering high dimensional data. In: Wille, L.T. (ed.) New Directions in Statistical Physics, pp. 273–309. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-662-08968-2_16

    CrossRef  Google Scholar 

  22. Zheng, G., Awadallah, A.H., Dumais, S.: Meta label correction for learning with weak supervision. arXiv preprint arXiv:1911.03809 (2019)

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 196 KB)

Rights and permissions

Reprints and Permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Haque, A., Reddi, V., Giallanza, T. (2021). Deep Learning for Suicide and Depression Identification with Unsupervised Label Correction. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12895. Springer, Cham. https://doi.org/10.1007/978-3-030-86383-8_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86383-8_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86382-1

  • Online ISBN: 978-3-030-86383-8

  • eBook Packages: Computer ScienceComputer Science (R0)