Skip to main content

Datasets for Medical Sentiment Analysis

  • Chapter
  • First Online:
Sentiment Analysis in the Medical Domain
  • 296 Accesses

Abstract

Data is necessary for training sentiment classifiers as well as for testing and comparing sentiment algorithms. The datasets that have been used for medical sentiment analysis are described in this chapter. We separate datasets made up of clinical records from those made up of information from social media. Most of the currently accessible corpora are in English. Although available clinical datasets such as MIMIC-II or i2B2 datasets were not initially designed for medical sentiment analysis, researchers have used them for this purpose and manually added sentiment annotations. Data privacy concerns are one factor for the limited availability of clinical datasets. This chapter first summarises challenges of creating datasets for medical sentiment analysis. Second, it describes available datasets that have been used already by researchers in medical sentiment analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://physionet.org/.

  2. 2.

    https://n2c2.dbmi.hms.harvard.edu/.

  3. 3.

    https://www.muse-challenge.org/challenge/data.

  4. 4.

    https://www.kaggle.com/datasets/kazanova/sentiment140.

  5. 5.

    http://yanran.li/dailydialog.html.

References

  1. Carrillo-de Albornoz, J., Rodríguez-Vidal, J., Plaza, L.: ediseases dataset (2018). https://doi.org/10.5281/zenodo.1479354

  2. Chen, Q., Sokolova, M.: Word2vec and doc2vec in unsupervised sentiment analysis of clinical discharge summaries. CoRR abs/1805.00352 (2018). http://arxiv.org/abs/1805.00352

  3. Choi, S., Choi, J.: Snumedinfo at TREC CDS track 2014: Medical case-based retrieval task. In: TREC (2014)

    Google Scholar 

  4. Dang, T.T., Ho, T.B.: Mixture of language models utilization in score-based sentiment classification on clinical narratives. In: International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, pp. 255–268. Springer (2016)

    Google Scholar 

  5. Ghassemi, M.M., Al-Hanai, T., Raffa, J.D., Mark, R.G., Nemati, S., Chokshi, F.H.: How is the doctor feeling? ICU provider sentiment is associated with diagnostic imaging utilization. In: 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 4058–4064 (2018). https://doi.org/10.1109/EMBC.2018.8513325

  6. Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N project report, Stanford 1(12), 2009 (2009)

    Google Scholar 

  7. Gräßer, F., Kallumadi, S., Malberg, H., Zaunseder, S.: Aspect-based sentiment analysis of drug reviews applying cross-domain and cross-data learning. In: Proceedings of the 2018 International Conference on Digital Health, pp. 121–125 (2018). https://doi.org/10.1145/3194658.3194677

  8. Gupta, R., Vishwanath, A., Yang, Y.: Global reactions to covid-19 on twitter: a labelled dataset with latent topic, sentiment and emotion attributes. arXiv:2007.06954v6 (2021)

    Google Scholar 

  9. Imran, A.S., Daudpota, S.M., Kastrati, Z., Batra, R.: Cross-cultural polarity and emotion detection using sentiment analysis and deep learning on covid-19 related tweets. IEEE Access 8, 181074–181090 (2020)

    Article  Google Scholar 

  10. Johnson, A.E.W., Pollard, T.J., Shen, L., wei H. Lehman, L., Feng, M., Ghassemi, M.M., Moody, B., Szolovits, P., Celi, L.A., Mark, R.G.: MIMIC-III, a freely accessible critical care database. Sci. Data 3, 1 (2016)

    Google Scholar 

  11. Johnson, A.E., Pollard, T.J., Berkowitz, S.J., Greenbaum, N.R., Lungren, M.P., Deng, C.y., Mark, R.G., Horng, S.: MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci. Data 6(1), 1–8 (2019)

    Google Scholar 

  12. Johnson, A., Bulgarelli, L., Pollard, T., Celi, L.A., Mark, R., Horng, S.: MIMIC-IV-ED. In: PhsyioNet (2022)

    Google Scholar 

  13. Li, Y., Su, H., Shen, X., Li, W., Cao, Z., Niu, S.: Dailydialog: a manually labelled multi-turn dialogue dataset. Preprint. arXiv:1710.03957 (2017)

    Google Scholar 

  14. Liu, B., Hu, M., Cheng, J.: Opinion observer: analyzing and comparing opinions on the web. In: Proceedings of the 14th International Conference on World Wide Web, pp. 342–351 (2005)

    Google Scholar 

  15. Mohammad, S.M., Bravo-Marquez, F.: Wassa-2017 shared task on emotion intensity. Preprint. arXiv:1708.03700 (2017)

    Google Scholar 

  16. Mohan, M., Abhinav, A.K., Ashok, A., Akhil, A.V., Achinth, P.R.: Depression detection using facial expression and sentiment analysis. In: 2021 Asian Conference on Innovation in Technology (ASIANCON), pp. 1–6 (2021). https://doi.org/10.1109/ASIANCON51346.2021.9544819

  17. Pestian, J.P., Matykiewicz, P., Linn-Gust, M., South, B., Uzuner, O., Wiebe, J., Cohen, K.B., Hurdle, J., Brew, C.: Sentiment analysis of suicide notes: a shared task. Biomed. Inf. Insights 5, BII–S9042 (2012)

    Google Scholar 

  18. Sabra, S., Malik, K.M., Alobaidi, M.: Prediction of venous thromboembolism using semantic and sentiment analyses of clinical narratives. Comput. Biol. Med. 94, 1–10 (2018)

    Article  Google Scholar 

  19. Sanglerdsinlapachai, N., Plangprasopchok, A., Ho, T.B., Nantajeewarawat, E.: Improving sentiment analysis on clinical narratives by exploiting UMLS semantic types. Artif. Intell. Med. 113, 102033 (2021)

    Article  Google Scholar 

  20. Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013)

    Google Scholar 

  21. Sohn, S., Torii, M., Li, D., Wagholikar, K.B., Wu, S.T.I., Liu, H.: A hybrid approach to sentiment sentence classification in suicide notes. Biomed. Inf. Insights 5, 43–50 (2012)

    Google Scholar 

  22. Stubbs, A., Kotfila, C., Xu, H., Uzuner, Ö: Identifying risk factors for heart disease over time: overview of 2014 i2b2/uthealth shared task track 2. J. Biomed. Inf. 58, S67–S77 (2015). https://doi.org/10.1016/j.jbi.2015.07.001. http://www.sciencedirect.com/science/article/pii/S1532046415001409. Proceedings of the 2014 i2b2/UTHealth Shared-Tasks and Workshop on Challenges in Natural Language Processing for Clinical Data

  23. Uzuner, Ö.: Recognizing obesity and comorbidities in sparse data. J. Am. Med. Inf. Assoc. 16(4), 561–570 (2009). https://doi.org/10.1197/jamia.M3115

    Article  Google Scholar 

  24. Zhang, Y., Zhao, Z., Wang, P., Li, X., Rong, L., Song, D.: InteractiveSentimentDataset. IEEE Dataport (2020). https://doi.org/10.21227/d3rf-sd41

  25. Zhang, Y., Zhao, Z., Wang, P., Li, X., Rong, L., Song, D.: Scenariosa: a dyadic conversational database for interactive sentiment analysis. IEEE Access 8, 90652–90664 (2020). https://doi.org/10.1109/ACCESS.2020.2994147

    Article  Google Scholar 

  26. Zou Y., Wang J., Lei Z., Zhang Y., Wang W.: Sentiment analysis for necessary preview of 30-day mortality in sepsis patients and the control strategies. J. Healthc. Eng. 2021, Article 1713363 (2021). https://doi.org/10.1155/2021/1713363

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Denecke, K. (2023). Datasets for Medical Sentiment Analysis. In: Sentiment Analysis in the Medical Domain. Springer, Cham. https://doi.org/10.1007/978-3-031-30187-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-30187-2_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-30186-5

  • Online ISBN: 978-3-031-30187-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics