Datasets for Medical Sentiment Analysis

Denecke, Kerstin

doi:10.1007/978-3-031-30187-2_6

Kerstin Denecke²

296 Accesses

Abstract

Data is necessary for training sentiment classifiers as well as for testing and comparing sentiment algorithms. The datasets that have been used for medical sentiment analysis are described in this chapter. We separate datasets made up of clinical records from those made up of information from social media. Most of the currently accessible corpora are in English. Although available clinical datasets such as MIMIC-II or i2B2 datasets were not initially designed for medical sentiment analysis, researchers have used them for this purpose and manually added sentiment annotations. Data privacy concerns are one factor for the limited availability of clinical datasets. This chapter first summarises challenges of creating datasets for medical sentiment analysis. Second, it describes available datasets that have been used already by researchers in medical sentiment analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Carrillo-de Albornoz, J., Rodríguez-Vidal, J., Plaza, L.: ediseases dataset (2018). https://doi.org/10.5281/zenodo.1479354
Chen, Q., Sokolova, M.: Word2vec and doc2vec in unsupervised sentiment analysis of clinical discharge summaries. CoRR abs/1805.00352 (2018). http://arxiv.org/abs/1805.00352
Choi, S., Choi, J.: Snumedinfo at TREC CDS track 2014: Medical case-based retrieval task. In: TREC (2014)
Google Scholar
Dang, T.T., Ho, T.B.: Mixture of language models utilization in score-based sentiment classification on clinical narratives. In: International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, pp. 255–268. Springer (2016)
Google Scholar
Ghassemi, M.M., Al-Hanai, T., Raffa, J.D., Mark, R.G., Nemati, S., Chokshi, F.H.: How is the doctor feeling? ICU provider sentiment is associated with diagnostic imaging utilization. In: 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 4058–4064 (2018). https://doi.org/10.1109/EMBC.2018.8513325
Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N project report, Stanford 1(12), 2009 (2009)
Google Scholar
Gräßer, F., Kallumadi, S., Malberg, H., Zaunseder, S.: Aspect-based sentiment analysis of drug reviews applying cross-domain and cross-data learning. In: Proceedings of the 2018 International Conference on Digital Health, pp. 121–125 (2018). https://doi.org/10.1145/3194658.3194677
Gupta, R., Vishwanath, A., Yang, Y.: Global reactions to covid-19 on twitter: a labelled dataset with latent topic, sentiment and emotion attributes. arXiv:2007.06954v6 (2021)
Google Scholar
Imran, A.S., Daudpota, S.M., Kastrati, Z., Batra, R.: Cross-cultural polarity and emotion detection using sentiment analysis and deep learning on covid-19 related tweets. IEEE Access 8, 181074–181090 (2020)
Article Google Scholar
Johnson, A.E.W., Pollard, T.J., Shen, L., wei H. Lehman, L., Feng, M., Ghassemi, M.M., Moody, B., Szolovits, P., Celi, L.A., Mark, R.G.: MIMIC-III, a freely accessible critical care database. Sci. Data 3, 1 (2016)
Google Scholar
Johnson, A.E., Pollard, T.J., Berkowitz, S.J., Greenbaum, N.R., Lungren, M.P., Deng, C.y., Mark, R.G., Horng, S.: MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci. Data 6(1), 1–8 (2019)
Google Scholar
Johnson, A., Bulgarelli, L., Pollard, T., Celi, L.A., Mark, R., Horng, S.: MIMIC-IV-ED. In: PhsyioNet (2022)
Google Scholar
Li, Y., Su, H., Shen, X., Li, W., Cao, Z., Niu, S.: Dailydialog: a manually labelled multi-turn dialogue dataset. Preprint. arXiv:1710.03957 (2017)
Google Scholar
Liu, B., Hu, M., Cheng, J.: Opinion observer: analyzing and comparing opinions on the web. In: Proceedings of the 14th International Conference on World Wide Web, pp. 342–351 (2005)
Google Scholar
Mohammad, S.M., Bravo-Marquez, F.: Wassa-2017 shared task on emotion intensity. Preprint. arXiv:1708.03700 (2017)
Google Scholar
Mohan, M., Abhinav, A.K., Ashok, A., Akhil, A.V., Achinth, P.R.: Depression detection using facial expression and sentiment analysis. In: 2021 Asian Conference on Innovation in Technology (ASIANCON), pp. 1–6 (2021). https://doi.org/10.1109/ASIANCON51346.2021.9544819
Pestian, J.P., Matykiewicz, P., Linn-Gust, M., South, B., Uzuner, O., Wiebe, J., Cohen, K.B., Hurdle, J., Brew, C.: Sentiment analysis of suicide notes: a shared task. Biomed. Inf. Insights 5, BII–S9042 (2012)
Google Scholar
Sabra, S., Malik, K.M., Alobaidi, M.: Prediction of venous thromboembolism using semantic and sentiment analyses of clinical narratives. Comput. Biol. Med. 94, 1–10 (2018)
Article Google Scholar
Sanglerdsinlapachai, N., Plangprasopchok, A., Ho, T.B., Nantajeewarawat, E.: Improving sentiment analysis on clinical narratives by exploiting UMLS semantic types. Artif. Intell. Med. 113, 102033 (2021)
Article Google Scholar
Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013)
Google Scholar
Sohn, S., Torii, M., Li, D., Wagholikar, K.B., Wu, S.T.I., Liu, H.: A hybrid approach to sentiment sentence classification in suicide notes. Biomed. Inf. Insights 5, 43–50 (2012)
Google Scholar
Stubbs, A., Kotfila, C., Xu, H., Uzuner, Ö: Identifying risk factors for heart disease over time: overview of 2014 i2b2/uthealth shared task track 2. J. Biomed. Inf. 58, S67–S77 (2015). https://doi.org/10.1016/j.jbi.2015.07.001. http://www.sciencedirect.com/science/article/pii/S1532046415001409. Proceedings of the 2014 i2b2/UTHealth Shared-Tasks and Workshop on Challenges in Natural Language Processing for Clinical Data
Uzuner, Ö.: Recognizing obesity and comorbidities in sparse data. J. Am. Med. Inf. Assoc. 16(4), 561–570 (2009). https://doi.org/10.1197/jamia.M3115
Article Google Scholar
Zhang, Y., Zhao, Z., Wang, P., Li, X., Rong, L., Song, D.: InteractiveSentimentDataset. IEEE Dataport (2020). https://doi.org/10.21227/d3rf-sd41
Zhang, Y., Zhao, Z., Wang, P., Li, X., Rong, L., Song, D.: Scenariosa: a dyadic conversational database for interactive sentiment analysis. IEEE Access 8, 90652–90664 (2020). https://doi.org/10.1109/ACCESS.2020.2994147
Article Google Scholar
Zou Y., Wang J., Lei Z., Zhang Y., Wang W.: Sentiment analysis for necessary preview of 30-day mortality in sepsis patients and the control strategies. J. Healthc. Eng. 2021, Article 1713363 (2021). https://doi.org/10.1155/2021/1713363
Article Google Scholar

Download references

Author information

Authors and Affiliations

Bern University of Applied Sciences, Biel, Switzerland
Kerstin Denecke

Authors

Kerstin Denecke
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Denecke, K. (2023). Datasets for Medical Sentiment Analysis. In: Sentiment Analysis in the Medical Domain. Springer, Cham. https://doi.org/10.1007/978-3-031-30187-2_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-30187-2_6
Published: 24 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30186-5
Online ISBN: 978-3-031-30187-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics