Abstract
Data is necessary for training sentiment classifiers as well as for testing and comparing sentiment algorithms. The datasets that have been used for medical sentiment analysis are described in this chapter. We separate datasets made up of clinical records from those made up of information from social media. Most of the currently accessible corpora are in English. Although available clinical datasets such as MIMIC-II or i2B2 datasets were not initially designed for medical sentiment analysis, researchers have used them for this purpose and manually added sentiment annotations. Data privacy concerns are one factor for the limited availability of clinical datasets. This chapter first summarises challenges of creating datasets for medical sentiment analysis. Second, it describes available datasets that have been used already by researchers in medical sentiment analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Carrillo-de Albornoz, J., RodrÃguez-Vidal, J., Plaza, L.: ediseases dataset (2018). https://doi.org/10.5281/zenodo.1479354
Chen, Q., Sokolova, M.: Word2vec and doc2vec in unsupervised sentiment analysis of clinical discharge summaries. CoRR abs/1805.00352 (2018). http://arxiv.org/abs/1805.00352
Choi, S., Choi, J.: Snumedinfo at TREC CDS track 2014: Medical case-based retrieval task. In: TREC (2014)
Dang, T.T., Ho, T.B.: Mixture of language models utilization in score-based sentiment classification on clinical narratives. In: International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, pp. 255–268. Springer (2016)
Ghassemi, M.M., Al-Hanai, T., Raffa, J.D., Mark, R.G., Nemati, S., Chokshi, F.H.: How is the doctor feeling? ICU provider sentiment is associated with diagnostic imaging utilization. In: 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 4058–4064 (2018). https://doi.org/10.1109/EMBC.2018.8513325
Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N project report, Stanford 1(12), 2009 (2009)
Gräßer, F., Kallumadi, S., Malberg, H., Zaunseder, S.: Aspect-based sentiment analysis of drug reviews applying cross-domain and cross-data learning. In: Proceedings of the 2018 International Conference on Digital Health, pp. 121–125 (2018). https://doi.org/10.1145/3194658.3194677
Gupta, R., Vishwanath, A., Yang, Y.: Global reactions to covid-19 on twitter: a labelled dataset with latent topic, sentiment and emotion attributes. arXiv:2007.06954v6 (2021)
Imran, A.S., Daudpota, S.M., Kastrati, Z., Batra, R.: Cross-cultural polarity and emotion detection using sentiment analysis and deep learning on covid-19 related tweets. IEEE Access 8, 181074–181090 (2020)
Johnson, A.E.W., Pollard, T.J., Shen, L., wei H. Lehman, L., Feng, M., Ghassemi, M.M., Moody, B., Szolovits, P., Celi, L.A., Mark, R.G.: MIMIC-III, a freely accessible critical care database. Sci. Data 3, 1 (2016)
Johnson, A.E., Pollard, T.J., Berkowitz, S.J., Greenbaum, N.R., Lungren, M.P., Deng, C.y., Mark, R.G., Horng, S.: MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci. Data 6(1), 1–8 (2019)
Johnson, A., Bulgarelli, L., Pollard, T., Celi, L.A., Mark, R., Horng, S.: MIMIC-IV-ED. In: PhsyioNet (2022)
Li, Y., Su, H., Shen, X., Li, W., Cao, Z., Niu, S.: Dailydialog: a manually labelled multi-turn dialogue dataset. Preprint. arXiv:1710.03957 (2017)
Liu, B., Hu, M., Cheng, J.: Opinion observer: analyzing and comparing opinions on the web. In: Proceedings of the 14th International Conference on World Wide Web, pp. 342–351 (2005)
Mohammad, S.M., Bravo-Marquez, F.: Wassa-2017 shared task on emotion intensity. Preprint. arXiv:1708.03700 (2017)
Mohan, M., Abhinav, A.K., Ashok, A., Akhil, A.V., Achinth, P.R.: Depression detection using facial expression and sentiment analysis. In: 2021 Asian Conference on Innovation in Technology (ASIANCON), pp. 1–6 (2021). https://doi.org/10.1109/ASIANCON51346.2021.9544819
Pestian, J.P., Matykiewicz, P., Linn-Gust, M., South, B., Uzuner, O., Wiebe, J., Cohen, K.B., Hurdle, J., Brew, C.: Sentiment analysis of suicide notes: a shared task. Biomed. Inf. Insights 5, BII–S9042 (2012)
Sabra, S., Malik, K.M., Alobaidi, M.: Prediction of venous thromboembolism using semantic and sentiment analyses of clinical narratives. Comput. Biol. Med. 94, 1–10 (2018)
Sanglerdsinlapachai, N., Plangprasopchok, A., Ho, T.B., Nantajeewarawat, E.: Improving sentiment analysis on clinical narratives by exploiting UMLS semantic types. Artif. Intell. Med. 113, 102033 (2021)
Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013)
Sohn, S., Torii, M., Li, D., Wagholikar, K.B., Wu, S.T.I., Liu, H.: A hybrid approach to sentiment sentence classification in suicide notes. Biomed. Inf. Insights 5, 43–50 (2012)
Stubbs, A., Kotfila, C., Xu, H., Uzuner, Ö: Identifying risk factors for heart disease over time: overview of 2014 i2b2/uthealth shared task track 2. J. Biomed. Inf. 58, S67–S77 (2015). https://doi.org/10.1016/j.jbi.2015.07.001. http://www.sciencedirect.com/science/article/pii/S1532046415001409. Proceedings of the 2014 i2b2/UTHealth Shared-Tasks and Workshop on Challenges in Natural Language Processing for Clinical Data
Uzuner, Ö.: Recognizing obesity and comorbidities in sparse data. J. Am. Med. Inf. Assoc. 16(4), 561–570 (2009). https://doi.org/10.1197/jamia.M3115
Zhang, Y., Zhao, Z., Wang, P., Li, X., Rong, L., Song, D.: InteractiveSentimentDataset. IEEE Dataport (2020). https://doi.org/10.21227/d3rf-sd41
Zhang, Y., Zhao, Z., Wang, P., Li, X., Rong, L., Song, D.: Scenariosa: a dyadic conversational database for interactive sentiment analysis. IEEE Access 8, 90652–90664 (2020). https://doi.org/10.1109/ACCESS.2020.2994147
Zou Y., Wang J., Lei Z., Zhang Y., Wang W.: Sentiment analysis for necessary preview of 30-day mortality in sepsis patients and the control strategies. J. Healthc. Eng. 2021, Article 1713363 (2021). https://doi.org/10.1155/2021/1713363
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Denecke, K. (2023). Datasets for Medical Sentiment Analysis. In: Sentiment Analysis in the Medical Domain. Springer, Cham. https://doi.org/10.1007/978-3-031-30187-2_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-30187-2_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30186-5
Online ISBN: 978-3-031-30187-2
eBook Packages: Computer ScienceComputer Science (R0)