Abstract
Cancer is one among leading diseases, which affects millions of people and families around the world. Monitoring the mood of such cancer affected people plays a vital part in their treatment. In recent days, social media provides a platform for many people to share their experiences about the cancer through various blogs and communities. In this study, we intended to analyse moods of various cancer affected patients by collecting tweets from different online cancer supported communities. We employed several text mining and machine learning strategies to perform sentiment analysis on a distributed framework and developed a model for easier and faster analysis. The proposed distributed framework with long short-term memory (LSTM) neural network is an alternative to the conventional sentiment analysis approaches in analysing large volumes of data in a potential flow. The effectiveness of proposed framework was evaluated on the proposed dataset (corpus-1) and other two benchmark datasets like Health news Tweets (corpus-2) and Medical abstracts (corpus-3). The performance of each text mining and classification method was separately evaluated on three datasets and compared to each other. The results proved that the proposed approach performed better among the other methods in terms of both accuracy and execution time.
Similar content being viewed by others
References
Aisopos F, Papadakis G, Varvarigou T (2011) Sentiment analysis of social media content using N-Gram graphs. In: Proceedings of the 3rd ACM SIGMM international workshop on Social media—WSM’11, p 9. https://doi.org/10.1145/2072609.2072614
Ando Y, Terazaki H, Haraoka K, Tajiri T, Nakamura M, Obayashi K, Ishizaki T (2002) Presence of autoantibody against ATTR Val30Met after sequential liver transplantation. Transplantation 73(5):751–755. https://doi.org/10.1097/00007890-200203150-00016
Baltas ABAK, Tsakalidis AK (2017) Algorithmic aspects of cloud computing. In: Lecture Notes in Computer Science, vol 10230. Springer, Berlin, pp 15–25
Barry J (2017) Sentiment analysis of online reviews using bag-of-words and LSTM approaches. In: CEUR workshop proceedings, pp 272–274
Bashri MFA, Kusumaningrum R (2017) Sentiment analysis using Latent Dirichlet allocation and topic polarity wordcloud visualization. In: 2017 5th international conference on information and communication technology, ICoIC7 2017, 0(c), pp 4–8. https://doi.org/10.1109/icoict.2017.8074651
Brody CM, Davidson N (eds) (1998) Professional development for cooperative learning: issues and approaches. Suny Press, New York
Cambria E, Benson T, Eckl C, Hussain A (2012) Sentic PROMs: application of sentic computing to the development of a novel unified framework for measuring health-care quality. Expert Syst Appl 39(12):10533–10543. https://doi.org/10.1016/j.eswa.2012.02.120
Carod FA, Cuadrado MP, González JG, Egido JH (1997) Autonomic disorder and sudden death in a patient with Wallenberg’s syndrome. Neurología (Barcelona, Spain) 12(1):1–9. https://doi.org/10.1162/jmlr.2003.3.4-5.951
Chen Z, Zeng DD (2017) Mining online e-liquid reviews for opinion polarities about e-liquid features. BMC Public Health 17(1):1–7. https://doi.org/10.1186/s12889-017-4533-z
Chen J, Pan X, Monga R, Bengio S, Jozefowicz R (2016) Revisiting distributed synchronous SGD. arXiv preprint arXiv:1604.00981
Chen M, Hao Y, Hwang K, Wang L, Wang L (2017a) Disease prediction by machine learning over big data from healthcare communities. IEEE Access 5(c):8869–8879. https://doi.org/10.1109/access.2017.2694446
Chen T, Xu R, He Y, Wang X (2017b) Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2016.10.065
Cheng OKM, Lau R (2015) Big data stream analytics for near real-time sentiment analysis. J Comput Commun 3(3):189–195. https://doi.org/10.4236/jcc.2015.35024
Chiu B, Crichton G, Korhonen A, Pyysalo S (2016) How to train good word embeddings for biomedical NLP. In: Proceedings of the 15th workshop on biomedical natural language processing, pp 166–174. https://doi.org/10.18653/v1/w16-2922
Crannell WC, Clark E, Jones C, James TA, Moore J (2016) A pattern-matched Twitter analysis of US cancer-patient sentiments. J Surg Res 206(2):536–542. https://doi.org/10.1016/j.jss.2016.06.050(Elsevier Inc)
De la Torre-Díez I, Díaz-Pernas FJ, Antón-Rodríguez M (2012) A content analysis of chronic diseases social groups on facebook and twitter. Telemed e-Health 18(6):404–408. https://doi.org/10.1089/tmj.2011.0227
Denecke K, Nejdl W (2009) How valuable is medical social media data? Content analysis of the medical web. Inf Sci 179(12):1870–1880. https://doi.org/10.1016/j.ins.2009.01.025(Elsevier Inc)
Devi KA, Edara DC, Sistla VPK, Kolli VKK (2018) Extended correlated principal component analysis with SVM-PUK in opinion mining. Turk J Electr Eng Comput Sci 26(5):2570–2582. https://doi.org/10.3906/elk-1704-178
Dey A, Jenamani M, Thakkar JJ (2018) Senti-N-Gram: an n-gram lexicon for sentiment analysis. Expert Syst Appl 103:92–105. https://doi.org/10.1016/j.eswa.2018.03.004(Elsevier Ltd)
Du J, Xu J, Song H, Liu X, Tao C (2017) Optimization on machine learning based approaches for sentiment analysis on HPV vaccines related tweets. J Biomed Semant 8(1):1–7. https://doi.org/10.1186/s13326-017-0120-6
Esuli A, Sebastiani F (2006) Determining term subjectivity and term orientation for opinion mining. In: Proceedings of the 11th meeting of the european chapter of the association for computational linguistics (EACL-2006), vol 2(1), pp 193–200. http://doi.org/10.1.1.60.8645
Fang X, Zhan J (2015) Sentiment analysis using product review data. J Big Data 2(1):5. https://doi.org/10.1186/s40537-015-0015-2
Ficek M, Kencl L (2012) Inter-call mobility model: a spatio-temporal refinement of call data records using a gaussian mixture model. In: 2012 Proceedings IEEE INFOCOM. IEEE, pp 469–477. https://doi.org/10.1109/infcom.2012.6195786
Ha I, Back B, Ahn B (2015) MapReduce functions to analyze sentiment information from social big data. Int J Distrib Sens Netw. https://doi.org/10.1155/2015/417502
Hamdan H, Bellot P, Bechet F (2015) Lsislif: CRF and logistic regression for opinion target extraction and sentiment polarity analysis. In: Proceedings of the 9th international workshop on semantic evaluation, (SemEval), pp 753–758. https://doi.org/10.1016/j.crhy.2009.03.001
Jonnalagadda S, Peeler R, Topham P (2012) Discovering opinion leaders for medical topics using news articles. J Biomed Semant 3(1):2
Kim E, Han JY, Moon TJ, Shaw B, Shah DV, McTavish FM, Gustafson DH (2012) The process and effect of supportive message expression and reception in online breast cancer support groups. Psycho-Oncology 21(5):531–540. https://doi.org/10.1002/pon.1942
Liang J, Liu P, Tan J, Bai S (2014) Sentiment classification based on AS-LDA model. Proc Comput Sci 31:511–516. https://doi.org/10.1016/j.procs.2014.05.296
Liang X, Lin L, Shen X, Feng J, Yan S, Xing EP (2017) Interpretable structure-evolving LSTM. In: Proceedings—30th IEEE conference on computer vision and pattern recognition, CVPR 2017, 2017-Janua, pp 2175–2184. https://doi.org/10.1109/cvpr.2017.234
Lin F, Xiahou J, Xu Z (2016) TCM clinic records data mining approaches based on weighted-LDA and multi-relationship LDA model. Multimed Tools Appl 75(22):14203–14232. https://doi.org/10.1007/s11042-016-3363-9
Lu Y (2013) Automatic topic identification of health-related messages in online health community using text classification. SpringerPlus 2(1):1–7. https://doi.org/10.1186/2193-1801-2-309
Madani Y, Erritali M, Bengourram J (2018) Sentiment analysis using semantic similarity and Hadoop MapReduce. Knowl Inf Syst. https://doi.org/10.1007/s10115-018-1212-z(Springer London)
Manogaran G, Varatharajan R, Priyan MK (2018) Hybrid recommendation system for heart disease diagnosis based on multiple kernel learning with adaptive neuro-fuzzy inference system. Multimed Tools Appl 77(4):4379–4399. https://doi.org/10.1007/s11042-017-5515-y
Meesad P, Boonrawd P, Nuipian V (2011) A Chi square-test for word importance differentiation in text classification. Int Conf Inf Electron Eng 6:110–114. https://doi.org/10.1016/S0043-1354(01)00016-1
Minarro-Gimenez JA, Marin-Alonso O, Samwald M (2014) Exploring the application of deep learning techniques on medical text corpora. Stud Health Technol Inform 205:584–588. https://doi.org/10.3233/978-1-61499-432-9-584
Miura Y, Hattori K, Ohkuma T, Masuichi H (2013) Topic modeling with sentiment clues and relaxed labeling schema. In: Proceedings of the 3rd workshop on sentiment analysis where AI meets psychology, pp 6–14
Murthy D, Eldredge M (2016) Who tweets about cancer? An analysis of cancer-related tweets in the USA. Digit Health 2:205520761665767. https://doi.org/10.1177/2055207616657670
Nodarakis N, Sioutas S, Tsakalidis AK, Tzimas G (2016) Large scale sentiment analysis on twitter with spark. In: EDBT/ICDT workshops, pp 1–8
Oneto L, Bisio F, Cambria E, Anguita D (2016) Statistical learning theory and ELM for big social data analysis. IEEE Comput Intell Mag 11(3):45–55. https://doi.org/10.1109/MCI.2016.2572540
Ozcift A, Gulten A (2011) Classifier ensemble construction with rotation forest to improve medical diagnosis performance of machine learning algorithms. Comput Methods Programs Biomed 104(3):443–451. https://doi.org/10.1016/j.cmpb.2011.03.018(Elsevier Ireland Ltd)
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135. https://doi.org/10.1561/1500000011
Portier K, Greer GE, Rokach L, Ofek N, Wang Y, Biyani P, Yu M, Banerjee S, Zhao K, Mitra P, Yen J (2013) Understanding topics and sentiment in an online cancer survivor community. J Natl Cancer Inst Monogr 47:195–198. https://doi.org/10.1093/jncimonographs/lgt025
Qiu B, Zhao K, Mitra P, Wu D, Caragea C, Yen J, Portier K (2011) Get online support, feel better-sentiment analysis and dynamics in an online cancer survivor community. In: Proceedings—2011 IEEE international conference on privacy, security, risk and trust and IEEE international conference on social computing, PASSAT/SocialCom 2011, pp 274–281. https://doi.org/10.1109/passat/socialcom.2011.127
Rahnama AHA (2014) Distributed real-time sentiment analysis for big data social streams. In: Proceedings—2014 international conference on control, decision and information technologies, CoDIT 2014, pp 789–794. https://doi.org/10.1109/codit.2014.6996998
TH M, Sahu S, Anand A (2015) Evaluating distributed word representations for capturing semantics of biomedical concepts. In: Proceedings of BioNLP 15, (Ml), pp 158–163. https://doi.org/10.18653/v1/w15-3820
Shaw BR, McTavish F, Hawkins R, Gustafson DH, Pingree S (2000) Experiences of women with breast cancer: exchanging social support over the CHESS computer network. J Health Commun 5(2):135–159. https://doi.org/10.1080/108107300406866
Soutner D, Müller L (2013) Application of LSTM neural networks in language modelling. In: Habernal I, Matoušek V (eds) Text, speech, and dialogue. TSD 2013, Lecture notes in computer science, vol 8082. Springer, Berlin
Spinczyk D, Nabrdalik K, Rojewska K (2018) Computer aided sentiment analysis of anorexia nervosa patients’ vocabulary. BioMed Eng Online BioMed Cent. https://doi.org/10.1186/s12938-018-0451-2
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, 07-12-June, pp 1–9. https://doi.org/10.1109/cvpr.2015.7298594
Timusk T, Holmes CC, Reichardt W (1995) C-axis properties of 123, like Lanl-Cm95. Anharmonic Prop High-T_c Cuprates 49:171
Tonks A, Smith R (1996) Information in practice. BMJ (Clin Res Ed.) 313(7055):438. https://doi.org/10.1136/bmj.313.7055.438
Torii M, Fan JW, Yang WL, Lee T, Wiley MT, Zisook DS, Huang Y (2015) Risk factor detection for heart disease by applying text analytics in electronic medical records. J Biomed Inform 58:S164–S170. https://doi.org/10.1016/j.jbi.2015.08.011(Elsevier Inc)
Underhill DG, McDowell LK, Marchette DJ, Solka JL (2007) Enhancing text analysis via dimensionality reduction. In: 2007 IEEE international conference on information reuse and integration, IEEE IRI-2007, vol 21402(410), pp 348–353. https://doi.org/10.1109/iri.2007.4296645
Vinodhini G, Chandrasekaran RM (2014) Opinion mining using principal component analysis based ensemble model for e-commerce application. CSI Trans ICT 2(3):169–179. https://doi.org/10.1007/s40012-014-0055-3
Vinodhini G, Chandrasekaran RM (2015) Sentiment classification using principal component analysis based neural network model. In: 2014 International conference on information communication and embedded systems, ICICES 2014, vol 978, pp 1–6. https://doi.org/10.1109/icices.2014.7033961
Vittayakorn S, Umeda T, Murasaki K, Sudo K, Okatani T, Yamaguchi K (2016) Automatic attribute discovery with neural activations, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9908 LNCS, pp 252–268. https://doi.org/10.1007/978-3-319-46493-0_16
Whitten P, Mair F, Haycox A, May C, Williams L, Hellmich S (2002) Systematic review of cost effectiveness studies of telemedicine interventions. BMJ 324(7351):1434–1437
Xia L, Gentile AL, Munro J, Iria J (2009) Improving patient opinion mining through multi-step classification. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 5729 LNAI, pp 70–76. https://doi.org/10.1007/978-3-642-04208-9_13
Yan X, Wu X, Kakadiaris IA, Shah SK (2012) To track or to detect? An ensemble framework for optimal selection. In: Fitzgibbon A, Lazebnik S, Perona P, Sato Y, Schmid C (eds) Computer vision—ECCV 2012. Lecture Notes in Computer Science, vol 75/76. Springer, Berlin
Yu R, Li A, Morariu VI, Davis LS (2017) Visual relationship detection with internal and external linguistic knowledge distillation. In: Proceedings of the IEEE international conference on computer vision, 2017-Octob(1), pp 1068–1076. https://doi.org/10.1109/iccv.2017.121
Zhao K, Yen J, Greer G, Qiu B, Mitra P, Portier K (2014) Finding influential users of online health communities: a new metric based on sentiment influence. J Am Med Inform Assoc JAMIA 21(e2):1. https://doi.org/10.1136/amiajnl-2013-002282
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Edara, D.C., Vanukuri, L.P., Sistla, V. et al. Sentiment analysis and text categorization of cancer medical records with LSTM. J Ambient Intell Human Comput 14, 5309–5325 (2023). https://doi.org/10.1007/s12652-019-01399-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-019-01399-8