Abstract
This paper presents a prototype to classify stroke that combines text mining tools and machine learning algorithms. Machine learning can be portrayed as a significant tracker in areas like surveillance, medicine, data management with the aid of suitably trained machine learning algorithms. Data mining techniques applied in this work give an overall review about the tracking of information with respect to semantic as well as syntactic perspectives. The proposed idea is to mine patients’ symptoms from the case sheets and train the system with the acquired data. In the data collection phase, the case sheets of 507 patients were collected from Sugam Multispecialty Hospital, Kumbakonam, Tamil Nadu, India. Next, the case sheets were mined using tagging and maximum entropy methodologies, and the proposed stemmer extracts the common and unique set of attributes to classify the strokes. Then, the processed data were fed into various machine learning algorithms such as artificial neural networks, support vector machine, boosting and bagging and random forests. Among these algorithms, artificial neural networks trained with a stochastic gradient descent algorithm outperformed the other algorithms with a higher classification accuracy of 95% and a smaller standard deviation of 14.69.
Similar content being viewed by others
References
Roger VL, Go AS, Lloyd-Jones DM, Benjamin EJ, Berry JD, Borden WB, Bravata DM, Dai S, Ford ES, Fox CS, Fullerton HJ, Gillespie C, Hailpern SM, Heit JA, Howard VJ, Kissela BM, Kittner SJ, Lackland DT, Lichtman JH, Lisabeth LD, Makuc DM, Marcus GM, Marelli A, Matchar DB, Moy CS, Mozaffarian D, Mussolino ME, Nichol G, Paynter NP, Soliman EZ, Sorlie PD, Sotoodehnia N, Turan TN, Virani SS, Wong ND, Woo D, Turner MB (2012) Executive summary: heart disease and stroke statistics—2012 update: a report. Circulation 125(1):188–197
Pahus SH, Hansen AT, Hvas AM (2016) Thrombophilia testing in young patients with Ischemic stroke. Thromb Res 137:108–112
Dupont SA, Wijdicks EF, Lanzino G, Rabinstein AA (2010) Aneurysmal subarachnoid hemorrhage: an overview for the practicing neurologist. Semin Neurol 30(5):45–54
Santos EMM, Yoo AJ, Beenen LF, Majoie CB, Marquering HA (2016) Observer variability of absolute and relative thrombus density measurements in patients with acute ischemic stroke. Neuroradiology 58(2):133–139
Rebouças ES, Marques RCP, Braga AM, Oliveira SAF, de Albuquerque VHC, Filho PPR (2018) New level set approach based on Parzen estimation for stroke segmentation in skull CT images. Soft Comput. https://doi.org/10.1007/s00500-018-3491-4
Shinohara Y, Yanagihara T, Abe K, Yoshimine T, Fujinaka T, Chuma T, Ochi F, Nagayama M, Ogawa A, Suzuki N, Katayama Y, Kimura A, Minematsu K (2011) Cerebral infarction/transient ischemic attack (TIA). J Stroke Cerebrovasc Dis 20(4):S71–S73
Süt N, Çelik Y (2012) Prediction of mortality in stroke patients using multilayer perceptron neural networks. Turk J Med Sci 42(5):886–893
Rajini NH, Bhavani R (2013) Computer aided detection of ischemic stroke using segmentation and texture features. Measurement 46(6):1865–1874
Sundström C (2014) Machine learning algorithms for stroke diagnostics. Master’s thesis in biomedical engineering
Amini L, Azarpazhouh R, Farzadfar MT, Mousavi SA, Jazaieri F, Khorvash F, Norouzi R, Toghianfar N (2013) Prediction and control of stroke by data mining. Int J Prev Med 4(2):S245
Bentley P, Ganesalingam J, Jones AL, Mahady K, Epton S, Rinne P, Sharma P, Halse O, Mehta A, Rueckert D (2014) Prediction of stroke thrombolysis outcome using CT brain machine learning. NeuroImage Clin 4:635–640
Cheng CA, Lin YC, Chiu HW (2014) Prediction of the prognosis of ischemic stroke patients after intravenous thrombolysis using artificial neural networks. Stud Health Technol Inform 202:115–118
Colak C, Karaman E, Turtay MG (2015) Application of knowledge discovery process on the prediction of stroke. Comput Methods Programs Biomed 119(3):181–185
Maier O, Schröder C, Forkert ND, Martinetz T, Handels H (2015) Classifiers for ischemic stroke lesion segmentation: a comparison study. PLoS ONE 10(12):e0145118
Kansadub T, Thammaboosadee S, Kiattisin S, Jalayondeja C (2015) Stroke risk prediction model based on demographic data. In: Biomedical engineering international conference (BMEiCON), pp 1–3
Sung SF, Hsieh CY, Yang YH, Lin HJ, Chen CH, Chen YW, Hu YH (2015) Developing a stroke severity index based on administrative data was feasible using data mining techniques. J Clin Epidemiol 68(11):1292–1300
Alotaibi NN, Sasi S (2016) Stroke in-patients’ transfer to the ICU using ensemble based model. In: IEEE international conference on electrical, electronics, and optimization techniques (ICEEOT), pp 2004–2010
Adam SY, Yousif A, Bashir MB (2016) Classification of ischemic stroke using machine learning algorithms. Int J Comput Appl 149(10):26–31
Radu RA, Terecoasă EO, Băjenaru OA, Tiu C (2017) Etiologic classification of ischemic stroke: where do we stand. Clin Neurol Neurosurg 159:93–106
Chantamit-O-Pas P, Goyal M (2017) Prediction of stroke using deep learning model. In: Liu D., Xie S, Li Y, Zhao D, El-Alfy ES (eds) Neural information processing ICONIP, Lecture notes in computer science 10638
Suwanwela NC, Poungvarin N, The Asian Stroke Advisory Panel (2016) Stroke burden and stroke care system in Asia. Neurol India 64:46–51
World Health Organization (2004) Global burden of disease (GBD) 2002 estimates. World health report 2004. WHO, Geneva
O’Donnell MJ, Xavier D, Liu L, Zhang H, Chin SL, Rao-Melacini P, Rangarajan S, Islam S, Pais P, McQueen MJ, Mondo C, Damasceno A, Lopez-Jaramillo P, Hankey GJ, Dans AL, Yusoff K, Truelsen T, Diener H-C, Sacco RL, Ryglewicz D, Czlonkowska A, Weimar C, Wang X, Yusuf S (2010) Risk factors for ischaemic and intracerebral haemorrhagic stroke in 22 countries (the INTERSTROKE study): a case-control study. Lancet 376:112–123
O’Donnell MJ, Chin SL, Rangarajan S, Xavier D, Liu L, Zhang H, Rao-Melacini P, Zhang X, Pais P, Agapay S, Lopez-Jaramillo P (2016) Global and regional effects of potentially modifiable risk factors associated with acute stroke in 32 countries(INTERSTROKE): a case-control study. Lancet 388(10046):761–775
Tsuruoka Y, Tateisi Y, Kim JD, Ohta T, McNaught J, Ananiadou S, Tsujii J (2005) Developing a robust part-of-speech tagger for biomedical text. In: Advances in informatics—10th Panhellenic conference on informatics, pp 382–392
Kulick S, Bies A, Liberman M, Mandel M, McDonald R, Palmer M, Schein A, Ungar L (2004) Integrated annotation for biomedical information extraction. Linking biological literature, ontologies and databases. In: Proceedings of the HLT/NAACL 2004 workshop: BioLINK, pp 61–68
Toutanova K, Klein D, Manning CD, Singer Y (2003) Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of NAACL ‘03, pp 173–180
Tateisi Y, Tsujii J (2004) Part-of-speech annotation of biology research abstracts. In: Proceedings of 4th international conference on language resource and evaluation (LREC2004), pp 1267–1270
Pollay M (2012) Overview of the CSF dual outflow system. Acta Neurochir Suppl 113:47–50
Fan J, Upadhye S, Worster A (2006) Understanding receiver operating characteristic (ROC) curves. Can J Emergency Med 8(1):19–20
Dreyfus SE (1990) Artificial neural networks, back propagation, and the Kelley-Bryson gradient procedure. J Guid Control Dyn 13(5):926–928
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Vishwanathan SVM, Murty MN (2002) SSVM: a simple SVM algorithm. In: Proceedings of the 2002 international joint conference on neural networks. IJCNN’02, vol 3, pp 2393–2398
Utgoff PE (1989) Incremental induction of decision trees. Mach Learn 4(2):161–186
Saraee M, Keane J (2007) Using T3, an improved decision tree classifier, for mining stroke-related medical data. Methods Inf Med 46(5):523–529
Liu L, Luo G, Ke Q, Zhang X (2017) An algorithm based on logistic regression with data fusion in wireless sensor network. Eurasip J Wirel Commun Netw. https://doi.org/10.1186/s13638-016-0793-z
Ho TK (1995) Random decision forests. In: Proceedings of the third international conference on document analysis and recognition, vol 1, pp 278–282
Isaac E, Easwarakumar KS, Issac J (2017) Urban landcover classification from multispectral image data using optimized AdaBoosted random forests. Remote Sens Lett 8(4):350–359
Freund Y, Schapire R, Abe N (1999) A short introduction to boosting. J Jpn Soc Artif Intell 14(771–780):1612
Filho PPR, Rebouças ES, Marinho LB, Sarmento RM, Tavares JMRS, Albuquerque VHC (2017) Analysis of human tissue densities: a new approach to extract features from medical images. Pattern Recognit Lett 2017(94):211–218. https://doi.org/10.1016/j.patrec.2017.02.005
Acknowledgements
We are grateful to Dr. Sundarrajan S, Neurologist, Sugam Multispecialty Hospital, for permitting us to access the real-time data of the patients and for his valuable suggestions in classifying the type of strokes. We also thank the management of Sugam Multispecialty Hospital, Kumbakonam, for their assistance in collecting the case sheets. We acknowledge the Department of Science and Technology, India, for providing financial support through INSPIRE fellowship (No. IF120649) to carry out this research work. The second author also thanks Department of Science & Technology for financial aid from grant No.SR/FST/ETI-349/2013.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
There is no conflict of interest among the authors to publish this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Govindarajan, P., Soundarapandian, R.K., Gandomi, A.H. et al. Classification of stroke disease using machine learning algorithms. Neural Comput & Applic 32, 817–828 (2020). https://doi.org/10.1007/s00521-019-04041-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-019-04041-y