Combining Data Mining Techniques to Enhance Cardiac Arrhythmia Detection

  • Christian Gomes
  • Alan Cardoso
  • Thiago Silveira
  • Diego Dias
  • Elisa Tuler
  • Renato Ferreira
  • Leonardo Rocha
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10861)

Abstract

Detection of Cardiac Arrhythmia (CA) is performed using the clinical analysis of the electrocardiogram (ECG) of a patient to prevent cardiovascular diseases. Machine Learning Algorithms have been presented as promising tools in aid of CA diagnoses, with emphasis on those related to automatic classification. However, these algorithms suffer from two traditional problems related to classification: (1) excessive number of numerical attributes generated from the decomposition of an ECG; and (2) the number of patients diagnosed with CAs is much lower than those classified as “normal” leading to very unbalanced datasets. In this paper, we combine in a coordinate way several data mining techniques, such as clustering, feature selection, oversampling strategies and automatic classification algorithms to create more efficient classification models to identify the disease. In our evaluations, using a traditional dataset provided by the UCI, we were able to improve significantly the effectiveness of Random Forest classification algorithm achieving an accuracy of over 88%, a value higher than the best already reported in the literature.

Keywords

Cardiac Arrhythmia Detection Automatic classification Machine learning 

References

  1. 1.
    Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of SIGMOD 1998, pp. 94–105. ACM, New York (1998)CrossRefGoogle Scholar
  2. 2.
    Alelyani, S., Tang, J., Liu, H.: Feature selection for clustering: a review. Data Clust.: Algorithms Appl. 29, 110–121 (2013)Google Scholar
  3. 3.
    Arlot, S., Celisse, A., et al.: A survey of cross-validation procedures for model selection. Stat. Surv. 4, 40–79 (2010)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Barua, S., Islam, M.M., Yao, X., Murase, K.: Mwmote-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans. Knowl. Data Eng. 26(2), 405–425 (2014)CrossRefGoogle Scholar
  5. 5.
    Berkhin, P.: A survey of clustering data mining techniques. In: Kogan, J., Nicholas, C., Teboulle, M. (eds.) Grouping Multidimensional Data, pp. 25–71. Springer, Heidelberg (2006).  https://doi.org/10.1007/3-540-28349-8_2CrossRefGoogle Scholar
  6. 6.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefGoogle Scholar
  7. 7.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Int. Res. 16(1), 321–357 (2002). http://dl.acm.org/citation.cfm?id=1622407.1622416
  8. 8.
    Douzas, G., Bacao, F.: Self-organizing map oversampling (SOMO) for imbalanced data set learning. Expert Syst. Appl. 82, 40–52 (2017)CrossRefGoogle Scholar
  9. 9.
    Faber, V.: Clustering and the continuous K-Means algorithm. Los Alamos Sci. 22, 138–144 (1994)Google Scholar
  10. 10.
    Farivar, R., Rebolledo, D., Chan, E., Campbell, R.H.: A parallel implementation of K-Means clustering on GPUs. In: Proceedings of PDPTA 2008, USA, pp. 340–345, July 2008Google Scholar
  11. 11.
    Guvenir, H.A., Acar, B., Demiroz, G., Cekin, A.: A supervised machine learning algorithm for arrhythmia analysis. In: Computers in Cardiology, pp. 433–436. IEEE (1997)Google Scholar
  12. 12.
    Hall, M.A.: Correlation-based feature subset selection for machine learning. Ph.D. thesis, University of Waikato, Hamilton, New Zealand (1998)Google Scholar
  13. 13.
    Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005).  https://doi.org/10.1007/11538059_91CrossRefGoogle Scholar
  14. 14.
    Jadhav, S.M., Nalbalwar, S., Ghatol, A.: Artificial neural network based cardiac arrhythmia classification using ECG signal data. In: 2010 International Conference on Electronics and Information Engineering (ICEIE), vol. 1, p. V1-228. IEEE (2010)Google Scholar
  15. 15.
    Joachims, T.: Making large-scale support vector machine learning practical. In: Advances in Kernel Methods, pp. 169–184. MIT Press, Cambridge (1999). http://dl.acm.org/citation.cfm?id=299094.299104
  16. 16.
    Lichman, M.: UCI machine learning repository (2013). https://archive.ics.uci.edu/ml/datasets/Arrhythmia
  17. 17.
    Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining, vol. 454. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-1-4615-5689-3CrossRefMATHGoogle Scholar
  18. 18.
    Özçift, A.: Random forests ensemble classifier trained with data resampling strategy to improve cardiac arrhythmia diagnosis. Comput. Biol. Med. 41(5), 265–271 (2011)CrossRefGoogle Scholar
  19. 19.
    Portela, F., Santos, M.F., Silva, Á., Rua, F., Abelha, A., Machado, J.: Preventing patient cardiac arrhythmias by using data mining techniques. In: 2014 IEEE Conference on Biomedical Engineering and Sciences (IECBES), pp. 165–170. IEEE (2014)Google Scholar
  20. 20.
    Salles, T., Gonçalves, M., Rodrigues, V., Rocha, L.: Broof: exploiting out-of-bag errors, boosting and random forests for effective automated classification. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015, pp. 353–362. ACM, New York (2015). http://doi.acm.org/10.1145/2766462.2767747
  21. 21.
    Salles, T., Rocha, L., Mourão, F., Gonçalves, M., Viegas, F., Meira, W.: A two-stage machine learning approach for temporally-robust text classification. Inf. Syst. 69(Suppl. C), 40–58 (2017).  https://doi.org/10.1016/j.is.2017.04.004, http://www.sciencedirect.com/science/article/pii/S0306437917301801CrossRefGoogle Scholar
  22. 22.
    Samad, S., Khan, S.A., Haq, A., Riaz, A.: Classification of arrhythmia. Int. J. Electr. Energy 2(1), 57–61 (2014)CrossRefGoogle Scholar
  23. 23.
    Viegas, F., Gonçalves, M.A., Martins, W., Rocha, L.: Parallel lazy semi-naive Bayes strategies for effective and efficient document classification. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, CIKM 2015, pp. 1071–1080. ACM, New York (2015). http://doi.acm.org/10.1145/2806416.2806565
  24. 24.
    Viegas, F., Rocha, L., Gonçalves, M., Mourão, F., Sá, G., Salles, T., Andrade, G., Sandin, I.: A genetic programming approach for feature selection in highly dimensional skewed data. Neurocomputing (2017).  https://doi.org/10.1016/j.neucom.2017.08.050, http://www.sciencedirect.com/science/article/pii/S0925231217314716
  25. 25.
    Weka: Weka - interface classifier (2016). http://weka.sourceforge.net/doc.dev/weka/classifiers/Classifier.html. Accessed 02 Dec 2017
  26. 26.
    Wu, J., Xiong, H., Wu, P., Chen, J.: Local decomposition for rare class analysis. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 814–823. ACM (2007)Google Scholar
  27. 27.
    Zhang, M.L., Zhou, Z.H.: A k-nearest neighbor based algorithm for multi-label classification. In: 2005 IEEE International Conference on Granular Computing, pp. 718–721. IEEE (2005)Google Scholar
  28. 28.
    Zheng, Z., Wu, X., Srihari, R.: Feature selection for text categorization on imbalanced data. ACM SIGKDD Explor. Newsl. 6, 80–89 (2004)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Christian Gomes
    • 1
  • Alan Cardoso
    • 1
  • Thiago Silveira
    • 2
  • Diego Dias
    • 1
  • Elisa Tuler
    • 1
  • Renato Ferreira
    • 3
  • Leonardo Rocha
    • 1
  1. 1.Universidade Federal de São João del-ReiSão João del-ReiBrazil
  2. 2.Tsinghua UniversityBeijingChina
  3. 3.Universidade Federal de Minas GeraisBelo HorizonteBrazil

Personalised recommendations