Combining Data Mining Techniques to Enhance Cardiac Arrhythmia Detection

  • Christian Gomes
  • Alan Cardoso
  • Thiago Silveira
  • Diego Dias
  • Elisa Tuler
  • Renato Ferreira
  • Leonardo Rocha
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10861)


Detection of Cardiac Arrhythmia (CA) is performed using the clinical analysis of the electrocardiogram (ECG) of a patient to prevent cardiovascular diseases. Machine Learning Algorithms have been presented as promising tools in aid of CA diagnoses, with emphasis on those related to automatic classification. However, these algorithms suffer from two traditional problems related to classification: (1) excessive number of numerical attributes generated from the decomposition of an ECG; and (2) the number of patients diagnosed with CAs is much lower than those classified as “normal” leading to very unbalanced datasets. In this paper, we combine in a coordinate way several data mining techniques, such as clustering, feature selection, oversampling strategies and automatic classification algorithms to create more efficient classification models to identify the disease. In our evaluations, using a traditional dataset provided by the UCI, we were able to improve significantly the effectiveness of Random Forest classification algorithm achieving an accuracy of over 88%, a value higher than the best already reported in the literature.


Cardiac Arrhythmia Detection Automatic classification Machine learning 


  1. 1.
    Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of SIGMOD 1998, pp. 94–105. ACM, New York (1998)CrossRefGoogle Scholar
  2. 2.
    Alelyani, S., Tang, J., Liu, H.: Feature selection for clustering: a review. Data Clust.: Algorithms Appl. 29, 110–121 (2013)Google Scholar
  3. 3.
    Arlot, S., Celisse, A., et al.: A survey of cross-validation procedures for model selection. Stat. Surv. 4, 40–79 (2010)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Barua, S., Islam, M.M., Yao, X., Murase, K.: Mwmote-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans. Knowl. Data Eng. 26(2), 405–425 (2014)CrossRefGoogle Scholar
  5. 5.
    Berkhin, P.: A survey of clustering data mining techniques. In: Kogan, J., Nicholas, C., Teboulle, M. (eds.) Grouping Multidimensional Data, pp. 25–71. Springer, Heidelberg (2006). Scholar
  6. 6.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefGoogle Scholar
  7. 7.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Int. Res. 16(1), 321–357 (2002).
  8. 8.
    Douzas, G., Bacao, F.: Self-organizing map oversampling (SOMO) for imbalanced data set learning. Expert Syst. Appl. 82, 40–52 (2017)CrossRefGoogle Scholar
  9. 9.
    Faber, V.: Clustering and the continuous K-Means algorithm. Los Alamos Sci. 22, 138–144 (1994)Google Scholar
  10. 10.
    Farivar, R., Rebolledo, D., Chan, E., Campbell, R.H.: A parallel implementation of K-Means clustering on GPUs. In: Proceedings of PDPTA 2008, USA, pp. 340–345, July 2008Google Scholar
  11. 11.
    Guvenir, H.A., Acar, B., Demiroz, G., Cekin, A.: A supervised machine learning algorithm for arrhythmia analysis. In: Computers in Cardiology, pp. 433–436. IEEE (1997)Google Scholar
  12. 12.
    Hall, M.A.: Correlation-based feature subset selection for machine learning. Ph.D. thesis, University of Waikato, Hamilton, New Zealand (1998)Google Scholar
  13. 13.
    Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). Scholar
  14. 14.
    Jadhav, S.M., Nalbalwar, S., Ghatol, A.: Artificial neural network based cardiac arrhythmia classification using ECG signal data. In: 2010 International Conference on Electronics and Information Engineering (ICEIE), vol. 1, p. V1-228. IEEE (2010)Google Scholar
  15. 15.
    Joachims, T.: Making large-scale support vector machine learning practical. In: Advances in Kernel Methods, pp. 169–184. MIT Press, Cambridge (1999).
  16. 16.
    Lichman, M.: UCI machine learning repository (2013).
  17. 17.
    Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining, vol. 454. Springer, Heidelberg (2012). Scholar
  18. 18.
    Özçift, A.: Random forests ensemble classifier trained with data resampling strategy to improve cardiac arrhythmia diagnosis. Comput. Biol. Med. 41(5), 265–271 (2011)CrossRefGoogle Scholar
  19. 19.
    Portela, F., Santos, M.F., Silva, Á., Rua, F., Abelha, A., Machado, J.: Preventing patient cardiac arrhythmias by using data mining techniques. In: 2014 IEEE Conference on Biomedical Engineering and Sciences (IECBES), pp. 165–170. IEEE (2014)Google Scholar
  20. 20.
    Salles, T., Gonçalves, M., Rodrigues, V., Rocha, L.: Broof: exploiting out-of-bag errors, boosting and random forests for effective automated classification. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015, pp. 353–362. ACM, New York (2015).
  21. 21.
    Salles, T., Rocha, L., Mourão, F., Gonçalves, M., Viegas, F., Meira, W.: A two-stage machine learning approach for temporally-robust text classification. Inf. Syst. 69(Suppl. C), 40–58 (2017)., Scholar
  22. 22.
    Samad, S., Khan, S.A., Haq, A., Riaz, A.: Classification of arrhythmia. Int. J. Electr. Energy 2(1), 57–61 (2014)CrossRefGoogle Scholar
  23. 23.
    Viegas, F., Gonçalves, M.A., Martins, W., Rocha, L.: Parallel lazy semi-naive Bayes strategies for effective and efficient document classification. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, CIKM 2015, pp. 1071–1080. ACM, New York (2015).
  24. 24.
    Viegas, F., Rocha, L., Gonçalves, M., Mourão, F., Sá, G., Salles, T., Andrade, G., Sandin, I.: A genetic programming approach for feature selection in highly dimensional skewed data. Neurocomputing (2017).,
  25. 25.
    Weka: Weka - interface classifier (2016). Accessed 02 Dec 2017
  26. 26.
    Wu, J., Xiong, H., Wu, P., Chen, J.: Local decomposition for rare class analysis. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 814–823. ACM (2007)Google Scholar
  27. 27.
    Zhang, M.L., Zhou, Z.H.: A k-nearest neighbor based algorithm for multi-label classification. In: 2005 IEEE International Conference on Granular Computing, pp. 718–721. IEEE (2005)Google Scholar
  28. 28.
    Zheng, Z., Wu, X., Srihari, R.: Feature selection for text categorization on imbalanced data. ACM SIGKDD Explor. Newsl. 6, 80–89 (2004)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Christian Gomes
    • 1
  • Alan Cardoso
    • 1
  • Thiago Silveira
    • 2
  • Diego Dias
    • 1
  • Elisa Tuler
    • 1
  • Renato Ferreira
    • 3
  • Leonardo Rocha
    • 1
  1. 1.Universidade Federal de São João del-ReiSão João del-ReiBrazil
  2. 2.Tsinghua UniversityBeijingChina
  3. 3.Universidade Federal de Minas GeraisBelo HorizonteBrazil

Personalised recommendations