A SMOTE Extension for Balancing Multivariate Epilepsy-Related Time Series Datasets

  • Enrique de la CalEmail author
  • José R. Villar
  • Paula Vergara
  • Javier Sedano
  • Álvaro Herrero
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 649)


In some cases, big data bunches are in the form of Time Series (TS), where the occurrence of complex TS events are rarely presented. In this scenario, learning algorithms need to cope with the TS data balancing problem, which has been barely studied for TS datasets. This research addresses this issue, describing a very simple TS extension of the well-known SMOTE algorithm for balancing datasets. To validate the proposal, it is applied to a realistic dataset publicly available containing epilepsy-related TS. A study on the characteristics of the dataset before and after the performance of this TS balancing algorithm is performed, showing evidence on the requirements for the research on this topic, the energy efficiency of the algorithm and the TS generation process among them.


Dataset balancing algorithms SMOTE Time series 



This research has been funded by the Spanish Ministry of Science and Innovation, under project MINECO-TIN2014-56967-R.


  1. 1.
    Beniczky, S., Polster, T., Kjaer, T., Hjalgrim, H.: Detection of generalized tonic-clonic seizures by a wireless wrist accelerometer: a prospective, multicenter study. Epilepsia 4(54), e58–61 (2013)CrossRefGoogle Scholar
  2. 2.
    Villar, J.R., González, S., Sedano, J., Chira, C., Trejo-Gabriel-Galán, J.M.: Improving human activity recognition and its application in early stroke diagnosis. Int. J. Neural Syst. 25(4), 1450036–1450055 (2015)CrossRefGoogle Scholar
  3. 3.
    Villar, J.R., Vergara, P., Menéndez, M., de la Cal, E., González, V.M., Sedano, J.: Generalized models for the classification of abnormal movements in daily life and its applicability to epilepsy convulsion recognition. Int. J. Neural Syst. 26(6) (2016).
  4. 4.
    Villar, J.R., Menéndez, M., de la Cal, E., González, V.M., Sedano, J.: Identification of abnormal movements with 3D accelerometer sensors for its application to seizure recognition. Int. J. Appl. Logic (2016). Accepted for publicationGoogle Scholar
  5. 5.
    López, V., Fernández, A., del Jesus, M., Herrera, F.: A hierarchical genetic fuzzy system based on genetic programming for addressing classification with highly imbalanced and borderline data-sets. Knowl.-Based Syst. 38, 85–104 (2013)CrossRefGoogle Scholar
  6. 6.
    Galar, M., Fernández, A., Barrenechea, E., Herrera, F.: Eusboost: enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recogn. 46(12), 3460–3471 (2013)CrossRefGoogle Scholar
  7. 7.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res., 321–357 (2002)Google Scholar
  8. 8.
    Batista, G., Prati, R., Monard, M.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor. 6, 20–29 (2004)CrossRefGoogle Scholar
  9. 9.
    He, H., Bai, Y., Garcia, E., Li, S., et al.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: IEEE International Joint Conference on Neural Networks, IJCNN 2008, (IEEE World Congress on Computational Intelligence), pp. 1322–1328. IEEE (2008)Google Scholar
  10. 10.
    Tang, S., Chen, S.: The generation mechanism of synthetic minority class examples. In: Proceedings of 5th International Conference on Information Technology and Applications in Biomedicine (ITAB 2008), pp. 444–447 (2008)Google Scholar
  11. 11.
    Stefanowski, J., Wilk, S.: Selective pre-processing of imbalanced data for improving classification performance. In: Proceedings of the 10th International Conference in Data Warehousing and Knowledge Discovery (DaWaK2008), vol. LNCS 5182, pp. 283–292. Springer (2008)Google Scholar
  12. 12.
    Fu, T.C.: A review on time series data mining. Eng. Appl. Artif. Intell. 24(1), 164–181 (2011)CrossRefGoogle Scholar
  13. 13.
    Mishra, S., Saravanan, C., Dwivedi, V., Pathak, K.: Discovering flood rising pattern in hydrological time series data mining during the pre monsoon period. Indian J. Mar. Sci. 44(3), 3 (2015)Google Scholar
  14. 14.
    Montgomery, D.C., Jennings, C.L., Kulahci, M.: Introduction to Time Series Analysis and Forecasting. Wiley, Hoboken (2015)zbMATHGoogle Scholar
  15. 15.
    Moses, D., et al.: A survey of data mining algorithms used in cardiovascular disease diagnosis from multi-lead ecg data. Kuwait J. Sci. 42(2) (2015)Google Scholar
  16. 16.
    Köknar-Tezel, S., Latecki, L.J.: Improving SVM classification on imbalanced time series data sets with ghost points. Knowl. Inf. Syst. 28(1), 1–23 (2011)CrossRefGoogle Scholar
  17. 17.
    Agrawal, A., Viktor, H.L., Paquet, E.: SCUT: multi-class imbalanced data classification using smote and cluster-based undersampling. In: Proceedings of 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K) (2015)Google Scholar
  18. 18.
    Phan, S., Famili, F., Tang, Z., Pan, Y., Liu, Z., Ouyang, J., Lenferink, A., Oconnor, M.M.C.: A novel pattern based clustering methodology for time-series microarray data. Int. J. Comput. Mathe. 84(5), 585–597 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Villar, J.R.: Researcher’s web page (2017).

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Enrique de la Cal
    • 1
    Email author
  • José R. Villar
    • 1
  • Paula Vergara
    • 1
  • Javier Sedano
    • 2
  • Álvaro Herrero
    • 3
  1. 1.University of OviedoOviedoSpain
  2. 2.Instituto Tecnológico de Castilla y LeónBurgosSpain
  3. 3.Department of Civil EngineeringUniversity of BurgosBurgosSpain

Personalised recommendations