A SMOTE Extension for Balancing Multivariate Epilepsy-Related Time Series Datasets
In some cases, big data bunches are in the form of Time Series (TS), where the occurrence of complex TS events are rarely presented. In this scenario, learning algorithms need to cope with the TS data balancing problem, which has been barely studied for TS datasets. This research addresses this issue, describing a very simple TS extension of the well-known SMOTE algorithm for balancing datasets. To validate the proposal, it is applied to a realistic dataset publicly available containing epilepsy-related TS. A study on the characteristics of the dataset before and after the performance of this TS balancing algorithm is performed, showing evidence on the requirements for the research on this topic, the energy efficiency of the algorithm and the TS generation process among them.
KeywordsDataset balancing algorithms SMOTE Time series
This research has been funded by the Spanish Ministry of Science and Innovation, under project MINECO-TIN2014-56967-R.
- 3.Villar, J.R., Vergara, P., Menéndez, M., de la Cal, E., González, V.M., Sedano, J.: Generalized models for the classification of abnormal movements in daily life and its applicability to epilepsy convulsion recognition. Int. J. Neural Syst. 26(6) (2016). https://doi.org/10.1142/S0129065716500374
- 4.Villar, J.R., Menéndez, M., de la Cal, E., González, V.M., Sedano, J.: Identification of abnormal movements with 3D accelerometer sensors for its application to seizure recognition. Int. J. Appl. Logic (2016). Accepted for publicationGoogle Scholar
- 7.Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res., 321–357 (2002)Google Scholar
- 9.He, H., Bai, Y., Garcia, E., Li, S., et al.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: IEEE International Joint Conference on Neural Networks, IJCNN 2008, (IEEE World Congress on Computational Intelligence), pp. 1322–1328. IEEE (2008)Google Scholar
- 10.Tang, S., Chen, S.: The generation mechanism of synthetic minority class examples. In: Proceedings of 5th International Conference on Information Technology and Applications in Biomedicine (ITAB 2008), pp. 444–447 (2008)Google Scholar
- 11.Stefanowski, J., Wilk, S.: Selective pre-processing of imbalanced data for improving classification performance. In: Proceedings of the 10th International Conference in Data Warehousing and Knowledge Discovery (DaWaK2008), vol. LNCS 5182, pp. 283–292. Springer (2008)Google Scholar
- 13.Mishra, S., Saravanan, C., Dwivedi, V., Pathak, K.: Discovering flood rising pattern in hydrological time series data mining during the pre monsoon period. Indian J. Mar. Sci. 44(3), 3 (2015)Google Scholar
- 15.Moses, D., et al.: A survey of data mining algorithms used in cardiovascular disease diagnosis from multi-lead ecg data. Kuwait J. Sci. 42(2) (2015)Google Scholar
- 17.Agrawal, A., Viktor, H.L., Paquet, E.: SCUT: multi-class imbalanced data classification using smote and cluster-based undersampling. In: Proceedings of 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K) (2015)Google Scholar
- 19.Villar, J.R.: Researcher’s web page (2017). http://www.di.uniovi.es/~villar