Advertisement

Knowledge and Information Systems

, Volume 28, Issue 1, pp 1–23 | Cite as

Improving SVM classification on imbalanced time series data sets with ghost points

  • Suzan Köknar-TezelEmail author
  • Longin Jan Latecki
Regular Paper

Abstract

Imbalanced data sets present a particular challenge to the data mining community. Often, it is the rare event that is of interest and the cost of misclassifying the rare event is higher than misclassifying the usual event. When the data is highly skewed toward the usual, it can be very difficult for a learning system to accurately detect the rare event. There have been many approaches in recent years for handling imbalanced data sets, from under-sampling the majority class to adding synthetic points to the minority class in feature space. However, distances between time series are known to be non-Euclidean and non-metric, since comparing time series requires warping in time. This fact makes it impossible to apply standard methods like SMOTE to insert synthetic data points in feature spaces. We present an innovative approach that augments the minority class by adding synthetic points in distance spaces. We then use Support Vector Machines for classification. Our experimental results on standard time series show that our synthetic points significantly improve the classification rate of the rare events, and in most cases also improves the overall accuracy of SVMs. We also show how adding our synthetic points can aid in the visualization of time series data sets.

Keywords

Imbalanced data sets Support Vector Machines Time series 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aach J, Church GM (2001) Aligning gene expression time series with time warping algorithms. Bioinformatics 17: 495–508CrossRefGoogle Scholar
  2. 2.
    Aizerman MA, Braverman EA, Rozonoer L (1964) Theoretical foundations of the potential function method in pattern recognition learning. In: Automation and remote control, vol 25, pp 821–837Google Scholar
  3. 3.
    Akbani R, Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced datasets. In: Proceedings of ECML’04, pp 39–50Google Scholar
  4. 4.
    Batista GEAPA, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor. Newsl. 6(1): 20–29CrossRefGoogle Scholar
  5. 5.
    Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. In: KDD Workshop, pp 359–370Google Scholar
  6. 6.
    Bishop CM (2007) Pattern recognition and machine learning (Information Science and Statistics), 1st ed. 2006. corr. 2nd printing edn, SpringerGoogle Scholar
  7. 7.
    Chan P, Stolfo SJ (1998) Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection. In: In Proceedings of the fourth international conference on knowledge discovery and data mining. AAAI Press, pp 164–168Google Scholar
  8. 8.
    Chawla NV, Bowyer KW, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16: 321–357zbMATHGoogle Scholar
  9. 9.
    Chawla NV, Lazarevic A, Hall LO , Bowyer KW (2003) Smoteboost: improving prediction of the minority class in boosting. In: Proceedings of the principles of knowledge discovery in databases, PKDD-2003, pp 107–119Google Scholar
  10. 10.
    Cieslak DA, Chawla NV (2008) Start globally, optimize locally, predict globally: improving performance on imbalanced data. In: ‘ICDM’08: Proceedings of the 2008 eighth IEEE international conference on data mining’, IEEE Computer Society, Washington, DC, USA, pp 143–152Google Scholar
  11. 11.
    Georgiou C, Hatami H (2008) CSC2414- Metric embeddings. Lecture 1: A brief introduction to metric embeddings, examples and motivation’Google Scholar
  12. 12.
    Giorgino T (2009) Computing and visualizing dynamic time warping alignments in R: the dtw package. Journal of Statistical Software 31(7): 1–24Google Scholar
  13. 13.
    Han H, Wang W, Mao B (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning, vol 3644 of Lecture Notes in Computer Science, Springer, pp 878–887Google Scholar
  14. 14.
    Hovsepian K, Anselmo P, Mazumdar S (2010) Supervised inductive learning with LotkaVolterra derived models. Knowl Inf SystGoogle Scholar
  15. 15.
    Keogh E, Xi X, Wei L, Ratanamahatana CA (2006) Ucr time series classification/clustering page, Website. http://www.cs.ucr.edu/~eamonn/time_series_data/
  16. 16.
    Kubat M, Holte RC, Matwin S (1998) Machine learning for the detection of oil spills in satellite radar images. Machine Learning 30(2–3): 195–215CrossRefGoogle Scholar
  17. 17.
    Latecki LJ, Wang Q, Köknar-Tezel S, Megalooikonomou V (2007) Optimal subsequence bijection. IEEE International conference on data Mining, pp 565–570Google Scholar
  18. 18.
    Latecki LJ, Lakaemper R, Eckhardt U (2000) Shape descriptors for non-rigid shapes with a single closed contour. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 424–429Google Scholar
  19. 19.
    Laub J, Müller K-R (2004) Feature discovery in non-metric pairwise data. J Mach Learn Res 5: 801–818Google Scholar
  20. 20.
    Matousek J (2002) Lectures on Discrete Geometry. Springer-Verlag New York Inc., Secaucus, NJ, USAzbMATHGoogle Scholar
  21. 21.
    Mena L, Gonzalez J (2006) Machine learning for imbalanced datasets: Application in medical diagnostic. In: In proceedings of the 19th international FLAIRS conferenceGoogle Scholar
  22. 22.
    Roweis S, Saul L (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290: 2323–2326CrossRefGoogle Scholar
  23. 23.
    Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 26: 43–49zbMATHCrossRefGoogle Scholar
  24. 24.
    Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290: 2319–2323CrossRefGoogle Scholar
  25. 25.
    Tufte ER (2001) The visual display of quantitative information, 2nd edition. Graphics Press, Cheshire, CTGoogle Scholar
  26. 26.
    van Rijsbergen C (1979) In: Information retrieval. Butterworths, LondonGoogle Scholar
  27. 27.
    Vapnik VN (1995) The nature of statistical learning theory. Springer-Verlag New York Inc., New York, NY, USAzbMATHGoogle Scholar
  28. 28.
    Wang BX, Japkowicz N (2009) Boosting support vector machines for imbalanced data sets. Knowl Inf SystGoogle Scholar
  29. 29.
    Weber M, Alexa M, Müller W (2001) Visualizing time-series on spirals, In: Proceedings of the IEEE symposium on information visualization 2001 (INFOVIS’01), pp 7–14Google Scholar
  30. 30.
    Weiss GM (2004) Mining with rarity: a unifying framework. SIGKDD Explor Newsl 6(1): 7–19CrossRefGoogle Scholar
  31. 31.
    Weiss GM, Hirsh H (1998) Learning to predict rare events in event sequences. In: In Proceedings of the fourth international conference on knowledge discovery and data mining, AAAI Press, pp 359–363Google Scholar
  32. 32.
    Weiss GM, Provost F (2003) Learning when training data are costly: the effect of class distribution on tree induction. J Artif Intell Res 19: 315–354zbMATHGoogle Scholar
  33. 33.
    Woods K, Doss C, Bowyer K, Solka J, Priebe C, Kegelmeyer P (1993) Comparative evaluation of pattern recognition techniques for detection of microcalcifications in mammography. Int J Pattern Recognit Artif Intell 7: 1417–1436CrossRefGoogle Scholar
  34. 34.
    Wu G, Chang EY (2003) Class-boundary alignment for imbalanced dataset learning. In: Workshop on learning from imbalanced datasets in international conference on machine learning (ICML)Google Scholar
  35. 35.
    Yang X, Bai X, Latecki LJ, Tu Z (2008) Improving shape retrieval by learning graph transduction. In: ‘ECCV (4)’, Vol 5305 of Lecture Notes in Computer Science, Springer, pp 788–801Google Scholar
  36. 36.
    Yi BK, Jagadish HV, Faloutsos C (1998) Efficient retrieval of similar time sequences under time warping. In: Proceedings of international conference on data engineering (ICDE98), pp 201–208Google Scholar
  37. 37.
    Zelnik-Manor L, Perona P (2004) Self-tuning spectral clustering. In: Advances in neural information processing systems 17. MIT Press, pp 1601–1608Google Scholar
  38. 38.
    Zhao H (2008) Instance weighting versus threshold adjusting for cost-sensitive classification. Knowl Inf Syst 15(3): 321–334CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2010

Authors and Affiliations

  1. 1.Department of Computer and Information SciencesTemple UniversityPhiladelphiaUSA

Personalised recommendations