Skip to main content

Kernel Machines for Imbalanced Data Problem in Biomedical Applications

  • Chapter
  • First Online:
Support Vector Machines Applications

Abstract

Kernel machines such as the support vector machines (SVMs) have been reported to perform well in many applications. However, the performance of a binary SVM can be adversely affected by an imbalanced set of training samples, known as the imbalanced data problem. One-class SVMs, as a recognition-based approach, can be used to train and recognize the majority class and such kernel machines have already been developed. In this chapter, we review and study the effects of imbalanced datasets on the performance of both one-class SVMs and binary SVMs. We show that a hybrid kernel machine comprising one-class SVMs and binary SVMs in a multi-classifier system alleviates the imbalanced data problem. We also report the deployment of such hybrid kernel machines in two biomedical applications where the imbalanced data problem exists.

The research presented in this chapter was carried out when all authors were with the Nanyang Technological University, Singapore.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. The Kernel-Machine.org http://www.kernel-machines.org/

  2. Bach, F., Jordan, M.: Kernel independent component analysis. J. Mach. Learn. Res. 3(1), 1–48 (2003)

    MATH  MathSciNet  Google Scholar 

  3. Bishop, C.M.: Neural Networks for Pattern Recognition. Clarendon, Oxford (1995)

    Google Scholar 

  4. Chawla, N., Bowyer, K., Hall, L., Kegelmeyer, W.: SMOTE: synthetic minority over-sampling technique. Artif. Intel. Res. 16, 321–357 (2002)

    MATH  Google Scholar 

  5. Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explorations 6(1), 1–6 (2004)

    Article  Google Scholar 

  6. de Chazal, P., O’Dwyer, M., Reilly, R.: Automatic classification of heartbeats using ECG morphology and heartbeat interval features. IEEE T. Bio-Med. Eng. 51(7), 1196–1206 (2004)

    Article  Google Scholar 

  7. Deselaers, T., Keysers, D., Ney, H.: Discriminative training for object recognition using image patches. In: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 2, pp. 157–162 (2005)

    Google Scholar 

  8. Drummond, C., Holte, R.C.: C4.5, class imbalance, and cost sensitivity: why undersampling beats over-sampling. In: Proceedings of the ICML 2003 Workshop on Learning from Imbalanced Data Sets II, vol. 11, Washington, DC (2003)

    Google Scholar 

  9. El-Naqa, I., Yang, Y., Wernick, M.N., Galatsanos, N.P., Nishikawa, R.M.: A support vector machine approach for detection of microcalcifications. IEEE T. Med. Imaging 21(12), 1552–1563 (2002)

    Article  Google Scholar 

  10. Estabrooks, A., Jo, T., Japkowicz, N.: A multiple resampling method for learning from imbalances data sets. Comput. Intell. 20(1), 18–36 (2004)

    Article  MathSciNet  Google Scholar 

  11. Gal-Or, M., May, J.H., Spangler, W.E.: Assessing the predictive accuracy of diversity measures with domain-dependent asymmetric misclassification costs. Inform. Fusion J. (Special issue on Diversity in Multiple Classifier Systems) 6(1), 37–48 (2005)

    Google Scholar 

  12. Gokturk, S.B., Tomasi, C., Acar, B., Beaulieu, C.F., Paik, D., Jeffrey, B.J., Yee, J., Napel, S.: A statistical 3D pattern processing method for computer aided detection of polyps in CT colonography. IEEE T. Med. Imaging 20(12), 1251–1260 (2001)

    Article  Google Scholar 

  13. Goldberger, A.L., Amaral, L.A.N., Glass, L., Hausdorff, J.M., Ivanov, P.C., Mark, R.G., Mietus, J.E., Moody, G.B., Peng, C.K., Stanley, H.E.: PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101(23), e215–e220 (2000)

    Article  Google Scholar 

  14. Hojjatoleslami, A., Sardo, L., Kittler, J.: An RBF based classifier for detection of microcalcifications in mammograms with outlier rejection capability. In: International Conference on Neural Networks, vol. 3, pp. 1379–1384 (1997)

    Google Scholar 

  15. Hsu, R.L., Abdel-Mottaleb, M., Jain, A.K.: Face detection in color images. In: Proceedings of 2001 International Conference on Image Processing, vol. 1, pp. 1046–1049 (2001)

    Google Scholar 

  16. Hu, Y.H., Palreddy, S., Tompkins, W.J.: A patient-adaptable ECG beat classifier using a mixture of experts approach. IEEE T. Bio-Med. Eng. 44(9), 891–900 (1997)

    Article  Google Scholar 

  17. Japkowicz, N.: The class imbalance problem: significance and strategies. In: Proceedings of the 2000 International Conference on Artificial Intelligence (IC-AI’2000), vol. 1, pp. 111–117 (2000)

    Google Scholar 

  18. Japkowicz, N., Myers, C., Gluck, M.: A novelty detection approach to classification. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pp. 518–523. Morgan Kaufmann, San Francisco, CA (1995)

    Google Scholar 

  19. Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–450 (2002)

    MATH  Google Scholar 

  20. Karakoulas, G.J., Shawe-Taylor, J.: Optimizing classifiers for imbalanced training sets. In: Proceedings of the 1998 conference on Advances in Neural Information Processing Systems II, pp. 253–259 (1999)

    Google Scholar 

  21. Karkanis, S.A., Iakovidis, D.K., Maroulis, D.E., Karras, D.A., Tzivras, M.D.: Computer aided tumor detection in endoscopic video using color wavelet features. IEEE T. Inf. Technol. B. 7(3), 141–152 (2003)

    Article  Google Scholar 

  22. Kittler, J., Hatef, M., Duin, R., Matas, J.: On combining classifiers. IEEE T. Pattern Anal. 20(3), 226–239 (1998)

    Article  Google Scholar 

  23. Kubat, M., Holte, R., Matwin, S.: Detection of oil-spills in radar images of sea surface. Mach. Learn. 30, 195–215 (1998)

    Article  Google Scholar 

  24. Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 179–186. Morgan Kaufmann, Nashville, Tennessee (1997)

    Google Scholar 

  25. Kuncheva, L.I., Bezdek, J., Duin, R.: Decision templates for multiple classifier fusion: an experimental comparison. Pattern Recogn. 34(2), 299–314 (2001)

    Article  MATH  Google Scholar 

  26. Li, P., Chan, K.L., Fang, W.: Hybrid kernel machine ensemble for imbalanced data sets. In: 18th International Conference on Pattern Recognition (ICPR’06), vol. 1, pp. 1108–1111 (2006)

    Google Scholar 

  27. Li, P., Chan, K.L., Fu, S., Krishnan, S.M.: An abnormal ECG beat detector approach for long-term monitoring of heart patients based on hybrid kernel machine ensemble. In: International Workshop on Multiple Classifier Systems (MCS 2005), Lecture Notes in Computer Science, vol. 3541, pp. 346–355. Springer (2005)

    Google Scholar 

  28. Li, P., Chan, K.L., Fu, S., Krishnan, S.M.: Neural networks in healthcare: potential and challenges. In: A Concept Learning-Based Patient-Adaptable Abnormal ECG Beat Detector for Long-Term Monitoring of Heart Patients, pp. 105–128. Idea Group Publishing, Hershey, PA (2006)

    Google Scholar 

  29. Li, P., Chan, K.L., Krishnan, S.M.: Learning a multi-size patch-based hybrid kernel machine ensemble for abnormal region detection in colonoscopic images. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 670–675 (2005)

    Google Scholar 

  30. Li, P., Chan, K.L., Krishnan, S.M., Gao, Y.: Detecting abnormal regions in colonoscopic images by patch-based classifier ensemble. In: 17th International Conference on Pattern Recognition (ICPR), vol. 3, pp. 774–777. Cambridge, UK (2004)

    Google Scholar 

  31. Li, P., Krishnan, S.M., Chan, K.L., Gao, Y.: Abnormal region detection in colonoscopic images using novelty detection technique. In: Proceedings of 7th International Workshop on Advanced Imaging Technology (IWAIT’2004). Singapore pp. 139–154, MIT, Cambridge, USA (2004)

    Google Scholar 

  32. Manevitz, L.M., Yousef, M.: One-class SVMs for document classification. J. Mach. Learn. 2, pp. 139–154. MIT, Cambridge, USA (2001)

    Google Scholar 

  33. Markou, M., Singh, S.: Novelty detection: a review-part 1: statistical approaches. Signal Process. 83(12), 2481–2497 (2003)

    Article  MATH  Google Scholar 

  34. Markou, M., Singh, S.: Novelty detection: a review-part 2: neural network based approaches. Signal Process. 83(12), 2499–2521 (2003)

    Article  MATH  Google Scholar 

  35. Mika, S., Rätsch, G., Weston, J., Schölkopf, B., Müller, K.R.: Fisher discriminant analysis with kernels. In: Hu, Y.H., Larsen, J., Wilson, E., Douglas, S. (eds.) Neural Networks for Signal Processing IX, pp. 41–48. IEEE (1999)

    Google Scholar 

  36. Mika, S., Schölkopf, B., Smola, A., Müller, K.R., Scholz, M., Rätsch, G.: Kernel pca and de-noising in feature spaces. In: Proceedings of the 1998 Conference on Advances in Neural Information Processing Systems II, pp. 536–542. MIT, Cambridge, MA (1999)

    Google Scholar 

  37. Osowski, S., Hoai, L., Markiewicz, T.: Support vector machine-based expert system for reliable heartbeat recognition. IEEE T. Bio-Med. Eng. 51(4), 582–589 (2004)

    Article  Google Scholar 

  38. Peng, J., Heisterkamp, D., Dai, H.: Adaptive quasiconformal kernel nearest neighbor classification. IEEE T. Pattern Anal. 26(5), 656–661 (2004)

    Article  Google Scholar 

  39. Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Advances in Kernel Methods: Support Vector Learning, pp. 185–208. MIT, Cambridge, MA (1999)

    Google Scholar 

  40. Platt, J.C.: Probabilities for SV Machines. In: Smola, A.J., Bartlett, P.J., Scholkopf, B., Schuurmans, D. (eds.) Advances in Large Margin Classifiers, pp. 61–74. MIT, Cambridge, MA (2000)

    Google Scholar 

  41. Raskutti, B., Kowalczyk, A.: Extreme re-balancing for SVMs: a case study. SIGKDD Explorations 6(1), 60–69 (2004)

    Article  Google Scholar 

  42. Rothganger, F., Lazebnik, S., Schmid, C., Ponce, J.: Segmenting, modeling, and matching video clips containing multiple moving objects. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 914–921 (2004)

    Google Scholar 

  43. Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)

    Article  MATH  Google Scholar 

  44. Schölkopf, B., Smola, A.J.: Learning with Kernels Support Vector Machines, Regularization, Optimization, and Beyond. MIT, Cambridge, MA (2002)

    Google Scholar 

  45. Shin, H., Cho, S.: How to deal with large dataset, class imbalance and binary output in SVM based response model. In: Proceedings of the Korean Data Mining Conference, pp. 93–107 (2003)

    Google Scholar 

  46. Shipp, C.A., Kuncheva, L.: Relationships between combination methods and measures of diversity in combining classifiers. Inform. Fusion 3(2), 135–148 (2002)

    Article  Google Scholar 

  47. Tax, D.: One-class classification:concept-learning in the absence of counter-examples. Asci dissertation series, Delft University of Technology (2001)

    Google Scholar 

  48. Tax, D., Duin, R.: Support vector data description. Pattern Recogn. Lett. 20(11–13), 1191–1199 (1999)

    Article  Google Scholar 

  49. Tax, D., Duin, R.: Image database retrieval with support vector data description. In: Proceedings of the Sixth Annual Conference of the Advanced School for Computing and Imaging, ASCI Delft (2000)

    Google Scholar 

  50. Tax, D., Duin, R.: Uniform object generation for optimizing one-class classifiers. J. Mach. Learn. Res. 2, 155–173 (2002)

    MATH  Google Scholar 

  51. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)

    Book  MATH  Google Scholar 

  52. Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)

    MATH  Google Scholar 

  53. Veropoulos, K., Cristianini, N., Campbell, C.: Controlling the sensitivity of support vector machines. In: Proceedings of the International Joint Conference on Artificial Intelligence, (IJCAI99), Stockholm, Sweden (1999)

    Google Scholar 

  54. Weiss, G., Provost, F.: The effect of class distribution on classifier learning: an empirical study. Tech. Report ML-TR-44, Department of Computer Science, Rutgers University, August 2001

    Google Scholar 

  55. Wolpert, D.H.: Stacked generalization. Neural Networks 5, 241–259 (1992)

    Article  Google Scholar 

  56. Wu, G., Chang, E.Y.: Class-boundary alignment for imbalanced dataset learning. In: The Twentieth ICML Workshop on Learning from Imbalanced Datasets, pp. 49–56. Washington, DC (2003)

    Google Scholar 

  57. Yanowitz, F.G.: The Alan E. Lindsay ECG learning center in cyberspace, http://medlib.med.utah.edu/kw/ecg/ (2003)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kap Luk Chan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Li, P., Chan, K.L., Fu, S., Krishnan, S.M. (2014). Kernel Machines for Imbalanced Data Problem in Biomedical Applications. In: Ma, Y., Guo, G. (eds) Support Vector Machines Applications. Springer, Cham. https://doi.org/10.1007/978-3-319-02300-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-02300-7_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-02299-4

  • Online ISBN: 978-3-319-02300-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics