Abstract
Kernel machines such as the support vector machines (SVMs) have been reported to perform well in many applications. However, the performance of a binary SVM can be adversely affected by an imbalanced set of training samples, known as the imbalanced data problem. One-class SVMs, as a recognition-based approach, can be used to train and recognize the majority class and such kernel machines have already been developed. In this chapter, we review and study the effects of imbalanced datasets on the performance of both one-class SVMs and binary SVMs. We show that a hybrid kernel machine comprising one-class SVMs and binary SVMs in a multi-classifier system alleviates the imbalanced data problem. We also report the deployment of such hybrid kernel machines in two biomedical applications where the imbalanced data problem exists.
The research presented in this chapter was carried out when all authors were with the Nanyang Technological University, Singapore.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
The Kernel-Machine.org http://www.kernel-machines.org/
Bach, F., Jordan, M.: Kernel independent component analysis. J. Mach. Learn. Res. 3(1), 1–48 (2003)
Bishop, C.M.: Neural Networks for Pattern Recognition. Clarendon, Oxford (1995)
Chawla, N., Bowyer, K., Hall, L., Kegelmeyer, W.: SMOTE: synthetic minority over-sampling technique. Artif. Intel. Res. 16, 321–357 (2002)
Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explorations 6(1), 1–6 (2004)
de Chazal, P., O’Dwyer, M., Reilly, R.: Automatic classification of heartbeats using ECG morphology and heartbeat interval features. IEEE T. Bio-Med. Eng. 51(7), 1196–1206 (2004)
Deselaers, T., Keysers, D., Ney, H.: Discriminative training for object recognition using image patches. In: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 2, pp. 157–162 (2005)
Drummond, C., Holte, R.C.: C4.5, class imbalance, and cost sensitivity: why undersampling beats over-sampling. In: Proceedings of the ICML 2003 Workshop on Learning from Imbalanced Data Sets II, vol. 11, Washington, DC (2003)
El-Naqa, I., Yang, Y., Wernick, M.N., Galatsanos, N.P., Nishikawa, R.M.: A support vector machine approach for detection of microcalcifications. IEEE T. Med. Imaging 21(12), 1552–1563 (2002)
Estabrooks, A., Jo, T., Japkowicz, N.: A multiple resampling method for learning from imbalances data sets. Comput. Intell. 20(1), 18–36 (2004)
Gal-Or, M., May, J.H., Spangler, W.E.: Assessing the predictive accuracy of diversity measures with domain-dependent asymmetric misclassification costs. Inform. Fusion J. (Special issue on Diversity in Multiple Classifier Systems) 6(1), 37–48 (2005)
Gokturk, S.B., Tomasi, C., Acar, B., Beaulieu, C.F., Paik, D., Jeffrey, B.J., Yee, J., Napel, S.: A statistical 3D pattern processing method for computer aided detection of polyps in CT colonography. IEEE T. Med. Imaging 20(12), 1251–1260 (2001)
Goldberger, A.L., Amaral, L.A.N., Glass, L., Hausdorff, J.M., Ivanov, P.C., Mark, R.G., Mietus, J.E., Moody, G.B., Peng, C.K., Stanley, H.E.: PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101(23), e215–e220 (2000)
Hojjatoleslami, A., Sardo, L., Kittler, J.: An RBF based classifier for detection of microcalcifications in mammograms with outlier rejection capability. In: International Conference on Neural Networks, vol. 3, pp. 1379–1384 (1997)
Hsu, R.L., Abdel-Mottaleb, M., Jain, A.K.: Face detection in color images. In: Proceedings of 2001 International Conference on Image Processing, vol. 1, pp. 1046–1049 (2001)
Hu, Y.H., Palreddy, S., Tompkins, W.J.: A patient-adaptable ECG beat classifier using a mixture of experts approach. IEEE T. Bio-Med. Eng. 44(9), 891–900 (1997)
Japkowicz, N.: The class imbalance problem: significance and strategies. In: Proceedings of the 2000 International Conference on Artificial Intelligence (IC-AI’2000), vol. 1, pp. 111–117 (2000)
Japkowicz, N., Myers, C., Gluck, M.: A novelty detection approach to classification. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pp. 518–523. Morgan Kaufmann, San Francisco, CA (1995)
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–450 (2002)
Karakoulas, G.J., Shawe-Taylor, J.: Optimizing classifiers for imbalanced training sets. In: Proceedings of the 1998 conference on Advances in Neural Information Processing Systems II, pp. 253–259 (1999)
Karkanis, S.A., Iakovidis, D.K., Maroulis, D.E., Karras, D.A., Tzivras, M.D.: Computer aided tumor detection in endoscopic video using color wavelet features. IEEE T. Inf. Technol. B. 7(3), 141–152 (2003)
Kittler, J., Hatef, M., Duin, R., Matas, J.: On combining classifiers. IEEE T. Pattern Anal. 20(3), 226–239 (1998)
Kubat, M., Holte, R., Matwin, S.: Detection of oil-spills in radar images of sea surface. Mach. Learn. 30, 195–215 (1998)
Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 179–186. Morgan Kaufmann, Nashville, Tennessee (1997)
Kuncheva, L.I., Bezdek, J., Duin, R.: Decision templates for multiple classifier fusion: an experimental comparison. Pattern Recogn. 34(2), 299–314 (2001)
Li, P., Chan, K.L., Fang, W.: Hybrid kernel machine ensemble for imbalanced data sets. In: 18th International Conference on Pattern Recognition (ICPR’06), vol. 1, pp. 1108–1111 (2006)
Li, P., Chan, K.L., Fu, S., Krishnan, S.M.: An abnormal ECG beat detector approach for long-term monitoring of heart patients based on hybrid kernel machine ensemble. In: International Workshop on Multiple Classifier Systems (MCS 2005), Lecture Notes in Computer Science, vol. 3541, pp. 346–355. Springer (2005)
Li, P., Chan, K.L., Fu, S., Krishnan, S.M.: Neural networks in healthcare: potential and challenges. In: A Concept Learning-Based Patient-Adaptable Abnormal ECG Beat Detector for Long-Term Monitoring of Heart Patients, pp. 105–128. Idea Group Publishing, Hershey, PA (2006)
Li, P., Chan, K.L., Krishnan, S.M.: Learning a multi-size patch-based hybrid kernel machine ensemble for abnormal region detection in colonoscopic images. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 670–675 (2005)
Li, P., Chan, K.L., Krishnan, S.M., Gao, Y.: Detecting abnormal regions in colonoscopic images by patch-based classifier ensemble. In: 17th International Conference on Pattern Recognition (ICPR), vol. 3, pp. 774–777. Cambridge, UK (2004)
Li, P., Krishnan, S.M., Chan, K.L., Gao, Y.: Abnormal region detection in colonoscopic images using novelty detection technique. In: Proceedings of 7th International Workshop on Advanced Imaging Technology (IWAIT’2004). Singapore pp. 139–154, MIT, Cambridge, USA (2004)
Manevitz, L.M., Yousef, M.: One-class SVMs for document classification. J. Mach. Learn. 2, pp. 139–154. MIT, Cambridge, USA (2001)
Markou, M., Singh, S.: Novelty detection: a review-part 1: statistical approaches. Signal Process. 83(12), 2481–2497 (2003)
Markou, M., Singh, S.: Novelty detection: a review-part 2: neural network based approaches. Signal Process. 83(12), 2499–2521 (2003)
Mika, S., Rätsch, G., Weston, J., Schölkopf, B., Müller, K.R.: Fisher discriminant analysis with kernels. In: Hu, Y.H., Larsen, J., Wilson, E., Douglas, S. (eds.) Neural Networks for Signal Processing IX, pp. 41–48. IEEE (1999)
Mika, S., Schölkopf, B., Smola, A., Müller, K.R., Scholz, M., Rätsch, G.: Kernel pca and de-noising in feature spaces. In: Proceedings of the 1998 Conference on Advances in Neural Information Processing Systems II, pp. 536–542. MIT, Cambridge, MA (1999)
Osowski, S., Hoai, L., Markiewicz, T.: Support vector machine-based expert system for reliable heartbeat recognition. IEEE T. Bio-Med. Eng. 51(4), 582–589 (2004)
Peng, J., Heisterkamp, D., Dai, H.: Adaptive quasiconformal kernel nearest neighbor classification. IEEE T. Pattern Anal. 26(5), 656–661 (2004)
Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Advances in Kernel Methods: Support Vector Learning, pp. 185–208. MIT, Cambridge, MA (1999)
Platt, J.C.: Probabilities for SV Machines. In: Smola, A.J., Bartlett, P.J., Scholkopf, B., Schuurmans, D. (eds.) Advances in Large Margin Classifiers, pp. 61–74. MIT, Cambridge, MA (2000)
Raskutti, B., Kowalczyk, A.: Extreme re-balancing for SVMs: a case study. SIGKDD Explorations 6(1), 60–69 (2004)
Rothganger, F., Lazebnik, S., Schmid, C., Ponce, J.: Segmenting, modeling, and matching video clips containing multiple moving objects. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 914–921 (2004)
Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)
Schölkopf, B., Smola, A.J.: Learning with Kernels Support Vector Machines, Regularization, Optimization, and Beyond. MIT, Cambridge, MA (2002)
Shin, H., Cho, S.: How to deal with large dataset, class imbalance and binary output in SVM based response model. In: Proceedings of the Korean Data Mining Conference, pp. 93–107 (2003)
Shipp, C.A., Kuncheva, L.: Relationships between combination methods and measures of diversity in combining classifiers. Inform. Fusion 3(2), 135–148 (2002)
Tax, D.: One-class classification:concept-learning in the absence of counter-examples. Asci dissertation series, Delft University of Technology (2001)
Tax, D., Duin, R.: Support vector data description. Pattern Recogn. Lett. 20(11–13), 1191–1199 (1999)
Tax, D., Duin, R.: Image database retrieval with support vector data description. In: Proceedings of the Sixth Annual Conference of the Advanced School for Computing and Imaging, ASCI Delft (2000)
Tax, D., Duin, R.: Uniform object generation for optimizing one-class classifiers. J. Mach. Learn. Res. 2, 155–173 (2002)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
Veropoulos, K., Cristianini, N., Campbell, C.: Controlling the sensitivity of support vector machines. In: Proceedings of the International Joint Conference on Artificial Intelligence, (IJCAI99), Stockholm, Sweden (1999)
Weiss, G., Provost, F.: The effect of class distribution on classifier learning: an empirical study. Tech. Report ML-TR-44, Department of Computer Science, Rutgers University, August 2001
Wolpert, D.H.: Stacked generalization. Neural Networks 5, 241–259 (1992)
Wu, G., Chang, E.Y.: Class-boundary alignment for imbalanced dataset learning. In: The Twentieth ICML Workshop on Learning from Imbalanced Datasets, pp. 49–56. Washington, DC (2003)
Yanowitz, F.G.: The Alan E. Lindsay ECG learning center in cyberspace, http://medlib.med.utah.edu/kw/ecg/ (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Li, P., Chan, K.L., Fu, S., Krishnan, S.M. (2014). Kernel Machines for Imbalanced Data Problem in Biomedical Applications. In: Ma, Y., Guo, G. (eds) Support Vector Machines Applications. Springer, Cham. https://doi.org/10.1007/978-3-319-02300-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-02300-7_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02299-4
Online ISBN: 978-3-319-02300-7
eBook Packages: EngineeringEngineering (R0)