Advertisement

Applied Intelligence

, Volume 49, Issue 8, pp 3109–3122 | Cite as

Multi-view learning with fisher kernel and bi-bagging for imbalanced problem

  • Zhe WangEmail author
  • Yiwen Zhu
  • Zhaozhi Chen
  • Jing ZhangEmail author
  • Wenli DuEmail author
Article
  • 207 Downloads

Abstract

Existing approaches for handling imbalanced problem are based on the discriminant approaches, while only little attention is dedicated to mining the probability information provided by generative approaches. Moreover, the multi-view learning trains classifier through combining different representations of data for improving the performance of classifier in imbalanced classification. In this paper, a learning framework consisting of fisher kernel and Bi-Bagging is proposed for imbalanced problem. The Fisher kernel is employed to integrate the probability information into the pristine feature of data. Thus, the generated fisher vector contain better discriminatory information. However, the generated fisher vector may lead to high-dimension overfitting. So the dataset represented by the fisher vector is then processed by Bi-Bagging to generate multi-view data and balanced training subsets, which not only reduces the high dimension of generated fisher vector but also promotes the accuracy of minority instances. In one word, the combination of fisher kernel and Bi-Bagging makes use of the probability information in the pristine feature and generates balanced multi-view training subsets with adequate dimension. Therefore, the proposed learning framework is independent of specific models, and the base classifier of the learning framework can be replaced by different linear classifier. Two experimental strategies are implemented to validate the effectiveness of the proposed learning framework for imbalanced datasets on 30 KEEL datasets.

Keywords

Fisher kernel Multi-view learning Ensemble learning Imbalanced learning Pattern recognition 

Notes

Acknowledgments

This work is supported by Natural Science Foundation of China under Grant No. 61672227, “Shuguang Program” supported by Shanghai Education Development Foundation and Shanghai Municipal Education Commission, and National Science Foundation of China for Distinguished Young Scholars under Grant 61725301.

References

  1. 1.
    Akaho S (2006) A kernel method for canonical correlation analysis. arXiv:cs/0609071
  2. 2.
    Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Multiple-Valued Logic Soft Comput 17:255–287Google Scholar
  3. 3.
    Bach F, Lanckriet GR, Jordan MI (2004) Multiple kernel learning, conic duality, and the SMO algorithm. In: International conference on machine learning. ACM, pp 6–13Google Scholar
  4. 4.
    Bishop CM (2007) Pattern recognition and machine learning. SpringerGoogle Scholar
  5. 5.
    Bryll R, Gutierrez-Osuna R, Quek F (2003) Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets. Pattern Recogn 36(6):1291–1302zbMATHCrossRefGoogle Scholar
  6. 6.
    Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009) Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Pacific-Asia conference on advances in knowledge discovery and data mining, pp 475–482Google Scholar
  7. 7.
    Chaudhuri K, Kakade SM, Livescu K, Sridharan K (2009) Multi-view clustering via canonical correlation analysis. In: International conference on machine learning, pp 129–136Google Scholar
  8. 8.
    Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357zbMATHCrossRefGoogle Scholar
  9. 9.
    Chen T, Guestrin C (2016) Xgboost; a scalable tree boosting system. In: Proceedings of the 22Nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’16. ACM, New York, pp 785–794Google Scholar
  10. 10.
    Cortes C, Vapnik V (1995) Support vector machine. Mach Learn 20(3):273–297zbMATHGoogle Scholar
  11. 11.
    Fan W, Stolfo SJ, Zhang J, Chan PK (1999) Adacost: misclassification cost-sensitive boosting. In: International conference on machine learning, vol 99, pp 97–105Google Scholar
  12. 12.
    Fumera G, Roli F (2002) Support vector machines with embedded reject option. Pattern Recogn Support Vector Mach, 68–82Google Scholar
  13. 13.
    Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2012) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern Part C (Appl Rev) 42(4):463–484CrossRefGoogle Scholar
  14. 14.
    Guo H, Li Y, Li Y, Liu X, Li J (2016) Bpso-adaboost-knn ensemble learning algorithm for multi-class imbalanced data classification. Eng Appl Artif Intel 49:176–193CrossRefGoogle Scholar
  15. 15.
    Han H, Wang W, Mao BH (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: Advances in intelligent computing, vol 3644. Springer, pp 878–887Google Scholar
  16. 16.
    He HB, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284CrossRefGoogle Scholar
  17. 17.
    Ho TK (1995) Random decision forests. In: International conference on document analysis and recognition, vol 1. IEEE, pp 278–282Google Scholar
  18. 18.
    Hosmer DW Jr, Lemeshow S, Sturdivant RX (1991) Applied logistic regression. Stat Med 10(7):1162–1163zbMATHCrossRefGoogle Scholar
  19. 19.
    Hotelling H (1935) Relations between two sets of variants. Biometrika 28(3-4):312–377Google Scholar
  20. 20.
    Jaakkola TS, Haussler D (1998) Exploiting generative models in discriminative classifiers. Adv Neural Inf Process Syst 11(11):487–493Google Scholar
  21. 21.
    Jo T, Japkowicz N (2004) Class imbalances versus small disjuncts. ACM Sigkdd Explor Newslett 6(1):40–49CrossRefGoogle Scholar
  22. 22.
    Sham MK, Dean PF (2007) Multi-view regression via canonical correlation analysis. Lect Notes Comput Sci 4539:82–96MathSciNetzbMATHCrossRefGoogle Scholar
  23. 23.
    Kwok T (1999) Moderating the outputs of support vector machine classifiers. IEEE Trans Neural Netw 10 (5):1018–1031CrossRefGoogle Scholar
  24. 24.
    Lanckriet GRG, Cristianini N, Bartlett P, Ghaoui LE, Jordan MI (2004) Learning the kernel matrix with semidefinite programming. J Mach Learn Res 5(Jan):27–72MathSciNetzbMATHGoogle Scholar
  25. 25.
    Leski J (2003) Ho–kashyap classifier with generalization control. Pattern Recogn Lett 24(14):2281–2290zbMATHCrossRefGoogle Scholar
  26. 26.
    Li Q, Li G, Niu WJ, Cao Y, Chang L, Tan J, Guo L (2016) Boosting imbalanced data learning with wiener process oversampling. Front Comput Sci, 1–16Google Scholar
  27. 27.
    Liu XY, Wu JX, Zhou ZH (2009) Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern Part B (Cybern) 39(2):539–550CrossRefGoogle Scholar
  28. 28.
    Maloof MA (2003) Learning when data sets are imbalanced and when costs are unequal and unknown. In: International conference on machine learning workshop learning from imbalanced data sets IIGoogle Scholar
  29. 29.
    Masnadi-Shirazi H, Vasconcelos N, Iranmehr A (2012) Cost-sensitive support vector machines. arXiv:1212.0975
  30. 30.
    Muslea I, Minton S, Knoblock CA (2002) Adaptive view validation: a first step towards automatic view detection. In: International conference on machine learning, pp 443–450Google Scholar
  31. 31.
    Muslea I, Minton S, Knoblock CA (2003) Active learning with strong and weak views: a case study on wrapper induction. In: International joint conference on artificial intelligence, vol 3, pp 415–420Google Scholar
  32. 32.
    Muslea IA (2011) Active learning with multiple views. J Artif Intell Res 27(1):203–233MathSciNetzbMATHGoogle Scholar
  33. 33.
    Nigam K, Ghani R (2000) Analyzing the effectiveness and applicability of co-training. In: International conference on information and knowledge management, pp 86–93Google Scholar
  34. 34.
    Platt J (1998) Sequential minimal optimization: a fast algorithm for training support vector machines. In: Advances in kernel methods-support vector learning, pp 212–223Google Scholar
  35. 35.
    Rakotomamonjy A, Bach F, Canu S, Grandvalet Y (2007) More efficiency in multiple kernel learning. In: International conference on machine learning, pp 775–782Google Scholar
  36. 36.
    Rakotomamonjy A, Bach F, Canu S, Grandvalet Y (2008) Simplemkl. J Mach Learn Res 9(3):2491–2521MathSciNetzbMATHGoogle Scholar
  37. 37.
    Seiffert C, Khoshgoftaar TM, Van HJ, Napolitano A (2010) Rusboost: a hybrid approach to alleviating class imbalance. IEEE Trans Syst Man Cybern-Part A: Syst Humans 40 (1):185– 197CrossRefGoogle Scholar
  38. 38.
    Sonnenburg S (2005) A general and efficient multiple kernel learning algorithm. Adv Neural Inf Process Syst 18:1273–1280Google Scholar
  39. 39.
    Subrahmanya N, Shin YC (2010) Sparse multiple kernel learning for signal processing applications. IEEE Trans Pattern Anal Mach Intell 32(5):788–798CrossRefGoogle Scholar
  40. 40.
    Sun B, Chen HY, Wang J, Xie H (2018) Evolutionary under-sampling based bagging ensemble method for imbalanced data classification. Front Comput Sci 12(2):331–350CrossRefGoogle Scholar
  41. 41.
    Sun Y, Wong AKC, Kamel MS (2009) Classification of imbalanced data: a review. Int J Pattern Recogn Artif Intell 23(4):687–719CrossRefGoogle Scholar
  42. 42.
    Szafranski M, Grandvalet Y, Rakotomamonjy A (2010) Composite kernel learning. Mach Learn 79 (1–2):73–103MathSciNetCrossRefGoogle Scholar
  43. 43.
    Wang Q, Luo Z, Huang J, Feng Y, Liu Z (2017) A novel ensemble method for imbalanced data learning: bagging of extrapolation-smote svm. Comput Intell Neurosci 2017:11Google Scholar
  44. 44.
    Wang W, Zhou ZH (2010) A new analysis of co-training. In: International conference on international conference on machine learning, pp 1135–1142Google Scholar
  45. 45.
    Xu C, Tao D, Xu C (2013) A survey on multi-view learning. arXiv:1304.5634
  46. 46.
    Xu Z, Jin R, Yang H, King I, Lyu MR (2010) Simple and efficient multiple kernel learning by group lasso. In: International conference on machine learning, pp 1175–1182Google Scholar
  47. 47.
    Yu S, Krishnapuram B, Rosales R, Rao RB (2011) Bayesian co-training. J Mach Learn Res 12 (3):2649–2680MathSciNetzbMATHGoogle Scholar
  48. 48.
    Zhu YJ, Wang Z, Gao DQ (2015) Gravitational fixed radius nearest neighbor for imbalanced problem. Knowl-Based Syst 90:224–238CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Key Laboratory of Advanced Control and Optimization for Chemical Processes, Ministry of EducationEast China University of Science and TechnologyShanghaiChina
  2. 2.Department of Computer Science and EngineeringEast China University of Science and TechnologyShanghaiPeople’s Republic of China

Personalised recommendations