Abstract
Most of the research in class imbalance are carried out in standard (binary or multi-class) classification problems. However, in recent years, researchers have addressed new classification frameworks beyond standard classification in different aspects. Several variations of class imbalance problem appear within these frameworks. This chapter reviews the problem of class imbalance for a spectrum of these non-classical problems. Throughout this chapter, in Sect. 12.2 some research studies related to class imbalance where only partially labeled data is available (SSL) are reviewed. Then, in Sect. 12.3 the problem of label imbalance in problems where more than a label can be associated to an instance (Multilabel Learning) is discussed. In Sect. 12.4 the problem of class imbalance when labels are associated to bags of instances, rather than individually (Multi-instance Learning), is analyzed. Next, Sect. 12.5 refers to the problem of class imbalance when there exists an ordinal relation among classes (Ordinal Classification). Finally, in Sect. 12.6 some concluding remarks are presented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Multilabel learning differs from multi-class classifier as in the latter only one label, from a set larger than two possible classes, is associated to each instance.
References
Attenberg, J., Ertekin, S.: Class imbalance and active learning. In: He, H., Ma, Y. (eds.) Imbalanced Learning: Foundations, Algorithms, and Applications, pp. 101–149. IEEE Press/Wiley, Hoboken (2013)
Attenberg, J., Provost, F.: Why label when you can search? Alternatives to active learning for applying human resources to build classification models under extreme class imbalance. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, pp. 423–432. ACM (2010)
Attenberg, J., Provost, F.: Inactive learning? Difficulties employing active learning in practice. ACM SIGKDD Explor. Newsl. 12(2), 36–41 (2011)
Baccianella, S., Esuli, A., Sebastiani, F.: Evaluation measures for ordinal regression. In: Ninth International Conference on Intelligent Systems Design and Applications, ISDA’09, Pisa, 30 Nov–2 Dec 2009, pp. 283–287 (2009)
Balcan, M.F., Hanneke, S.: Robust interactive learning. In: Conference on Learning Theory, New York, pp. 20–1 (2012)
Beygelzimer, A., Hsu, D.J., Langford, J., Zhang, C.: Search improves label for active learning. In: Advances in Neural Information Processing Systems, pp. 3342–3350 (2016)
Bloodgood, M., Vijay-Shanker, K.: Taking into account the differences between actively and passively acquired data: the case of active learning with support vector machines for imbalanced datasets. In: Proceedings of Human Language Technologies, New York, pp. 137–140. Association for Computational Linguistics (2009)
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, Madison, pp. 92–100. ACM (1998)
Branco, P., Torgo, L., Ribeiro, R.P.: A survey of predictive modeling on imbalanced domains. ACM Comput. Surv. 49(2), 31:1–31:50 (2016)
Chapelle, O., Scholkopf, B., Zien, A.: Semi-supervised learning. IEEE Trans. Neural Netw. 20(3), 542–542 (2009)
Charte, F., Rivera, A., del Jesus, M.J., Herrera, F.: A first approach to deal with imbalance in multi-label datasets. In: International Conference on Hybrid Artificial Intelligence Systems, pp. 150–160. Springer, Berlin/Heidelberg (2013)
Charte, F., Rivera, A., del Jesus, M.J., Herrera, F.: Concurrence among imbalanced labels and its influence on multilabel resampling algorithms. In: International Conference on Hybrid Artificial Intelligence Systems, Salamanca, pp. 110–121. Springer (2014)
Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: Addressing imbalance in multilabel classification: measures and random resampling algorithms. Neurocomputing 163, 3–16 (2015)
Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: Mlsmote: approaching imbalanced multilabel learning through synthetic instance generation. Knowl.-Based Syst. 89, 385–397 (2015)
Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: Dealing with difficult minority labels in imbalanced mutilabel data sets. Neurocomputing (2017, in press). https://doi.org/10.1016/j.neucom.2016.08.158
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Chen, Y., Wang, G., Dong, S.: Learning with progressive transductive support vector machine. Pattern Recogn. Lett. 24(12), 1845–1855 (2003)
Chen, K., Lu, B.L., Kwok, J.T.: Efficient classification of multi-label and imbalanced data using min-max modular classifiers. In: International Joint Conference on Neural Networks (IJCNN’06), Vancouver, pp. 1770–1775. IEEE (2006)
Cieslak, D.A., Hoens, T.R., Chawla, N.V., Kegelmeyer, W.P.: Hellinger distance decision trees are robust and skew-insensitive. Data Min. Knowl. Disc. 24(1), 136–158 (2012)
Cruz-Ramírez, M., Hervás-Martínez, C., Sánchez-Monedero, J., Gutiérrez, P.A.: Metrics to guide a multi-objective evolutionary algorithm for ordinal classification. Neurocomputing 135, 21–31 (2014)
Daniels, Z.A., Metaxas, D.N.: Addressing imbalance in multi-label classification using structured Hellinger forests. In: Thirty-First AAAI Conference on Artificial Intelligence, San Francisco (2017)
Dembczynski, K., Jachnik, A., Kotlowski, W., Waegeman, W., Hüllermeier, E.: Optimizing the f-measure in multi-label classification: plug-in rule approach versus structured loss minimization. ICML 28(3), 1130–1138 (2013)
Dendamrongvit, S., Kubat, M.: Undersampling approach for imbalanced training sets and induction from multi-label text-categorization domains. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, Melbourne, pp. 40–52. Springer (2009)
Ertekin, S.: Adaptive oversampling for imbalanced data classification. In: Proceedings of the 28th International Symposium on Computer and Information Sciences, Paris. Lecture Notes in Electrical Engineering, vol. 264, pp. 261–269. Springer (2013)
Ertekin, S., Huang, J., Bottou, L., Giles, L.: Learning on the border: active learning in imbalanced data classification. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, pp. 127–136. ACM (2007)
Gammerman, A., Vovk, V., Vapnik, V.: Learning by transduction. In: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, Madison, pp. 148–155. Morgan Kaufmann Publishers Inc. (1998)
Giraldo-Forero, A.F., Jaramillo-Garzón, J.A., Ruiz-Muñoz, J.F., Castellanos-Domínguez, C.G.: Managing imbalanced data sets in multi-label problems: a case study with the smote algorithm. In: Iberoamerican Congress on Pattern Recognition, La Havana, pp. 334–342. Springer (2013)
Gutiérrez, P.A., Pérez-Ortiz, M., Sánchez-Monedero, J., Fernández-Navarro, F., Hervás-Martínez, C.: Ordinal regression methods: survey and experimental study. IEEE Trans. Knowl. Data Eng. 28(1), 127–146 (2016)
He, J., Gu, H., Liu, W.: Imbalanced multi-modal multi-label learning for subcellular localization prediction of human proteins with both single and multiple sites. PloS One 7(6), e37155 (2012)
Hernández-González, J., Inza, I., Lozano, J.A.: Weak supervision and other non-standard classification problems: a taxonomy. Pattern Recogn. Lett. 69, 49–55 (2016)
Herrera, F., Charte, F., Rivera, A.J., del Jesus, M.J.: Multilabel Classification: Problem Analysis, Metrics and Techniques. Springer, Cham (2016)
Herrera, F., Ventura, S., Bello, R., Cornelis, C., Zafra, A., Sánchez-Tarragó, D., Vluymans, S.: Multiple Instance Learning: Foundations and Algorithms. Springer, Cham (2016)
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)
Jacobusse, G., Veenman, C.: On selection bias with imbalanced classes. In: International Conference on Discovery Science, Bari, pp. 325–340. Springer (2016)
Joachims, T.: Transductive inference for text classification using support vector machines. In: International Conference on Machine Learning, Bled, pp. 200–209 (1999)
Juszczak, P., Duin, R.P.: Uncertainty sampling methods for one-class classifiers. In: Proceedings of the ICML, Washington, DC, vol. 3 (2003)
Kim, S., Kim, H., Namkoong, Y.: Ordinal classification of imbalanced data with application in emergency and disaster information services. IEEE Intell. Syst. 31(5), 50–56 (2016)
Kourtis, I., Stamatatos, E.: Author identification using semi-supervised learning. In: CLEF’2011 Conference on Multilingual and Multimodal Information Access Evaluation (Lab and Workshop Notebook Papers), Amsterdam (2011)
Levina, E., Bickel, P.J.: Maximum likelihood estimation of intrinsic dimension. Ann Arbor MI 48109, 1092 (2004)
Li, S., Wang, Z., Zhou, G., Lee, S.Y.M.: Semi-supervised learning for imbalanced sentiment classification. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence IJCAI’2011, Barcelona, pp. 1826–1831 (2011)
Melki, G., Cano, A., Ventura, S.: MIRSVM : multi-instance support vector machine with bag representatives. Pattern Recogn. 79, 228–241 (2018)
Mera, C., Orozco-Alzate, M., Branch, J.: Improving representation of the positive class in imbalanced multiple-instance learning. In: International Conference Image Analysis and Recognition, Vilamoura, pp. 266–273. Springer (2014)
Mera, C., Arrieta, J., Orozco-Alzate, M., Branch, J.: A bag oversampling approach for class imbalance in multiple instance learning. In: Iberoamerican Congress on Pattern Recognition, pp. 724–731. Springer (2015)
Nekooeimehr, I., Lai-Yuen, S.K.: Cluster-based weighted oversampling for ordinal regression (CWOS-Ord). Neurocomputing 218, 51–60 (2016)
Pakrashi, A., Mac Namee, B.: Stacked-MLkNN: a stacking based improvement to multi-label k-nearest neighbours. In: First International Workshop on Learning with Imbalanced Domains: Theory and Applications, pp. 51–63 (2017)
Pang, S., Ban, T., Kadobayashi, Y., Kasabov, N.: Personalized mode transductive spanning SVM classification tree. Inf. Sci. 181(11), 2071–2085 (2011)
Pathak, D., Shelhamer, E., Long, J., Darrell, T.: Fully convolutional multi-class multiple instance learning. In: International Conference on Learning Representations (ICLR) Workshop, San Diego, arXiv:1412.7144 (2015)
Pérez-Ortiz, M., Gutiérrez, P.A., Hervás-Martínez, C., Yao, X.: Graph-based approaches for over-sampling in the context of ordinal regression. IEEE Trans. Knowl. Data Eng. 27(5), 1233–1245 (2015)
Pérez-Ortiz, M., Sáez, A., Sánchez-Monedero, J., Gutiérrez, P.A., Hervás-Martínez, C.: Tackling the ordinal and imbalance nature of a melanoma image classification problem. In: 2016 International Joint Conference on Neural Networks, IJCNN’2016, Vancouver, 24–29 July 2016, pp. 2156–2163 (2016)
Prez-Ortiz, M., Gutirrez, P., Aylln-Tern, M., Heaton, N., Ciria, R., Briceo, J., Hervs-Martnez, C.: Synthetic semi-supervised learning in imbalanced domains. Knowl.-Based Syst. 123(C), 75–87 (2017)
Raskutti, B., Kowalczyk, A.: Extreme re-balancing for SVMS: a case study. ACM SIGKDD Explor. Newsl. 6(1), 60–69 (2004)
Stamatatos, E.: Author identification using imbalanced and limited training texts. In: 18th International Workshop on Database and Expert Systems Applications (DEXA’07), pp. 237–241. IEEE (2007)
Stanescu, A., Caragea, D.: Semi-supervised self-training approaches for imbalanced splice site datasets. In: Proceedings of the Sixth International Conference on Bioinformatics and Computational Biology, BICoB’2014, Las Vegas, pp. 131–136 (2014)
Sun, K.W., Lee, C.H.: Addressing class-imbalance in multi-label learning via two-stage multi-label hypernetwork. Neurocomputing 266, 375–389 (2017)
Tahir, M.A., Kittler, J., Bouridane, A.: Multilabel classification using heterogeneous ensemble of multi-label classifiers. Pattern Recogn. Lett. 33(5), 513–523 (2012)
Tahir, M.A., Kittler, J., Yan, F.: Inverse random under sampling for class imbalance problem and its application to multi-label classification. Pattern Recogn. 45(10), 3738–3750 (2012)
Tepvorachai, G., Papachristou, C.: Multi-label imbalanced data enrichment process in neural net classifier training. In: IEEE International Joint Conference on Neural Networks (IJCNN’2008), Hong Kong, pp. 1301–1307. IEEE (2008)
Tomanek, K., Hahn, U.: Reducing class imbalance during active learning for named entity annotation. In: Proceedings of the Fifth International Conference on Knowledge Capture, Redondo Beach, pp. 105–112. ACM (2009)
Tomek, I.: An experiment with the edited nearest-neighbor rule. IEEE Trans. Syst. Man Cybern. SMC-6(6), 448–452 (1976)
Torgo, L., Branco, P., Ribeiro, R.P., Pfahringer, B.: Resampling strategies for regression. Exp. Syst. 32(3), 465–476 (2015)
Vapnik, V.N.: Statistical Learning Theory. Wiley-Interscience, New York/Chichester (1998)
Vluymans, S., Tarragó, D.S., Saeys, Y., Cornelis, C., Herrera, F.: Fuzzy rough classifiers for class imbalanced multi-instance data. Pattern Recogn. 53, 36–45 (2016)
Waegeman, W., Baets, B.D., Boullart, L.: ROC analysis in ordinal regression learning. Pattern Recogn. Lett. 29(1), 1–9 (2008)
Wang, J., Chang, S.F., Zhou, X., Wong, S.T.: Active microscopic cellular image annotation by superposable graph transduction with imbalanced labels. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR’2008), Anchorage, pp. 1–8. IEEE (2008)
Wang, J., Jebara, T., Chang, S.F.: Graph transduction via alternating minimization. In: Proceedings of the 25th International Conference on Machine Learning, Helsinki, pp. 1144–1151. ACM (2008)
Wang, X., Liu, X., Japkowicz, N., Matwin, S.: Resampling and cost-sensitive methods for imbalanced multi-instance learning. In: 2013 IEEE 13th International Conference on Data Mining Workshops (ICDMW), Dallas, pp. 808–816. IEEE (2013)
Wang, X., Matwin, S., Japkowicz, N., Liu, X.: Cost-sensitive boosting algorithms for imbalanced multi-instance datasets. In: Canadian Conference on Artificial Intelligence, Regina, pp. 174–186. Springer (2013)
Wang, A., Liu, L., Jin, X., Li, Y.: Adapting TSVM for fault diagnosis with imbalanced class data. In: Control and Decision Conference (CCDC), 2016 Chinese, Yinchuan, pp. 2919–2923. IEEE (2016)
Wang, S., Minku, L.L., Yao, X.: A systematic study of online class imbalance learning with concept drift. IEEE Trans. Neural Netw. Learn. Syst. 29(10), 4802–4821 (2018)
Xu, X., Li, B.: Multiple class multiple-instance learning and its application to image categorization. Int. J. Image Graph. 7(3), 427–444 (2007)
Youngs, N., Shasha, D., Bonneau, R.: Positive-unlabeled learning in the face of labeling bias. In: 2015 IEEE International Conference on Data Mining Workshop (ICDMW), New Jersey, pp. 639–645. IEEE (2015)
Zhang, M.L., Li, Y.K., Liu, X.Y.: Towards class-imbalance aware multi-label learning. In: IJCAI, pp. 4041–4047 (2015)
Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: Advances in Neural Information Processing Systems, vol. 16, pp. 321–328. MIT Press, Cambridge (2004)
Zhu, X., Goldberg, A.B.: Introduction to semi-supervised learning. Synth. Lect. Artif. Intell. Mach. Learn. 3(1), 1–130 (2009)
Zhu, J., Hovy, E.H.: Active learning for word sense disambiguation with methods for addressing the class imbalance problem. In: EMNLP-CoNLL, vol. 7, pp. 783–790 (2007)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Fernández, A., García, S., Galar, M., Prati, R.C., Krawczyk, B., Herrera, F. (2018). Non-classical Imbalanced Classification Problems. In: Learning from Imbalanced Data Sets. Springer, Cham. https://doi.org/10.1007/978-3-319-98074-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-98074-4_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98073-7
Online ISBN: 978-3-319-98074-4
eBook Packages: Computer ScienceComputer Science (R0)