AutoIDL: Automated Imbalanced Data Learning via Collaborative Filtering

Zhang, Jingqi; Sun, Zhongbin; Qi, Yong

doi:10.1007/978-3-030-55393-7_9

Jingqi Zhang¹⁴,
Zhongbin Sun¹⁴ &
Yong Qi¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12275))

Included in the following conference series:

International Conference on Knowledge Science, Engineering and Management

1313 Accesses

Abstract

AutoML aims to select an appropriate classification algorithm and corresponding hyperparameters for an individual dataset. However, existing AutoML methods usually ignore the intrinsic imbalance nature of most real-world datasets and lead to poor performance. For handling imbalanced data, sampling methods have been widely used since their independence of the used algorithms. We propose a method named AutoIDL for selecting the sampling methods as well as classification algorithms simultaneously. Particularly, AutoIDL firstly represents datasets as graphs and extracts their meta-features with a graph embedding method. In addition, meta-targets are identified as pairs of sampling methods and classification algorithms for each imbalanced dataset. Secondly, the user-based collaborative filtering method is employed to train a ranking model based on the meta repository to select appropriate sampling methods and algorithms for a new dataset. Extensive experimental results demonstrate that AutoIDL is effective for automated imbalanced data learning and it outperforms the state-of-the-art AutoML methods.

Supported by National Natural Science Foundation of China under Grant No. 61702405 and the China Postdoctoral Science Foundation under Grant No. 2017M623176.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Barua, S., Islam, M.M., Yao, X., Murase, K.: Mwmote-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans. Knowl. Data Eng. 26(2), 405–425 (2012)
Article Google Scholar
Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: PAKDD, pp. 475–482 (2009)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smotesynthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article Google Scholar
Douzas, G., Bacao, F., Last, F.: Improving imbalanced learning through a heuristic oversampling method based on k-means and smote. Inf. Sci. 465, 1–20 (2018)
Article Google Scholar
Elshawi, R., Maher, M., Sakr, S.: Automated machine learning: state-of-the-art and open challenges. CoRR (2019). http://arxiv.org/abs/1906.02287
Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: NeurIPS, pp. 2962–2970 (2015)
Google Scholar
Guo, H., Li, Y., Jennifer, S., Gu, M., Huang, Y., Gong, B.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)
Article Google Scholar
Han, H., Wang, W.Y., Mao, B.H.: Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: ICIC, pp. 878–887 (2005)
Google Scholar
Hart, P.: The condensed nearest neighbor rule. IEEE Trans. Inf. Theory 14(3), 515–516 (1968)
Article Google Scholar
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: IJCNN, pp. 1322–1328 (2008)
Google Scholar
Herlocker, J.L., Konstan, J.A., Borchers, A., Riedl, J.: An algorithmic framework for performing collaborative filtering. In: SIGIR, pp. 227–234 (1999)
Google Scholar
Kubat, M., Matwin, S., et al.: Addressing the curse of imbalanced training sets: one-sided selection. In: ICML, pp. 179–186 (1997)
Google Scholar
Laurikkala, J.: Improving identification of difficult small classes by balancing class distribution. In: Quaglini, S., Barahona, P., Andreassen, S. (eds.) AIME 2001. LNCS (LNAI), vol. 2101, pp. 63–66. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-48229-6_9
Chapter Google Scholar
Lin, W.C., Tsai, C.F., Hu, Y.H., Jhang, J.S.: Clustering-based undersampling in class-imbalanced data. Inf. Sci. 409, 17–26 (2017)
Article Google Scholar
López, V., Fernández, A., García, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 250, 113–141 (2013)
Article Google Scholar
Mani, I., Zhang, I.: KNN approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of Workshop on Learning from Imbalanced Datasets, pp. 1–7 (2003)
Google Scholar
Narayanan, A., Chandramohan, M., Venkatesan, R., Chen, L., Liu, Y., Jaiswal, S.: graph2vec: learning distributed representations of graphs. CoRR (2017). http://arxiv.org/abs/1707.05005
Olson, R.S., Moore, J.H.: TPOT: a tree-based pipeline optimization tool for automating machine learning. In: Hutter, F., Kotthoff, L., Vanschoren, J. (eds.) Automated Machine Learning. TSSCML, pp. 151–160. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05318-5_8
Chapter Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: KDD, pp. 847–855 (2013)
Google Scholar
Tomek, I.: Two modifications of CNN. IEEE Trans. Syst. Man Cybern. 6, 769–772 (1976)
MathSciNet MATH Google Scholar
Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern. 3, 408–421 (1972)
Article MathSciNet Google Scholar
Yang, C., Akimoto, Y., Kim, D.W., Udell, M.: OBOE: collaborative filtering for AutoML model selection. In: KDD, pp. 1173–1183 (2019)
Google Scholar
Yen, S.J., Lee, Y.S.: Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst. Appl. 36(3), 5718–5727 (2009)
Article Google Scholar
Zhu, X.J.: Semi-supervised learning literature survey. Technical report. University of Wisconsin-Madison Department of Computer Sciences (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, China
Jingqi Zhang, Zhongbin Sun & Yong Qi

Authors

Jingqi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhongbin Sun
View author publications
You can also search for this author in PubMed Google Scholar
Yong Qi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhongbin Sun .

Editor information

Editors and Affiliations

Deakin University, Geelong, VIC, Australia
Gang Li
University of Electronic Science and Technology of China, Chengdu, China
Heng Tao Shen
Beijing Institute of Technology, Beijing, China
Ye Yuan
Zhejiang Gongshang University, Hangzhou, China
Xiaoyang Wang
Zhejiang Normal University, Jinhua, China
Huawen Liu
National University of Defense Technology, Changsha, China
Xiang Zhao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, J., Sun, Z., Qi, Y. (2020). AutoIDL: Automated Imbalanced Data Learning via Collaborative Filtering. In: Li, G., Shen, H., Yuan, Y., Wang, X., Liu, H., Zhao, X. (eds) Knowledge Science, Engineering and Management. KSEM 2020. Lecture Notes in Computer Science(), vol 12275. Springer, Cham. https://doi.org/10.1007/978-3-030-55393-7_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-55393-7_9
Published: 20 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-55392-0
Online ISBN: 978-3-030-55393-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics