A density weighted fuzzy outlier clustering approach for class imbalanced learning

Wang, Xiaokang; Wang, Huiwen; Wang, Yihui

doi:10.1007/s00521-020-04747-4

A density weighted fuzzy outlier clustering approach for class imbalanced learning

Original Article
Published: 29 January 2020

Volume 32, pages 13035–13049, (2020)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

566 Accesses
20 Citations
Explore all metrics

Abstract

The class imbalance problem is widely studied in the machine learning community, and it is present in many real-world applications such as spam filtering, anomaly detection and medical diagnosis. In this paper, we propose a density weighted fuzzy outlier clustering approach for class imbalanced learning. The method considers a novel fuzzy neighborhood relation with local density information when assigning the weights to the samples in the clustering process, and it is then hybridized with the fuzzy outlier clustering approach for a novel fuzzy clustering method. In this way, the most representative majority class samples are chosen while the outlier samples are subjected to elimination. The validity of the proposed method is tested with synthetic and real-world datasets which demonstrates superior performance compared to other clustering-based resampling schemes. Thus, the density weighted fuzzy outlier clustering approach can be used for real life imbalanced problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Outlier detection using an ensemble of clustering algorithms

Article 03 November 2021

An iterative approach to unsupervised outlier detection using ensemble method and distance-based data filtering

Article Open access 17 February 2022

A Novel Density-Based Clustering Approach for Outlier Detection in High-Dimensional Data

References

Anand A, Pugalenthi G, Fogel GB, Suganthan PN (2010) An approach for classification of highly imbalanced data using weighting and undersampling. Amino Acids 39(5):1385–1391. https://doi.org/10.1007/s00726-010-0595-2
Article Google Scholar
Barua S, Islam MM, Yao X, Murase K (2014) Mwmote-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans Knowl Data Eng 26(2):405–425. https://doi.org/10.1109/TKDE.2012.232
Article Google Scholar
Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Springer, Boston, pp 95–154
Book Google Scholar
Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2012) Dbsmote: density-based synthetic minority over-sampling technique. Appl Intell 36(3):664–684. https://doi.org/10.1007/s10489-011-0287-y
Article Google Scholar
Celebi ME, Kingravi HA, Vela PA (2013) A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst Appl 40(1):200–210. https://doi.org/10.1016/j.eswa.2012.07.021
Article Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2011) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16(1):321–357
MATH Google Scholar
Dagher I (2012) Clustering with complex centers. Neural Comput Appl 21(1):133–144. https://doi.org/10.1007/s00521-011-0616-4
Article Google Scholar
Devi D, Biswas S, Purkayastha B (2017) Redundancy-driven modified tomek-link based undersampling: a solution to class imbalance. Pattern Recogni Lett 93:3–12. https://doi.org/10.1016/j.patrec.2016.10.006
Article Google Scholar
Drummond C, Holte R (2003) C4.5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on learning from imbalanced datasets II, held in conjunction with ICML 2003
Du M, Ding S, Xue Y (2018) A robust density peaks clustering algorithm using fuzzy neighborhood. Int J Mach Learn Cybern 9(7):1131–1140. https://doi.org/10.1007/s13042-017-0636-1
Article Google Scholar
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second international conference on knowledge discovery and data mining, KDD’96. AAAI Press, pp 226–231. http://dl.acm.org/citation.cfm?id=3001460.3001507
Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H, Bing G (2017) Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl 73:220–239. https://doi.org/10.1016/j.eswa.2016.12.035
Article Google Scholar
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. ACM SIGKDD Explor Newsl 11(1):10–18
Article Google Scholar
Han H, Wang WY, Mao BH (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: Huang DS, Zhang XP, Huang GB (eds) Advances in intelligent computing. Springer, Berlin, pp 878–887
Chapter Google Scholar
He Z, Xu X, Deng S (2003) Discovering cluster-based local outliers. Pattern Recognit Lett 24(9):1641–1650. https://doi.org/10.1016/S0167-8655(03)00003-5
Article MATH Google Scholar
Huang JZ, Ng MK, Rong H, Li Z (2005) Automated variable weighting in k-means type clustering. IEEE Trans Pattern Anal Mach Intell 27(5):657–668. https://doi.org/10.1109/TPAMI.2005.95
Article Google Scholar
Huang X, Ye Y, Zhang H (2014) Extensions of kmeans-type algorithms: a new clustering framework by integrating intracluster compactness and intercluster separation. IEEE Trans Neural Netw Learn Syst 25(8):1433–1446. https://doi.org/10.1109/TNNLS.2013.2293795
Article Google Scholar
Keller A (2000) Fuzzy clustering with outliers. In: Proceedings of the NAFIPS00 2000, pp 143–147
Khanali H, Vaziri B (2019) An improved approach to fuzzy clustering based on fcm algorithm and extended vikor method. Neural Comput Appl. https://doi.org/10.1007/s00521-019-04035-w
Article Google Scholar
Krawczyk B, Galar M, Jele ukasz, Herrera F (2016) Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy. Appl Soft Comput 38:714–726. https://doi.org/10.1016/j.asoc.2015.08.060
Article Google Scholar
Lin WC, Tsai CF, Hu YH, Jhang JS (2017) Clustering-based undersampling in class-imbalanced data. Inf Sci 409:17–26. https://doi.org/10.1016/j.ins.2017.05.008
Article Google Scholar
Lopez V, del Rio S, Benitez JM, Herrera F (2015) Cost-sensitive linguistic fuzzy rule based classification systems under the mapreduce framework for imbalanced big data. Fuzzy Sets Syst 258:5–38. https://doi.org/10.1016/j.fss.2014.01.015
Article MathSciNet Google Scholar
Majhi SK (2019) Fuzzy clustering algorithm based on modified whale optimization algorithm for automobile insurance fraud detection. In: Evolutionary intelligence, pp 1–12. https://doi.org/10.1007/s12065-019-00260-3
Ofek N, Rokach L, Stern R, Shabtai A (2017) Fast-CBUS: a fast clustering-based undersampling method for addressing the class imbalance problem. Neurocomputing 243:88–102. https://doi.org/10.1016/j.neucom.2017.03.011
Article Google Scholar
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496
Article Google Scholar
Silva GRL, Neto PC, Torres LCB, Braga AP (2019) A fuzzy data reduction cluster method based on boundary information for large datasets. Neural Comput Appl. https://doi.org/10.1007/s00521-019-04049-4
Article Google Scholar
Somasundaram A, Reddy S (2019) Parallel and incremental credit card fraud detection model to handle concept drift and data imbalance. Neural Comput Appl 31(1):3–14. https://doi.org/10.1007/s00521-018-3633-8
Article Google Scholar
Somasundaram A, Reddy US (2017) Modelling a stable classifier for handling large scale data with noise and imbalance. In: Proceedings of the 2017 international conference on computational intelligence in data science (ICCIDS) Chennai, India, pp 16
Stetco A, Zeng XJ, Keane J (2015) Fuzzy c-means++: fuzzy c-means with effective seeding initialization. Expert Syst Appl 42(21):7541–7548. https://doi.org/10.1016/j.eswa.2015.05.014
Article Google Scholar
Tukey J (1977) Exploratory data analysis. Addison-Wesley Publishing Company, Menlo Park
MATH Google Scholar
Vo T, Nguyen T, Le CT (2019) A hybrid framework for smile detection in class imbalance scenarios. Neural Comput Appl 31(12):85838592
Article Google Scholar
Yen SJ, Lee YS (2009) Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst Appl 36(3, Part 1):5718–5727. https://doi.org/10.1016/j.eswa.2008.06.108
Article Google Scholar
Yoon K, Kwek S (2007) A data reduction approach for resolving the imbalanced data issue in functional genomics. Neural Comput Appl 16(3):295–306. https://doi.org/10.1007/s00521-007-0089-7
Article Google Scholar
Yu H, Mu C, Sun C, Yang W, Zuo X (2015) Support vector machine-based optimized decision threshold adjustment strategy for classifying imbalanced data. Knowl Based Syst 76:67–78
Article Google Scholar
Zhang H, Wang S, Xu X, Chow TWS, Wu QMJ (2018) Tree2vector: learning a vectorial representation for tree-structured data. IEEE Trans Neural Netw Learn Syst 29(11):5304–5318. https://doi.org/10.1109/TNNLS.2018.2797060
Article MathSciNet Google Scholar

Download references

Acknowledgements

The work is supported by the National Natural Science Foundation of China (Grant No. 71420107025). The authors would like to thank the associate editor and anonymous referees for their helpful and constructive comments.

Author information

Authors and Affiliations

School of Economics and Management, Beihang University, Beijing, China
Xiaokang Wang & Huiwen Wang
Institute for Social and Economic Research and Policy, Columbia University, New York, 10027, USA
Yihui Wang

Authors

Xiaokang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Huiwen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yihui Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaokang Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, X., Wang, H. & Wang, Y. A density weighted fuzzy outlier clustering approach for class imbalanced learning. Neural Comput & Applic 32, 13035–13049 (2020). https://doi.org/10.1007/s00521-020-04747-4

Download citation

Received: 29 July 2019
Accepted: 17 January 2020
Published: 29 January 2020
Issue Date: August 2020
DOI: https://doi.org/10.1007/s00521-020-04747-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A density weighted fuzzy outlier clustering approach for class imbalanced learning

Abstract

Access this article

Similar content being viewed by others

Outlier detection using an ensemble of clustering algorithms

An iterative approach to unsupervised outlier detection using ensemble method and distance-based data filtering

A Novel Density-Based Clustering Approach for Outlier Detection in High-Dimensional Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A density weighted fuzzy outlier clustering approach for class imbalanced learning

Abstract

Access this article

Similar content being viewed by others

Outlier detection using an ensemble of clustering algorithms

An iterative approach to unsupervised outlier detection using ensemble method and distance-based data filtering

A Novel Density-Based Clustering Approach for Outlier Detection in High-Dimensional Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation