This paper focuses on the problem of low learning algorithm accuracy caused by serious imbalance of big data in Internet of Things, and proposes a bidirectional self-adaptive resampling algorithm for imbalanced big data. Based on the sizes of data sets and imbalance ratios inputted by the user, the algorithm will process the data using a combination of oversampling for minority class and distribution sensitive undersampling for majority class. This paper proposes a new distribution-sensitive resampling algorithm. According to the distribution of samples, the majority and minority samples are divided into different categories, and different processing methods are adopted for the samples with different distribution characteristics The algorithm makes the sample set after resampling keep the same characteristics with the original data set as much as possible. The algorithm emphasizes the importance of boundary samples, that is, the samples at the boundary of majority classes and minority classes are more important than other samples for learning algorithm. The boundary minority samples will be copied, and the boundary majority samples will be reserved. Real-world application is introduced in the end, which shows that compared with the existing imbalanced data resampling algorithms, this algorithm improves the accuracy of learning algorithm, especially for the accuracy and recall rate of minority class.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
Bennin KE, Keung J, et al (2017) MAHAKIL: diversity based oversampling approach to alleviate the class imbalance issue in software defect prediction[J].IEEE Trans Softw Eng (99):1–1
Bunkhumpornpat C, Sinapiromsaran K (2015) CORE: core-based synthetic minority over-sampling and borderline majority under-sampling technique.[M]. Inderscience Publishers
Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009) Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling Technique for Handling the Class Imbalanced Problem[C]// Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. Springer-Verlag, p 475–482
Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2012) DBSMOTE: density-based synthetic minority over-sampling technique[J]. Appl Intell 36(3):664–684
Chawla NV, Bowyer KW, Hall LO et al (2002) SMOTE: synthetic minority over-sampling technique[J]. J Artif Intell Res 16(1):321–357
Chen J, Tian Z, Cui X, Yin L, Wang X Trust architecture and reputation evaluation for internet of things. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-018-0887-z
Cheng H, Xiong N, Vasilakos AV et al (2012) Nodes organization for channel assignment with topology preservation in multi-radio wireless mesh networks[J]. Ad Hoc Netw 10(5):760–773
Cheng H, Su Z, Xiong N et al (2016) Energy-efficient node scheduling algorithms for wireless sensor networks using Markov random field model[J]. Inf Sci 329(C):461–477
Cheng W, Zhao M, Xiong N, Chui KT (2017) Non-convex sparse and low-rank based robust subspace segmentation for data mining. Sensors 17(7):1633
Cheng H, Feng D, Shi X et al (2018) Data quality analysis and cleaning strategy for wireless sensor networks[J]. Eurasip J Wirel Commun Netw 2018(1):61
Fang W, Li Y, Zhang H, Xiong N, Lai J, Vasilakos AV (2008) On the throughput-energy tradeoff for data transmission between cloud and mobile devices. Inf Sci 283:79–93
Gui J, Hui L, Xiong NX (2017) A game-based localized multi-objective topology control scheme in heterogeneous wireless networks. IEEE Access 5(1):2396–2416
Guo W, Xiong N, Vasilakos AV, Chen G, Yu C Distributed k–connected fault–tolerant topology control algorithms with PSO in future autonomic sensor systems. IJSNet 12(1):53–62
Han H, Wang WY, Mao BH (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning [A]. ICIC 3644(5):878–887
Jiang X, Fang Z, Xiong NN, Gao Y, Huang B, Zhang J, Yu L, Harrington P (2018) Data fusion-based multi-object tracking for unconstrained visual sensor networks. IEEE Access 6:13716–13728
Jiang F, Fu Y, Gupta BB, Lou F, Rho S, Meng F, Tian Z. Deep Learning based Multi-channel intelligent attack detection for Data Security. IEEE Transactions on Sustainable Computing
Li X, Liu W, Xie M, Liu A, Zhao M, Xiong NN, Zhao M, Dai W (2018) Differentiated data aggregation routing scheme for energy conserving and delay sensitive wireless sensor networks. Sensors 18(7):2349
Lin B, Guo W, Xiong N, Chen G, Vasilakos AV, Zhang H (2016) A pretreatment workflow scheduling approach for big data applications in multi-cloud environments. IEEE Trans Netw Serv Manag 13(3):581–594
Lin WC, Tsai CF, Hu YH, et al (2017) Clustering-based undersampling in class-imbalanced data[J]. Inf Sci
Lin C, Xiong N, Park JH, Kim T Dynamic power management in new architecture of wireless sensor networks. Int J Commun Syst 22(6):671–693
Liu RW, Shi L, Yu SC-H, Xiong N, Wang D (2017) Reconstruction of Undersampled big dynamic MRI data using non-convex low-rank and sparsity constraints. Sensors 17(3):509
Liu X, Dong M, Liu Y, Liu A, Xiong NN (2018) Construction low complexity and low delay CDS for big data code dissemination. Complexity 2018:5429546:1–5429546:19
Liu X, Zhao S, Liu A, Xiong N. Knowledge-aware proactive nodes selection approach for energy management in Internet of Things, AV VasilakosFuture generation computer systems
Mathews L, Hari S (2018) Learning from imbalanced data[M]. Encyclopedia of Information Science and Technology
Ofek N, Rokach L, Stern R (2017) Fast-CBUS: A fast clustering-based undersampling method for addressing the class imbalance problem[J]. Neurocomputing
Sun Z, Tao R, Xiong N, Pan X (2018, 2018) CS-PLM: compressive sensing data gathering algorithm based on packet loss matching in sensor networks. Wirel Commun Mob Comput:5131949:1–5131949:12
Tan Q, Gao Y, Shi J, Wang X, Fang B, Tian Z (2018) Towards a comprehensive insight into the eclipse attacks of tor hidden services. IEEE Internet Things J
Wang Y, Chen K, Yu J, Xiong NX, Leung H, Zhou H, Zhu L (2017) Dynamic propagation characteristics estimation and tracking based on an EM-EKF algorithm in time-variant MIMO channel. Inf Sci 408:70–83
Wang Z, Li T, Xiong N, Pan Y A novel dynamic network data replication scheme based on historical access record and proactive deletion. J Supercomput 62(1):227–250
Weiss GM (2004) Mining with rarity: a unifying framework[J]. Sigkdd EN 6(1):7–19
Wu P-f, Xiao F, Sha C, Huang H-p, Wang R-c, Xiong N (2017) Node scheduling strategies for achieving full-view area coverage in camera sensor networks. Sensors 17(6):1303. https://doi.org/10.3390/s17061303
Xia Z, Xiong NN, Vasilakos AV, Sun X (2017) EPCBIR: an efficient and privacy-preserving content-based image retrieval scheme in cloud computing. Inf Sci 387:195–204
Xia Z, Ma X, Shen Z, Sun X, Xiong NN, Jeon B (2018) Secure image LBP feature extraction in cloud-based smart campus. IEEE Access 6(1):30392–30401. https://doi.org/10.1109/ACCESS.2018.2845456
Xiong N, Vasilakos AV, Yang LT, Song L, Pan Y, Kannan R, Li Y (2009) Comparative analysis of quality of service and memory usage for adaptive failure detectors in healthcare systems. IEEE J Sel Areas Commun 27(4):495–509
Xiong N, Jia X, Yang LT, Vasilakos AV, Li Y, Pan Y (2010) A distributed efficient flow control scheme for multirate multicast networks. IEEE Trans Parallel Distrib Syst 21(9):1254–1266
Xiong N, Vasilakosb AV, Yang LT, Wang C, Kannane R, Chang C, Pan Y (2010) A novel self-tuning feedback controller for active queue management supporting TCP flows. Inf Sci 180(11):2249–2263
Xiong N, Vasilakos AV, Wu J, Yang YR, Rindos A, Zhou Y, Song WZ (2012) A self-tuning failure detection scheme for cloud computing service, 2012 IEEE 26th Parallel & Distributed Processing Symposium (IPDPS)
Xiong N, Liu RW, Liang M, Wu D, Liu Z, Wu H (2017) Effective alternating direction optimization methods for sparsity-constrained blind image Deblurring. Sensors 17(1):174. https://doi.org/10.3390/s17010174
Zeng Y, Sreenan CJ, Xiong N, Yang LT, Park JH Connectivity and coverage maintenance in wireless sensor networks. J Supercomput 52(1):23–46
Zhang YP, Zhang LN, Wang YC (2010) Cluster-based majority under-sampling approaches for class imbalance learning[C]// IEEE International Conference on Information and Financial Engineering. IEEE 400–404
Zheng H, Guo W, Xiong N (2017) A kernel-based compressive sensing approach for mobile data gathering in wireless sensor network systems. IEEE Trans Syst Man Cybern Syst 8(99):1–13. https://doi.org/10.1109/TSMC.2017.2734886
Zhong P, Li Y-T, Liu W, Duan G, Chen Y-W, Xiong NN (2017) Joint mobile data collection and wireless energy transfer in wireless rechargeable sensor networks. Sensors 17(8):1881
Zhou Y, Zhang D, Xiong N (2017) Post-cloud computing paradigms: a survey and comparison. Tsinghua Sci Technol 22(6):714–732
This work is partially funded by the National Natural Science Foundation of China under The National key research and development plan under Grant No. 2018YFB0803504 and Grant No. 61871140, U1636215, 61872100 and No.61572153.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Han, W., Tian, Z., Huang, Z. et al. Bidirectional self-adaptive resampling in internet of things big data learning. Multimed Tools Appl 78, 30111–30126 (2019). https://doi.org/10.1007/s11042-018-6938-9
- Imbalanced big data
- Data learning