Bidirectional self-adaptive resampling in internet of things big data learning

  • Weihong Han
  • Zhihong TianEmail author
  • Zizhong Huang
  • Shudong Li
  • Yan Jia


This paper focuses on the problem of low learning algorithm accuracy caused by serious imbalance of big data in Internet of Things, and proposes a bidirectional self-adaptive resampling algorithm for imbalanced big data. Based on the sizes of data sets and imbalance ratios inputted by the user, the algorithm will process the data using a combination of oversampling for minority class and distribution sensitive undersampling for majority class. This paper proposes a new distribution-sensitive resampling algorithm. According to the distribution of samples, the majority and minority samples are divided into different categories, and different processing methods are adopted for the samples with different distribution characteristics The algorithm makes the sample set after resampling keep the same characteristics with the original data set as much as possible. The algorithm emphasizes the importance of boundary samples, that is, the samples at the boundary of majority classes and minority classes are more important than other samples for learning algorithm. The boundary minority samples will be copied, and the boundary majority samples will be reserved. Real-world application is introduced in the end, which shows that compared with the existing imbalanced data resampling algorithms, this algorithm improves the accuracy of learning algorithm, especially for the accuracy and recall rate of minority class.


Imbalanced big data Resampling Oversampling Undersampling Data learning 



This work is partially funded by the National Natural Science Foundation of China under The National key research and development plan under Grant No. 2018YFB0803504 and Grant No. 61871140, U1636215, 61872100 and No.61572153.


  1. 1.
    Bennin KE, Keung J, et al (2017) MAHAKIL: diversity based oversampling approach to alleviate the class imbalance issue in software defect prediction[J].IEEE Trans Softw Eng (99):1–1Google Scholar
  2. 2.
    Bunkhumpornpat C, Sinapiromsaran K (2015) CORE: core-based synthetic minority over-sampling and borderline majority under-sampling technique.[M]. Inderscience PublishersGoogle Scholar
  3. 3.
    Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009) Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling Technique for Handling the Class Imbalanced Problem[C]// Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. Springer-Verlag, p 475–482Google Scholar
  4. 4.
    Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2012) DBSMOTE: density-based synthetic minority over-sampling technique[J]. Appl Intell 36(3):664–684CrossRefGoogle Scholar
  5. 5.
    Chawla NV, Bowyer KW, Hall LO et al (2002) SMOTE: synthetic minority over-sampling technique[J]. J Artif Intell Res 16(1):321–357CrossRefGoogle Scholar
  6. 6.
    Chen J, Tian Z, Cui X, Yin L, Wang X Trust architecture and reputation evaluation for internet of things. J Ambient Intell Humaniz Comput.
  7. 7.
    Cheng H, Xiong N, Vasilakos AV et al (2012) Nodes organization for channel assignment with topology preservation in multi-radio wireless mesh networks[J]. Ad Hoc Netw 10(5):760–773CrossRefGoogle Scholar
  8. 8.
    Cheng H, Su Z, Xiong N et al (2016) Energy-efficient node scheduling algorithms for wireless sensor networks using Markov random field model[J]. Inf Sci 329(C):461–477CrossRefGoogle Scholar
  9. 9.
    Cheng W, Zhao M, Xiong N, Chui KT (2017) Non-convex sparse and low-rank based robust subspace segmentation for data mining. Sensors 17(7):1633CrossRefGoogle Scholar
  10. 10.
    Cheng H, Feng D, Shi X et al (2018) Data quality analysis and cleaning strategy for wireless sensor networks[J]. Eurasip J Wirel Commun Netw 2018(1):61CrossRefGoogle Scholar
  11. 11.
    Fang W, Li Y, Zhang H, Xiong N, Lai J, Vasilakos AV (2008) On the throughput-energy tradeoff for data transmission between cloud and mobile devices. Inf Sci 283:79–93CrossRefGoogle Scholar
  12. 12.
    Gui J, Hui L, Xiong NX (2017) A game-based localized multi-objective topology control scheme in heterogeneous wireless networks. IEEE Access 5(1):2396–2416CrossRefGoogle Scholar
  13. 13.
    Guo W, Xiong N, Vasilakos AV, Chen G, Yu C Distributed k–connected fault–tolerant topology control algorithms with PSO in future autonomic sensor systems. IJSNet 12(1):53–62CrossRefGoogle Scholar
  14. 14.
    Han H, Wang WY, Mao BH (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning [A]. ICIC 3644(5):878–887Google Scholar
  15. 15.
    Jiang X, Fang Z, Xiong NN, Gao Y, Huang B, Zhang J, Yu L, Harrington P (2018) Data fusion-based multi-object tracking for unconstrained visual sensor networks. IEEE Access 6:13716–13728CrossRefGoogle Scholar
  16. 16.
    Jiang F, Fu Y, Gupta BB, Lou F, Rho S, Meng F, Tian Z. Deep Learning based Multi-channel intelligent attack detection for Data Security. IEEE Transactions on Sustainable ComputingGoogle Scholar
  17. 17.
    Li X, Liu W, Xie M, Liu A, Zhao M, Xiong NN, Zhao M, Dai W (2018) Differentiated data aggregation routing scheme for energy conserving and delay sensitive wireless sensor networks. Sensors 18(7):2349CrossRefGoogle Scholar
  18. 18.
    Lin B, Guo W, Xiong N, Chen G, Vasilakos AV, Zhang H (2016) A pretreatment workflow scheduling approach for big data applications in multi-cloud environments. IEEE Trans Netw Serv Manag 13(3):581–594CrossRefGoogle Scholar
  19. 19.
    Lin WC, Tsai CF, Hu YH, et al (2017) Clustering-based undersampling in class-imbalanced data[J]. Inf SciGoogle Scholar
  20. 20.
    Lin C, Xiong N, Park JH, Kim T Dynamic power management in new architecture of wireless sensor networks. Int J Commun Syst 22(6):671–693CrossRefGoogle Scholar
  21. 21.
    Liu RW, Shi L, Yu SC-H, Xiong N, Wang D (2017) Reconstruction of Undersampled big dynamic MRI data using non-convex low-rank and sparsity constraints. Sensors 17(3):509CrossRefGoogle Scholar
  22. 22.
    Liu X, Dong M, Liu Y, Liu A, Xiong NN (2018) Construction low complexity and low delay CDS for big data code dissemination. Complexity 2018:5429546:1–5429546:19Google Scholar
  23. 23.
    Liu X, Zhao S, Liu A, Xiong N. Knowledge-aware proactive nodes selection approach for energy management in Internet of Things, AV VasilakosFuture generation computer systemsGoogle Scholar
  24. 24.
    Mathews L, Hari S (2018) Learning from imbalanced data[M]. Encyclopedia of Information Science and TechnologyGoogle Scholar
  25. 25.
    Ofek N, Rokach L, Stern R (2017) Fast-CBUS: A fast clustering-based undersampling method for addressing the class imbalance problem[J]. NeurocomputingGoogle Scholar
  26. 26.
    Sun Z, Tao R, Xiong N, Pan X (2018, 2018) CS-PLM: compressive sensing data gathering algorithm based on packet loss matching in sensor networks. Wirel Commun Mob Comput:5131949:1–5131949:12Google Scholar
  27. 27.
    Tan Q, Gao Y, Shi J, Wang X, Fang B, Tian Z (2018) Towards a comprehensive insight into the eclipse attacks of tor hidden services. IEEE Internet Things JGoogle Scholar
  28. 28.
    Wang Y, Chen K, Yu J, Xiong NX, Leung H, Zhou H, Zhu L (2017) Dynamic propagation characteristics estimation and tracking based on an EM-EKF algorithm in time-variant MIMO channel. Inf Sci 408:70–83CrossRefGoogle Scholar
  29. 29.
    Wang Z, Li T, Xiong N, Pan Y A novel dynamic network data replication scheme based on historical access record and proactive deletion. J Supercomput 62(1):227–250CrossRefGoogle Scholar
  30. 30.
    Weiss GM (2004) Mining with rarity: a unifying framework[J]. Sigkdd EN 6(1):7–19CrossRefGoogle Scholar
  31. 31.
    Wu P-f, Xiao F, Sha C, Huang H-p, Wang R-c, Xiong N (2017) Node scheduling strategies for achieving full-view area coverage in camera sensor networks. Sensors 17(6):1303. CrossRefGoogle Scholar
  32. 32.
    Xia Z, Xiong NN, Vasilakos AV, Sun X (2017) EPCBIR: an efficient and privacy-preserving content-based image retrieval scheme in cloud computing. Inf Sci 387:195–204CrossRefGoogle Scholar
  33. 33.
    Xia Z, Ma X, Shen Z, Sun X, Xiong NN, Jeon B (2018) Secure image LBP feature extraction in cloud-based smart campus. IEEE Access 6(1):30392–30401. CrossRefGoogle Scholar
  34. 34.
    Xiong N, Vasilakos AV, Yang LT, Song L, Pan Y, Kannan R, Li Y (2009) Comparative analysis of quality of service and memory usage for adaptive failure detectors in healthcare systems. IEEE J Sel Areas Commun 27(4):495–509CrossRefGoogle Scholar
  35. 35.
    Xiong N, Jia X, Yang LT, Vasilakos AV, Li Y, Pan Y (2010) A distributed efficient flow control scheme for multirate multicast networks. IEEE Trans Parallel Distrib Syst 21(9):1254–1266CrossRefGoogle Scholar
  36. 36.
    Xiong N, Vasilakosb AV, Yang LT, Wang C, Kannane R, Chang C, Pan Y (2010) A novel self-tuning feedback controller for active queue management supporting TCP flows. Inf Sci 180(11):2249–2263MathSciNetCrossRefGoogle Scholar
  37. 37.
    Xiong N, Vasilakos AV, Wu J, Yang YR, Rindos A, Zhou Y, Song WZ (2012) A self-tuning failure detection scheme for cloud computing service, 2012 IEEE 26th Parallel & Distributed Processing Symposium (IPDPS)Google Scholar
  38. 38.
    Xiong N, Liu RW, Liang M, Wu D, Liu Z, Wu H (2017) Effective alternating direction optimization methods for sparsity-constrained blind image Deblurring. Sensors 17(1):174. CrossRefGoogle Scholar
  39. 39.
    Zeng Y, Sreenan CJ, Xiong N, Yang LT, Park JH Connectivity and coverage maintenance in wireless sensor networks. J Supercomput 52(1):23–46CrossRefGoogle Scholar
  40. 40.
    Zhang YP, Zhang LN, Wang YC (2010) Cluster-based majority under-sampling approaches for class imbalance learning[C]// IEEE International Conference on Information and Financial Engineering. IEEE 400–404Google Scholar
  41. 41.
    Zheng H, Guo W, Xiong N (2017) A kernel-based compressive sensing approach for mobile data gathering in wireless sensor network systems. IEEE Trans Syst Man Cybern Syst 8(99):1–13. CrossRefGoogle Scholar
  42. 42.
    Zhong P, Li Y-T, Liu W, Duan G, Chen Y-W, Xiong NN (2017) Joint mobile data collection and wireless energy transfer in wireless rechargeable sensor networks. Sensors 17(8):1881CrossRefGoogle Scholar
  43. 43.
    Zhou Y, Zhang D, Xiong N (2017) Post-cloud computing paradigms: a survey and comparison. Tsinghua Sci Technol 22(6):714–732CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Cyberspace Institute of Advanced TechnologyGuangzhou UniversityGuangzhouChina
  2. 2.Computer SchoolNational University of Defense TechnologyChangshaChina

Personalised recommendations