Bidirectional self-adaptive resampling in internet of things big data learning

Abstract

This paper focuses on the problem of low learning algorithm accuracy caused by serious imbalance of big data in Internet of Things, and proposes a bidirectional self-adaptive resampling algorithm for imbalanced big data. Based on the sizes of data sets and imbalance ratios inputted by the user, the algorithm will process the data using a combination of oversampling for minority class and distribution sensitive undersampling for majority class. This paper proposes a new distribution-sensitive resampling algorithm. According to the distribution of samples, the majority and minority samples are divided into different categories, and different processing methods are adopted for the samples with different distribution characteristics The algorithm makes the sample set after resampling keep the same characteristics with the original data set as much as possible. The algorithm emphasizes the importance of boundary samples, that is, the samples at the boundary of majority classes and minority classes are more important than other samples for learning algorithm. The boundary minority samples will be copied, and the boundary majority samples will be reserved. Real-world application is introduced in the end, which shows that compared with the existing imbalanced data resampling algorithms, this algorithm improves the accuracy of learning algorithm, especially for the accuracy and recall rate of minority class.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3

References

  1. 1.

    Bennin KE, Keung J, et al (2017) MAHAKIL: diversity based oversampling approach to alleviate the class imbalance issue in software defect prediction[J].IEEE Trans Softw Eng (99):1–1

  2. 2.

    Bunkhumpornpat C, Sinapiromsaran K (2015) CORE: core-based synthetic minority over-sampling and borderline majority under-sampling technique.[M]. Inderscience Publishers

  3. 3.

    Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009) Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling Technique for Handling the Class Imbalanced Problem[C]// Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. Springer-Verlag, p 475–482

  4. 4.

    Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2012) DBSMOTE: density-based synthetic minority over-sampling technique[J]. Appl Intell 36(3):664–684

    Article  Google Scholar 

  5. 5.

    Chawla NV, Bowyer KW, Hall LO et al (2002) SMOTE: synthetic minority over-sampling technique[J]. J Artif Intell Res 16(1):321–357

    MATH  Article  Google Scholar 

  6. 6.

    Chen J, Tian Z, Cui X, Yin L, Wang X Trust architecture and reputation evaluation for internet of things. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-018-0887-z

    Article  Google Scholar 

  7. 7.

    Cheng H, Xiong N, Vasilakos AV et al (2012) Nodes organization for channel assignment with topology preservation in multi-radio wireless mesh networks[J]. Ad Hoc Netw 10(5):760–773

    Article  Google Scholar 

  8. 8.

    Cheng H, Su Z, Xiong N et al (2016) Energy-efficient node scheduling algorithms for wireless sensor networks using Markov random field model[J]. Inf Sci 329(C):461–477

    MATH  Article  Google Scholar 

  9. 9.

    Cheng W, Zhao M, Xiong N, Chui KT (2017) Non-convex sparse and low-rank based robust subspace segmentation for data mining. Sensors 17(7):1633

    Article  Google Scholar 

  10. 10.

    Cheng H, Feng D, Shi X et al (2018) Data quality analysis and cleaning strategy for wireless sensor networks[J]. Eurasip J Wirel Commun Netw 2018(1):61

    Article  Google Scholar 

  11. 11.

    Fang W, Li Y, Zhang H, Xiong N, Lai J, Vasilakos AV (2008) On the throughput-energy tradeoff for data transmission between cloud and mobile devices. Inf Sci 283:79–93

    Article  Google Scholar 

  12. 12.

    Gui J, Hui L, Xiong NX (2017) A game-based localized multi-objective topology control scheme in heterogeneous wireless networks. IEEE Access 5(1):2396–2416

    Article  Google Scholar 

  13. 13.

    Guo W, Xiong N, Vasilakos AV, Chen G, Yu C Distributed k–connected fault–tolerant topology control algorithms with PSO in future autonomic sensor systems. IJSNet 12(1):53–62

    Article  Google Scholar 

  14. 14.

    Han H, Wang WY, Mao BH (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning [A]. ICIC 3644(5):878–887

    Google Scholar 

  15. 15.

    Jiang X, Fang Z, Xiong NN, Gao Y, Huang B, Zhang J, Yu L, Harrington P (2018) Data fusion-based multi-object tracking for unconstrained visual sensor networks. IEEE Access 6:13716–13728

    Article  Google Scholar 

  16. 16.

    Jiang F, Fu Y, Gupta BB, Lou F, Rho S, Meng F, Tian Z. Deep Learning based Multi-channel intelligent attack detection for Data Security. IEEE Transactions on Sustainable Computing

  17. 17.

    Li X, Liu W, Xie M, Liu A, Zhao M, Xiong NN, Zhao M, Dai W (2018) Differentiated data aggregation routing scheme for energy conserving and delay sensitive wireless sensor networks. Sensors 18(7):2349

    Article  Google Scholar 

  18. 18.

    Lin B, Guo W, Xiong N, Chen G, Vasilakos AV, Zhang H (2016) A pretreatment workflow scheduling approach for big data applications in multi-cloud environments. IEEE Trans Netw Serv Manag 13(3):581–594

    Article  Google Scholar 

  19. 19.

    Lin WC, Tsai CF, Hu YH, et al (2017) Clustering-based undersampling in class-imbalanced data[J]. Inf Sci

  20. 20.

    Lin C, Xiong N, Park JH, Kim T Dynamic power management in new architecture of wireless sensor networks. Int J Commun Syst 22(6):671–693

    Article  Google Scholar 

  21. 21.

    Liu RW, Shi L, Yu SC-H, Xiong N, Wang D (2017) Reconstruction of Undersampled big dynamic MRI data using non-convex low-rank and sparsity constraints. Sensors 17(3):509

    Article  Google Scholar 

  22. 22.

    Liu X, Dong M, Liu Y, Liu A, Xiong NN (2018) Construction low complexity and low delay CDS for big data code dissemination. Complexity 2018:5429546:1–5429546:19

    Google Scholar 

  23. 23.

    Liu X, Zhao S, Liu A, Xiong N. Knowledge-aware proactive nodes selection approach for energy management in Internet of Things, AV VasilakosFuture generation computer systems

  24. 24.

    Mathews L, Hari S (2018) Learning from imbalanced data[M]. Encyclopedia of Information Science and Technology

  25. 25.

    Ofek N, Rokach L, Stern R (2017) Fast-CBUS: A fast clustering-based undersampling method for addressing the class imbalance problem[J]. Neurocomputing

  26. 26.

    Sun Z, Tao R, Xiong N, Pan X (2018, 2018) CS-PLM: compressive sensing data gathering algorithm based on packet loss matching in sensor networks. Wirel Commun Mob Comput:5131949:1–5131949:12

    Google Scholar 

  27. 27.

    Tan Q, Gao Y, Shi J, Wang X, Fang B, Tian Z (2018) Towards a comprehensive insight into the eclipse attacks of tor hidden services. IEEE Internet Things J

  28. 28.

    Wang Y, Chen K, Yu J, Xiong NX, Leung H, Zhou H, Zhu L (2017) Dynamic propagation characteristics estimation and tracking based on an EM-EKF algorithm in time-variant MIMO channel. Inf Sci 408:70–83

    Article  Google Scholar 

  29. 29.

    Wang Z, Li T, Xiong N, Pan Y A novel dynamic network data replication scheme based on historical access record and proactive deletion. J Supercomput 62(1):227–250

    Article  Google Scholar 

  30. 30.

    Weiss GM (2004) Mining with rarity: a unifying framework[J]. Sigkdd EN 6(1):7–19

    Article  Google Scholar 

  31. 31.

    Wu P-f, Xiao F, Sha C, Huang H-p, Wang R-c, Xiong N (2017) Node scheduling strategies for achieving full-view area coverage in camera sensor networks. Sensors 17(6):1303. https://doi.org/10.3390/s17061303

    Article  Google Scholar 

  32. 32.

    Xia Z, Xiong NN, Vasilakos AV, Sun X (2017) EPCBIR: an efficient and privacy-preserving content-based image retrieval scheme in cloud computing. Inf Sci 387:195–204

    Article  Google Scholar 

  33. 33.

    Xia Z, Ma X, Shen Z, Sun X, Xiong NN, Jeon B (2018) Secure image LBP feature extraction in cloud-based smart campus. IEEE Access 6(1):30392–30401. https://doi.org/10.1109/ACCESS.2018.2845456

    Article  Google Scholar 

  34. 34.

    Xiong N, Vasilakos AV, Yang LT, Song L, Pan Y, Kannan R, Li Y (2009) Comparative analysis of quality of service and memory usage for adaptive failure detectors in healthcare systems. IEEE J Sel Areas Commun 27(4):495–509

    Article  Google Scholar 

  35. 35.

    Xiong N, Jia X, Yang LT, Vasilakos AV, Li Y, Pan Y (2010) A distributed efficient flow control scheme for multirate multicast networks. IEEE Trans Parallel Distrib Syst 21(9):1254–1266

    Article  Google Scholar 

  36. 36.

    Xiong N, Vasilakosb AV, Yang LT, Wang C, Kannane R, Chang C, Pan Y (2010) A novel self-tuning feedback controller for active queue management supporting TCP flows. Inf Sci 180(11):2249–2263

    MathSciNet  Article  Google Scholar 

  37. 37.

    Xiong N, Vasilakos AV, Wu J, Yang YR, Rindos A, Zhou Y, Song WZ (2012) A self-tuning failure detection scheme for cloud computing service, 2012 IEEE 26th Parallel & Distributed Processing Symposium (IPDPS)

  38. 38.

    Xiong N, Liu RW, Liang M, Wu D, Liu Z, Wu H (2017) Effective alternating direction optimization methods for sparsity-constrained blind image Deblurring. Sensors 17(1):174. https://doi.org/10.3390/s17010174

    Article  Google Scholar 

  39. 39.

    Zeng Y, Sreenan CJ, Xiong N, Yang LT, Park JH Connectivity and coverage maintenance in wireless sensor networks. J Supercomput 52(1):23–46

    Article  Google Scholar 

  40. 40.

    Zhang YP, Zhang LN, Wang YC (2010) Cluster-based majority under-sampling approaches for class imbalance learning[C]// IEEE International Conference on Information and Financial Engineering. IEEE 400–404

  41. 41.

    Zheng H, Guo W, Xiong N (2017) A kernel-based compressive sensing approach for mobile data gathering in wireless sensor network systems. IEEE Trans Syst Man Cybern Syst 8(99):1–13. https://doi.org/10.1109/TSMC.2017.2734886

    Article  Google Scholar 

  42. 42.

    Zhong P, Li Y-T, Liu W, Duan G, Chen Y-W, Xiong NN (2017) Joint mobile data collection and wireless energy transfer in wireless rechargeable sensor networks. Sensors 17(8):1881

    Article  Google Scholar 

  43. 43.

    Zhou Y, Zhang D, Xiong N (2017) Post-cloud computing paradigms: a survey and comparison. Tsinghua Sci Technol 22(6):714–732

    Article  Google Scholar 

Download references

Acknowledgments

This work is partially funded by the National Natural Science Foundation of China under The National key research and development plan under Grant No. 2018YFB0803504 and Grant No. 61871140, U1636215, 61872100 and No.61572153.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Zhihong Tian.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Han, W., Tian, Z., Huang, Z. et al. Bidirectional self-adaptive resampling in internet of things big data learning. Multimed Tools Appl 78, 30111–30126 (2019). https://doi.org/10.1007/s11042-018-6938-9

Download citation

Keywords

  • Imbalanced big data
  • Resampling
  • Oversampling
  • Undersampling
  • Data learning