Skip to main content
Log in

Learnability and robustness of shallow neural networks learned by a performance-driven BP and a variant of PSO for edge decision-making

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

It may not be easy to implement complex AI models in edge devices without strong computing capacity (e.g. GPU). The Universal Approximation Theorem states that a shallow neural network (SNN) can represent any nonlinear function. In this paper, we focus on the learnability and robustness of SNNs, obtained by a greedy tight force heuristic algorithm [a Performance Driven Back-Propagation (PDBP)] and a loose force meta-heuristic algorithm [a variant of particle swarm optimization (VPSO)]. From the engineering prospective, all sensors are well justified for a specific task. Hence, all sensor readings should be strongly correlated to the target, and the structure of an SNN should depend on the dimensions of a problem space. The key findings of the research are summarized as follows: (1) The number of hidden neurons of an SNN depends on the nonlinearity of the training data, and the number of hidden neurons up to the dimension number of a problem space could be enough; (2) The learnability of SNNs, produced by error-driven PDBP, is always better than that of SNNs, optimized by error-driven VPSOs; (3) The performances of SNNs, obtained by PDBPs and VPSOs, do not change much for different training rates; and (4) Comparing with other classic machine learning algorithms, such as C4.5, NB and NN in literature, the SNNs, obtained by accuracy-driven PDBPs, win for all tested data sets, and the improvement percentage is up to 32.86%. Hence, the research could provide valued guidance for the implementation of edge intelligence.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Karpathy A, Toderici G, Shetty S et al (2014) Large-scale video classification with convolutional neural networks. IEEE conference on computer vision and pattern recognition (CVPR), Columbus, OH, USA, 23–28 June 2014. https://doi.org/10.1109/CVPR.2014.223

  2. Wang Z, Ren J, Zhang D et al (2018) A deep-learning based feature hybrid framework for spatiotemporal saliency detection inside videos. Neurocomputing 287:68–83

    Article  Google Scholar 

  3. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848. https://doi.org/10.1109/tpami.2017.2699184 PMID: 28463186

  4. Ren S, He K, Girshick RB, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  Google Scholar 

  5. Maccagno A, Mastropietro A, Mazziotta U, Scarpiniti M, Lee Y, Uncini A (2019) A CNN approach for audio classification in construction sites. The 29th Italian workshop on neural networks (WIRN 2019), Vietri sul Mare (SA), Italy, 12–14 June 2019

  6. Zhang W, Zhai M, Huang Z, Liu C, Li W, Cao Y (2019) Speech RTE–E, with deep multipath convolutional neural networks. In: Yu H, Liu J, Liu L, Ju Z, Liu Y, Zhou D (eds) Intelligent robotics and applications. ICIRA, (2019) Lecture Notes in Computer Science, vol 11745. Springer, Cham

  7. Blum AL, Rivest RL (1992) Training a 3-node neural network is NP-complete. Neural Netw 5:117–127

    Article  Google Scholar 

  8. Cybenkot G (1989) Approximation by superpositions of a sigmoidal function. Math Control Signals Syst 2:303–314

    Article  MathSciNet  Google Scholar 

  9. Hornik K (1991) Approximation capabilities of multilayer feedforward networks. Neural Netw 4(2):251–257

    Article  MathSciNet  Google Scholar 

  10. Leshno M, Lin VY, Pinkus A, Schocken S (1993) Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Netw 6(6):861–867

    Article  Google Scholar 

  11. Tkachenko R, Izonin I (2018) Model and principles for the implementation of neural-like structures based on geometric data transformations. The first international conference on computer science, engineering and education applications (ICCSEEA2018), Kiev, Ukraine, 18–20 January 2018

  12. Huang G-B, Zhu Q-Y, Siew C-K extreme learning machine: theory and applications. https://doi.org/10.1016/j.neucom.2005.12.126

  13. Tkachenko R, Izonin I, Vitynskyi OP, Lotoshynska N, Pavlyuk O (2018) Development of the non-iterative supervised learning predictor based on the Ito decomposition and SGTM neural-like structure for managing medical insurance costs. Data 3(4):46. https://doi.org/10.3390/data3040046

    Article  Google Scholar 

  14. Izonin I, Tkachenko R, KryvinskaN N, Tkachenko P, Greguš M (2019) Multiple linear regression based on coefficients identification using non-iterative SGTM neural-like structure. In Book: Advances in computational intelligence. https://doi.org/10.1007/978-3-030-20521-8_39

  15. Sengupta S, Basak S, Peters RA (2019) Particle swarm optimization: a survey of historical and recent developments with hybridization perspectives. Mach Learn Knowl Extr 1(1):157–191. https://doi.org/10.3390/make1010010

    Article  Google Scholar 

  16. Wolberg WH, Mangasarian OL (1990) Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proc Natl Acad Sci 87:9193–9196

    Article  Google Scholar 

  17. Almeida TA, Hidalgo JMG, Yamakami A (2011) Contributions to the study of SMS spam filtering:new collection and results. In: DocEng-11, Mountain View, California, USA, 19–22 September 2011

  18. Dua D, Graff C (2019) UCI machine learning repository [https://archive.ics.uci.edu/ml/datasets.php]. University of California, School of Information and Computer Science, Irvine, CA

  19. Anthony M, Bartlett PL (1999) Neural network learning: theoretical foundations. Cambridge University Press

  20. Sharma RA, Goyal N, Choudhury M, Netrapalli P (2018) Learnability of learned neural networks. Sixth international conference on learning representations (ICLR), Vancouver Convention Center, Vancouver, BC, Canada April 30–May 3 (2018)

  21. Kim B, Lee SM, Seo JK (2019) Improving learnability of neural networks: adding supplementary axes to disentangle data representation. https://arxiv.org/abs/1902.04205

  22. Lee S-G, Kim J, Jung H-J, Choe Y (2019) Comparing sample-wise learnability across deep neural network models. The thirty-third aaai conference on artificial intelligence (AAAI-19), Hilton Hawaiian Village, Honolulu, Hawaii, USA,27 Jan.–Feb. 2019

  23. Zhang Y, Lee JD, Wainwright MJ, Jordan MI (2017) On the Learnability of fully-connected neural networks. Proceedings of the 20th international conference on artificial intelligence and statistics, PMLR 54, 2017, pp 83–91

  24. Arora S, Bhaskara A, Ge R, Ma T (2014) Provable bounds for learning some deep representations. In: Proceedings of the 31st international conference on machine learning, PMLR, 32(1), 2014, pp 584–592

  25. Mohammad A, Masouros C, Andreopoulos Y (2020) Complexity-scalable neural-network-based MIMO detection with learnable weight scaling. IEEE Trans Commun 68(10):6101–6113. https://doi.org/10.1109/TCOMM.2020.3007622

    Article  Google Scholar 

  26. Zhong K, Song Z, Jain P, Bartlett PL, Dhillon IS (2017) Recovery guarantees for one-hidden-layer neural networks. Thirty-fourth international conference on machine learning (ICML2017). arXiv:1706.03175 [cs.LG]

  27. Ge R, Lee JD, Ma T (2018) Learning one-hidden-layer neural networks with landscape design. Sixth international conference on learning representations (ICLR), Vancouver Convention Center, Vancouver, BC, Canada April 30–May 3, 2018

  28. Li Y, Yuan Y (2017) Convergence analysis of two-layer neural networks with ReLU activation. NIPS’17: Proceedings of the 31st international conference on neural information processing systems, Dec. 2017, pp 597–607

  29. Song L, Vempala SS, Xie B (2017) On the complexity of learning neural networks, NIPS 2017

  30. Janzamin M, Sedghi H, Anandkumar A (2016) Beating the perils of non-convexity: guaranteed training of neural networks using tensor methods. arXiv:1506.08473v3

  31. Kajornrit J (2015) A comparative study of optimization methods for improving artificial neural network performance. The 7th international conference on information technology and electrical engineering (ICITEE). Chiang Mai, Thailand

  32. Nawi NM, Khan A, Rehman MZ, Chiroma H, Herawan T (2015) Weight, optimization in recurrent neural networks with hybrid metaheuristic Cuckoo search techniques for data classification, mathematical problems in engineering, vol 2015, Article ID 868375. https://doi.org/10.1155/2015/868375

  33. Jiang L, Meng D, Zhao Q, Shan S, Hauptmann AG (2015) Self-paced curriculum learning. Twenty-ninth conference on artificial intelligence (AAAI15), Austin Texas, USA, 25–29 Jan. 2015, pp 2694–2700

  34. Zhang A, Sun G, Ren J, Li X, Wang Z, Jia X (2018) A dynamic neighborhood learning-based gravitational search algorithm. IEEE Trans Cybern 48(1):436–447

    Article  Google Scholar 

  35. Bu F, Chen Z, Zhang Q (2015) Incremental updating method for big data feature learning. Comput Eng Appl 3:92–101

    Google Scholar 

  36. Mangal R, Nori AV, Orso A (2019) Robustness of neural networks: a probabilistic and practical approach. arXiv:1902.05983v1 [cs.LG] 15 Feb 2019

  37. Zeng H, Zhu C, Goldstein T, Huang F (2020) Are adversarial examples created equal? A learnable weighted minimax risk for robustness under non-uniform attacks. arXiv:2010.12989 [cs.LG], 24 Oct. 2020

  38. Li D, Li Q, Ye Y, Xu S (2020) enhancing robustness of deep neural networks against adversarial malware samples: principles, framework, and AICS’2019 challenge, the AAAI-19 workshop on artificial intelligence for cyber security (AICS), 2019. arXiv:1812.08108v3 [cs.CR] 16 Sep 2020

  39. AlBadawy EA, Saha A, Mazurowski MA (2018) Deep learning for segmentation of brain tumors: impact of cross-institutional training and testing. Med Phys 45(3):1150–1158. https://doi.org/10.1002/mp.12752

    Article  Google Scholar 

  40. Chui KT, Fung DCL, Lytras MD, Lamb TM (2020) Predicting at-risk university students in a virtual learning environment via a machine learning algorithm. Comput Hum Behav. https://doi.org/10.1016/j.chb.2018.06.032

    Article  Google Scholar 

  41. Rojas R (1996) Chapter 7 The backpropagation algorithm. Neural networks. Springer, Berlin, pp 151–184

    Chapter  Google Scholar 

  42. Ratnaweera A, Halgamuge SK, Watson H (2002) Particle swarm optimization with self-adaptive acceleration coefficients. In: Proceedings of the 1st international conference on fuzzy systems and knowledge discovery (FSKD’02): computational intelligence for the E-age, 2 vol, 18–22. Orchid Country Club, Singapore

  43. Bratton D, Kennedy J (2007) Defining a standard for particle swarm optimization. In: Proceedings of the 2007 IEEE swarm intelligence symposium (SIS 2007), Honolulu, HI, USA, 1–5 April 2007. https://doi.org/10.1109/SIS.2007.368035

  44. He H, Lawry J (2014) Linguistic attribute hierarchy and its optimisation for classification problems. Soft Comput 18(10):1967–1984

    Article  Google Scholar 

  45. He H, Tiwari A, Mehnen J et al (2016) Incremental information gain analysis of input attribute impact on RBF-kernel SVM spam detection, WCCI2016, Vancouver, Canada, 24–29 July, 2016

  46. He H, Watson T, Maple C et al (2017) Semantic attribute deep learning with a hierarchy of linguistic decision trees for spam detection, IJCNN2017, Anchorage, Alaska, USA, 14–19 May 2017

  47. He H, Zhu ZL, Xu G, Zhu Z (2018) How good a shallow neural network is for solving non-linear decision making problems: proceedings of the 9th international conference, BICS 2018, Xi’an, China, July 7–8, 2018. In book: advances in brain inspired cognitive systems. https://doi.org/10.1007/978-3-030-00563-4_2

  48. Pima Indian Diabetes Database. https://www.kaggle.com/uciml/pima-indians-diabetes-database. Accessed 25 Sept 2020

  49. Qin Z, Lawry J (2005) Decision tree learning with fuzzy labels. Inf Sci 172(1–2):91–129

    Article  MathSciNet  Google Scholar 

  50. He H, Zhu Z (2020) A heuristically self-organised linguistic attribute deep learning in edge computing for IoT intelligence. arXiv:2006.04766, 08 Jun 2020

Download references

Acknowledgements

This research is sponsored by National Natural Science Foundation of China (61903002), the key project of Natural Science in Universities in Anhui Province, China (KJ2018A0111), and the Open Research Fund of Anhui Key Laboratory of Detection Technology and Energy Saving Devices, Anhui Polytechnic University, China (2017070503B026-A01).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongmei He.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

He, H., Chen, M., Xu, G. et al. Learnability and robustness of shallow neural networks learned by a performance-driven BP and a variant of PSO for edge decision-making. Neural Comput & Applic 33, 13809–13830 (2021). https://doi.org/10.1007/s00521-021-06019-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-021-06019-1

Keywords

Navigation