Skip to main content
Log in

Backpropagation for Fully Connected Cascade Networks

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

The fully connected cascade (FCC) networks are a recently proposed class of neural networks where each layer has only one neuron and each neuron is connected with all the neurons in its previous layers. In this paper we derive and describe in detail an efficient backpropagation algorithm (named BPFCC) for computing the gradient for FCC networks. Actually, the backpropagation in BPFCC is an elaborately designed process for computing the derivative amplification coefficients, which are essential for gradient computation. The average time complexity for computing an entry of the gradient is O(1). BPFCC needs to be called by training algorithms to do any useful work, and we wrote a program FCCNET for that purpose. Currently, FCCNET uses the Levenberg–Marquardt algorithm to train FCC networks, and the loss function for classification is designed based on a nonlinear extension of logistic regression. For two-class classification, we derive a Gauss–Newton-like approximation for the Hessian of the loss function, and when the number of classes is more than two, numerical approximation of the Hessian is used. Experimental results confirm the efficiency of BPFCC, and the validity of the companion techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. To save space the weights obtained in the experiments are not listed, but they can be obtained by contacting the author of this paper.

  2. All the three problems are regression problems, including Spirals, although it is believed that Spirals has been designed by the Neuron by Neuron authors based on the two-spiral problem. However, the Two-Spiral problem treated by FCCNET in Sect. 5.3 is the original two-spiral classification problem.

References

  1. LeCun Y, Bottou L, Orr G, Muller K (1998) Efficient backprop. In: Orr G, Muller K (eds) Neural networks: tricks of the trade. Springer, Berlin

    Google Scholar 

  2. Livieris IE, Pintelas P (2013) A new conjugate gradient algorithm for training neural networks based on a modified secant equation. Appl Math Comput 221(15):491–502

    Article  MathSciNet  MATH  Google Scholar 

  3. Hagan MT, Menhaj MB (1994) Training feedforward networks with the Marquardt algorithm. IEEE Trans Neural Netw 5(6):989–993

    Article  Google Scholar 

  4. Robitaille B, Marcos B, Veillette M, Payre G (1996) Modified quasi-Newton methods for training neural networks. Comput Chem Eng 20(9):1133–1140

    Article  Google Scholar 

  5. Bottou L, Curtis FE, Nocedal J (2016) Optimization methods for large-scale machine learning. https://arxiv.org/abs/1606.04838

  6. Wilamowski BM, Cotton NJ, Kaynak O, Dundar G (2008) Computing gradient vector and Jacobian matrix in arbitrarily connected neural networks. IEEE Trans Ind Electron 55(10):3784–3790

    Article  Google Scholar 

  7. Wilamowski BM, Yu H (2010) Improved computation for Levenberg–Marquardt training. IEEE Trans Neural Netw 21(6):930–937

    Article  Google Scholar 

  8. Wilamowski BM, Yu H (2010) Neural network learning without backpropagation. IEEE Trans Neural Netw 21(11):1793–1803

    Article  Google Scholar 

  9. Hunter D, Yu H, Pukish MS, Kolbusz J, Wilamowski BM (2012) Selection of proper neural network sizes and architectures: a comparative study. IEEE Trans Ind Inf 8(2):228–240

    Article  Google Scholar 

  10. Hussain S, Mokhtar M, Howe JM (2015) Sensor failure detection, identification, and accommodation using fully connected cascade neural network. IEEE Trans Ind Electron 62(3):1683–1692

    Article  Google Scholar 

  11. Deshpande G, Wang P, Rangaprakash D, Wilamowski B (2015) Fully connected cascade artificial neural network architecture for attention deficit hyperactivity disorder classification from functional magnetic resonance imaging data. IEEE Trans Cybern 45(12):2668–2679

    Article  Google Scholar 

  12. Nielsen MA (2015) Neural networks and deep learning. Determination Press

  13. Haykin S (2008) Neural networks and learning machines, 3rd edn. Prentice Hall, Upper Saddle Rive

    Google Scholar 

  14. Richard MD, Lippmann RP (1991) Neural network classifiers estimate Bayesian a posteriori probabilities. Neural Comput 3(4):461–483

    Article  Google Scholar 

  15. Alpaydin E (2010) Introduction to machine learning, 2nd edn. MIT Press, Cambridge

    MATH  Google Scholar 

  16. Theodoridis S, Koutroumbas K (2009) Pattern recognition, 4th edn. Elsevier, London

    MATH  Google Scholar 

  17. Zhang GP (2000) Neural networks for classification: a survey. IEEE Trans Syst Man Cybernet C Appl Rev 30(4):451–462

    Article  Google Scholar 

  18. Allwein EL, Schapire RE, Singer Y (2001) Reducing multiclass to binary: a unifying approach for margin classifiers. J Mach Learn Res 1(2):113–141

    MathSciNet  MATH  Google Scholar 

  19. Ou G, Murphey YL (2007) Multi-class pattern classification using neural networks. Pattern Recognit 40(1):4–18

    Article  MATH  Google Scholar 

  20. Cid-Sueiro J, Arribas JI, Urban-Munoz S, Figueiras-Vidal AR (1999) Cost functions to estimate a posteriori probabilities in multiclass problems. IEEE Trans Neural Netw 10(3):645–656

    Article  Google Scholar 

  21. Suresh S, Sundararajan N, Saratchandran P (2008) Risk-sensitive loss functions for sparse multi-category classification problems. Inf Sci 178(12):2621–2638

    Article  MathSciNet  MATH  Google Scholar 

  22. Arribas JI, Cid-Sueiro J (2005) A model selection algorithm for a posteriori probability estimation with neural networks. IEEE Trans Neural Netw 16(4):799–809

    Article  Google Scholar 

  23. Seghouane A-K, Amari S-I (2007) The AIC criterion and symmetrizing the Kullback–Leibler divergence. IEEE Trans Neural Netw 18(1):97–106

    Article  Google Scholar 

  24. Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin

    MATH  Google Scholar 

  25. Nocedal J, Wright SJ (2006) Numerical optimization, 2nd edn. Springer, Berlin

    MATH  Google Scholar 

  26. Mackay D (1992) Bayesian interpolation. Neural Comput 4(3):415–447

    Article  MATH  Google Scholar 

  27. Wedge D, Ingram D, McLean D, Mingham C, Bandar Z (2006) On global-local artificial neural networks for function approximation. IEEE Trans Neural Netw 17(4):942–952

    Article  Google Scholar 

  28. Brooks TF, Pope DS, Marcolini AM (1989) Airfoil self-noise and prediction. Technical Report RP-1218, NASA

  29. Gonzalez RL (2008) Neural networks for variational problems in engineering. PhD thesis, Technical University of Catalonia

  30. Lang KJ, Witbrock M (1988) Learning to tell two spirals apart. In: Proceedings of the 1988 connectionist models summer school

  31. Gritsenko A, Eirola E, Schupp D, Ratner E, Lendasse A (2016) Probabilistic methods for multiclass classification problems. In: Proceedings of ELM-2015, vol 2. Springer, pp 385–397

Download references

Acknowledgements

The author would like to thank the anonymous reviewers whose suggestions greatly enhanced the technical quality of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yiping Cheng.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cheng, Y. Backpropagation for Fully Connected Cascade Networks. Neural Process Lett 46, 293–311 (2017). https://doi.org/10.1007/s11063-017-9588-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-017-9588-4

Keywords

Mathematics Subject Classification

Navigation