Skip to main content
Log in

Improving learning in Artificial Neural Networks using better weight initializations

  • Original Research
  • Published:
International Journal of Information Technology Aims and scope Submit manuscript

Abstract

Artificial Neural Networks is a popular choice in optimization tasks for a number of applications such as approximations, regression, classification. The speed of training of Artificial Neural Networks is sensitive to weight initialization. A new interval-based weight initialization method I-WT is proposed to improve the convergence rate in artificial neural networks. The lower and upper bound is used, so as to model the uncertainty (such as noise in case of measured data), in the estimation of free parameters, i.e., weights. The novelty of this approach lies in the exploitation of dependency of weight update on the derivative of activation function to determine the optimal interval for weight initialization, that reduce the chance of saturation to only 6%, thus ensuring faster convergence. A thorough derivation of I-WT is presented and verified over a number of learning problems such as function approximation, regression and classification. The simulations demonstrate that for most cases, the performance (in terms of mean squared error) achieved using I-WT is improved compared to other weight initialization methods. I-WT is also tested for Deep Neural Networks, and performs better than Glorot and Bengio, and He et al., most popular choice of initialization for Deep Neural Networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2

Similar content being viewed by others

Data availability

All datasets used for study are taken from secondary data sources, and can be accessed from the mentioned links: (1) Montreal Bike Lanes dataset for regression is available at https://www.kaggle.com/datasets/pablomonleon/montreal-bike-lanes; (2) Computer Hardware Dataset for regression is available at https://archive.ics.uci.edu/dataset/29/computer+hardware; (3) Automobile Dataset for regression is available at https://archive.ics.uci.edu/dataset/10/automobile; (4) Iris Dataset for classification is available at https://archive.ics.uci.edu/dataset/53/iris; (5) Wine Dataset for classification is available at https://archive.ics.uci.edu/dataset/109/wine; (6) Bike Sharing Dataset for DNN is available at https://archive.ics.uci.edu/dataset/275/bike+sharing+dataset; (7) Online News Popularity Dataset for DNN is available at https://archive.ics.uci.edu/dataset/332/online+news+popularity; (8) Superconductivity Dataset for DNN is available at https://archive.ics.uci.edu/dataset/464/superconductivty+dat.

Notes

  1. The resilient backpropagation learning is variation of standard backpropagation learning for faster convergence.

  2. The number of nodes at the hidden layer is varied from 1 to 35, and node size generating least error is reported.

  3. We get five \(5\times 5\) matrices, one for each learning problem, with entries 1(if performance of method i is statistically better than method j) and 0(if performance of method i is statistically similar to method j). These 5 matrices are superimposed in Table 5.

  4. https://www.kaggle.com/pablomonleon/montreal-bike-lanes.

  5. Bhatia et al. in their work [23] used logistic activation function at the hidden layer, whereas, in this work, we used hyperbolic tangent activation function. The modifications to Bhatia et al. method is discussed in Appendix.

References

  1. Gajjar P, Saxena A, Acharya K, Shah P, Bhatt C, Nguyen TT (2023) Liquidt: stock market analysis using liquid time-constant neural networks. Int J Inf Technol 16(10):1–12

  2. Singh N, Panda SP (2022) Artificial neural network on graphical processing unit and its emphasis on ground water level prediction. Int J Inf Technol 14(7):3659–3666

    Google Scholar 

  3. Karthikeyan M, Mary Anita E, Mohana Geetha D (2023) Towards developing an automated technique for glaucomatous image classification and diagnosis (AT-GICD) using neural networks. Int J Inf Technol 15(7):3727–3739

    Google Scholar 

  4. Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366

    Article  Google Scholar 

  5. Scarselli F, Tsoi AC (1998) Universal approximation using feedforward neural networks: a survey of some existing methods, and some new results. Neural Netw 11(1):15–37

    Article  Google Scholar 

  6. Rumelhart DE, Hinton GE, Williams RJ (1985) Learning internal representations by error propagation. Technical report, DTIC Document

  7. Deng L, Yu D et al (2014) Deep learning: methods and applications. Found Trends Signal Process 7(3–4):197–387

    Article  MathSciNet  Google Scholar 

  8. Bengio Y et al (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127

    Article  MathSciNet  Google Scholar 

  9. Kolen JF, Pollack JB (1991) Back propagation is sensitive to initial conditions. In: Advances in neural information processing systems, pp 860–867

  10. Riedmiller M, Braun H (1993) A direct adaptive method for faster backpropagation learning: the RPROP algorithm. In: IEEE international conference on neural networks. IEEE, pp 586–591

  11. Kim Y, Ra J (1991) Weight value initialization for improving training speed in the backpropagation network. In: 1991 IEEE international joint conference on neural networks, vol 16, no 10. IEEE, pp 2396–2401

  12. Drago GP, Ridella S (1992) Statistically controlled activation weight initialization (SCAWI). IEEE Trans Neural Netw 3(4):627–631

    Article  Google Scholar 

  13. Boers JW (1992) Biological metaphors and the design of modular artificial neural networks. Master’s thesis, Leiden University, the Netherlands

  14. Wessels LF, Barnard E (1992) Avoiding false local minima by proper initialization of connections. IEEE Trans Neural Netw 3(6):899–905

    Article  Google Scholar 

  15. Thimm G, Fiesler E (1997) High-order and multilayer perceptron initialization. IEEE Trans Neural Netw 8(2):349–359

    Article  Google Scholar 

  16. Yam JY, Chow TW (2001) Feedforward networks training speed enhancement by optimal initialization of the synaptic coefficients. IEEE Trans Neural Netw 12(2):430–434

    Article  Google Scholar 

  17. Erdogmus D, Fontenla-Romero O, Principe JC, Alonso-Betanzos A, Castillo E (2005) Linear-least-squares initialization of multilayer perceptrons through backpropagation of the desired response. IEEE Trans Neural Netw 16(2):325–337

    Article  Google Scholar 

  18. Timotheou S (2009) A novel weight initialization method for the random neural network. Neurocomputing 73(1–3):160–168

    Article  Google Scholar 

  19. Adam SP, Karras DA, Magoulas GD, Vrahatis MN (2014) Solving the linear interval tolerance problem for weight initialization of neural networks. Neural Netw 54:17–37

    Article  Google Scholar 

  20. Sodhi SS, Chandra P, Tanwar S (2014) A new weight initialization method for sigmoidal feedforward artificial neural networks. In: 2014 international joint conference on neural networks (IJCNN). IEEE, pp 291–298

  21. Qiao J, Li S, Li W (2016) Mutual information based weight initialization method for sigmoidal feedforward neural networks. Neurocomputing 207:676–683

    Article  Google Scholar 

  22. Mittal A, Singh AP, Chandra P (2017) A new weight initialization using statistically resilient method and Moore-Penrose inverse method for SFANN. Int J Recent Res Asp 4:98–105

    Google Scholar 

  23. Bhatia M, Veenu Chandra P (2018) A new weight initialization method for sigmoidal FFANN. J Intell Fuzzy Syst (Preprint), 1–9 (2018)

  24. Mittal A, Singh AP, Chandra P (2020) A modification to the Nguyen–Widrow weight initialization method. Intelligent systems. In: Technologies and applications. Springer, Berlin, pp 141–153

  25. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 249–256

  26. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034

  27. Pinkus A (1999) Approximation theory of the MLP model in neural networks. Acta Numer 8:143–195

    Article  MathSciNet  Google Scholar 

  28. Chandra P, Singh Y (2004) Feedforward sigmoidal networks-equicontinuity and fault-tolerance properties. IEEE Trans Neural Netw 15(6):1350–1366

    Article  Google Scholar 

  29. Bonamente M (2013) Statistics and analysis of scientific data. Springer, Berlin

    Book  Google Scholar 

  30. DasGupta A (2000) Best constants in Chebyshev inequalities with various applications. Metrika 51(3):185–200

    Article  MathSciNet  Google Scholar 

  31. Cherkassky V, Mulier FM (2007) Learning from data: concepts, theory, and methods. Wiley, Berlin

    Book  Google Scholar 

  32. Chandra P, Ghose U, Sood A (2015) A non-sigmoidal activation function for feedforward artificial neural networks. In: Neural Networks (IJCNN), 2015 International Joint conference on. IEEE, pp 1–8

  33. Bache K, Lichman M (2013) UCI machine learning repository

  34. Ein-Dor P, Feldmesser J (1987) Attributes of the performance of central processing units: a relative performance prediction model. Commun ACM 30(4):308–318

    Article  Google Scholar 

  35. Kibler D, Aha DW, Albert MK (1989) Instance-based prediction of real-valued attributes. Comput Intell 5(2):51–57

    Article  Google Scholar 

  36. Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(2):179–188

    Article  Google Scholar 

  37. Aeberhard S, Coomans D, De Vel O (1994) Comparative analysis of statistical pattern recognition methods in high dimensional settings. Pattern Recognit 27(8):1065–1077

    Article  Google Scholar 

  38. Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manag 45(4):427–437

    Article  Google Scholar 

  39. Kumar A, Jain S, Kumar M (2023) Face and gait biometrics authentication system based on simplified deep neural networks. Int J Inf Technol 15(2):1005–1014

    Google Scholar 

  40. Collobert R, Weston J (2008) A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th international conference on machine learning, pp 160–167

  41. Mishra AK, Roy P, Bandyopadhyay S, Das SK (2022) Achieving highly efficient breast ultrasound tumor classification with deep convolutional neural networks. Int J Inf Technol 14(7):3311–3320

    Google Scholar 

  42. Lawrence S, Giles CL (2000) Overfitting and neural networks: conjugate gradient and backpropagation. In: Proceedings of the IEEE-INNS-ENNS international joint conference on neural networks. IJCNN 2000. Neural computing: new challenges and perspectives for the New Millennium, vol 1. IEEE, pp 114–119

  43. Liu W, Wang Z, Liu X, Zeng N, Liu Y, Alsaadi FE (2017) A survey of deep neural network architectures and their applications. Neurocomputing 234:11–26

    Article  Google Scholar 

  44. Zhang Q, Yang LT, Chen Z, Li P (2018) A survey on deep learning for big data. Inf Fusion 42:146–157

    Article  Google Scholar 

  45. Fanaee TH, Gama J (2014) Event labeling combining ensemble detectors and background knowledge. Prog Artif Intell 2(2–3):113–127

    Article  Google Scholar 

  46. Fernandes K, Vinagre P, Cortez P (2015) A proactive intelligent decision support system for predicting the popularity of online news. In: Portuguese conference on artificial intelligence. Springer, Berlin, pp 535–546

  47. Hamidieh K (2018) A data-driven statistical model for predicting the critical temperature of a superconductor. Comput Mater Sci 154:346–354

    Article  Google Scholar 

Download references

Funding

No funding was received to assist with the preparation of this manuscript.

Author information

Authors and Affiliations

Authors

Contributions

Both authors contributed to the study conception and design. Both authors read and approved the final manuscript.

Corresponding author

Correspondence to Apeksha Mittal.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Appendix

Appendix

  1. a.

    Central limit theoram [29] states that the distribution sum of large number of independent random variables is approximately gaussian. It can be noted from (1) that \(n_j\) is a sum of random variables \(w_{ij}\), \(x_i\) and \(\theta _j\). Thus, the distribution of \(n_j\) is approximately gaussian.

  2. b.

    DasGupta [30] stated an improved bound for the Chebyshev’s relation for a random variable X with finite expected value \(\mu\) and variance \(\delta ^2\) and normally distributed as:

    $$\begin{aligned} P \left\{ |X-\mu | \ge k\delta \right\} \le \frac{1}{3k^2} \end{aligned}$$
    (1)

    for any real number \(k>0\).

  3. c.

    A confusion matrix is used to evaluate the performance of classification task, using actual output and desired output of a trained network. Accuracy is the ratio of the correctly predicted data to the total predicted data. Precision is evaluated as the ratio of the correctly predicted positive data, to the positive predicted data. Recall is the ratio of correctly predicted positive data to the actual positive data [38].

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mittal, A., Chandra, P. Improving learning in Artificial Neural Networks using better weight initializations. Int. j. inf. tecnol. (2024). https://doi.org/10.1007/s41870-024-01869-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s41870-024-01869-z

Keywords

Navigation