Improving learning in Artificial Neural Networks using better weight initializations

Mittal, Apeksha; Chandra, Pravin

doi:10.1007/s41870-024-01869-z

Improving learning in Artificial Neural Networks using better weight initializations

Original Research
Published: 29 April 2024

(2024)
Cite this article

International Journal of Information Technology Aims and scope Submit manuscript

Apeksha Mittal^1,2 &
Pravin Chandra²

11 Accesses
Explore all metrics

Abstract

Artificial Neural Networks is a popular choice in optimization tasks for a number of applications such as approximations, regression, classification. The speed of training of Artificial Neural Networks is sensitive to weight initialization. A new interval-based weight initialization method I-WT is proposed to improve the convergence rate in artificial neural networks. The lower and upper bound is used, so as to model the uncertainty (such as noise in case of measured data), in the estimation of free parameters, i.e., weights. The novelty of this approach lies in the exploitation of dependency of weight update on the derivative of activation function to determine the optimal interval for weight initialization, that reduce the chance of saturation to only 6%, thus ensuring faster convergence. A thorough derivation of I-WT is presented and verified over a number of learning problems such as function approximation, regression and classification. The simulations demonstrate that for most cases, the performance (in terms of mean squared error) achieved using I-WT is improved compared to other weight initialization methods. I-WT is also tested for Deep Neural Networks, and performs better than Glorot and Bengio, and He et al., most popular choice of initialization for Deep Neural Networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Behavior Analysis of a Deep Feedforward Neural Network by Varying the Weight Initialization Methods

A Modification to the Nguyen–Widrow Weight Initialization Method

Weight and bias initialization routines for Sigmoidal Feedforward Network

Article 07 November 2020

Data availability

All datasets used for study are taken from secondary data sources, and can be accessed from the mentioned links: (1) Montreal Bike Lanes dataset for regression is available at https://www.kaggle.com/datasets/pablomonleon/montreal-bike-lanes; (2) Computer Hardware Dataset for regression is available at https://archive.ics.uci.edu/dataset/29/computer+hardware; (3) Automobile Dataset for regression is available at https://archive.ics.uci.edu/dataset/10/automobile; (4) Iris Dataset for classification is available at https://archive.ics.uci.edu/dataset/53/iris; (5) Wine Dataset for classification is available at https://archive.ics.uci.edu/dataset/109/wine; (6) Bike Sharing Dataset for DNN is available at https://archive.ics.uci.edu/dataset/275/bike+sharing+dataset; (7) Online News Popularity Dataset for DNN is available at https://archive.ics.uci.edu/dataset/332/online+news+popularity; (8) Superconductivity Dataset for DNN is available at https://archive.ics.uci.edu/dataset/464/superconductivty+dat.

Notes

The resilient backpropagation learning is variation of standard backpropagation learning for faster convergence.
The number of nodes at the hidden layer is varied from 1 to 35, and node size generating least error is reported.
We get five $5\times 5$ matrices, one for each learning problem, with entries 1(if performance of method i is statistically better than method j) and 0(if performance of method i is statistically similar to method j). These 5 matrices are superimposed in Table 5.
https://www.kaggle.com/pablomonleon/montreal-bike-lanes.
Bhatia et al. in their work [23] used logistic activation function at the hidden layer, whereas, in this work, we used hyperbolic tangent activation function. The modifications to Bhatia et al. method is discussed in Appendix.

References

Gajjar P, Saxena A, Acharya K, Shah P, Bhatt C, Nguyen TT (2023) Liquidt: stock market analysis using liquid time-constant neural networks. Int J Inf Technol 16(10):1–12
Singh N, Panda SP (2022) Artificial neural network on graphical processing unit and its emphasis on ground water level prediction. Int J Inf Technol 14(7):3659–3666
Google Scholar
Karthikeyan M, Mary Anita E, Mohana Geetha D (2023) Towards developing an automated technique for glaucomatous image classification and diagnosis (AT-GICD) using neural networks. Int J Inf Technol 15(7):3727–3739
Google Scholar
Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366
Article Google Scholar
Scarselli F, Tsoi AC (1998) Universal approximation using feedforward neural networks: a survey of some existing methods, and some new results. Neural Netw 11(1):15–37
Article Google Scholar
Rumelhart DE, Hinton GE, Williams RJ (1985) Learning internal representations by error propagation. Technical report, DTIC Document
Deng L, Yu D et al (2014) Deep learning: methods and applications. Found Trends Signal Process 7(3–4):197–387
Article MathSciNet Google Scholar
Bengio Y et al (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127
Article MathSciNet Google Scholar
Kolen JF, Pollack JB (1991) Back propagation is sensitive to initial conditions. In: Advances in neural information processing systems, pp 860–867
Riedmiller M, Braun H (1993) A direct adaptive method for faster backpropagation learning: the RPROP algorithm. In: IEEE international conference on neural networks. IEEE, pp 586–591
Kim Y, Ra J (1991) Weight value initialization for improving training speed in the backpropagation network. In: 1991 IEEE international joint conference on neural networks, vol 16, no 10. IEEE, pp 2396–2401
Drago GP, Ridella S (1992) Statistically controlled activation weight initialization (SCAWI). IEEE Trans Neural Netw 3(4):627–631
Article Google Scholar
Boers JW (1992) Biological metaphors and the design of modular artificial neural networks. Master’s thesis, Leiden University, the Netherlands
Wessels LF, Barnard E (1992) Avoiding false local minima by proper initialization of connections. IEEE Trans Neural Netw 3(6):899–905
Article Google Scholar
Thimm G, Fiesler E (1997) High-order and multilayer perceptron initialization. IEEE Trans Neural Netw 8(2):349–359
Article Google Scholar
Yam JY, Chow TW (2001) Feedforward networks training speed enhancement by optimal initialization of the synaptic coefficients. IEEE Trans Neural Netw 12(2):430–434
Article Google Scholar
Erdogmus D, Fontenla-Romero O, Principe JC, Alonso-Betanzos A, Castillo E (2005) Linear-least-squares initialization of multilayer perceptrons through backpropagation of the desired response. IEEE Trans Neural Netw 16(2):325–337
Article Google Scholar
Timotheou S (2009) A novel weight initialization method for the random neural network. Neurocomputing 73(1–3):160–168
Article Google Scholar
Adam SP, Karras DA, Magoulas GD, Vrahatis MN (2014) Solving the linear interval tolerance problem for weight initialization of neural networks. Neural Netw 54:17–37
Article Google Scholar
Sodhi SS, Chandra P, Tanwar S (2014) A new weight initialization method for sigmoidal feedforward artificial neural networks. In: 2014 international joint conference on neural networks (IJCNN). IEEE, pp 291–298
Qiao J, Li S, Li W (2016) Mutual information based weight initialization method for sigmoidal feedforward neural networks. Neurocomputing 207:676–683
Article Google Scholar
Mittal A, Singh AP, Chandra P (2017) A new weight initialization using statistically resilient method and Moore-Penrose inverse method for SFANN. Int J Recent Res Asp 4:98–105
Google Scholar
Bhatia M, Veenu Chandra P (2018) A new weight initialization method for sigmoidal FFANN. J Intell Fuzzy Syst (Preprint), 1–9 (2018)
Mittal A, Singh AP, Chandra P (2020) A modification to the Nguyen–Widrow weight initialization method. Intelligent systems. In: Technologies and applications. Springer, Berlin, pp 141–153
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 249–256
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034
Pinkus A (1999) Approximation theory of the MLP model in neural networks. Acta Numer 8:143–195
Article MathSciNet Google Scholar
Chandra P, Singh Y (2004) Feedforward sigmoidal networks-equicontinuity and fault-tolerance properties. IEEE Trans Neural Netw 15(6):1350–1366
Article Google Scholar
Bonamente M (2013) Statistics and analysis of scientific data. Springer, Berlin
Book Google Scholar
DasGupta A (2000) Best constants in Chebyshev inequalities with various applications. Metrika 51(3):185–200
Article MathSciNet Google Scholar
Cherkassky V, Mulier FM (2007) Learning from data: concepts, theory, and methods. Wiley, Berlin
Book Google Scholar
Chandra P, Ghose U, Sood A (2015) A non-sigmoidal activation function for feedforward artificial neural networks. In: Neural Networks (IJCNN), 2015 International Joint conference on. IEEE, pp 1–8
Bache K, Lichman M (2013) UCI machine learning repository
Ein-Dor P, Feldmesser J (1987) Attributes of the performance of central processing units: a relative performance prediction model. Commun ACM 30(4):308–318
Article Google Scholar
Kibler D, Aha DW, Albert MK (1989) Instance-based prediction of real-valued attributes. Comput Intell 5(2):51–57
Article Google Scholar
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(2):179–188
Article Google Scholar
Aeberhard S, Coomans D, De Vel O (1994) Comparative analysis of statistical pattern recognition methods in high dimensional settings. Pattern Recognit 27(8):1065–1077
Article Google Scholar
Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manag 45(4):427–437
Article Google Scholar
Kumar A, Jain S, Kumar M (2023) Face and gait biometrics authentication system based on simplified deep neural networks. Int J Inf Technol 15(2):1005–1014
Google Scholar
Collobert R, Weston J (2008) A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th international conference on machine learning, pp 160–167
Mishra AK, Roy P, Bandyopadhyay S, Das SK (2022) Achieving highly efficient breast ultrasound tumor classification with deep convolutional neural networks. Int J Inf Technol 14(7):3311–3320
Google Scholar
Lawrence S, Giles CL (2000) Overfitting and neural networks: conjugate gradient and backpropagation. In: Proceedings of the IEEE-INNS-ENNS international joint conference on neural networks. IJCNN 2000. Neural computing: new challenges and perspectives for the New Millennium, vol 1. IEEE, pp 114–119
Liu W, Wang Z, Liu X, Zeng N, Liu Y, Alsaadi FE (2017) A survey of deep neural network architectures and their applications. Neurocomputing 234:11–26
Article Google Scholar
Zhang Q, Yang LT, Chen Z, Li P (2018) A survey on deep learning for big data. Inf Fusion 42:146–157
Article Google Scholar
Fanaee TH, Gama J (2014) Event labeling combining ensemble detectors and background knowledge. Prog Artif Intell 2(2–3):113–127
Article Google Scholar
Fernandes K, Vinagre P, Cortez P (2015) A proactive intelligent decision support system for predicting the popularity of online news. In: Portuguese conference on artificial intelligence. Springer, Berlin, pp 535–546
Hamidieh K (2018) A data-driven statistical model for predicting the critical temperature of a superconductor. Comput Mater Sci 154:346–354
Article Google Scholar

Download references

Funding

No funding was received to assist with the preparation of this manuscript.

Author information

Authors and Affiliations

GD Goenka University, Sohna, Haryana, 122103, India
Apeksha Mittal
Guru Gobind Singh Indraprastha University, Delhi, 110078, India
Apeksha Mittal & Pravin Chandra

Authors

Apeksha Mittal
View author publications
You can also search for this author in PubMed Google Scholar
Pravin Chandra
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Both authors contributed to the study conception and design. Both authors read and approved the final manuscript.

Corresponding author

Correspondence to Apeksha Mittal.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Appendix

a.
Central limit theoram [29] states that the distribution sum of large number of independent random variables is approximately gaussian. It can be noted from (1) that $n_j$ is a sum of random variables $w_{ij}$, $x_i$ and $\theta _j$. Thus, the distribution of $n_j$ is approximately gaussian.
b.
DasGupta [30] stated an improved bound for the Chebyshev’s relation for a random variable X with finite expected value $\mu$ and variance $\delta ^2$ and normally distributed as:
$$\begin{aligned} P \left\{ |X-\mu | \ge k\delta \right\} \le \frac{1}{3k^2} \end{aligned}$$
(1)
for any real number $k>0$.
c.
A confusion matrix is used to evaluate the performance of classification task, using actual output and desired output of a trained network. Accuracy is the ratio of the correctly predicted data to the total predicted data. Precision is evaluated as the ratio of the correctly predicted positive data, to the positive predicted data. Recall is the ratio of correctly predicted positive data to the actual positive data [38].

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Mittal, A., Chandra, P. Improving learning in Artificial Neural Networks using better weight initializations. Int. j. inf. tecnol. (2024). https://doi.org/10.1007/s41870-024-01869-z

Download citation

Received: 08 January 2024
Accepted: 08 April 2024
Published: 29 April 2024
DOI: https://doi.org/10.1007/s41870-024-01869-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving learning in Artificial Neural Networks using better weight initializations

Abstract

Access this article

Similar content being viewed by others

Behavior Analysis of a Deep Feedforward Neural Network by Varying the Weight Initialization Methods

A Modification to the Nguyen–Widrow Weight Initialization Method

Weight and bias initialization routines for Sigmoidal Feedforward Network

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving learning in Artificial Neural Networks using better weight initializations

Abstract

Access this article

Similar content being viewed by others

Behavior Analysis of a Deep Feedforward Neural Network by Varying the Weight Initialization Methods

A Modification to the Nguyen–Widrow Weight Initialization Method

Weight and bias initialization routines for Sigmoidal Feedforward Network

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation