Abstract
Artificial Neural Networks is a popular choice in optimization tasks for a number of applications such as approximations, regression, classification. The speed of training of Artificial Neural Networks is sensitive to weight initialization. A new interval-based weight initialization method I-WT is proposed to improve the convergence rate in artificial neural networks. The lower and upper bound is used, so as to model the uncertainty (such as noise in case of measured data), in the estimation of free parameters, i.e., weights. The novelty of this approach lies in the exploitation of dependency of weight update on the derivative of activation function to determine the optimal interval for weight initialization, that reduce the chance of saturation to only 6%, thus ensuring faster convergence. A thorough derivation of I-WT is presented and verified over a number of learning problems such as function approximation, regression and classification. The simulations demonstrate that for most cases, the performance (in terms of mean squared error) achieved using I-WT is improved compared to other weight initialization methods. I-WT is also tested for Deep Neural Networks, and performs better than Glorot and Bengio, and He et al., most popular choice of initialization for Deep Neural Networks.
Similar content being viewed by others
Data availability
All datasets used for study are taken from secondary data sources, and can be accessed from the mentioned links: (1) Montreal Bike Lanes dataset for regression is available at https://www.kaggle.com/datasets/pablomonleon/montreal-bike-lanes; (2) Computer Hardware Dataset for regression is available at https://archive.ics.uci.edu/dataset/29/computer+hardware; (3) Automobile Dataset for regression is available at https://archive.ics.uci.edu/dataset/10/automobile; (4) Iris Dataset for classification is available at https://archive.ics.uci.edu/dataset/53/iris; (5) Wine Dataset for classification is available at https://archive.ics.uci.edu/dataset/109/wine; (6) Bike Sharing Dataset for DNN is available at https://archive.ics.uci.edu/dataset/275/bike+sharing+dataset; (7) Online News Popularity Dataset for DNN is available at https://archive.ics.uci.edu/dataset/332/online+news+popularity; (8) Superconductivity Dataset for DNN is available at https://archive.ics.uci.edu/dataset/464/superconductivty+dat.
Notes
The resilient backpropagation learning is variation of standard backpropagation learning for faster convergence.
The number of nodes at the hidden layer is varied from 1 to 35, and node size generating least error is reported.
We get five \(5\times 5\) matrices, one for each learning problem, with entries 1(if performance of method i is statistically better than method j) and 0(if performance of method i is statistically similar to method j). These 5 matrices are superimposed in Table 5.
References
Gajjar P, Saxena A, Acharya K, Shah P, Bhatt C, Nguyen TT (2023) Liquidt: stock market analysis using liquid time-constant neural networks. Int J Inf Technol 16(10):1–12
Singh N, Panda SP (2022) Artificial neural network on graphical processing unit and its emphasis on ground water level prediction. Int J Inf Technol 14(7):3659–3666
Karthikeyan M, Mary Anita E, Mohana Geetha D (2023) Towards developing an automated technique for glaucomatous image classification and diagnosis (AT-GICD) using neural networks. Int J Inf Technol 15(7):3727–3739
Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366
Scarselli F, Tsoi AC (1998) Universal approximation using feedforward neural networks: a survey of some existing methods, and some new results. Neural Netw 11(1):15–37
Rumelhart DE, Hinton GE, Williams RJ (1985) Learning internal representations by error propagation. Technical report, DTIC Document
Deng L, Yu D et al (2014) Deep learning: methods and applications. Found Trends Signal Process 7(3–4):197–387
Bengio Y et al (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127
Kolen JF, Pollack JB (1991) Back propagation is sensitive to initial conditions. In: Advances in neural information processing systems, pp 860–867
Riedmiller M, Braun H (1993) A direct adaptive method for faster backpropagation learning: the RPROP algorithm. In: IEEE international conference on neural networks. IEEE, pp 586–591
Kim Y, Ra J (1991) Weight value initialization for improving training speed in the backpropagation network. In: 1991 IEEE international joint conference on neural networks, vol 16, no 10. IEEE, pp 2396–2401
Drago GP, Ridella S (1992) Statistically controlled activation weight initialization (SCAWI). IEEE Trans Neural Netw 3(4):627–631
Boers JW (1992) Biological metaphors and the design of modular artificial neural networks. Master’s thesis, Leiden University, the Netherlands
Wessels LF, Barnard E (1992) Avoiding false local minima by proper initialization of connections. IEEE Trans Neural Netw 3(6):899–905
Thimm G, Fiesler E (1997) High-order and multilayer perceptron initialization. IEEE Trans Neural Netw 8(2):349–359
Yam JY, Chow TW (2001) Feedforward networks training speed enhancement by optimal initialization of the synaptic coefficients. IEEE Trans Neural Netw 12(2):430–434
Erdogmus D, Fontenla-Romero O, Principe JC, Alonso-Betanzos A, Castillo E (2005) Linear-least-squares initialization of multilayer perceptrons through backpropagation of the desired response. IEEE Trans Neural Netw 16(2):325–337
Timotheou S (2009) A novel weight initialization method for the random neural network. Neurocomputing 73(1–3):160–168
Adam SP, Karras DA, Magoulas GD, Vrahatis MN (2014) Solving the linear interval tolerance problem for weight initialization of neural networks. Neural Netw 54:17–37
Sodhi SS, Chandra P, Tanwar S (2014) A new weight initialization method for sigmoidal feedforward artificial neural networks. In: 2014 international joint conference on neural networks (IJCNN). IEEE, pp 291–298
Qiao J, Li S, Li W (2016) Mutual information based weight initialization method for sigmoidal feedforward neural networks. Neurocomputing 207:676–683
Mittal A, Singh AP, Chandra P (2017) A new weight initialization using statistically resilient method and Moore-Penrose inverse method for SFANN. Int J Recent Res Asp 4:98–105
Bhatia M, Veenu Chandra P (2018) A new weight initialization method for sigmoidal FFANN. J Intell Fuzzy Syst (Preprint), 1–9 (2018)
Mittal A, Singh AP, Chandra P (2020) A modification to the Nguyen–Widrow weight initialization method. Intelligent systems. In: Technologies and applications. Springer, Berlin, pp 141–153
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 249–256
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034
Pinkus A (1999) Approximation theory of the MLP model in neural networks. Acta Numer 8:143–195
Chandra P, Singh Y (2004) Feedforward sigmoidal networks-equicontinuity and fault-tolerance properties. IEEE Trans Neural Netw 15(6):1350–1366
Bonamente M (2013) Statistics and analysis of scientific data. Springer, Berlin
DasGupta A (2000) Best constants in Chebyshev inequalities with various applications. Metrika 51(3):185–200
Cherkassky V, Mulier FM (2007) Learning from data: concepts, theory, and methods. Wiley, Berlin
Chandra P, Ghose U, Sood A (2015) A non-sigmoidal activation function for feedforward artificial neural networks. In: Neural Networks (IJCNN), 2015 International Joint conference on. IEEE, pp 1–8
Bache K, Lichman M (2013) UCI machine learning repository
Ein-Dor P, Feldmesser J (1987) Attributes of the performance of central processing units: a relative performance prediction model. Commun ACM 30(4):308–318
Kibler D, Aha DW, Albert MK (1989) Instance-based prediction of real-valued attributes. Comput Intell 5(2):51–57
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(2):179–188
Aeberhard S, Coomans D, De Vel O (1994) Comparative analysis of statistical pattern recognition methods in high dimensional settings. Pattern Recognit 27(8):1065–1077
Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manag 45(4):427–437
Kumar A, Jain S, Kumar M (2023) Face and gait biometrics authentication system based on simplified deep neural networks. Int J Inf Technol 15(2):1005–1014
Collobert R, Weston J (2008) A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th international conference on machine learning, pp 160–167
Mishra AK, Roy P, Bandyopadhyay S, Das SK (2022) Achieving highly efficient breast ultrasound tumor classification with deep convolutional neural networks. Int J Inf Technol 14(7):3311–3320
Lawrence S, Giles CL (2000) Overfitting and neural networks: conjugate gradient and backpropagation. In: Proceedings of the IEEE-INNS-ENNS international joint conference on neural networks. IJCNN 2000. Neural computing: new challenges and perspectives for the New Millennium, vol 1. IEEE, pp 114–119
Liu W, Wang Z, Liu X, Zeng N, Liu Y, Alsaadi FE (2017) A survey of deep neural network architectures and their applications. Neurocomputing 234:11–26
Zhang Q, Yang LT, Chen Z, Li P (2018) A survey on deep learning for big data. Inf Fusion 42:146–157
Fanaee TH, Gama J (2014) Event labeling combining ensemble detectors and background knowledge. Prog Artif Intell 2(2–3):113–127
Fernandes K, Vinagre P, Cortez P (2015) A proactive intelligent decision support system for predicting the popularity of online news. In: Portuguese conference on artificial intelligence. Springer, Berlin, pp 535–546
Hamidieh K (2018) A data-driven statistical model for predicting the critical temperature of a superconductor. Comput Mater Sci 154:346–354
Funding
No funding was received to assist with the preparation of this manuscript.
Author information
Authors and Affiliations
Contributions
Both authors contributed to the study conception and design. Both authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Appendix
Appendix
-
a.
Central limit theoram [29] states that the distribution sum of large number of independent random variables is approximately gaussian. It can be noted from (1) that \(n_j\) is a sum of random variables \(w_{ij}\), \(x_i\) and \(\theta _j\). Thus, the distribution of \(n_j\) is approximately gaussian.
-
b.
DasGupta [30] stated an improved bound for the Chebyshev’s relation for a random variable X with finite expected value \(\mu\) and variance \(\delta ^2\) and normally distributed as:
$$\begin{aligned} P \left\{ |X-\mu | \ge k\delta \right\} \le \frac{1}{3k^2} \end{aligned}$$(1)for any real number \(k>0\).
-
c.
A confusion matrix is used to evaluate the performance of classification task, using actual output and desired output of a trained network. Accuracy is the ratio of the correctly predicted data to the total predicted data. Precision is evaluated as the ratio of the correctly predicted positive data, to the positive predicted data. Recall is the ratio of correctly predicted positive data to the actual positive data [38].
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mittal, A., Chandra, P. Improving learning in Artificial Neural Networks using better weight initializations. Int. j. inf. tecnol. (2024). https://doi.org/10.1007/s41870-024-01869-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s41870-024-01869-z