# What are Extreme Learning Machines? Filling the Gap Between Frank Rosenblatt’s Dream and John von Neumann’s Puzzle

- 1.6k Downloads
- 171 Citations

## Abstract

The emergent machine learning technique—extreme learning machines (ELMs)—has become a hot area of research over the past years, which is attributed to the growing research activities and significant contributions made by numerous researchers around the world. Recently, it has come to our attention that a number of misplaced notions and misunderstandings are being dissipated on the relationships between ELM and some earlier works. This paper wishes to clarify that (1) ELM theories manage to address the open problem which has puzzled the neural networks, machine learning and neuroscience communities for 60 years: whether hidden nodes/neurons need to be tuned in learning, and proved that in contrast to the common knowledge and conventional neural network learning tenets, hidden nodes/neurons do not need to be iteratively tuned in wide types of neural networks and learning models (Fourier series, biological learning, etc.). Unlike ELM theories, none of those earlier works provides theoretical foundations on feedforward neural networks with random hidden nodes; (2) ELM is proposed for both generalized single-hidden-layer feedforward network and multi-hidden-layer feedforward networks (including biological neural networks); (3) homogeneous architecture-based ELM is proposed for feature learning, clustering, regression and (binary/multi-class) classification. (4) Compared to ELM, SVM and LS-SVM tend to provide suboptimal solutions, and SVM and LS-SVM do not consider feature representations in hidden layers of multi-hidden-layer feedforward networks either.

## Keywords

Extreme learning machine Random vector functional link QuickNet Radial basis function network Feedforward neural network Randomness## Notes

### Acknowledgments

Minsky and Rosenblatt’s controversy may have indirectly inspired the reviving of artificial neural network research in 1980s. Their controversy turned out to show that hidden neurons are critical. Although the main stream of research has focused learning algorithms on tuning hidden neurons since 1980s, few pioneers independently studied alternative solutions (e.g., Halbert White for QuickNet, Yao-Han Pao, Boris Igelnik, and C. L. Philip Chen for RVFL, D. S. Broomhead and David Lowe for RBF networks, Corinna Cortes and Vladimir Vapnik for SVM, J. A. K. Suykens and J. Vandewalle for LS-SVM, E. Baum, Wouter F. Schmidt, Martin A. Kraaijveld and Robert P. W. Duin for Random Weights Sigmoid Network) in 1980s–1990s. Although few of those attempts did not take off finally due to various reasons and constrains, they have been playing significant and irreplaceable roles in the relevant research history. We would appreciate their historical contributions. As discussed in Huang [6], without BP and SVM/LS-SVM, the research and applications on neural networks would never have been so intensive and extensive. We would like to thank Bernard Widrow, Stanford University, USA, for sharing his vision in neural networks and precious historical experiences with us in the past years. It is always pleasant to discuss with him on biological learning, neuroscience and Rosenblatt’s work. We both feel confident that we have never been so close to natural biological learning. We would like to thank C. L. Philip Chen, University of Macau, China and Boris Igelnik, BMI Research, Inc., USA, for invaluable discussions on RVFL, Johan Suykens, Katholieke Universiteit Leuven, Belgium, for invaluable discussions on LS-SVM and Stefano Fusi, Columbia University, USA, for the discussion on the links between biological learning and ELM, and Wouter F. Schmidt and Robert P. W. Duin for the kind constructive feedback on the discussions between ELM and their 1992 work. We would also like to thank Jose Principe, University of Florida, USA, for nice discussions on neuroscience (especially on neuron layers/slices) and his invaluable suggestions on potential mathematical problems of ELM and M. Brandon Westover, Harvard Medical School, USA, for the constructive comments and suggestions on the potential links between ELM and biological learning in local receptive fields.

## References

- 1.Schmidt WF, Kraaijveld MA, Duin RPW. Feed forward neural networks with random weights. In: Proceedings of 11th IAPR international conference on pattern recognition methodology and systems, Hague, Netherlands, p. 1–4, 1992.Google Scholar
- 2.Pao Y-H, Park G-H, Sobajic DJ. Learning and generalization characteristics of the random vector functional-link net. Neurocomputing. 1994;6:163–80.CrossRefGoogle Scholar
- 3.Huang G-B. Reply to comments on ‘the extreme learning machine’. IEEE Trans Neural Netw. 2008;19(8):1495–6.CrossRefGoogle Scholar
- 4.Huang G-B, Li M-B, Chen L, Siew C-K. Incremental extreme learning machine with fully complex hidden nodes. Neurocomputing. 2008;71:576–83.CrossRefGoogle Scholar
- 5.Huang G-B, Chen L. Enhanced random search based incremental extreme learning machine. Neurocomputing. 2008;71:3460–8.CrossRefGoogle Scholar
- 6.Huang G-B. An insight into extreme learning machines: random neurons, random features and kernels. Cogn Comput. 2014;6(3):376–90.CrossRefGoogle Scholar
- 7.Huang G, Song S, Gupta JND, Wu C. Semi-supervised and unsupervised extreme learning machines. IEEE Trans Cybern. 2014;44(12):2405–17.CrossRefPubMedGoogle Scholar
- 8.Huang G-B, Bai Z, Kasun LLC, Vong CM. Local receptive fields based extreme learning machine. IEEE Comput Intell Mag. 2015;10(2):18–29.CrossRefGoogle Scholar
- 9.Rosenblatt F. The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev. 1958;65(6):386–408.CrossRefPubMedGoogle Scholar
- 10.Rahimi A, Recht B. Random features for large-scale kernel machines. In: Proceedings of the 2007 neural information processing systems (NIPS2007), p. 1177–1184, 3–6 Dec 2007.Google Scholar
- 11.Le Q, Sarlós T, Smola A. Fastfood approximating kernel expansions in loglinear time. In: Proceedings of the 30th international conference on machine learning, Atlanta, USA, p. 16–21, June 2013.Google Scholar
- 12.Huang P-S, Deng L, Hasegawa-Johnson M, He X. Random features for kernel deep convex network. In: Proceedings of the 38th international conference on acoustics, speech, and signal processing (ICASSP 2013), Vancouver, Canada, p. 26–31, May 2013.Google Scholar
- 13.Widrow B, Greenblatt A, Kim Y, Park D. The no-prop algorithm: a new learning algorithm for multilayer neural networks. Neural Netw. 2013;37:182–8.CrossRefPubMedGoogle Scholar
- 14.Bartlett PL. The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Trans Inform Theory. 1998;44(2):525–36.CrossRefGoogle Scholar
- 15.Cortes C, Vapnik V. Support vector networks. Mach Learn. 1995;20(3):273–97.Google Scholar
- 16.Suykens JAK, Vandewalle J. Least squares support vector machine classifiers. Neural Process Lett. 1999;9(3):293–300.CrossRefGoogle Scholar
- 17.Minsky M, Papert S. Perceptrons: an introduction to computational geometry. Cambridge: MIT Press; 1969.Google Scholar
- 18.Huang G-B. Learning capability of neural networks. Ph.D. thesis, Nanyang Technological University, Singapore, 1998.Google Scholar
- 19.von Neumann J. Probabilistic logics and the synthesis of reliable organisms from unreliable components. In: Shannon CE, McCarthy J, editors. Automata studies. Princeton: Princeton University Press; 1956. p. 43–98.Google Scholar
- 20.von Neumann J. The general and logical theory of automata. In: Jeffress LA, editor. Cerebral mechanisms in behavior. New York: Wiley; 1951. p. 1–41.Google Scholar
- 21.Park J, Sandberg IW. Universal approximation using radial-basis-function networks. Neural Comput. 1991;3:246–57.CrossRefGoogle Scholar
- 22.Leshno M, Lin VY, Pinkus A, Schocken S. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Netw. 1993;6:861–7.CrossRefGoogle Scholar
- 23.Huang G-B, Zhou H, Ding X, Zhang R. Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern B. 2012;42(2):513–29.CrossRefGoogle Scholar
- 24.Huang G-B, Zhu Q-Y, Siew C-K. Extreme learning machine: a new learning scheme of feedforward neural networks. In: Proceedings of international joint conference on neural networks (IJCNN2004), vol. 2, Budapest, Hungary, p. 985–990, 25–29 July 2004.Google Scholar
- 25.Huang G-B, Chen L, Siew C-K. Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw. 2006;17(4):879–92.CrossRefPubMedGoogle Scholar
- 26.Huang G-B, Chen L. Convex incremental extreme learning machine. Neurocomputing. 2007;70:3056–62.CrossRefGoogle Scholar
- 27.Sosulski DL, Bloom ML, Cutforth T, Axel R, Datta SR. Distinct representations of olfactory information in different cortical centres. Nature. 2011;472:213–6.CrossRefPubMedCentralPubMedGoogle Scholar
- 28.Eliasmith C, Stewart TC, Choo X, Bekolay T, DeWolf T, Tang Y, Rasmussen D. A large-scale model of the functioning brain. Science. 2012;338:1202–5.CrossRefPubMedGoogle Scholar
- 29.Barak O, Rigotti M, Fusi S. The sparseness of mixed selectivity neurons controls the generalization–discrimination trade-off. J Neurosci. 2013;33(9):3844–56.CrossRefPubMedGoogle Scholar
- 30.Rigotti M, Barak O, Warden MR, Wang X-J, Daw ND, Miller EK, Fusi S. The importance of mixed selectivity in complex cognitive tasks. Nature. 2013;497:585–90.CrossRefPubMedCentralPubMedGoogle Scholar
- 31.Baum E. On the capabilities of multilayer perceptrons. J Complex. 1988;4:193–215.CrossRefGoogle Scholar
- 32.Igelnik B, Pao Y-H. Stochastic choice of basis functions in adaptive function approximation and the functional-link net. IEEE Trans Neural Netw. 1995;6(6):1320–9.CrossRefPubMedGoogle Scholar
- 33.Tamura S, Tateishi M. Capabilities of a four-layered feedforward neural network: four layers versus three. IEEE Trans Neural Netw. 1997;8(2):251–5.CrossRefPubMedGoogle Scholar
- 34.Principle J, Chen B. Universal approximation with convex optimization: gimmick or reality? IEEE Comput Intell Mag. 2015;10(2):68–77.CrossRefGoogle Scholar
- 35.Lowe D. Adaptive radial basis function nonlinearities and the problem of generalisation. In: Proceedings of first IEE international conference on artificial neural networks, p. 171–175, 1989.Google Scholar
- 36.Huang G-B, Zhu Q-Y, Mao KZ, Siew C-K, Saratchandran P, Sundararajan N. Can threshold networks be trained directly? IEEE Trans Circuits Syst II. 2006;53(3):187–91.CrossRefGoogle Scholar
- 37.Li M-B, Huang G-B, Saratchandran P, Sundararajan N. Fully complex extreme learning machine. Neurocomputing. 2005;68:306–14.CrossRefGoogle Scholar
- 38.Tang J, Deng C, Huang G-B. Extreme learning machine for multilayer perceptron. IEEE Trans Neural Netw Learn Syst. 2015;. doi: 10.1109/TNNLS.2015.2424995.Google Scholar
- 39.Kasun LLC, Zhou H, Huang G-B, Vong CM. Representational learning with extreme learning machine for big data. IEEE Intell Syst. 2013;28(6):31–4.Google Scholar
- 40.Jarrett K, Kavukcuoglu K, Ranzato M, LeCun Y. What is the best multi-stage architecture for object recognition. In: Proceedings of the 2009 IEEE 12th international conference on computer vision, Kyoto, Japan, 29 Sept–2 Oct 2009.Google Scholar
- 41.Saxe AM, Koh PW, Chen Z, Bhand M, Suresh B, Ng AY. On random weights and unsupervised feature learning. In: Proceedings of the 28th international conference on machine learning, Bellevue, USA, 28 June–2 July 2011.Google Scholar
- 42.Cox D, Pinto N. Beyond simple features: a large-scale feature search approach to unconstrained face recognition. In: IEEE international conference on automatic face and gesture recognition and workshops. IEEE, p. 8–15, 2011.Google Scholar
- 43.McDonnell MD, Vladusich T. Enhanced image classification with a fast-learning shallow convolutional neural network. In: Proceedings of international joint conference on neural networks (IJCNN’2015), Killarney, Ireland, 12–17 July 2015.Google Scholar
- 44.Zeng Y, Xu X, Fang Y, Zhao K. Traffic sign recognition using extreme learning classifier with deep convolutional features. In: The 2015 international conference on intelligence science and big data engineering (IScIDE 2015), Suzhou, China, June 14–16, 2015.Google Scholar
- 45.Suykens JAK, Gestel TV, Brabanter JD, Moor BD, Vandewalle J. Least squares support vector machines. Singapore: World Scientific; 2002.CrossRefGoogle Scholar
- 46.Rahimi A, Recht B. Uniform approximation of functions with random bases. In: Proceedings of the 2008 46th annual allerton conference on communication, control, and computing, p. 555–561, 23–26 Sept 2008Google Scholar
- 47.Daubechies I. Orthonormal bases of compactly supported wavelets. Commun Pure Appl Math. 1988;41:909–96.CrossRefGoogle Scholar
- 48.Daubechies I. The wavelet transform, time-frequency localization and signal analysis. IEEE Trans Inform Theory. 1990;36(5):961–1005.CrossRefGoogle Scholar
- 49.Miche Y, Sorjamaa A, Bas P, Simula O, Jutten C, Lendasse A. OP-ELM: optimally pruned extreme learning machine. IEEE Trans Neural Netw. 2010;21(1):158–62.CrossRefPubMedGoogle Scholar
- 50.Kim T, Adali T. Approximation by fully complex multilayer perceptrons. Neural Comput. 2003;15:1641–66.CrossRefPubMedGoogle Scholar
- 51.Chen CLP. A rapid supervised learning neural network for function interpolation and approximation. IEEE Trans Neural Netw. 1996;7(5):1220–30.CrossRefPubMedGoogle Scholar
- 52.Chen CLP, Wan JZ. A rapid learning and dynamic stepwise updating algorithm for flat neural networks and the applications to time-series prediction. IEEE Trans Syst Man Cybern B Cybern. 1999;29(1):62–72.CrossRefPubMedGoogle Scholar
- 53.Huang G-B, Zhu Q-Y, Siew C-K. Extreme learning machine: theory and applications. Neurocomputing. 2006;70:489–501.CrossRefGoogle Scholar
- 54.White H. An additional hidden unit test for neglected nonlinearity in multilayer feedforward networks. In: Proceedings of the international conference on neural networks, p. 451–455, 1989.Google Scholar
- 55.Poggio T, Mukherjee S, Rifkin R, Rakhlin A, Verri A. “\(b\)”, A.I. Memo No. 2001–011, CBCL Memo 198, Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 2001.Google Scholar
- 56.Steinwart I, Hush D, Scovel C. Training SVMs without offset. J Mach Learn Res. 2011;12(1):141–202.Google Scholar
- 57.Luo J, Vong C-M, Wong P-K. Sparse bayesian extreme learning machine for multi-classification. IEEE Trans Neural Netw Learn Syst. 2014;25(4):836–43.CrossRefPubMedGoogle Scholar
- 58.Decherchi S, Gastaldo P, Leoncini A, Zunino R. Efficient digital implementation of extreme learning machines for classification. IEEE Trans Circuits Syst II. 2012;59(8):496–500.CrossRefGoogle Scholar
- 59.Bai Z, Huang G-B, Wang D, Wang H, Westover MB. Sparse extreme learning machine for classification. IEEE Trans Cybern. 2014;44(10):1858–70.CrossRefPubMedGoogle Scholar
- 60.Frénay B, van Heeswijk M, Miche Y, Verleysen M, Lendasse A. Feature selection for nonlinear models with extreme learning machines. Neurocomputing. 2013;102:111–24.CrossRefGoogle Scholar
- 61.Broomhead DS, Lowe D. Multivariable functional interpolation and adaptive networks. Complex Syst. 1988;2:321–55.Google Scholar
- 62.Ferrari S, Stengel RF. Smooth function approximation using neural networks. IEEE Trans Neural Netw. 2005;16(1):24–38.CrossRefPubMedGoogle Scholar
- 63.Wang LP, Wan CR. Comments on ‘the extreme learning machine’. IEEE Trans Neural Netw. 2008;19(8):1494–5.CrossRefPubMedGoogle Scholar
- 64.Chen S, Cowan CFN, Grant PM. Orthogonal least squares learning algorithm for radial basis function networks. IEEE Trans Neural Netw. 1991;2(2):302–9.CrossRefPubMedGoogle Scholar