Cognitive Computation

, Volume 6, Issue 3, pp 376–390 | Cite as

An Insight into Extreme Learning Machines: Random Neurons, Random Features and Kernels

Article

Abstract

Extreme learning machines (ELMs) basically give answers to two fundamental learning problems: (1) Can fundamentals of learning (i.e., feature learning, clustering, regression and classification) be made without tuning hidden neurons (including biological neurons) even when the output shapes and function modeling of these neurons are unknown? (2) Does there exist unified framework for feedforward neural networks and feature space methods? ELMs that have built some tangible links between machine learning techniques and biological learning mechanisms have recently attracted increasing attention of researchers in widespread research areas. This paper provides an insight into ELMs in three aspects, viz: random neurons, random features and kernels. This paper also shows that in theory ELMs (with the same kernels) tend to outperform support vector machine and its variants in both regression and classification applications with much easier implementation.

Keywords

Extreme learning machine Support vector machine Least square support vector machine ELM kernel Random neuron Random feature Randomized matrix 

References

  1. 1.
    Cortes C, Vapnik V. Support vector networks. Mach Learn. 1995;20(3):273–97.Google Scholar
  2. 2.
    Suykens JAK, Vandewalle J. Least squares support vector machine classifiers. Neural Process Lett. 1999;9(3):293–300.CrossRefGoogle Scholar
  3. 3.
    Huang G-B, Zhu Q-Y, Siew C-K. Extreme learning machine: a new learning scheme of feedforward neural networks. In: Proceedings of international joint conference on neural networks (IJCNN2004), vol. 2, (Budapest, Hungary); 2004. p. 985–990, 25–29 July.Google Scholar
  4. 4.
    Li M-B, Huang G-B, Saratchandran P, Sundararajan N. Fully complex extreme learning machine. Neurocomputing 2005;68:306–14.CrossRefGoogle Scholar
  5. 5.
    Huang G-B, Zhu Q-Y, Siew C-K. Extreme learning machine: theory and applications. Neurocomputing. 2006;70:489–501.CrossRefGoogle Scholar
  6. 6.
    Huang G-B, Chen L, Siew C-K. Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw. 2006;17(4):879–92.PubMedCrossRefGoogle Scholar
  7. 7.
    Huang G-B, Chen L. Convex incremental extreme learning machine. Neurocomputing. 2007;70:3056–62.CrossRefGoogle Scholar
  8. 8.
    Miche Y, Sorjamaa A, Bas P, Simula O, Jutten C, Lendasse A. OP-ELM: optimally pruned extreme learning machine. IEEE Trans Neural Netw. 2010;21(1):158–62.PubMedCrossRefGoogle Scholar
  9. 9.
    Frénay B, Verleysen M. Using SVMs with randomised feature spaces: an extreme learning approach. In: Proceedings of the 18th European symposium on artificial neural networks (ESANN), (Bruges, Belgium); 2010. pp. 315–320, 28–30 April.Google Scholar
  10. 10.
    Frénay B, Verleysen M. Parameter-insensitive kernel in extreme learning for non-linear support vector regression. Neurocomputing. 2011;74:2526–31.CrossRefGoogle Scholar
  11. 11.
    Cho JS, White H. Testing correct model specification using extreme learning machines. Neurocomputing. 2011;74(16):2552–65.CrossRefGoogle Scholar
  12. 12.
    Soria-Olivas E, Gomez-Sanchis J, Martin JD, Vila-Frances J, Martinez M, Magdalena JR, Serrano AJ. BELM: Bayesian extreme learning machine. IEEE Trans Neural Netw. 2011;22(3):505–9.PubMedCrossRefGoogle Scholar
  13. 13.
    Xu Y, Dong ZY, Meng K, Zhang R, Wong KP. Real-time transient stability assessment model using extreme learning machine. IET Gener Transm Distrib. 2011;5(3):314–22.CrossRefGoogle Scholar
  14. 14.
    Saxe AM, Koh PW, Chen Z, Bhand M, Suresh B, Ng AY. On random weights and unsupervised feature learning. In: Proceedings of the 28th international conference on machine learning, (Bellevue, USA); 2011. 28 June–2 July.Google Scholar
  15. 15.
    Saraswathi S, Sundaram S, Sundararajan N, Zimmermann M, Nilsen-Hamilton M. ICGA-PSO-ELM approach for accurate multiclass cancer classification resulting in reduced gene sets in which genes encoding secreted proteins are highly represented. IEEE-ACM Trans Comput Biol Bioinform. 2011;6(2):452–63.CrossRefGoogle Scholar
  16. 16.
    Minhas R, Mohammed AA, Wu QMJ. Incremental learning in human action recognition based on snippets. IEEE Trans Circuits Syst Video Technol. 2012;22(11):1529–41.CrossRefGoogle Scholar
  17. 17.
    Decherchi S, Gastaldo P, Leoncini A, Zunino R. Efficient digital implementation of extreme learning machines for classification. IEEE Trans Circuits Syst II. 2012;59(8):496–500.CrossRefGoogle Scholar
  18. 18.
    Gastaldo P, Zunino R, Cambria E, Decherchi S. Combining ELMs with random projections. IEEE Intell Syst. 2013;28(6):46–8.Google Scholar
  19. 19.
    Lin J, Yin J, Cai Z, Liu Q, Li K, Leung VC. A secure and practical mechanism for outsourcing ELMs in cloud computing. IEEE Intell Syst. 2013;28(6):35–8.Google Scholar
  20. 20.
    Akusok A, Lendasse A, Corona F, Nian R, Miche Y. ELMVIS: a nonlinear visualization technique using random permutations and ELMs. IEEE Intell Syst. 2013;28(6):41–6.Google Scholar
  21. 21.
    Fletcher R. Practical methods of optimization: volume 2 constrained optimization.  New York:Wiley; 1981.Google Scholar
  22. 22.
    Werbos PJ. Beyond regression: New tools for prediction and analysis in the behavioral sciences. Ph.D. thesis, Harvord University; 1974.Google Scholar
  23. 23.
    Rumelhart DE, Hinton GE, Williams RJ. Learning internal representations by error propagation. In: Rumelhart DE, McClelland JL, editors. Parallel distributed processing: explorations in the microstructures of cognition, vol: foundations. Cambridge, MA: MIT Press; 1986. p. 318–62.Google Scholar
  24. 24.
    Rumelhart DE, Hinton GE, Williams RJ. Learning representations by back-propagation errors. Nature. 1986;323:533–6.CrossRefGoogle Scholar
  25. 25.
    Werbos PJ. The roots of backpropagation : from ordered derivatives to neural networks and political forecasting. New York:Wiley; 1994.Google Scholar
  26. 26.
    Huang G-B, Chen L. Enhanced random search based incremental extreme learning machine. Neurocomputing. 2008;71:3460–8.CrossRefGoogle Scholar
  27. 27.
    Sosulski DL, Bloom ML, Cutforth T, Axel R, Datta SR. Distinct representations of olfactory information in different cortical centres. Nature. 2011;472:213–6.PubMedCentralPubMedCrossRefGoogle Scholar
  28. 28.
    Eliasmith C, Stewart TC, Choo X, Bekolay T, DeWolf T, Tang Y, Rasmussen D. A large-scale model of the functioning brain. Science. 2012;338:1202–5.PubMedCrossRefGoogle Scholar
  29. 29.
    Barak O, Rigotti M, Fusi S. The sparseness of mixed selectivity neurons controls the generalization–discrimination trade-off. J Neurosci. 2013;33(9):3844–56.PubMedCrossRefGoogle Scholar
  30. 30.
    Rigotti M, Barak O, Warden MR, Wang X-J, Daw ND, Miller EK, Fusi S. The importance of mixed selectivity in complex cognitive tasks. Nature. 2013;497:585–90.PubMedCrossRefGoogle Scholar
  31. 31.
    Igelnik B, Pao Y-H. Stochastic choice of basis functions in adaptive function approximation and the functional-link net. IEEE Trans Neural Netw. 1995;6(6):1320–9.PubMedCrossRefGoogle Scholar
  32. 32.
    Huang G-B, Zhou H, Ding X, Zhang R. Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B. 2012;42(2):513–29.CrossRefGoogle Scholar
  33. 33.
    Rahimi A, Recht B. Uniform approximation of functions with random bases. In: Proceedings of the 2008 46th annual allerton conference on communication, control, and computing, p. 555–561, 23–26 Sept 2008.Google Scholar
  34. 34.
    Huang G-B, Zhu Q-Y, Mao KZ, Siew C-K, Saratchandran P, Sundararajan N. Can threshold networks be trained directly? IEEE Trans Circuits Syst II. 2006;53(3):187–91.CrossRefGoogle Scholar
  35. 35.
    Bartlett PL. The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Trans Inform Theory. 1998;44(2):525–36.CrossRefGoogle Scholar
  36. 36.
    Rosenblatt F. The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev. 1958;65(6):386–408.PubMedCrossRefGoogle Scholar
  37. 37.
    Rosenblatt F. Principles of Neurodynamics: perceptrons and the theory of brain mechanisms. New York:Spartan Books; 1962.Google Scholar
  38. 38.
    Block HD. The perceptron: a model for brain function. I. Rev Modern Phys. 1962;34(1):123–35.CrossRefGoogle Scholar
  39. 39.
    Block HD, Knight JBW, Rosenblatt F. Analysis of a four-layer series-coupled perceptron. II. Rev Modern Phys. 1962;34(1):135–42.CrossRefGoogle Scholar
  40. 40.
    Schmidt WF, Kraaijveld MA, Duin RP. Feed forward neural networks with random weights. In: Proceedings of 11th IAPR international conference on pattern recognition methodology and systems, (Hague, Netherlands); 1992. p. 1–4.Google Scholar
  41. 41.
    White H. An additional hidden unit test for neglected nonlinearity in multilayer feedforward networks. In: Proceedings of the international conference on neural networks. 1989. p. 451–455.Google Scholar
  42. 42.
    White H. Approxiate nonlinear forecasting methods. In: Elliott G, Granger CWJ, Timmermann A, editors. Handbook of economics forecasting. New York: Elsevier; 2006. p. 460–512.Google Scholar
  43. 43.
    Loone SM, Irwin GW. Improving neural network training solutions using regularisation. Neurocomputing. 2001;37:71–90.CrossRefGoogle Scholar
  44. 44.
    Serre D. Matrices: theory and applications.  New York:Springer; 2002.Google Scholar
  45. 45.
    Rao CR, Mitra SK. Generalized Inverse of matrices and its applications. New York:Wiley; 1971.Google Scholar
  46. 46.
    Fernández-Delgado M, Cernadas E, Barro S, Ribeiro J, Nevesb J. Direct kernel perceptron (DKP): Ultra-fast kernel elm-based classification with non-iterative closed-form weight calculation. Neural Netw. 2014;50(1):60–71.PubMedCrossRefGoogle Scholar
  47. 47.
    Widrow B, Greenblatt A, Kim Y, Park D. The no-prop algorithm: A new learning algorithm for multilayer neural networks. Neural Netw. 2013;37:182–8.PubMedCrossRefGoogle Scholar
  48. 48.
    Toms DJ. Training binary node feedforward neural networks by backpropagation of error. Electron Lett. 1990;26(21):1745–6.CrossRefGoogle Scholar
  49. 49.
    Corwin EM, Logar AM, Oldham WJB. An iterative method for training multilayer networks with threshold function. IEEE Trans Neural Netw. 1994;5(3):507–8.PubMedCrossRefGoogle Scholar
  50. 50.
    Goodman RM, Zeng Z. A learning algorithm for multi-layer perceptrons with hard-limiting threshold units. In: Proceedings of the 1994 IEEE workshop of neural networks for signal processing. 1994. p. 219–228.Google Scholar
  51. 51.
    Plagianakos VP, Magoulas GD, Nousis NK, Vrahatis MN. Training multilayer networks with discrete activation functions. In: Proceedings of the IEEE international joint conference on neural networks (IJCNN’2001), Washington D.C., U.S.A.; 2001.Google Scholar
  52. 52.
    Huang G-B, Ding X, Zhou H. Optimization method based extreme learning machine for classification. Neurocomputing. 2010;74:155–63.CrossRefGoogle Scholar
  53. 53.
    Bai Z, Huang G-B, Wang D, Wang H, Westover MB. Sparse extreme learning machine for classification. IEEE Trans Cybern. 2014. doi:10.1109/TCYB.2014.2298235.
  54. 54.
    Pao Y-H, Park G-H, Sobajic DJ. Learning and generalization characteristics of the random vector functional-link net. Neurocomputing. 1994;6:163–80.CrossRefGoogle Scholar
  55. 55.
    Huang G, Song S, Gupta JND, Wu C. Semi-supervised and unsupervised extreme learning machines. IEEE Trans Cybern. 2014. doi:10.1109/TCYB.2014.2307349.
  56. 56.
    Huang G-B, Li M-B, Chen L, Siew C-K. Incremental extreme learning machine with fully complex hidden nodes. Neurocomputing. 2008;71:576–83.CrossRefGoogle Scholar
  57. 57.
    Lee T-H, White H, Granger CWJ. Testing for neglected nonlinearity in time series modes: a comparison of neural network methods and standard tests. J Econ. 1993;56:269–90.CrossRefGoogle Scholar
  58. 58.
    Stinchcombe MB, White H. Consistent specification testing with nuisance parameters present only under the alternative. Econ Theory. 1998;14:295–324.CrossRefGoogle Scholar
  59. 59.
    Baum E. On the capabilities of multilayer perceptrons. J Complexity. 1988;4:193–215.CrossRefGoogle Scholar
  60. 60.
    Le Q, Sarlós T, Smola A. Fastfood approximating kernel expansions in loglinear time. In: Proceedings of the 30th international conference on machine learning, (Atlanta, USA), 16–21 June 2013.Google Scholar
  61. 61.
    Huang P-S, Deng L, Hasegawa-Johnson M, He X. Random features for kernel deep convex network. In: Proceedings of the 38th international conference on acoustics, speech, and signal processing (ICASSP 2013), Vancouver, Canada, 26–31 May 2013.Google Scholar
  62. 62.
    Lin J, Yin J, Cai Z, Liu Q, Li K, Leung VC. A secure and practical mechanism for outsourcing elms in cloud computing. IEEE Intell Syst. 2013;28(6):7–10.Google Scholar
  63. 63.
    Rahimi A, Recht B. Random features for large-scale kernel machines. In: Proceedings of the 2007 neural information processing systems (NIPS2007), 3–6 Dec 2007. p. 1177–1184.Google Scholar
  64. 64.
    Kasun LLC, Zhou H, Huang G-B, Vong CM. Representational learning with extreme learning machine for big data. IEEE Intell Syst 2013;28(6):31–4.Google Scholar
  65. 65.
    Fung G, Mangasarian OL. Proximal support vector machine classifiers. In: International conference on knowledge discovery and data mining, San Francisco, California, USA, 2001. p. 77–86.Google Scholar
  66. 66.
    Daubechies I. Orthonormal bases of compactly supported wavelets. Commun Pure Appl Math. 1988;41:909–96.CrossRefGoogle Scholar
  67. 67.
    Daubechies I. The wavelet transform, time-frequency localization and signal analysis. IEEE Trans Inform Theory. 1990;36(5):961–1005.CrossRefGoogle Scholar
  68. 68.
    Suykens JAK, Gestel TV, Brabanter JD, Moor BD, Vandewalle J. Least squares support vector machines. Singapore: World Scientific; 2002.CrossRefGoogle Scholar
  69. 69.
    Poggio T, Mukherjee S, Rifkin R, Rakhlin A, Verri A. “b,” (A.I. Memo No. 2001–011, CBCL Memo 198, Artificial Intelligence Laboratory, Massachusetts Institute of Technology), 2001.Google Scholar
  70. 70.
    Steinwart I, Hush D, Scovel C. Training SVMs without offset. J Mach Learn Res .2011;12(1):141–202.Google Scholar
  71. 71.
    Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogonal problems. Technometrics. 1970;12(1):55–67.CrossRefGoogle Scholar
  72. 72.
    Kaski S. Dimensionality reduction by random mapping: fast similarity computation for clustering. In: Proceedings of the 1998 IEEE international joint conference on neural networks, Anchorage, USA, 4–9 May 1998.Google Scholar
  73. 73.
    Pearson K. On lines and planes of closest fit to systems of points in space. Philos Mag. 1901;2:559–72.CrossRefGoogle Scholar
  74. 74.
    von Neumann J. The general and logical theory of automata. In: Jeffress LA, editor. Cerebral mechanisms in behavior. New York: Wiley; 1951. p. 1–41. 1951.Google Scholar
  75. 75.
    von Neumann J. Probabilistic logics and the synthesis of reliable organisms from unreliable components. In: Shannon CE, McCarthy J, editors. Automata studies. Princeton: Princeton University Press; 1956. p. 43–98.Google Scholar
  76. 76.
    Minhas R, Baradarani A, Seifzadeh S, Wu QMJ. Human action recognition using extreme learning machine based on visual vocabularies. Neurocomputing. 2010;73:1906–17.CrossRefGoogle Scholar
  77. 77.
    Wang J, Kumar S, Chang S-F. Semi-supervised hashing for large-scale search. IEEE Trans Pattern Anal Mach Intell. 2012;34(12):2393–406.PubMedCrossRefGoogle Scholar
  78. 78.
    He Q, Jin X, Du C, Zhuang F, Shi Z. Clustering in extreme learning machine feature space. Neurocomputing. 2014;128:88–95.CrossRefGoogle Scholar
  79. 79.
    Jarrett K, Kavukcuoglu K, Ranzato M, LeCun Y. What is the best multi-stage architecture for object recognition. In: Proceedings of the 2009 IEEE 12th international conference on computer vision, Kyoto, Japan, 29 Sept–2 Oct 2009.Google Scholar
  80. 80.
    Pinto N, Doukhan D, DiCarlo JJ, Cox DD. A high-throughput screening approach to discovering good forms of biologically inspired visual representation. PLoS Comput Biol. 2009;5(11):1–12.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.School of Electrical and Electronic EngineeringNanyang Technological UniversitySingaporeSingapore

Personalised recommendations