Artificial Intelligence Review

, Volume 49, Issue 2, pp 281–299 | Cite as

A review of adaptive online learning for artificial neural networks

  • Beatriz Pérez-SánchezEmail author
  • Oscar Fontenla-Romero
  • Bertha Guijarro-Berdiñas


In real applications learning algorithms have to address several issues such as, huge amount of data, samples which arrive continuously and underlying data generation processes that evolve over time. Classical learning is not always appropriate to work in these environments since independent and indentically distributed data are assumed. Taking into account the requirements of the learning process, systems should be able to modify both their structures and their parameters. In this survey, our aim is to review the developed methodologies for adaptive learning with artificial neural networks, analyzing the strategies that have been traditionally applied over the years. We focus on sequential learning, the handling of the concept drift problem and the determination of the network structure. Despite the research in this field, there are currently no standard methods to deal with these environments and diverse issues remain an open problem.


Artificial neural networks Online learning Concept drift Adaptive topology 



The authors would like to thank support for this work from the Xunta de Galicia (Grant code GRC2014/035) and the Secretaría de Estado de Investigación of the Spanish Government (Grant code TIN2015-65069), all partially supported by the European Union ERDF funds.


  1. Alippi C, Roveri M (2008) Just-in-time adaptive classifiers—part II: designing the classifier. IEEE Trans Neural Netw 19(12):2053–2064CrossRefGoogle Scholar
  2. Alippi C, Boracchi G, Roveri M (2011) A just-in-time adaptive classification systems based on the intersection of confidence intervals rule. Neural Netw 24(8):791–800CrossRefGoogle Scholar
  3. Alippi C, Boracchi G, Roveri M (2012) Just-in-time ensemble of classifiers. In: Proceedings of international joint conference on neural networks (IJCNN’12), pp 1–8Google Scholar
  4. Alippi C, Boracchi G, Roveri M (2013) Just-in-time classifiers for recurrent concepts. IEEE Trans Neural Netw Learn Syst 24(4):620–634CrossRefGoogle Scholar
  5. Augasta MG, Kathirvalavakumar T (2011) A novel pruning algorithm for optimizing feedforward neural network of classification problems. Neural Process Lett 34:241–258CrossRefGoogle Scholar
  6. Augasta MG, Kathirvalavakumar T (2013) Pruning algorithms of neural networks a comparative study. Cent Eur J Comp Sci 3(3):105–115Google Scholar
  7. Bauer F, Lukas MA (2011) Comparing parameter choice methods for regularization of ill-posed problems. Math Comput Simul 81:1795–1841MathSciNetzbMATHCrossRefGoogle Scholar
  8. Baum EB, Haussler D (1989) What size net gives valid generalization? Neural Comput 1:151–160CrossRefGoogle Scholar
  9. Beale EM (1972) A derivation of conjugate gradients, numerical methods for nonlinear optimization. Academic Press, New YorkzbMATHGoogle Scholar
  10. Bertini Junior JR, Nicoletti MC (2016) Enhancing constructive neural networks performance using functionally expanded input data. J Artif Intell Soft Comput Res 6(2):119–131CrossRefGoogle Scholar
  11. Bifet A, Gavalda R (2006) Kalman filters and adaptive windows for learning in data streams. In: Proceedings of international conference discovery science, pp 29–40Google Scholar
  12. Bifet A, Gavalda R (2007) Learning from time-changing data with adaptive windowing. In: Proceddings of SIAM international conference on data mining (SDM 2007)Google Scholar
  13. Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, OxfordzbMATHGoogle Scholar
  14. Bondarenko A, Borisov A, Aleksejeva L (2015) Neurons vs weights pruning in artificial neural networks. In: Proceedings of the 10th international scientific and practical conference, vol III, pp 22–28Google Scholar
  15. Bottou L (2004) Stochastic learning. Adv Lect Mach Learn Lect Notes Artif Intell 3176:146–168zbMATHCrossRefGoogle Scholar
  16. Bouchachia A (2011) Incremental learning with multi-level adaptation. Neurocomputing 74(11):1785–1799CrossRefGoogle Scholar
  17. Bouchachia A, Gabrys B, Sahel Z (2007) Overview of some incremental learning algorithms. In: Proceedings of the IEEE international conference on fuzzy systems, pp 1–6Google Scholar
  18. Brzezinski D, Stephanowski J (2014) Reacting to different types of concept drift: the accuracy updated ensemble algorithm. IEEE Trans Neural Netw Learn Syst 25(1):81–94CrossRefGoogle Scholar
  19. Camargo LS, Yoneyama T (2001) Specification of training sets and the number of hidden neurons for multilayer perceptrons. Neural Comput 13(12):2673–2680zbMATHCrossRefGoogle Scholar
  20. Chentouf R, Jutten C (1996) DWINA: depth and width incremental neural algorithm. In: Proceedings of the IEEE international conference on neural networks, pp 153–158Google Scholar
  21. Cun YL, Denker JS, Solla SA (1990) Optimal brain damage. Adv Neural Inf Process 2:598–605Google Scholar
  22. de Jesus Rubio J, Perez-Cruz H (2014) Evolving intelligent system for the modelling of nonlinear systems with dead-zone input. Appl Soft Comput 14(Part B):289–304CrossRefGoogle Scholar
  23. Ditzler G, Rosen G, Polikar R (2013) Discounted expert weighting for concept drift. In: IEEE symposium on computational intelligence in dynamic and uncertain environments (CIDUE’13), pp 61–67Google Scholar
  24. Ditzler G, Rosen G, Polikar R (2014) Domain adaptation bounds for multiple expert systems under concept drift. In: International joint conference on neural networks (IJCNN’14), pp 595–601Google Scholar
  25. Ditzler G, Roveri M, Alippi C, Polikar R (2015) Learning in nonstationary environments: a survey. IEEE Comput Intell Mag 10(4):12–25CrossRefGoogle Scholar
  26. Egrioglu E, Aladag CH, Gunay S (2008) A new model selection strategy in artificial neural networks. Appl Math Comput 195:591–597MathSciNetzbMATHGoogle Scholar
  27. Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Netw 22(10):1517–1531CrossRefGoogle Scholar
  28. Engel Y, Mannor S, Meir R (2004) The kernel recursive least-squares algorithm. IEEE Trans Signal Process 52(8):2275–2285MathSciNetzbMATHCrossRefGoogle Scholar
  29. Esposito F, Ferilli S, Fanizzi N, Basile T, Mauro MD (2004) Incremental learning and concept drift in INTHELEX. Intell Data Anal 8(3):213–237Google Scholar
  30. Fan Q, Zurada JM, Wu W (2014) Convergence of online gradient method for feedforward neural networks with smoothing \(l_{1/2}\) regularization penalty. Neural Netw 50:72–78zbMATHCrossRefGoogle Scholar
  31. Fritzke B (1994) Growing cell structures a self-organizing network for unsupervised and supervised learning. Neural Netw 7(9):1441–1460CrossRefGoogle Scholar
  32. Gama J (2010) Knowledge discovery from data streams. Chapman and Hall/CRC, Boca RatonzbMATHCrossRefGoogle Scholar
  33. Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: Proceedings in adavances artificial intelligence (SBIA 2004), pp 586–295Google Scholar
  34. Gama J, Sebastiao R, Pereira Rodrigues P (2013) On evaluating stream learning algorithms. Mach Learn 90(3):317–346MathSciNetzbMATHCrossRefGoogle Scholar
  35. Gama J, Žliobaitė I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv 46(4):44:1–44:37zbMATHCrossRefGoogle Scholar
  36. García-Pedrajas N, Ortiz-Boyer D (2007) A cooperative constructive method for neural networks for pattern recognition. Pattern Recognit 40(1):80–98zbMATHCrossRefGoogle Scholar
  37. Ghazikhani A, Monsefi R, Sadoghi Yazdi H (2014) Online neural network model for non-stationary and imbalanced data stream classification. Int J Mach Learn Cybernet 5(1):51–62CrossRefGoogle Scholar
  38. Goodwin GC, Sin KS (1984) Adaptive filtering, prediction and control. Prentice-Hall, Englewood CliffszbMATHGoogle Scholar
  39. Gregorcic G, Lightbody G (2007) Local model network identification with gaussian processes. IEEE Trans Neural Netw 18:1404–1423CrossRefGoogle Scholar
  40. Grossberg S (1987) Competitive learning: from interactive activation to adaptive resonance. Cogn Sci 11(1):23–63CrossRefGoogle Scholar
  41. Hagan MT, Menhaj M (1994) Training feedforward networks with the marquardt algorithm. IEEE Trans Neural Netw 5(6):989–993CrossRefGoogle Scholar
  42. Han H-G, Qiao J-F (2013) A structure optimisation algorithm for feedforward neural network construction. Neurocomputing 99:347–357CrossRefGoogle Scholar
  43. Hassibi B, Stork DG (1993) Second-order derivatives for network pruning: optimal brain surgeon. Adv Neural Inf Process Syst 5:164–171Google Scholar
  44. Haykin S (1999) Neural networks: a comprehensive foundation. Prentice Hall, New JerseyzbMATHGoogle Scholar
  45. He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284CrossRefGoogle Scholar
  46. Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366CrossRefGoogle Scholar
  47. Hsu CF (2008) Adaptive growing-and-pruning neural network control for a linear piezoelectric ceramic motor. Eng Appl Artif Intell 21(8):1153–1163CrossRefGoogle Scholar
  48. Huang G-B, Chen L (2008) Enhanced random search based incremental extreme learning machine. Neurocomputing 71(16–18):3460–3468CrossRefGoogle Scholar
  49. Huang DS, Du JX (2008) A constructive hybrid structure optimization methodology for radial basis probabilistic neural networks. IEEE Trans Neural Netw 19(12):2099–2115CrossRefGoogle Scholar
  50. Huang GB, Saratchandran P, Sundararajan N (2005) Generalised growing and pruning RBF (GGAP-RBF) neural network for function approximation. IEEE Trans Neural Netw 16(1):57–67CrossRefGoogle Scholar
  51. Huang G-B, Zhu Q-Y, Siew C-K (2006) Extreme learning machine: theorey and applications. NeuroComputing 70:489–501CrossRefGoogle Scholar
  52. Islam MM, Sattar MA, Amin MF, Yao X, Murase K (2009) A new adaptive merging and growing algorithm for designing artificial neural networks. IEEE Trans Syst Man Cybern 39(3):705–722CrossRefGoogle Scholar
  53. Jain LC, Seera M, Lim CP, Balasubramaniam P (2014) A review of online learning in supervised neural networks. Neural Comput Appl 25:491–509CrossRefGoogle Scholar
  54. Klinkenberg R (2004) Learning drifting concepts: example selection vs. example weighting. Intell Data Anal 8(3):281–300Google Scholar
  55. Krempl G, Žliobaitė I, Brzeziński D, Hüllermeier E, Last M, Lemaire V, Noack T, Shaker A, Sievi S, Spiliopoulou M, Stefanowski J (2014) Open challenges for data stream mining research. SIGKDD Explor 16(1):1–10CrossRefGoogle Scholar
  56. Kubat M, Gamma J, Utgoff P (2004) Incremental learning and concept drift, editor’s introduction: guest-editorial. Intell Data Anal 8(3):211–212Google Scholar
  57. Kuncheva L, Žliobaitė I (2009) On the window size for classification in changing environments. Intell Data Anal 13(6):861–872Google Scholar
  58. Kwok T-Y, Yeung D-Y (1997) Constructive algorihtms for structure learning in feedforward neural networks for regresion problems. IEEE Trans Neural Netw 8(3):630–645CrossRefGoogle Scholar
  59. Lauret P, Fock E, Mara TA (2006) A node pruning algorithm based on a fourier amplitude sensitivity test method. IEEE Trans Neural Netw 17(2):273–293CrossRefGoogle Scholar
  60. LeCunn Y, Bottou L, Orr G, Müller K-R (1998) Efficient backprop. Neural Netw Tricks Trade 1524:9–50CrossRefGoogle Scholar
  61. Levenberg K (1944) A method for the solution of certain non-linear problems in least squares. Q J Appl Math 2(2):164–168MathSciNetzbMATHCrossRefGoogle Scholar
  62. Liang N-Y, Huang G-B (2006) A fast and accurate online sequential learning algorithm for feedforward networks. IEEE Trans Neural Netw 17(6):1411–1423CrossRefGoogle Scholar
  63. Liu Y, Starzyk A, Zhu Z (2007) Optimizing number of hidden neurons in neural networks. In: Proceedings of the artificial intelligence and applications (AIAP’07), pp 121–126Google Scholar
  64. Liu W, Pokharel PP, Principe JC (2008a) The kernel least-mean-square algorithm. IEEE Trans Signal Process 56(2):543–554MathSciNetCrossRefGoogle Scholar
  65. Liu Y, Starzyk A, Zhu Z (2008b) Optimized approximation algorithm in neural networks without overfitting. IEEE Trans Neural Netw 19(6):983–995CrossRefGoogle Scholar
  66. Liu W, Park I, Principe JC (2009) Extended kernel recursive least squares algorithm. IEEE Trans Signal Process 57(10):3801–3814MathSciNetCrossRefGoogle Scholar
  67. Ma L, Khorasani K (2003) A new strategy for adaptively constructing multilayer feedforward neural networks. Neurocomputing 51:361–385CrossRefGoogle Scholar
  68. Marquardt DW (1963) An algorithm for least-squares estimation of non-linear parameters. J Soc Ind Appl Math 11(2):431–441zbMATHCrossRefGoogle Scholar
  69. Marques Silva A, Caminhasa W, Lemosa A, Gomide F (2014) A fast learning algorithm for evolving neo-fuzzy neuron. Appl Soft Comput 14(B):194–209CrossRefGoogle Scholar
  70. Martínez-Rego D, Pérez-Sánchez B, Fontenla-Romero O, Alonso-Betanzos A (2011) A robust incremental learning method for non-stationary environments. Neurocomputing 74:1800–1808CrossRefGoogle Scholar
  71. Minku LL, White AP, Yao X (2010) The impact of diversity on on-line ensemble learning in the presence of concept drift. IEEE Trans Knowl Data Eng 22:730–742CrossRefGoogle Scholar
  72. Minku L, Yao X (2012) Ddd: a new ensemble approach for dealing with concept drift. IEEE Trans Knowl Data Eng 24(4):619–633CrossRefGoogle Scholar
  73. Moller M (1993) Supervised learning on large redundant training sets. Int J Neural Syst 4(1):15–25CrossRefGoogle Scholar
  74. Nagumo J, Noda A (1967) A learning method for system identification. IEEE Trans Autom Control 12:283–287CrossRefGoogle Scholar
  75. Narasimha PL, Delashmit WH, Manry MT, Li J, Maldonado F (2008) An integrated growing-pruning method for feedforward network training. Neurocomputing 71(13–15):2831–2847CrossRefGoogle Scholar
  76. Ortega-Zamorano F, Jerez J, Urda D, Luque-Baena R, Franco L (2014) Fpga implementation of the C-MANTEC neural networks constructive algorithm. IEEE Trans Ind Inf 10(2):1154–1161CrossRefGoogle Scholar
  77. Ortega-Zamorano F, Jerez J, Jurez G, Franco L (2015) Fpga implmentation comparison between c-mantec and back propagation. In: International workshop on artificial neural networks (IWANN 2015), vol Part II of LNCS, pp 197–208Google Scholar
  78. Peng H, Mou L, Li G, Chen Y, Lu Y, Jin Z (2015) A comparative study on regularization strategies for embedding-based neural networks. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP 2015), pp 2106–2111Google Scholar
  79. Pérez-Sánchez B, Fontenla-Romero O, Guijarro-Berdiñas B, Martínez-Rego D (2013) An online learning algorithm for adaptable topologies of neural networks. Expert Syst Appl 40:7294–7304CrossRefGoogle Scholar
  80. Pérez-Sánchez B, Fontenla-Romero O, Guijarro-Berdiñas B (2014) Self-adaptive topology neural network for online incremental learning. In: Proceedings of the international conference on agents and artificial intelligence (ICAART’14), pp 94–101Google Scholar
  81. Pérez-Sánchez B, Fontenla-Romero O, Guijarro-Berdiñas B (2015) Adaptive neural topology based on Vapnik–Chervonenkis dimension. In: Lecture Notes in Artificial Intelligence (in press)Google Scholar
  82. Plavidis NG, Tasoulis DK, Adams NM, Hand DJ (2011) Landa perceptron: an adaptive classifier for data streams. Pattern Recogn 44(1):78–96zbMATHCrossRefGoogle Scholar
  83. Qiao JF, Han HG (2010) A repair algorithm for RBF neural network and its application to chemical oxygen demand modeling. Int J Neural Syst 20(1):63–74CrossRefGoogle Scholar
  84. Qiao J, Zhang Z, Bo Y (2014) An online self-adaptive modular neural network for time-varying systems. Neurocomputing 125:7–16CrossRefGoogle Scholar
  85. Qiao J, Li F, Han H, Li W (2016) Constructive algorithm for fully connected cascade feedforward neural networks. Neurocomputing 182:154–164CrossRefGoogle Scholar
  86. Qi M, Zhang GP (2001) An investigation of model selection criteria for neural network time series forecasting. Eur J Oper Res 132:666–680zbMATHCrossRefGoogle Scholar
  87. Reitermanová Z (2008) Feedforward neural networks architecture optimization and knowledge extraction. In: Proceedings of week of doctoral students (WDS 2008), vol Part I, pp 159–164Google Scholar
  88. Robins A (2004) Sequential learning in neural networks: a review and a discussion of pseudorehearsal based methods. Intell Data Anal 8(3):301–322Google Scholar
  89. Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65(6):386–408CrossRefGoogle Scholar
  90. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations of back-propagation errors. Nature 323:533–536zbMATHCrossRefGoogle Scholar
  91. Scarselli F, Tsoi AC (1998) Universal approximation using feedforward neural networks a surver of some existing methods and some new results. Neural Netw 11(1):15–37CrossRefGoogle Scholar
  92. Shao HM, Zheng GF (2011) Boundedness and convergence of online gradient method with penalty and momentum. Neurocomputing 74:765–770CrossRefGoogle Scholar
  93. Sharma SK, Chandra P (2010) Constructive neural networks: a review. Int J Eng Sci Technol 2(12):7847–7855Google Scholar
  94. Subirats JL, Franco L, Jerez JM (2012) C-mantec: a novel constructive neural network algorithm incorporating competition between neurons. Neural Netw 26:131–140CrossRefGoogle Scholar
  95. Teoh EJ, Tan KC, Xiang C (2006) Estimating the number of hidden neurons in a feedforward network using the singular value decomposition. IEEE Trans Neural Netw 17(6):1623–1629CrossRefGoogle Scholar
  96. Thomas P, Suhner MC (2015) A new multilayer perceptron pruning algorithm for classification and regression applications. Neural Process Lett 42(2):437–458CrossRefGoogle Scholar
  97. Vapnik V (1998) Statistical learning theory. Wiley, New YorkzbMATHGoogle Scholar
  98. Wang C, Hill DJ (2006) Learning from neural control. IEEE Trans Neural Netw 17(1):30–46CrossRefGoogle Scholar
  99. Wang J, Yang G, Liu S, Zurada JM (2015a) Convergence analysis of multilayer feedforward networks trained with penalty terms: a review. J Appl Comput Sci Methods 7(2):89–103CrossRefGoogle Scholar
  100. Wang J-H, Wang H-Y, Chen Y-L, Liu C-M (2015b) A constructive algorithm for unsupervised learning with incremental neural network. J Appl Res Technol 13:188–196CrossRefGoogle Scholar
  101. Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23:69–101Google Scholar
  102. Widrow E, Hoff ME (1960) Adaptive switching circuits. In: Proceedings of IRE WESCON convention, pp 96–104Google Scholar
  103. Wu W, Fan QW, Zurada JM, Wang J, Yang DK, Liu Y (2014) Batch gradient method with smoothing regularization for training of feedforward neural networks. Neural Netw 50:72–78zbMATHCrossRefGoogle Scholar
  104. Xu J, Ho DWC (2006) A new training and pruning algorithm based on node dependence and jacobian rank deficiency. Neurocomputing 70(1–3):544–558CrossRefGoogle Scholar
  105. Yamakawa T, Uchino E, Miki T, Kusabagi H (1992) A neofuzzy neuron and its applications to system identification and predictions to system behavior. Proc Int Conf Fuzzy Logic Neural Netw 1:477–484Google Scholar
  106. Ye Y, Squartini S, Piazza F (2013) Online sequential extreme learning machine in nonstationary environments. Neurocomputing 116:94–101CrossRefGoogle Scholar
  107. Yoan M, Sorjamaa A, Bas P, Simula O, Jutten C, Lendasse A (2010) OP-ELM: optimally pruned extreme learning machine. IEEE Trans Neural Netw 21(1):158–162CrossRefGoogle Scholar
  108. Yu X, Chen QF (2012) Convergence of gradient method with penalty for ridge polynomial neural network. Neurocomputing 97:405–409CrossRefGoogle Scholar
  109. Zeng W, Wang C (2015) Classification of neurodegenerative diseases using gait dynamics via deterministic learning. Inf Sci 317(C):246–258CrossRefGoogle Scholar
  110. Zeng W, Wang C, Yang F (2014) Silhouette-based gait recognition via deterministic learning. Pattern Recogn 47(11):3568–3584CrossRefGoogle Scholar
  111. Zeng W, Wang Q, Liu F, Wang Y (2016) Learning from adaptive neural network output feedback control of a unicycle-type mobile robot. ISA Trans 61:337–347CrossRefGoogle Scholar
  112. Zhang HS, Wu W, Liu F, Yao MC (2009) Boundedness and convergence of online gadient method with penalty for feedforward neural networks. IEEE Trans Neural Netw 20(6):1050–1054CrossRefGoogle Scholar
  113. Zhang R, Lan Y, Huang GB, Xu ZB (2012) Universal approximation of extreme learning machine with adaptive growth of hidden nodes. IEEE Trans Neural Netw Learn Syst 23(2):365–371CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2016

Authors and Affiliations

  • Beatriz Pérez-Sánchez
    • 1
    Email author
  • Oscar Fontenla-Romero
    • 1
  • Bertha Guijarro-Berdiñas
    • 1
  1. 1.Department of Computer Science, Faculty of InformaticsUniversity of A CoruñaA CoruñaSpain

Personalised recommendations