Memetic Computing

, 1:241 | Cite as

Architecture for development of adaptive on-line prediction models

Special Issue - Regular Research Paper

Abstract

This work presents an architecture for the development of on-line prediction models. The architecture defines unified modular environment based on three concepts from machine learning, these are: (i) ensemble methods, (ii) local learning, and (iii) meta learning. The three concepts are organised in a three layer hierarchy within the architecture. For the actual prediction making any data-driven predictive method such as artificial neural network, support vector machines, etc. can be implemented and plugged in. In addition to the predictive methods, data pre-processing methods can also be implemented as plug-ins. Models developed according to the architecture can be trained and operated in different modes. With regard to the training, the architecture supports the building of initial models based on a batch of training data, but if this data is not available the models can also be trained in incremental mode. In a scenario where correct target values are (occasionally) available during the run-time, the architecture supports life-long learning by providing several adaptation mechanisms across the three hierarchical levels. In order to demonstrate its practicality, we show how the issues of current soft sensor development and maintenance can be effectively dealt with by using the architecture as a construction plan for the development of adaptive soft sensing algorithms.

Keywords

Adaptive systems Local learning Meta learning Ensemble methods Industrial applications Soft sensors Life-long learning 

References

  1. 1.
    Dote Y, Ovaska SJ (2001) Industrial applications of soft computing: a review. Proc IEEE 89(9): 1243–1265CrossRefGoogle Scholar
  2. 2.
    Wolpert DH (1992) Stacked generalization. Neural Netw 5(2): 241–259MathSciNetCrossRefGoogle Scholar
  3. 3.
    Perrone MP, Cooper LN (1993) When networks disagree: ensemble methods for hybrid neural networks. Neural Netw Speech Image Proc 126–142Google Scholar
  4. 4.
    Kuncheva LI (2004) Combining pattern classifiers: methods and algorithms. Wiley, New JerseyMATHCrossRefGoogle Scholar
  5. 5.
    Valentini G, Masulli F (2002) Ensembles of learning machines. In: 13th Italian workshop on neural nets, vol 2486, Lecture Notes in Computer Sciences. Springer, Berlin, pp 3–22Google Scholar
  6. 6.
    Krogh A, Vedelsby J (1995) Neural network ensembles, cross validation and active learning. Adv Neural Inf Proc Syst (7):231–238Google Scholar
  7. 7.
    Kittler J, Hatef M, Duin RPW, Matas J (1998) On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20(3): 226–239CrossRefGoogle Scholar
  8. 8.
    Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1): 119–139MathSciNetMATHCrossRefGoogle Scholar
  9. 9.
    Opitz D, Maclin R (1999) Popular ensemble methods: an empirical study. J Artif Intell Res 11: 169–198MATHGoogle Scholar
  10. 10.
    Bauer E, Kohavi RON (1999) An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach Learn 36: 105–139CrossRefGoogle Scholar
  11. 11.
    Ruta D, Gabrys B (2000) An overview of classifier fusion methods. Comput Inf Syst 7(1): 1–10Google Scholar
  12. 12.
    Ruta D, Gabrys B (2005) Classifier selection for majority voting. Inf Fusion 6(1): 63–81CrossRefGoogle Scholar
  13. 13.
    Gabrys B (2004) Learning hybrid neuro-fuzzy classiffer models from data: to combine or not to combine. Fuzzy Sets Syst 147: 39–56MathSciNetMATHCrossRefGoogle Scholar
  14. 14.
    Gabrys B, Ruta D (2006) Genetic algorithms in classifier fusion. Appl Soft Comput 6(4): 337–347CrossRefGoogle Scholar
  15. 15.
    Geman S, Bienenstock E, Doursat R (1992) Neural networks and the bias/variance dilemma. Neural Comput 4(1): 1–58CrossRefGoogle Scholar
  16. 16.
    Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, USAGoogle Scholar
  17. 17.
    Jacobs R (1997) Bias/variance analyses of mixtures-of-experts architectures. Neural Comput 9(2): 369–383MATHCrossRefGoogle Scholar
  18. 18.
    Chandra A, Yao X (2006) Evolving hybrid ensembles of learning machines for better generalisation. Neurocomputing 69(7–9): 686–700CrossRefGoogle Scholar
  19. 19.
    Poggio T, Girosi F (1990) Regularization algorithms for learning that are equivalent to multilayer networks. Science 247(4945): 978–982MathSciNetMATHCrossRefGoogle Scholar
  20. 20.
    Platt J (1991) A resource-allocating network for function interpolation. Neural Comput 3(2): 213–225MathSciNetCrossRefGoogle Scholar
  21. 21.
    Bottou L, Vapnik V (1992) Local learning algorithms. Neural Comput 4(6): 888–900CrossRefGoogle Scholar
  22. 22.
    Schaal S, Atkeson CG (1998) Constructive incremental learning from only local information. Neural Comput 10(8): 2047–2084CrossRefGoogle Scholar
  23. 23.
    Atkeson CG, Moore AW, Schaal S (1997) Locally weighted learning. Artif Intell Rev 11(1): 11–73CrossRefGoogle Scholar
  24. 24.
    French R (1999) Catastrophic forgetting in connectionist networks: Causes, consequences and solutions. Trends Cogn Sci 3(4): 128–135MathSciNetCrossRefGoogle Scholar
  25. 25.
    Vijayakumar S, D’Souza A, Schaal S (2005) Incremental online learning in high dimensions. Neural Comput 17(12): 2602–2634MathSciNetCrossRefGoogle Scholar
  26. 26.
    Vilalta R, Drissi Y (2002) A perspective view and survey of meta-learning. Artif Intell Rev 18(2): 77–95CrossRefGoogle Scholar
  27. 27.
    Aha DW (1992) Generalizing from case studies: a case study. In: Proceedings of the ninth international conference on machine learning, pp 1–10Google Scholar
  28. 28.
    Pfahringer B, Bensusan H, Giraud-Carrier C (2000) Meta-learning by landmarking various learning algorithms. In: Proceedings of the Seventeenth international conference on machine learning, vol 951. Morgan Kaufmann, Menlo Park, pp 743–750Google Scholar
  29. 29.
    Kalousis A, Hilario M (2001) Model selection via meta-learning: A comparative study. Int J Artif Intell Tools 10(4): 525–554CrossRefGoogle Scholar
  30. 30.
    Peng Y, Flach PA, Soares C, Brazdil P (2002) Improved dataset characterisation for meta-learning. Lect Notes Comp Sci 2534: 141–152CrossRefGoogle Scholar
  31. 31.
    Wong RO (1995) Use, disuse, and growth of the brain. In: Proceedings of the National Academy of Sciences of the United States of America. vol 92, National Academy of Sciences, USA, pp 1797–1799Google Scholar
  32. 32.
    Kadlec P, Gabrys B, Strandt S (2009) Data-driven soft sensor in the process industry. Comput Chem Eng 33(4): 795–814CrossRefGoogle Scholar
  33. 33.
    Kadlec P, Gabrys B (2008) Soft sensor based on adaptive local learning. In: Coghill MK, Kasabov N, George (eds) Proceedings of the international conference on neural information processing, vol 5506, Lecture Notes in Computer Science. Auckland, New Zealand, Springer, Berlin, pp 1172–1179Google Scholar
  34. 34.
    Kasabov N (2001) Evolving fuzzy neural networks for supervised/unsupervised online knowledge-based learning. IEEE Trans Syst Man Cybern B 31(6): 902–918CrossRefGoogle Scholar
  35. 35.
    Kasabov N, Song Q (2002) Denfis: dynamic evolving neural-fuzzy inference system and its application for time-series prediction. IEEE Trans Fuzzy Syst 10(2): 144–154CrossRefGoogle Scholar
  36. 36.
    Angelov P, Filev DP (2004) Flexible models with evolving structure. Int J Intell Syst 19(4): 327–340MATHCrossRefGoogle Scholar
  37. 37.
    Angelov P, Kasabov N (2005) Evolving computational intelligence systems. In: IEEE workshop on genetic and fuzzy systems, Grenada, SpainGoogle Scholar
  38. 38.
    Brown G, Wyatt J, Harris R, Yao X (2005) Diversity creation methods: a survey and categorisation. Inf Fusion 6(1): 5–20CrossRefGoogle Scholar
  39. 39.
    Jacobs R (1991) Adaptive mixtures of local experts. Neural Comput 3(1): 79–87CrossRefGoogle Scholar
  40. 40.
    Ruta D, Gabrys B (2007) Neural network ensembles for time series prediction. In: International joint conference on neural networks 2007. IEEE Computer Society, Orlando, pp 1204–1209Google Scholar
  41. 41.
    Riedel S, Gabrys B (2007) Dynamic pooling for the combination of forecasts generated using multi level learning. In: International joint conference on neural networks 2007, IEEE Computer Society, pp 454–459Google Scholar
  42. 42.
    MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability 1(14): 281–297Google Scholar
  43. 43.
    Angelov P, Filev D (2005) Simplets: a simplified method for learning evolving takagi-sugeno fuzzy models. In: The 14th IEEE international conference on fuzzy systems, IEEE, pp 1068–1073Google Scholar
  44. 44.
    Angelov P, Zhou X (2006) Evolving fuzzy systems from data streams in real-time. In: International symposium on evolving fuzzy systems 2006, pp 29–35Google Scholar
  45. 45.
    Chung PJ, Bohme JF (2003) Recursive em algorithm with adaptive step size. In: Seventh international symposium on signal processing and its applications, vol 2, IEEE, pp 519–522Google Scholar
  46. 46.
    Gabrys B, Bargiela A (2000) General fuzzy min-max neural network for clustering and classification. IEEE Trans Neural Netw 11(3): 769–783CrossRefGoogle Scholar
  47. 47.
    Gabrys B, Petrakieva L (2004) Combining labelled and unlabelled data in the design of pattern classification systems. Int J Approx Reason 35(3): 251–273MathSciNetMATHCrossRefGoogle Scholar
  48. 48.
    Neal RM, Hinton GE (1999) A view of the em algorithm that justifies incremental, sparse, and other variants. In: Learning in graphical models, vol 89. MIT Press, Cambridge, pp 355–368Google Scholar
  49. 49.
    Zivkovic Z, van der Heijden F (2004) Recursive unsupervised learning of finite mixture models. IEEE Trans Pattern Anal Mach Intell 26(5): 651–656CrossRefGoogle Scholar
  50. 50.
    Bensusan H, Giraud-Carrier C, Kennedy C (2000) A higher-order approach to meta-learning. In: Proceedings of the ECML™ 2000 workshop on Meta-Learning, pp 109–117Google Scholar
  51. 51.
    Giraud-Carrier C (1998) Beyond predictive accuracy: what? In: Proceedings of the ECML-98 workshop on upgrading learning to meta-level, pp 78–85Google Scholar
  52. 52.
    Bouchachia A (2006) Incremental learning by decomposition. In: ICMLA ’06: Proceedings of the 5th international conference on machine learning and applications, IEEE Computer Society, pp 63–68Google Scholar
  53. 53.
    Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: Advances in artificial intelligence SBIA 2004: 17th Brazilian, vol 3171, pp 286–295Google Scholar
  54. 54.
    Maloof MA, Michalski RS (2000) Selecting examples for partial memory learning. Mach Learn 41(1): 27–52CrossRefGoogle Scholar
  55. 55.
    Klinkenberg R (2004) Learning drifting concepts: Example selection vs. example weighting. Intell Data Anal 8(3): 281–300Google Scholar
  56. 56.
    Koychev I (2000) Gradual forgetting for adaptation to concept drift. In: Proceedings of ECAI 2000 workshop current issues in spatio-temporal reasoning, pp 101–106Google Scholar
  57. 57.
    Croux C, Ruiz-Gazen A (2005) High breakdown estimators for principal components: the projection-pursuit approach revisited. J Multivar Anal 95(1): 206–226MathSciNetMATHCrossRefGoogle Scholar
  58. 58.
    Dobson AJ (2002) An introduction to generalized linear models. Chapman and Hall, LondonMATHGoogle Scholar
  59. 59.
    Vapnik VN (1998) Statistical learning theory. Wiley, New YorkMATHGoogle Scholar
  60. 60.
    Frank E, Hall M, Pfahringer B (2003) Locally weighted naive bayes. In: Proceedings of the conference on uncertainty in artificial intelligence, pp 249–256Google Scholar
  61. 61.
    Gosset WS (1908) The probable error of a mean. Biometrika 6(1): 1–25Google Scholar
  62. 62.
    Parzen E (1962) On estimation of a probability density function and mode. Ann Math Stat 33(3): 1065–1076MathSciNetMATHCrossRefGoogle Scholar
  63. 63.
    Klanke S, Vijayakumar S, Schaal S (2008) A library for locally weighted projection regression. J Mach Learn Res 9: 623–626MathSciNetMATHGoogle Scholar

Copyright information

© Springer-Verlag 2009

Authors and Affiliations

  1. 1.Smart Technology Research CentreBournemouth UniversityFern Barrow, PooleUK

Personalised recommendations