We present a tutorial survey on some recent approaches to unsupervised machine learning in the context of statistical pattern recognition. In statistical PR, there are two classical categories for unsupervised learning methods and models: first, variations of Principal Component Analysis and Factor Analysis, and second, learning vector coding or clustering methods. These are the starting-point in this article. The more recent trend in unsupervised learning is to consider this problem in the framework of probabilistic generative models. If it is possible to build and estimate a model that explains the data in terms of some latent variables, key insights may be obtained into the true nature and structure of the data. This approach is also reviewed, with examples such as linear and nonlinear independent component analysis and topological maps.


Independent Component Analysis Independent Component Analysis Blind Source Separation Neural Computation Statistical Pattern Recognition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Amari, S.-I., Cichocki, A., Yang, H.: A new learning algorithm for blind source separation. In: Advances in Neural Information Processing Systems 8, pp. 757–763. MIT Press, Cambridge (1996)Google Scholar
  2. 2.
    Attias, H.: Independent factor analysis. Neural Computation 11(4), 803–851 (1999)CrossRefGoogle Scholar
  3. 3.
    Baldi, P., Hornik, K.: Learning in linear neural networks: a survey. IEEE Trans. Neural Networks 6(4), 837–858 (1995)CrossRefGoogle Scholar
  4. 4.
    Barlow, H.: Unsupervised learning. Neural Computation 1, 295–311 (1989)CrossRefGoogle Scholar
  5. 5.
    Bell, A., Sejnowski, T.: An information-maximization approach to blind separation and blind deconvolution. Neural Computation 7 7, 1129–1159 (1995)CrossRefGoogle Scholar
  6. 6.
    Belouchrani, A., Meraim, K., Cardoso, J.-F., Moulines, E.: A blind source separation technique based on second order statistics. IEEE Trans. Signal Proc. 45, 434–444 (1997)CrossRefGoogle Scholar
  7. 7.
    Bishop, C., Svensen, M., Williams, C.: GTM: the generative topographic mapping. Neural Computation 10, 215–234 (1998)CrossRefGoogle Scholar
  8. 8.
    Bourlard, H., Kamp, Y.: Auto-association by multilayer Perceptrons and singular value decomposition. Biol. Cybernetics 59, 291–294 (1988)zbMATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Cardoso, J.F.: Blind signal separation: statistical principles. Proc. of the IEEE 9(10), 2009–2025 (1998)CrossRefGoogle Scholar
  10. 10.
    Cichocki, A., Unbehauen, R.: Robust neural networks with on-line learning for blind identification and blind separation of sources. IEEE Trans. on Circuits and Systems 43(11), 894–906 (1996)CrossRefGoogle Scholar
  11. 11.
    Cottrell, G., Munro, P., Zipser, D.: Learning internal representations from gray-scale images: an example of extensional programming. In: Proc. 9th Ann. Conf. of the Cognitive Science Society, pp. 462–473 (1987)Google Scholar
  12. 12.
    Der, R., Herrmann, M.: Second-order learning in Self-Organizing Maps. In: Oja, E., Kaski, S. (eds.) Kohonen Maps, pp. 293–302. Elsevier, Amsterdam (1999)CrossRefGoogle Scholar
  13. 13.
    Devijver, P., Kittler, J.: Pattern Recognition - a Statistical Approach. Prentice-Hall, Englewood Cliffs (1982)zbMATHGoogle Scholar
  14. 14.
    Diamantaras, K., Kung, S.: Principal Component Neural Networks: Theory and Applications. Wiley & Sons, New York (1996)zbMATHGoogle Scholar
  15. 15.
    The FastICA package, Available from
  16. 16.
    Foldiak, P.: Adaptive network for optimal linear feature extraction. In: Proc. Int. J. Conf. on Neural Networks, Washington, DC, pp. 401–406 (1989)Google Scholar
  17. 17.
    Funahashi, K.: On the approximate realization of continuous mappings by neural networks. Neural Networks 2, 183–192 (1989)CrossRefGoogle Scholar
  18. 18.
    Harman, H.H.: Modern Factor Analysis. Univ. of Chicago Press (1967)Google Scholar
  19. 19.
    Hastie, T., Stuetzle, W.: Principal curves. J. Am. Statist. Assoc. 84, 502–516 (1989)zbMATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    Haykin, S.: Neural Networks - a Comprehensive Foundation. MacMillan College Publ. Co., New York (1998)Google Scholar
  21. 21.
    Heskes, T.: Energy functions for Self-Organizing Maps. In: Oja, E., Kaski, S. (eds.) Kohonen Maps, pp. 303–316. Elsevier, Amsterdam (1999)CrossRefGoogle Scholar
  22. 22.
    Hinton, G., Revow, M., Dayan, P.: Recognizing handwritten digits using mixtures of linear models. In: Tesauro, G., Touretzky, D., Leen, T. (eds.) Advances in Neural Information Processing Systems, vol. 6, pp. 1015–1022. Kauffman, San Mateo (1995)Google Scholar
  23. 23.
    Hinton, G., Sejnowski, T.J.: Unsupervised Learning - Foundations of Neural Computation. MIT Press, Cambridge (1999)Google Scholar
  24. 24.
    Hornik, M., Stinchcombe, M., White, H.: Multilayer Feedforward Networks are Universal Approximators. Neural Networks 2, 359–368 (1989)CrossRefGoogle Scholar
  25. 25.
    Hyvärinen, A., Oja, E.: A fast fixed-point algorithm for Independent Component Analysis. Neural Computation 9(7), 1483–1492 (1997)CrossRefGoogle Scholar
  26. 26.
    Hyvärinen, A.: Fast and robust fixed-point algorithms for Independent Component Analysis. IEEE Trans. Neural Networks 10(3), 626–634 (1999)CrossRefGoogle Scholar
  27. 27.
    Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. Wiley, New York (2001)CrossRefGoogle Scholar
  28. 28.
    Hyvärinen, A., Pajunen, P.: Nonlinear independent component analysis: existence and uniqueness results. Neural Networks 12, 429–439 (1999)CrossRefGoogle Scholar
  29. 29.
    Ilin, A., Valpola, H., Oja, E.: Nonlinear dynamical factor analysis for state change detection. IEEE Trans. Neural Networks 15(3) (2004)Google Scholar
  30. 30.
    Jain, A.K., Duin, P.W., Mao, J.: Statistical pattern recognition: a review. IEEE Trans. Pattern Analysis and Machine Intelligence 22, 4–37 (2000)CrossRefGoogle Scholar
  31. 31.
    Japkowitz, N., Hanson, S., Gluck, A.: Nonlinear autoassociation is not equivalent to PCA. Neural Computation 12(3), 531–545 (2000)CrossRefGoogle Scholar
  32. 32.
    Jutten, C., Herault, J.: Blind separation of sources, part I: an adaptive algorithm based on neuromimetic architecture. Signal Processing 24, 1–10 (1991)zbMATHCrossRefGoogle Scholar
  33. 33.
    Jutten, C., Karhunen, J.: Advances in nonlinear blind source separation. In: Proc. 4th Int. Symp. on ICA and BSS, Nara, Japan, April 1-4, pp. 245–256 (2003)Google Scholar
  34. 34.
    Karhunen, J., Oja, E., Wang, L., Vigario, R., Joutsensalo, J.: A class of neural networks for independent component analysis. IEEE Trans. on Neural Networks 8 (3), 486–504 (1997)CrossRefGoogle Scholar
  35. 35.
    Kendall, M., Stuart, A.: The Advanced Theory of Statistics, vol. 1-3. MacMillan, NYC (1976-1979)zbMATHGoogle Scholar
  36. 36.
    Kohonen, T.: Self-Organizing Maps, 1995, 3rd edn. Springer, Berlin (2001)zbMATHGoogle Scholar
  37. 37.
    Kohonen, T., Kaski, S., Lagus, K., Salojärvi, J., Paatero, V., saarela, A.: Self organization of massive document collection. IEEE Trans. Neural Networks 11(3), 574–585 (2000)CrossRefGoogle Scholar
  38. 38.
    von der Malsburg, C.: Self-organization of orientation sensitive cells in the striate cortex. Kybernetik 14, 85–100 (1973)CrossRefGoogle Scholar
  39. 39.
    Oja, E.: A Simplified Neuron Model as a Principal Components Analyzer. J. Math. Biol. 15, 267–273 (1982)zbMATHCrossRefMathSciNetGoogle Scholar
  40. 40.
    Oja, E.: Subspace Methods of Pattern Recognition. RSP and J. Wiley, Letchworth (1983)Google Scholar
  41. 41.
    Oja, E.: Data Compression, Feature Extraction, and Autoassociation in Feedforward Neural Networks. In: Proc. ICANN 1991, Espoo, Finland, June 24-28, pp. 737–745 (1991)Google Scholar
  42. 42.
    Oja, E.: Principal Components, Minor Components, and Linear Neural Networks. Neural Networks 5, 927–935 (1992)CrossRefGoogle Scholar
  43. 43.
    Oja, E.: The nonlinear PCA learning rule in independent component analysis. Neurocomputing 17(1), 25–46 (1997)CrossRefGoogle Scholar
  44. 44.
    Oja, E., Kaski, S. (eds.): Kohonen Maps. Elsevier, Amsterdam (1999)zbMATHGoogle Scholar
  45. 45.
    Oja, E., Wang, L.: Neural fitting: robustness by anti-Hebbian learning. Neurocomputing 12, 155–170 (1976)CrossRefGoogle Scholar
  46. 46.
    Pham, D., Cardoso, J.-F.: Blind separation of instantaneous mixtures of nonstationary sources. IEEE Trans. Signal Proc. 49, 1837–1848 (2001)CrossRefMathSciNetGoogle Scholar
  47. 47.
    Ritter, H., Martinetz, T., Schulten, K.: Neural Computation and Self- Organizing Maps: an Introduction. Addison-Wesley, Reading (1992)zbMATHGoogle Scholar
  48. 48.
    Roweis, S., Ghahramani, Z.: A unifying review of linear gaussian models. Neural Computation 11(2), 305–346 (1999)CrossRefGoogle Scholar
  49. 49.
    Rubner, J., Tavan, P.: A self-organizing network for Principal Component Analysis. Europhysics Letters 10(7), 693–698 (1989)CrossRefGoogle Scholar
  50. 50.
    Sanger, T.: Optimal unsupervised learning in a single-layered linear feedforward network. Neural Networks 2, 459–473 (1989)CrossRefGoogle Scholar
  51. 51.
    Schalkoff, R.: Pattern Recognition - Statistical, Structural, and Neural Approaches. J. Wiley, Chichester (1992)Google Scholar
  52. 52.
    Schölkopf, B., Smola, A.J., Müller, K.-R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10, 1299–1319 (1998)CrossRefGoogle Scholar
  53. 53.
    Bibliography of SOM papers: a reference list of over 4000 studies on the Self- Organizing Map, Available at
  54. 54.
    The SOM Toolbox for Matlab, Available at
  55. 55.
    Tipping, M.E., Bishop, C.M.: Mixtures of probabilistic principal component analyzers. Neural Computation 11(2), 443–482 (1999)CrossRefGoogle Scholar
  56. 56.
    Valpola, H.: Bayesian ensemble learning for nonlinear factor analysis, Acta Polyt. Scand. Ma 108, Espoo, D.Sc. Thesis, Helsinki University of Technology (2000)Google Scholar
  57. 57.
    Valpola, H., Oja, E., Ilin, A., Honkela, A., KArhunen, J.: Nonlinear blind source separation by variational Bayesian learning. IEICE Trans. E86-A, 532–541 (2003)Google Scholar
  58. 58.
    VanHulle, M.: Faithful Representations and Topographic Maps. J. Wiley & Sons, NewYork (2000)Google Scholar
  59. 59.
    Villmann, T.: Topology preservation in Self-Organizing Maps. In: Oja, E., Kaski, S. (eds.) Kohonen Maps, pp. 267–292. Elsevier, Amsterdam (1999)Google Scholar
  60. 60.
    Wang, L., Karhunen, J.: A unified neural bigradient algorithm for robust PCA and MCA. Int. J. of Neural Systems 7(1), 53–67 (1996)CrossRefGoogle Scholar
  61. 61.
    Webb, A.: Statistical Pattern Recognition. Arnold (1999)Google Scholar
  62. 62.
    Xu, L.: Temporal BYY learning for state space approach, hidden Markov model, and blind source separation. IEEE Trans. Signal Proc. 48, 2132–2144 (2000)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Erkki Oja
    • 1
  1. 1.Neural Networks Research CentreHelsinki University of TechnologyFinland

Personalised recommendations