Multi-modal Learning

  • Danijel Skočaj
  • Matej Kristan
  • Alen Vrečko
  • Aleš Leonardis
  • Mario Fritz
  • Michael Stark
  • Bernt Schiele
  • Somboon Hongeng
  • Jeremy L. Wyatt
Part of the Cognitive Systems Monographs book series (COSMOS, volume 8)


The main topic of this chapter is learning, more specifically, multimodal learning.

In biological systems, learning occurs in various forms and at various developmental stages facilitating adaptation to the ever changing environment. Learning is also one of the most fundamental capabilities of an artificial cognitive system, thus significant efforts have been dedicated in CoSy to researching a variety of issues related to it.


Action Context Continuous Learning Visual Concept Integrate Square Error Human Tutor 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Fidler, S., Skočaj, D., Leonardis, A.: Combining reconstructive and discriminative subspace methods for robust classification and regression by subsampling. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(3), 337–350 (2006), CrossRefGoogle Scholar
  2. 2.
    Harnad, S.: The symbol grounding problem. Physica D: Nonlinear Phenomena 42, 335–346 (1990)CrossRefGoogle Scholar
  3. 3.
    Ardizzone, E., Chella, A., Frixione, M., Gaglio, S.: Integrating subsymbolic and symbolic processing in artificial vision. Journal of Intelligent Systems 1(4), 273–308 (1992)Google Scholar
  4. 4.
    Chella, A., Frixione, M., Gaglio, S.: A cognitive architecture for artificial vision. Artificial Intelligence 89(1–2), 73–111 (1997)CrossRefzbMATHGoogle Scholar
  5. 5.
    Roy, D.K., Pentland, A.P.: Learning words from sights and sounds: a computational model. Cognitive Science 26(1), 113–146 (2002)CrossRefGoogle Scholar
  6. 6.
    Roy, D.K.: Learning visually-grounded words and syntax for a scene description task. Computer Speech and Language 16(3), 353–385 (2002)CrossRefGoogle Scholar
  7. 7.
    Steels, L., Vogt, P.: Grounding adaptive language games in robotic agents. In: Proceedings of the Fourth European Conference on Artificial Life, ECAL 1997, Complex Adaptive Systems, pp. 474–482 (1997)Google Scholar
  8. 8.
    Vogt, P.: The physical symbol grounding problem. Cognitive Systems Research 3(3), 429–457 (2002)CrossRefGoogle Scholar
  9. 9.
    Bauckhage, C., Fink, G., Fritsch, J., Kummert, F., Lömker, F., Sagerer, G., Wachsmuth, S.: An integrated system for cooperative man-machine interaction. In: IEEE International Symposium on Computational Intelligence in Robotics and Automation, pp. 328–333 (2001)Google Scholar
  10. 10.
    Kirstein, S., Wersing, H., Körner, E.: Rapid online learning of objects in a biologically motivated recognition architecture. In: Kropatsch, W.G., Sablatnig, R., Hanbury, A. (eds.) DAGM 2005. LNCS, vol. 3663, pp. 301–308. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  11. 11.
    Steels, L., Kaplan, F.: AIBO’s first words. the social learning of language and meaning. Evolution of Communication 4(1), 3–32 (2001)CrossRefGoogle Scholar
  12. 12.
    Arsenio, A.: Developmental learning on a humanoid robot. In: IEEE International Joint Conference on Neural Networks, pp. 3167–3172 (2004)Google Scholar
  13. 13.
    Pollard, D.E.: A user’s guide to measure theoretic probability. Cambridge University Press, Cambridge (2002)zbMATHGoogle Scholar
  14. 14.
    Kristan, M., Skočaj, D., Leonardis, A.: Online kernel density estimation for interactive learning (submitted for publication),
  15. 15.
    Wand, M.P., Jones, M.C.: Kernel Smoothing. Chapman & Hall/CRC (1995)Google Scholar
  16. 16.
    Scott, D.W., Szewczyk, W.F.: From kernels to mixtures. Technometrics 43(3), 323–335 (2001)CrossRefMathSciNetGoogle Scholar
  17. 17.
    Goldberger, J., Roweis, S.: Hierarchical clustering of a mixture model. In: Neural Inf. Proc. Systems, pp. 505–512 (2005)Google Scholar
  18. 18.
    Zhang, K., Kwok, J.T.: Simplifying mixture models through function approximation. In: Neural Inf. Proc. Systems (2006)Google Scholar
  19. 19.
    Mc Lachlan, G.J., Krishan, T.: The EM algorithm and extensions. Wiley, Chichester (1997)Google Scholar
  20. 20.
    Figueiredo, M.A.F., Jain, A.K.: Unsupervised learning of finite mixture models. IEEE Trans. Patter. Anal. Mach. Intell. 24(3), 381–396 (2002)CrossRefGoogle Scholar
  21. 21.
    Živkovič, Z., van der Heijden, F.: Recursive unsupervised learning of finite mixture models. IEEE Trans. Patter. Anal. Mach. Intell. 26(5), 651–656 (2004)CrossRefGoogle Scholar
  22. 22.
    Corduneanu, A., Bishop, C.M.: Variational Bayesian model selection for mixture distributions. In: Artificial Intelligence and Statistics, pp. 27–34. Morgan Kaufmann, Los Altos (2001)Google Scholar
  23. 23.
    McGrory, C.A., Titterington, D.M.: Variational approximations in Bayesian model selection for finite mixture distributions. Comput. Stat. Data Analysis 51(11), 5352–5367 (2007)CrossRefMathSciNetzbMATHGoogle Scholar
  24. 24.
    Song, M., Wang, H.: Highly efficient incremental estimation of gaussian mixture models for online data stream clustering. In: SPIE: Intelligent Computing: Theory and Applications, pp. 174–183 (2005)Google Scholar
  25. 25.
    Arandjelović, O., Cipolla, R.: Incremental learning of temporally-coherent gaussian mixture models. In: British Machine Vision Conference, pp. 759–768 (2005)Google Scholar
  26. 26.
    Szewczyk, W.F.: Time-evolving adaptive mixtures, Tech. rep., National Security Agency (2005)Google Scholar
  27. 27.
    Declercq, A., Piater, J.H.: Online learning of gaussian mixture models - a two-level approach. In: Intl.l Conf. Comp. Vis., Imaging and Comp. Graph. Theory and Applications, pp. 605–611 (2008)Google Scholar
  28. 28.
    Han, B., Comaniciu, D., Zhu, Y., Davis, L.S.: Sequential kernel density approximation and its application to real-time visual tracking. IEEE Trans. Patter. Anal. Mach. Intell. 30(7), 1186–1197 (2008)CrossRefGoogle Scholar
  29. 29.
    Kristan, M., Skočaj, D., Leonardis, A.: Incremental learning with Gaussian mixture models. In: Computer Vision Winter Workshop CVWW 2008, Moravske toplice, Slovenia, pp. 25–32 (2008),
  30. 30.
    Girolami, M., He, C.: Probability density estimation from optimally condensed data samples. IEEE Trans. Patter. Anal. Mach. Intell. 25(10), 1253–1264 (2003)CrossRefGoogle Scholar
  31. 31.
    Jones, M.C., Marron, J.S., Sheather, S.J.: A brief survey of bandwidth selection for density estimation. J. Amer. Stat. Assoc. 91(433), 401–407 (1996)CrossRefMathSciNetzbMATHGoogle Scholar
  32. 32.
    Skočaj, D., Berginc, G., Ridge, B., Štimec, A., Jogan, M., Vanek, O., Leonardis, A., Hutter, M., Hewes, N.: A system for continuous learning of visual concepts. In: International Conference on Computer Vision Systems ICVS 2007, Bielefeld, Germany (2007),
  33. 33.
    Skočaj, D., Ridge, B., Berginc, G., Leonardis, A.: A framework for continuous learning of simple visual concepts. In: Computer Vision Winter Workshop 2007, St. Lambrecht, Austria, pp. 99–105 (2007),
  34. 34.
    Skočaj, D., Kristan, M., Leonardis, A.: Continuous learning of simple visual concepts using incremental kernel density estimation. In: International Conference on Computer Vision Theory and Applications, Funchal, Madeira, Portugal, pp. 598–604 (2008),
  35. 35.
    Lowe, D.: Object recognition from local scale invariant features. In: ICCV 1999 (1999)Google Scholar
  36. 36.
    Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. In: CVPR 2003 (2003)Google Scholar
  37. 37.
    Mikolajczyk, K., Leibe, B., Schiele, B.: Local features for object class recognition. In: ICCV 2005, Beijing, China (2005)Google Scholar
  38. 38.
    Csurka, G., Dance, C., Fan, L., Willarnowski, J., Bray, C.: Visual categorization with bags of keypoints. In: SLCV (2004)Google Scholar
  39. 39.
    Leibe, B., Seemann, E., Schiele, B.: Pedestrian detection in crowded scenes. In: CVPR 2005, San Diego, CA, USA (2005)Google Scholar
  40. 40.
    Sivic, J., Russell, B.C., Efros, A.A., Zisserman, A., Freeman, W.T.: Discovering objects and their locations in images. In: ICCV 2005, Beijing, China (2005)Google Scholar
  41. 41.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR 2006, pp. 2169–2178 (2006)Google Scholar
  42. 42.
    Agarwal, A., Triggs, B.: Hyperfeatures - multilevel local coding for visual recognition. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 30–43. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  43. 43.
    Fritz, M., Schiele, B.: Towards unsupervised discovery of visual categories. In: Franke, K., Müller, K.-R., Nickolay, B., Schäfer, R. (eds.) DAGM 2006. LNCS, vol. 4174, pp. 232–241. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  44. 44.
    Grauman, K., Darrell, T.: Unsupervised learning of categories from sets of partially matching image features. In: CVPR 2006, pp. 19–25. IEEE Computer Society, Washington (2006)Google Scholar
  45. 45.
    Baldridge, J., Kruijff, G.-J.M.: Multi-modal combinatory categorial grammar. In: EACL 2003, Morristown, NJ, USA (2003)Google Scholar
  46. 46.
    Baldridge, J., Kruijff, G.-J.M.: Coupling ccg and hybrid logic dependency semantics. In: ACL 2002, Morristown, NJ, USA (2001)Google Scholar
  47. 47.
    Roy, D.: Learning words and syntax for a scene description task. Computer Speech and Language 16(3)Google Scholar
  48. 48.
    Kruijff, G.-J.M., Kelleher, J.D., Berginc, G., Leonardis, A.: Structural descriptions in Human-Assisted robot visual learning. In: Proceedings of 1st Annual Conference on Human-Robot Interaction (2006)Google Scholar
  49. 49.
    Kruijff, G.-J.M., Kelleher, J.D., Hawes, N.: Information fusion for visual reference resolution in dynamic situated dialogue. In: André, E., Dybkjær, L., Minker, W., Neumann, H., Weber, M. (eds.) PIT 2006. LNCS (LNAI), vol. 4021, pp. 117–128. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  50. 50.
    Kelleher, J., Kruijff, G.-J., Costello, F.: Proximity in context: an empirically grounded computational model of proximity for processing topological spatial expression. In: Coling-ACL 2006, Sydney Australia (2006)Google Scholar
  51. 51.
    Brand, M., Oliver, N., Pentland, A.: Coupled hidden markov models for complex action recognition. In: IEEE Proceedings of Computer Vision and Pattern Recognition, Puerto Rico, USA (1997)Google Scholar
  52. 52.
    Wren, C., Pentland, A.: Dynamic modeling of human motion. In: Proceedings of the Third IEEE International Conference on Automatic Face and Gesture Recognition, Nara, Japan (1998)Google Scholar
  53. 53.
    Hongeng, S., Wyatt, J.: Learning causality and intention in human actions. In: Proceedings of IEEE-RAS International Conference on Humanoid Robots, Genoa, France (2006),
  54. 54.
    Sutton, R.S., Barto, A.G.: Reinforcement learning : An introduction. MIT Press, Cambridge (1998)Google Scholar
  55. 55.
    Domingos, P., Richardson, M.: Markov logic: A unifying framework for statistical relational learning. In: Proceedings of the ICML 2004 Workshop on Statistical Relational Learning and its Connection to Other Fields, Banff, Canada (2004)Google Scholar
  56. 56.
    Hongeng, S., Wyatt, J.: Learning Causality and Intentional Actions. In: Rome, E., Hertzberg, J., Dorffner, G. (eds.) Towards Affordance-Based Robot Control. LNCS (LNAI), vol. 4760, pp. 27–46. Springer, Heidelberg (2008), CrossRefGoogle Scholar
  57. 57.
    Hongeng, S., Wyatt, J.: Learning goal-based motion sequences of object manipulation, Tech. Rep. CSR-08-02, School of Computer Science, University of Birmingham (2008)Google Scholar
  58. 58.
    Glymour, C.: Learning causes : Psychological explanations of causal explanation. Minds and Machines 8, 39–60 (1998)CrossRefGoogle Scholar
  59. 59.
    Gergely, G., Csibra, G.: Teleological reasoning in infancy: the naive theory of rational action. Trends in Cognitive Sciences 7(7), 287–292 (2003)CrossRefGoogle Scholar
  60. 60.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2007 (VOC 2007) Results (2007)Google Scholar
  61. 61.
    Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset, Tech. Rep. 7694, California Institute of Technology (2007)Google Scholar
  62. 62.
    Rosch, E., Mervis, C.B., Gray, W.D., Johnson, D.M., Braem, P.B.: Basic objects in natural categories. Cognitive PsychologyGoogle Scholar
  63. 63.
    Gibson, J.J.: The theory of affordance, in: Percieving, Acting, and Knowing. Lawrence Erlbaum Associates, Hillsdale (1977)Google Scholar
  64. 64.
    Winston, P.H., Katz, B., Binford, T.O., Lowry, M.R.: Learning physical descriptions from functional definitions, examples, and precedents. In: AAAI 1983 (1983)Google Scholar
  65. 65.
    Stark, L., Bowyer, K.: Achieving generalized object recognition through reasoning about association of function to structure. PAMI 13(10), 1097–1104 (1991)Google Scholar
  66. 66.
    Stark, L., Hoover, A., Goldgof, D., Bowyer, K.: Function-based recognition from incomplete knowledge of shape. In: WQV 1993, pp. 11–22 (1993)Google Scholar
  67. 67.
    Rivlin, E., Dickinson, S.J., Rosenfeld, A.: Recognition by functional parts. Computer Vision and Image Understanding: CVIU 62(2), 164–176 (1995)CrossRefzbMATHGoogle Scholar
  68. 68.
    Bogoni, L., Bajcsy, R.: Interactive recognition and representation of functionality. CVIU 62(2), 194–214 (1995)zbMATHGoogle Scholar
  69. 69.
    Saxena, A., Driemeyer, J., Ng, A.Y.: Robotic grasping of novel objects using vision. IJRRGoogle Scholar
  70. 70.
    Stark, M., Lies, P., Zillich, M., Wyatt, J., Schiele, B.: Functional object class detection based on learned affordance cues. In: 6th International Conference on Computer Vision Systems, ICVS (2008),
  71. 71.
    Sun, J., Zhang, W.W., Tang, X., Shum, H.Y.: Background cut. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part II. LNCS, vol. 3952, pp. 628–641. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  72. 72.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML 2001 (2001)Google Scholar
  73. 73.
    Jones, M.J., Rehg, J.M.: Statistical color models with application to skin detection. In: CVPR, pp. 1274–1280. IEEE Computer Society, Los Alamitos (1999)Google Scholar
  74. 74.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)CrossRefGoogle Scholar
  75. 75.
    Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Gool, L.J.V.: A comparison of affine region detectors. In: IJCV 2005 (2005)Google Scholar
  76. 76.
    Ferrari, V., Fevrier, L., Jurie, F., Schmid, C.: Groups of adjacent contour segments for object detection, Rapport De Recherche InriaGoogle Scholar
  77. 77.
    Ferrari, V., Tuytelaars, T., Gool, L.J.V.: Object detection by contour segment networks. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3953, pp. 14–28. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  78. 78.
    Stark, M., Schiele, B.: How good are local features for classes of geometric objects. In: ICCV (2007),
  79. 79.
    Zillich, M.: Incremental Indexing for Parameter-Free Perceptual Grouping. In: 31st Workshop of the Austrian Association for Pattern Recognition (2007)Google Scholar
  80. 80.
    Leibe, B., Leonardis, A., Schiele, B.: An implicit shape model for combined object categorization and segmentation. In: Ponce, J., Hebert, M., Schmid, C., Zisserman, A. (eds.) Toward Category-Level Object Recognition. LNCS, vol. 4170, pp. 508–524. Springer, Heidelberg (2006), CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Danijel Skočaj
    • 1
  • Matej Kristan
    • 1
  • Alen Vrečko
    • 1
  • Aleš Leonardis
    • 1
  • Mario Fritz
    • 2
  • Michael Stark
    • 2
  • Bernt Schiele
    • 2
  • Somboon Hongeng
    • 3
  • Jeremy L. Wyatt
    • 3
  1. 1.VICOS LabUniversity of LjubljanaLjubljanaSlovenia
  2. 2.Technische Universität DarmstadtDarmstadtGermany
  3. 3.Intelligent Robotics Laboratory, School of Computer ScienceUniversity of BirminghamBirminghamUK

Personalised recommendations