Skip to main content

Learning viewpoint invariant object representations using a temporal coherence principle

Abstract

Invariant object recognition is arguably one of the major challenges for contemporary machine vision systems. In contrast, the mammalian visual system performs this task virtually effortlessly. How can we exploit our knowledge on the biological system to improve artificial systems? Our understanding of the mammalian early visual system has been augmented by the discovery that general coding principles could explain many aspects of neuronal response properties. How can such schemes be transferred to system level performance? In the present study we train cells on a particular variant of the general principle of temporal coherence, the “stability” objective. These cells are trained on unlabeled real-world images without a teaching signal. We show that after training, the cells form a representation that is largely independent of the viewpoint from which the stimulus is looked at. This finding includes generalization to previously unseen viewpoints. The achieved representation is better suited for view-point invariant object classification than the cells’ input patterns. This property to facilitate view-point invariant classification is maintained even if training and classification take place in the presence of an – also unlabeled – distractor object. In summary, here we show that unsupervised learning using a general coding principle facilitates the classification of real-world objects, that are not segmented from the background and undergo complex, non-isomorphic, transformations.

This is a preview of subscription content, access via your institution.

References

  1. Berkes P,Wiskott L (2003) Slowfeature analysis yields a rich repertoire of complex-cell properties. Cognit Sci EPrintArch (CogPrints) 2804, http://cogprints.ecs.soton.ac.uk/archive/00002804/

  2. BY Betsch W Einhäuser KP Körding P König (2004) ArticleTitleThe world from a cat’s perspective – statistics of natural videos Biol Cybern 90 41–50 Occurrence Handle10.1007/s00422-003-0434-6 Occurrence Handle14762723

    Article  PubMed  Google Scholar 

  3. I Biederman (1987) ArticleTitleRecognition-by-components: a theory of human image understanding Psychol Rev 94 IssueID2 115–147 Occurrence Handle10.1037//0033-295X.94.2.115 Occurrence Handle3575582

    Article  PubMed  Google Scholar 

  4. I Biederman (2000) ArticleTitleRecognizing depth-rotated objects: a review of recent research and theory Spat Vis 13 241–253 Occurrence Handle10.1163/156856800741063 Occurrence Handle11198235

    Article  PubMed  Google Scholar 

  5. R Desimone J Duncan (1995) ArticleTitleNeural mechanisms of selective visual attention Annu Rev Neurosci 18 193–222 Occurrence Handle10.1146/annurev.ne.18.030195.001205 Occurrence Handle7605061

    Article  PubMed  Google Scholar 

  6. W Einhäuser C Kayser P König KP Körding (2002) ArticleTitleLearning the invariance properties of complex cells from their responses to natural stimuli Eur J Neurosci 15 475–486 Occurrence Handle10.1046/j.0953-816x.2001.01885.x Occurrence Handle11876775

    Article  PubMed  Google Scholar 

  7. W Einhäuser C Kayser KP Körding P König (2003) ArticleTitleLearning distinct and complementary feature-selectivities from natural colour videos Rev Neurosci 14 43–52 Occurrence Handle12929917

    PubMed  Google Scholar 

  8. P Földiak (1991) ArticleTitleLearning Invariance from Transformation Sequences Neural Comput 3 194–200

    Google Scholar 

  9. Franzius M, Einhäuser W, König P, Körding KP (2005) Learning a hierarchical model of cortical function from natural stimuli. (submitted).

  10. DH Hubel TN Wiesel (1962) ArticleTitleReceptive fields, binocular interaction and functional architecture in the cat’s visual cortex J Physiol 160 106–154 Occurrence Handle14449617

    PubMed  Google Scholar 

  11. J Hurri A Hyvärinen (2003) ArticleTitleSimple-Cell-Like Receptive Fields Maximize Temporal Coherence in Natural Video Neural Comput 15 IssueID3 663–691 Occurrence Handle10.1162/089976603321192121 Occurrence Handle12620162

    Article  PubMed  Google Scholar 

  12. C Kayser W Einhäuser O Dümmer P König KP Körding (2001) Extracting slow subspaces from natural videos leads to complex cells G Dorffner H Bischoff K Hornik (Eds) Artificial neural networks – (ICANN) LNCS 2130 Springer Berlin Heidelberg New York 1075–1080

    Google Scholar 

  13. C Kayser W Einhäuser P König (2003a) ArticleTitleTemporal correlations of orientations in natural scenes Neurocomputing 52 117–123

    Google Scholar 

  14. C Kayser KP Körding P König (2003b) ArticleTitleLearning the nonlinearity of neurons from natural visual stimuli Neural Comput 15 1751–1759 Occurrence Handle10.1162/08997660360675026

    Article  Google Scholar 

  15. KP Körding C Kayser W Einhäuser P König (2004) ArticleTitleHow are complex cell properties adapted to the statistics of natural stimuli? J Neurophysiol 91 206–212 Occurrence Handle10.1152/jn.00149.2003 Occurrence Handle12904330

    Article  PubMed  Google Scholar 

  16. BW Mel (1997) ArticleTitleSEEMORE: combining color, shape, and texture histogramming in a neurally inspired approach to visual object recognition Neural Comput 9 IssueID4 777–804 Occurrence Handle9161022

    PubMed  Google Scholar 

  17. Nayer SK, Nene SA, Murase H (1996) Real Time 100 object recognition system. In: Proceedings of ARPA Image UnderstandingWorkshop. Morgan Kaufmann, San Matteo

  18. BA Olshausen (2002) Principles of image representation in visual cortex LM Chalupa JS Werner (Eds) The visual neurosciences MIT Press Cambridge

    Google Scholar 

  19. T Poggio S Edelman (1990) ArticleTitleA network that learns to recognize three-dimensional objects Nature 343 IssueID6255 263–266 Occurrence Handle10.1038/343263a0 Occurrence Handle2300170

    Article  PubMed  Google Scholar 

  20. ET Rolls T Milward (2000) ArticleTitleA model of invariant object recognition in the visual system: learning rules, activation functions, lateral inhibition, and information-based performance measures Neural Comput 12 2547–2572 Occurrence Handle10.1162/089976600300014845 Occurrence Handle11110127

    Article  PubMed  Google Scholar 

  21. JV Stone (1996) ArticleTitleLearning perceptually salient visual parameters using spatiotemporal smoothness constraints Neural Comput 8 1463–1492 Occurrence Handle8823943

    PubMed  Google Scholar 

  22. SM Stringer ET Rolls (2002) ArticleTitleInvariant object recognition in the visual system with novel views of 3D objects Neural Comput 14 2585–2596 Occurrence Handle10.1162/089976602760407982 Occurrence Handle12433291

    Article  PubMed  Google Scholar 

  23. MJ Tarr S Pinker (1989) ArticleTitleMental rotation and orientation-dependence in shape recognition Cognit Psychol 21 IssueID2 233–282 Occurrence Handle10.1016/0010-0285(89)90009-1 Occurrence Handle2706928

    Article  PubMed  Google Scholar 

  24. MJ Tarr HH Bülthoff (1998) ArticleTitleImage-based object recognition in man, monkey and machine Cognition 67 1–20 Occurrence Handle10.1016/S0010-0277(98)00026-2 Occurrence Handle9735534

    Article  PubMed  Google Scholar 

  25. J Touryan B Lau Y Dan (2002) ArticleTitleIsolation of relevant visual features from random stimuli for cortical complex cells J Neurosci 22 10811–10818 Occurrence Handle12486174

    PubMed  Google Scholar 

  26. S Ullman R Basri (1991) ArticleTitleRecognition by linear combinations of models IEEE Trans Pattern Anal Mach Intell 13 IssueID10 992–1006 Occurrence Handle10.1109/34.99234

    Article  Google Scholar 

  27. G Wallis ET Rolls (1997) ArticleTitleInvariant face and object recognition in the visual systems Prog Neurobiol 51 167–194 Occurrence Handle10.1016/S0301-0082(96)00054-8 Occurrence Handle9247963

    Article  PubMed  Google Scholar 

  28. H Wersing E Körner (2003) ArticleTitleLearning optimized features for hierarchical models of invariant object recognition Neural Comput 15 1559–1588 Occurrence Handle10.1162/089976603321891800 Occurrence Handle12816566

    Article  PubMed  Google Scholar 

  29. L Wiskott T Sejnowski (2002) ArticleTitleSlow feature analysis: unsupervised learning of invariances Neural Comput 14 715–770 Occurrence Handle10.1162/089976602317318938 Occurrence Handle11936959

    Article  PubMed  Google Scholar 

  30. L Wiskott (2003) ArticleTitleSlow feature analysis: a theoretical analysis of optimal free responses Neural Comput 15 IssueID9 2147–2177 Occurrence Handle10.1162/089976603322297331 Occurrence Handle12959670

    Article  PubMed  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Wolfgang Einhäuser.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Einhäuser, W., Hipp, J., Eggert, J. et al. Learning viewpoint invariant object representations using a temporal coherence principle. Biol Cybern 93, 79–90 (2005). https://doi.org/10.1007/s00422-005-0585-8

Download citation

Keywords

  • Machine Vision
  • Object Representation
  • Unsupervised Learning
  • Temporal Coherence
  • Machine Vision System