Action Recognition Using a Bio-Inspired Feedforward Spiking Network

  • Maria-Jose Escobar
  • Guillaume S. Masson
  • Thierry Vieville
  • Pierre Kornprobst
Article

Abstract

We propose a bio-inspired feedforward spiking network modeling two brain areas dedicated to motion (V1 and MT), and we show how the spiking output can be exploited in a computer vision application: action recognition. In order to analyze spike trains, we consider two characteristics of the neural code: mean firing rate of each neuron and synchrony between neurons. Interestingly, we show that they carry some relevant information for the action recognition application. We compare our results to Jhuang et al. (Proceedings of the 11th international conference on computer vision, pp. 1–8, 2007) on the Weizmann database. As a conclusion, we are convinced that spiking networks represent a powerful alternative framework for real vision applications that will benefit from recent advances in computational neuroscience.

Keywords

Spiking networks Bio-inspired model Motion analysis V1 MT Action recognition 

References

  1. Adelson, E., & Bergen, J. (1985). Spatiotemporal energy models for the perception of motion. Journal of the Optical Society of America A, 2, 284–299. CrossRefGoogle Scholar
  2. Bayerl, P., & Neumann, H. (2007). Disambiguating visual motion by form–motion interaction—a computational model. International Journal of Computer Vision, 72(1), 27–45. CrossRefGoogle Scholar
  3. Beintema, J., & Lappe, M. (2002). Perception of biological motion without local image motion. Proceedings of the National Academy of Sciences of the USA, 99(8), 5661–5663. CrossRefGoogle Scholar
  4. Berzhanskaya, J., Grossberg, S., & Mingolla, E. (2007). Laminar cortical dynamics of visual form and motion interactions during coherent object motion perception. Spatial Vision, 20(4), 337–395. CrossRefGoogle Scholar
  5. Biederlack, J., Castelo-Branco, M., Neuenschwander, S., Wheeler, D. W., Singer, W., & Nikoli, D. (2006). Brightness induction: rate enhancement and neuronal synchronization as complementary codes. Neuron, 52(6), 1073–1083. CrossRefGoogle Scholar
  6. Blake, R., & Shiffrar, M. (2007). Perception of human motion. Annual Review of Psychology, 58, 12.1–12.27. CrossRefGoogle Scholar
  7. Blank, M., Gorelick, L., Shechtman, E., Irani, M., & Basri, R. (2005). Actions as space-time shapes. In Proceedings of the 10th international conference on computer vision (Vol. 2, pp. 1395–1402). Google Scholar
  8. Bobick, A., & Davis, J. (2001). The recognition of human movement using temporal templates. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(3), 257–267. CrossRefGoogle Scholar
  9. Born, R. T. (2000). Center-surround interactions in the middle temporal visual area of the owl monkey. Journal of Neurophysiology, 84, 2658–2669. Google Scholar
  10. Born, R., & Bradley, D. (2005). Structure and function of visual area MT. Annual Reviews—Neuroscience, 28, 157–189. CrossRefGoogle Scholar
  11. Buracas, G. T., & Albright, T. D. (1996). Contribution of area mt to perception of three-dimensional shape: a computational study. Vision Research, 36(6), 869–87. CrossRefGoogle Scholar
  12. Casile, A., & Giese, M. (2003). Roles of motion and form in biological motion recognition. In Lecture notes in computer science : Vol. 2714. Artificial networks and neural information processing (pp. 854–862). Berlin: Springer. Google Scholar
  13. Casile, A., & Giese, M. (2005). Critical features for the recognition of biological motion. Journal of Vision, 5, 348–360. CrossRefGoogle Scholar
  14. Cessac, B., Rostro-Gonzalez, H., Vasquez, J., & Vieville, T. (2008). To which extend is the “neural code” a metric? In Deuxième conférence française de neurosciences computationnelles. Google Scholar
  15. Collins, R., Gross, R., & Shi, J. (2002). Silhouette-based human identification from body shape and gait. In 5th intl. conf. on automatic face and gesture recognition (p. 366). Google Scholar
  16. Conway, B., & Livingstone, M. (2003). Space-time maps and two-bar interactions of different classes of direction-selective cells in macaque V1. Journal of Neurophysiology, 89, 2726–2742. CrossRefGoogle Scholar
  17. Cutler, R., & Davis, L. (2000). Robust real-time periodic motion detection, analysis, and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8) Google Scholar
  18. Dayan, P., & Abbott, L. F. (2001). Theoretical neuroscience: computational and mathematical modeling of neural systems. Cambridge: MIT Press. MATHGoogle Scholar
  19. De Valois, R., Cottaris, N., (2000). Spatial and temporal receptive fields of geniculate and cortical cells and directional selectivity. Vision Research, 40, 3685–3702. CrossRefGoogle Scholar
  20. Destexhe, A., Rudolph, M., & Paré, D. (2003). The high-conductance state of neocortical neurons in vivo. Nature Reviews Neuroscience, 4, 739–751. CrossRefGoogle Scholar
  21. Dollar, P., Rabaud, V., Cottrell, G., & Belongie, S. (2005). Behavior recognition via sparse spatio-temporal features. In VS-PETS (pp. 65–72). Google Scholar
  22. Efros, A., Berg, A., Mori, G., & Malik, J. (2003). Recognizing action at a distance. In Proceedings of the 9th international conference on computer vision (Vol. 2, pp. 726–734). Google Scholar
  23. Escobar, M. J., & Kornprobst, P. (2008). Action recognition with a bio–inspired feedforward motion processing model: The richness of center-surround interactions. In Lecture notes in computer science. Proceedings of the 10th European conference on computer vision. Berlin: Springer. Google Scholar
  24. Escobar, M. J., Wohrer, A., Kornprobst, P., & Vieville, T. (2006). Biological motion recognition using an mt-like model. In Proceedings of 3rd Latin American robotic symposium. Google Scholar
  25. Felleman, D., & Essen, D. V. (1991). Distributed hierarchical processing in the primate cerebral cortex. Cereb Cortex, 1, 1–47. CrossRefGoogle Scholar
  26. Fellous, J. M., Tiesinga, P. H. E., Thomas, P. J., & Sejnowski, T. J. (2004). Discovering spike patterns in neural responses. The Journal of Neuroscience, 24(12), 2989–3001. CrossRefGoogle Scholar
  27. Fries, P., Neuenschwander, S., Engel, A. K., Goebel, R., & Singer, W. (2001). Rapid feature selective neuronal synchronization through correlated latency shifting. Nature Neuroscience, 4(2), 194–200. CrossRefGoogle Scholar
  28. Gautrais, J., & Thorpe, S. (1998). Rate coding vs temporal order coding: a theoretical approach. Biosystems, 48, 57–65. CrossRefGoogle Scholar
  29. Gavrila, D. (1999). The visual analysis of human movement: A survey. Computer Vision and Image Understanding, 73(1), 82–98. MATHCrossRefGoogle Scholar
  30. Gavrila, D., & Davis, L. (1996). 3-D model-based tracking of humans in action: a multi-view approach. In Proceedings of the international conference on computer vision and pattern recognition. San Francisco: IEEE. Google Scholar
  31. Gerstner, W., & Kistler, W. (2002). Spiking neuron models. Cambridge: Cambridge University Press. MATHGoogle Scholar
  32. Giese, M., & Poggio, T. (2003). Neural mechanisms for the recognition of biological movements and actions. Nature Reviews Neuroscience, 4, 179–192. CrossRefGoogle Scholar
  33. Gollisch, T., & Meister, M. (2008). Rapid neural coding in the retina with relative spike latencies. Science, 319, 1108–1111. CrossRefGoogle Scholar
  34. Goncalves, L., DiBernardo, E., Ursella, E., & Perona, P. (1995). Monocular tracking of the human arm in 3D. In Proceedings of the 5th international conference on computer vision (pp. 764–770). Google Scholar
  35. Grzywacz, N., & Yuille, A. (1990). A model for the estimate of local image velocity by cells on the visual cortex. Proceedings of the Royal Society London B: Biological Sciences, 239(1295), 129–161. CrossRefGoogle Scholar
  36. Hiris, E., Humphrey, D., & Stout, A. (2005). Temporal properties in masking biological motion. Perception and Psychophysics, 67(3), 435–443. Google Scholar
  37. Hogg, D. (1983). Model-based vision: a paradigm to see a walking person. Image and Vision Computing, 1(1), 5–20. CrossRefGoogle Scholar
  38. Hubel, D., & Wiesel, T. (1962). Receptive fields, binocular interaction and functional architecture in the cat visual cortex. Journal of Physiology, 160, 106–154. Google Scholar
  39. Izhikevich, E. (2004). Which model to use for cortical spiking neurons? IEEE Transactions on Neural Networks, 15(5), 1063–1070. CrossRefGoogle Scholar
  40. Jhuang, H., Serre, T., Wolf, L., & Poggio, T. (2007). A biologically inspired system for action recognition. In Proceedings of the 11th international conference on computer vision (pp. 1–8). Google Scholar
  41. Kreuz, T., Haas, J. S., Morelli, A., Abarbanel, H. D., & Politi, A. (2007). Measuring spike train synchrony. Journal of Neuroscience Methods, 165, 151–161. CrossRefGoogle Scholar
  42. Laptev, I., Capuo, B., Schultz, C., & Lindeberg, T. (2007). Local velocity-adapted motion events for spatio-temporal recognition. Computer Vision and Image Understanding, 108(3), 207–229. CrossRefGoogle Scholar
  43. Lui, L. L., Bourne, J. A., & Rosa, M. G. P. (2007). Spatial summation, end inhibition and side inhibition in the middle temporal visual area MT. Journal of Neurophysiology, 97(2), 1135. CrossRefGoogle Scholar
  44. Maldonado, P., Babul, C., Singer, W., Rodriguez, E., Berger, D., & Grün, S. (2008). Synchronization of neuronal responses in primarily visual cortex of monkeys viewing natural images. Journal of Neurophysiology, 100, 1523–1532. CrossRefGoogle Scholar
  45. Mestre, D. R., Masson, G. S., & Stone, L. S. (2001). Spatial scale of motion segmentation from speed cues. Vision Research, 41(21), 2697–2713. CrossRefGoogle Scholar
  46. Michels, L., Lappe, M., & Vaina, L. (2005). Visual areas involved in the perception of human movement from dynamic analysis. Brain Imaging, 16(10), 1037–1041. Google Scholar
  47. Mokhber, A., Achard, C., & Milgram, M. (2008). Recognition of human behavior by space-time silhouette characterization. Pattern Recognition Letters, 29(1), 81–89. CrossRefGoogle Scholar
  48. Mutch, J., & Lowe, D. G. (2006). Multiclass object recognition with sparse, localized features. In Proceedings of the international conference on computer vision and pattern recognition (pp. 11–18). Google Scholar
  49. Neuenschwander, S., Castelo-Branco, M., & Singer, W. (1999). Synchronous oscillations in the cat retina. Vision Research, 39(15), 2485–2497. CrossRefGoogle Scholar
  50. Niebles, J. C., Wang, H., & Fei-Fei, L. (2006). Unsupervised learning of human action categories using spatial-temporal words. In British machine vision conference. Google Scholar
  51. Nowak, L., & Bullier, J. (1997). The timing of information transfer in the visual system. In Cerebral cortex (Vol. 12, pp. 205–241). New York: Plenum Press. Chap. 5. Google Scholar
  52. Nowlan, S., & Sejnowski, T. (1995). A selection model for motion processing in area MT of primates. Journal of Neuroscience, 15, 1195–1214. Google Scholar
  53. Pack, C. C., Hunter, J. N., & Born, R. T. (2005). Contrast dependence of suppressive influences in cortical area mt of alert macaque. Journal of Neurophysiology, 93(3), 1809–1815. CrossRefGoogle Scholar
  54. Perge, J., Borghuis, B., Bours, R., Lankheet, M., & van Wezel, R. (2005). Temporal dynamics of direction tuning in motion-sensitive macaque area mt. Journal of Neurophysiology, 93, 2194–2116. Google Scholar
  55. Perkel, D. H., & Bullock, T. H. (1968). Neural coding. Neurosciences Research Program Bulletin, 6, 221–348. Google Scholar
  56. Pinto, N., Cox, D. D., & DiCarlo, J. J. (2008). Why is real-world visual object recognition hard? PLoS Computational Biology, 4(1), e27. CrossRefMathSciNetGoogle Scholar
  57. Polana, R., & Nelson, R. (1997). Detection and recognition of periodic, non-rigid motion. International Journal of Computer Vision, 23(3), 261–282. CrossRefGoogle Scholar
  58. Riehle, A., Grün, S., Diesmann, M., & Aertsen, A. (1997). Spike synchronization and rate modulation differentially involved in motor cortical function. Science, 278, 1950–1953. CrossRefGoogle Scholar
  59. Rieke, F., Warland, D., de Ruyter van Steveninck, R., & Bialek, W. (1997). Spikes: Exploring the neural code. Cambridge: Bradford Books. Google Scholar
  60. Robson, J. (1966). Spatial and temporal contrast-sensitivity functions of the visual system. Journal of Optical Society of America, 69, 1141–1142. CrossRefGoogle Scholar
  61. Roelfsema, P. R., Lamme, V. A. F., & Spekreijse, H. (2004). Synchrony and covariation of firing rates in the primary visual cortex during contour grouping. Nature Neuroscience, 7(9), 982–991. CrossRefGoogle Scholar
  62. Rohr, K. (1994). Toward model-based recognition of human movements in image sequences. CVGIP, Image Understanding, 1, 94–115. CrossRefGoogle Scholar
  63. Rust, N., Mante, V., Simoncelli, E., & Movshon, J. (2006). How MT cells analyze the motion of visual patterns. Nature Neuroscience, 11, 1421–1431. CrossRefGoogle Scholar
  64. Saul, A., Carras, P., & Humphrey, A. (2005). Temporal properties of inputs to direction-selective neurons in monkey v1. Journal of Neurophysiology, 94, 282–294. CrossRefGoogle Scholar
  65. Seitz, S., & Dyer, C. (1997). View-invariant analysis of cyclic motion. The International Journal of Computer Vision, 25(3), 231–251. CrossRefGoogle Scholar
  66. Sereno, M. E., & Sereno, M. L. (1999). 2-d center-surround effects on 3-d structure-from-motion. Journal of Experimental Psychology: Human Perception and Performance, 25(6), 1834–1854. CrossRefGoogle Scholar
  67. Serre, T. (2006). Learning a dictionary of shape-components in visual cortex: Comparison with neurons, humans and machines. Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, MA. Google Scholar
  68. Serre, T., Wolf, L., & Poggio, T. (2005). Object recognition with features inspired by visual cortex. In Proceedings of the international conference on computer vision and pattern recognition (pp. 994–1000). Google Scholar
  69. Shah, M., & Jain, R. (1997). Motion-based recognition. Computational imaging and vision series. Dordrecht: Kluwer Academic. MATHGoogle Scholar
  70. Sigala, R., Serre, T., Poggio, T., & Giese, M. (2005). Learning features of intermediate complexity for the recognition of biological motion. In LNCS : Vol. 3696. ICANN 2005 (pp. 241–246). Berlin: Springer. Google Scholar
  71. Simoncelli, E. P., & Heeger, D. (1998). A model of neuronal responses in visual area MT. Vision Research, 38, 743–761. CrossRefGoogle Scholar
  72. Smith, M., Majaj, N., & Movshon, A. (2005). Dynamics of motion signaling by neurons in macaque area mt. Nature Neuroscience, 8(2), 220–228. CrossRefGoogle Scholar
  73. Snowden, R. J., Treue, S., Erickson, R. G., & Andersen, R. A. (1991). The response of area mt and v1 neurons to transparent motion. The Journal of Neuroscience, 11(9), 2768–2785. Google Scholar
  74. Thorpe, S. (1990). Spike arrival times: A highly efficient coding scheme for neural networks. In Parallel processing in neural systems and computers (pp. 91–94). Google Scholar
  75. Thorpe, S. (2002). Ultra-rapid scene categorization with a wave of spikes. In Lecture notes in computer science : Vol. 2525. Biologically motivated computer vision (pp. 1–15). Berlin: Springer. CrossRefGoogle Scholar
  76. Thorpe, S., Fize, D., & Marlot, C. (1996). Speed of processing in the human visual system. Nature, 381, 520–522. CrossRefGoogle Scholar
  77. Topsoe, F. (2000). Some inequalities for information divergence and related measures of discrimination. IEEE Transactions on Information Theory, 46(4), 1602–1609. CrossRefMathSciNetGoogle Scholar
  78. Tsotsos, J., Liu, Y., Martinez-Trujillo, J., Pomplun, M., Simine, E., & Zhou, K. (2005). Attending to visual motion. Computer Vision and Image Understanding, 100, 3–40. CrossRefGoogle Scholar
  79. VanRullen, R., & Thorpe, S. J. (2002). Surfing a spike wave down the ventral stream. Vision Research, 42, 2593–2615. CrossRefGoogle Scholar
  80. Victor, J., & Purpura, K. (1996). Nature and precision of temporal coding in visual cortex: a metric-space analysis. Journal of Neurophysiology, 76, 1310–1326. Google Scholar
  81. Wang, L., & Suter, D. (2007). Recognizing human activities from silhouettes: Motion subspace and factorial discriminative graphical model. In Proceedings CVPR. Google Scholar
  82. Wang, D. L., & Terman, D. (1995). Locally excitatory globally inhibitory oscillator networks. IEEE Transactions on Neural Networks, 6, 283–286. CrossRefGoogle Scholar
  83. Watson, A., & Ahumada, A. (1983). A look at motion in the frequency domain (NASA Tech. Memo). Google Scholar
  84. Wielaard, D. J., Shelley, M., McLaughlin, D., & Shapley, R. (2001). How simple cells are made in a nonlinear network model of the visual cortex. The Journal of Neuroscience, 21(14), 5203–5211. Google Scholar
  85. Wohrer, A., & Kornprobst, P. (2008). Virtual Retina: A biological retina model and simulator, with contrast gain control. Journal of Computational Neuroscience. doi:10.1007/s10827-008-0108-4. Google Scholar
  86. Wong, S. F., Kim, T. K., & Cipolla, R. (2007). Learning motion categories using both semantic and structural information. In Proceedings of the international conference on computer vision and pattern recognition (pp. 1–6). Google Scholar
  87. Xiao, D., Raiguel, S., Marcar, V., Koenderink, J., & Orban, G. A. (1995). Spatial heterogeneity of inhibitory surrounds in the middle temporal visual area. Proceedings of the National Academy of Sciences, 92(24), 11303–11306. CrossRefGoogle Scholar
  88. Xiao, D. K., Raiguel, S., Marcar, V., & Orban, G. A. (1997). The spatial distribution of the antagonistic surround of MT/V5 neurons. Cereb Cortex, 7(7), 662–677. CrossRefGoogle Scholar
  89. Zelnik-Manor, L., & Irani, M. (2001). Event-based analysis of video. In Proceedings of CVPR’01 (Vol. 2, pp. 123–128). Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Maria-Jose Escobar
    • 1
  • Guillaume S. Masson
    • 2
  • Thierry Vieville
    • 1
  • Pierre Kornprobst
    • 1
  1. 1.INRIA Sophia-AntipolisSophia-AntipolisFrance
  2. 2.Institut de Neurosciences Cognitives de la Méditerranée, CNRSUniversité d’Aix-Marseille, UMR6193MarseilleFrance

Personalised recommendations