Skip to main content

Computational Scene Analysis

  • Chapter

Part of the book series: Studies in Computational Intelligence ((SCI,volume 63))

Summary

A remarkable achievement of the perceptual system is its scene analysis capability, which involves two basic perceptual processes: the segmentation of a scene into a set of coherent patterns (objects) and the recognition of memorized ones. Although the perceptual system performs scene analysis with apparent ease, computational scene analysis remains a tremendous challenge as foreseen by Frank Rosenblatt. This chapter discusses scene analysis in the field of computational intelligence, particularly visual and auditory scene analysis. The chapter first addresses the question of the goal of computational scene analysis. A main reason why scene analysis is difficult in computational intelligence is the binding problem, which refers to how a collection of features comprising an object in a scene is represented in a neural network. In this context, temporal correlation theory is introduced as a biologically plausible representation for addressing the binding problem. The LEGION network lays a computational foundation for oscillatory correlation, which is a special form of temporal correlation. Recent results on visual and auditory scene analysis are described in the oscillatory correlation framework, with emphasis on real-world scenes. Also discussed are the issues of attention, feature-based versus model-based analysis, and representation versus learning. Finally, the chapter points out that the time dimension and David Marr's framework for understanding perception are essential for computational scene analysis.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allen JB (2005) Articulation and intelligibility. Morgan & Claypool

    Google Scholar 

  2. Arbib MA ed (2003) Handbook of brain theory and neural networks. 2nd ed, MIT Press, Cambridge MA

    MATH  Google Scholar 

  3. Barlow HB (1972) Single units and cognition: A neurone doctrine for perceptual psychology. Percept 1:371-394

    Article  Google Scholar 

  4. Biederman I (1987) Recognition-by-component: A theory of human image understanding. Psychol Rev 94:115-147

    Article  Google Scholar 

  5. Black MJ, Anandan P (1996) The robust estimation of multiple motions: parametric and piecewise-smooth flow fields. CVGIP: Image Understand-ing 63:75-104

    Article  Google Scholar 

  6. Bregman AS (1990) Auditory scene analysis. MIT Press, Cambridge MA

    Google Scholar 

  7. Campbell SR, Wang DL, Jayaprakash C (1999) Synchrony and desynchrony in integrate-and-fire oscillators. Neural Comp 11:1595-1619

    Article  Google Scholar 

  8. Cesmeli E, Wang DL (2000) Motion segmentation based on motion/ brightness integration and oscillatory correlation. IEEE Trans Neural Net 11:935-947

    Article  Google Scholar 

  9. Chang P (2004) Exploration of behavioral, physiological, and computa-tional approaches to auditory scene analysis. MS Thesis, The Ohio State University Department of Computer Science and Engineering (available at http://www.cse.ohiostate.edu/pnl/theses.html)

  10. Chen K, Wang DL, Liu X (2000) Weight adaptation and oscillatory correlation for image segmentation. IEEE Trans Neural Net 11:1106-1123

    Article  Google Scholar 

  11. Cherry EC (1953) Some experiments on the recognition of speech, with one and with two ears. J Acoust Soc Am 25:975-979

    Article  Google Scholar 

  12. Cowan N (2001) The magic number 4 in short-term memory: a reconsideration of mental storage capacity. Behav Brain Sci 24:87-185

    Article  Google Scholar 

  13. Darwin CJ (1997) Auditory grouping. Trends Cogn Sci 1:327-333

    Article  Google Scholar 

  14. Domijan D (2004) Recurrent network with large representational capacity. Neural Comp 16:1917-1942

    Article  MATH  Google Scholar 

  15. Driver J, Baylis GC (1998) Attention and visual object recognition. In: Parasuraman R (ed) The attentive brain. MIT Press Cambridge MA, pp. 299-326

    Google Scholar 

  16. Duncan J, Humphreys GW (1989) Visual search and stimulus similarity. Psychol Rev, 96:433-458

    Article  Google Scholar 

  17. Fabre-Thorpe M, Delorme A, Marlot C, Thorpe S (2001) A limit to the speed of processing in ultra-rapid visual categorization of novel natural scenes. J Cog Neurosci 13:1-10

    Article  Google Scholar 

  18. Field DJ, Hayes A, Hess RF (1993) Contour integration by the human visual system: Evidence for a local “association field”. Vis Res 33:173-193

    Article  Google Scholar 

  19. FitzHugh R (1961) Impulses and physiological states in models of nerve membrane. Biophys J 1:445-466

    Article  Google Scholar 

  20. Fukushima K, Imagawa T (1993) Recognition and segmentation of connected characters with selective attention. Neural Net 6:33-41

    Article  Google Scholar 

  21. Gibson JJ (1966) The senses considered as perceptual systems. Greenwood Press, Westport CT

    Google Scholar 

  22. Gold B, Morgan N (2000) Speech and audio signal processing. Wiley & Sons, New York

    Google Scholar 

  23. Gray CM (1999) The temporal correlation hypothesis of visual feature integration: still alive and well. Neuron 24:31-47

    Article  Google Scholar 

  24. Kahneman D, Treisman A, Gibbs B (1992) The reviewing of object files: object-specific integration of information. Cognit Psychol 24:175-219

    Article  Google Scholar 

  25. Kareev Y (1995) Through a narrow window: Working memory capacity and the detection of covariation. Cognition 56:263-269

    Article  Google Scholar 

  26. Knill DC, Richards W eds (1996) Perception as Bayesian inference. Cambridge University Press, New York

    MATH  Google Scholar 

  27. Koffka K (1935) Principles of Gestalt psychology. Harcourt, New York

    Google Scholar 

  28. Konen W, von der Malsburg C (1993) Learning to generalize from single examples in the dynamic link architecture. Neural Comp 5:719-735

    Article  Google Scholar 

  29. MacGregor JN (1987) Short-term memory capacity: Limitation or opti-mization? Psychol Rev 94:107-108

    Article  Google Scholar 

  30. Marr D (1982) Vision. Freeman, New York

    Google Scholar 

  31. Mattingley JB, Davis G, Driver J (1997) Preattentive filling-in of visual surfaces in parietal extinction. Science 275:671-674

    Article  Google Scholar 

  32. Milner, PM (1974) A model for visual shape recognition. Psychol Rev 81(6):521-535

    Article  Google Scholar 

  33. Minsky ML, Papert SA (1969) Perceptrons. MIT Press, Cambridge MA

    MATH  Google Scholar 

  34. Minsky ML, Papert SA (1988) Perceptrons (Expanded ed). MIT Press, Cambridge MA

    MATH  Google Scholar 

  35. Morris C, Lecar H (1981) Voltage oscillations in the barnacle giant muscle fiber. Biophys J 35:193-213

    Article  Google Scholar 

  36. Nagumo J, Arimoto S, Yoshizawa S (1962) An active pulse transmission line simulating nerve axon. Proc IRE 50:2061-2070

    Article  Google Scholar 

  37. Nakayama K, He ZJ, Shimojo S (1995) Visual surface representation: A critical link between lower-level and higher-level vision. In: Kosslyn SM, Osherson DN (eds) An invitation to cognitive science. MIT Press, Cambridge MA, pp. 1-70

    Google Scholar 

  38. Norris M (2003) Assessment and extension of Wang's oscillatory model of auditory stream segregation. PhD Dissertation, University of Queensland School of Information Technology and Electrical Engineering

    Google Scholar 

  39. Olshausen BA, Anderson CH, Van Essen DC (1993) A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information. J Neurosci 13:4700-4719

    Google Scholar 

  40. Palmer SE (1999) Vision science. MIT Press, Cambridge MA

    Google Scholar 

  41. Parasuraman R ed (1998) The attentive brain. MIT Press, Cambridge MA

    Google Scholar 

  42. Pashler HE (1998) The psychology of attention. MIT Press, Cambridge MA

    Google Scholar 

  43. Reynolds JH, Desimone R (1999) The role of neural mechanisms of attention in solving the binding problem. Neuron 24:19-29

    Article  Google Scholar 

  44. Riesenhuber M, Poggio T (1999) Are cortical models really bound by the “binding problem”? Neuron 24:87-93

    Article  Google Scholar 

  45. Roman N, Wang DL, Brown GJ (2003) Speech segregation based on sound localization. J Acoust Soc Am 114:2236-2252

    Article  Google Scholar 

  46. Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65:386-408

    Article  MathSciNet  Google Scholar 

  47. Rosenblatt F (1962) Principles of neural dynamics. Spartan, New York

    Google Scholar 

  48. Rumelhart DE, McClelland JL eds (1986) Parallel distributed processing 1: Foundations. MIT Press, Cambridge MA

    Google Scholar 

  49. Russell S, Norvig P (2003) Artificial intelligence: A modern approach. 2nd ed Prentice Hall, Upper Saddle River, NJ

    Google Scholar 

  50. Shadlen MN, Movshon JA (1999) Synchrony unbound: a critical evaluation of the temporal binding hypothesis. Neuron 24:67-77.

    Article  Google Scholar 

  51. Somers D, Kopell N (1993) Rapid synchrony through fast threshold modulation. Biol Cybern, 68:393-407

    Article  Google Scholar 

  52. Terman D, Wang DL (1995) Global competition and local cooperation in a network of neural oscillators, Physica D 81:148-176

    Article  MATH  MathSciNet  Google Scholar 

  53. Thorpe S, Fabre-Thorpe M (2003) Fast visual processing. In: Arbib MA (ed) Handbook of Brain Theory and Neural Networks. MIT Press, Cambridge MA, pp. 441-444

    Google Scholar 

  54. Thorpe S, Fize D, Marlot C (1996) Speed of processing in the human visual system. Nature 381:520-522

    Article  Google Scholar 

  55. Treisman A (1986) Features and objects in visual processing. Sci Am, November, Reprinted in The perceptual world, Rock I (ed). Freeman and Company, New York, pp. 97-110

    Google Scholar 

  56. Treisman A (1999) Solutions to the binding problem: progress through controversy and convergence. Neuron 24:105-110

    Article  Google Scholar 

  57. Treisman A, Gelade G (1980) A feature-integration theory of attention. Cognit Psychol 12:97-136

    Article  Google Scholar 

  58. van der Pol B (1926) On “relaxation oscillations”. Phil Mag 2(11):978-992

    Google Scholar 

  59. von der Malsburg C (1981) The correlation theory of brain function. Internal Report 81-2, Max-Planck-Institute for Biophysical Chemistry, Reprinted in Models of neural networks II, Domany E, van Hemmen JL, Schulten K, eds (1994) Springer, Berlin

    Google Scholar 

  60. von der Malsburg C (1999) The what and why of binding: the modeler's perspective. Neuron 24:95-104

    Article  Google Scholar 

  61. von der Malsburg C, Schneider W (1986) A neural cocktail-party processor. Biol Cybern 54:29-40

    Article  Google Scholar 

  62. Wang DL (1995) Emergent synchrony in locally coupled neural oscillators. IEEE Trans Neural Net 6(4):941-948

    Article  Google Scholar 

  63. Wang DL (1996) Primitive auditory segregation based on oscillatory correlation. Cognit Sci 20:409-456

    Article  Google Scholar 

  64. Wang DL (2000) On connectedness: a solution based on oscillatory correlation. Neural Comp 12:131-139

    Article  Google Scholar 

  65. Wang DL (2005) The time dimension for scene analysis. IEEE Trans Neural Net 16:1401-1426

    Article  Google Scholar 

  66. Wang DL, Brown GJ (1999) Separation of speech from interfering sounds based on oscillatory correlation. IEEE Trans Neural Net 10:684-697

    Article  Google Scholar 

  67. Wang DL, Kristjansson A, Nakayama K (2005) Efficient visual search without top-down or bottom-up guidance. Percept Psychophys 67:239-253

    Google Scholar 

  68. Wang DL, Terman D (1995) Locally excitatory globally inhibitory oscillator networks. IEEE Trans Neural Net 6(1):283-286

    Article  MathSciNet  Google Scholar 

  69. Wang DL, Terman D (1997) Image segmentation based on oscillatory correlation. Neural Comp 9:805-836 (for errata see Neural Comp 9:1623-1626)

    Article  Google Scholar 

  70. Wersing H, Steil JJ, Ritter H (2001) A competitive-layer model for feature binding and sensory segmentation. Neural Comp 13:357-388

    Article  MATH  Google Scholar 

  71. Wertheimer M (1923) Untersuchungen zur Lehre von der Gestalt, II. Psychol Forsch 4:301-350

    Article  Google Scholar 

  72. Wrigley SN, Brown GJ (2004) A computational model of auditory selective attention. IEEE Trans Neural Net 15:1151-1163

    Article  Google Scholar 

  73. Yantis S (1998) Control of visual attention. In: Pashler H (ed) Attention.Psychology Press, London, pp. 223-256

    Google Scholar 

  74. Yen SC, Finkel LH (1998) Extraction of perceptually salient contours by striate cortical networks. Vis Res 38:719-741

    Article  Google Scholar 

  75. Zhang X, Minai AA (2004) Temporally sequenced intelligent blockmatching and motion-segmentation using locally coupled networks. IEEE Trans Neural Net 15:1202-1214

    Article  Google Scholar 

  76. Zhao L, Macau EEN (2001) A network of dynamically coupled chaotic maps for scene segmentation. IEEE Trans Neural Net 12:1375-1385

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Wang, D. (2007). Computational Scene Analysis. In: Duch, W., Mańdziuk, J. (eds) Challenges for Computational Intelligence. Studies in Computational Intelligence, vol 63. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71984-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71984-7_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71983-0

  • Online ISBN: 978-3-540-71984-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics