Skip to main content

Language Label Learning for Visual Concepts Discovered from Video Sequences

  • Conference paper
Attention in Cognitive Systems. Theories and Systems from an Interdisciplinary Viewpoint (WAPCV 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4840))

Included in the following conference series:

  • 1644 Accesses

Abstract

Computational models of grounded language learning have been based on the premise that words and concepts are learned simultaneously. Given the mounting cognitive evidence for concept formation in infants, we argue that the availability of pre-lexical concepts (learned from image sequences) leads to considerable computational efficiency in word acquisition. Key to the process is a model of bottom-up visual attention in dynamic scenes. Background learning and foreground segmentation is used to generate robust tracking and detect occlusion events. Trajectories are clustered to obtain motion event concepts. The object concepts (image schemas) are abstracted from the combined appearance and motion data. The set of acquired concepts under visual attentive focus are then correlated with contemporaneous commentary to learn the grounded semantics of words and multi-word phrasal concatenations from the narrative. We demonstrate that even based on a mere half hour of video (of a scene involving many objects and activities), a number of rudimentary concepts can be discovered. When these concepts are associated with unedited English commentary, we find that several words emerge - approximately half the identified concepts from the video are associated with the correct concepts. Thus, the computational model reflects the beginning of language comprehension, based on attentional parsing of the visual data. Finally, the emergence of multi-word phrasal concatenations, a precursor to syntax, is observed where they are more salient referents than single words.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Piaget, J.: The Construction of Reality in the Child. Basic Books, New York (1994)

    Google Scholar 

  2. Fodor, J.A., Lepore, E.: What Can’t Be Evaluated Can’t Be Evaluated, and It Can’t Be Supervalued Either. Journal Of Philosophy 93, 516–536 (1996)

    MathSciNet  Google Scholar 

  3. Carey, S.: Knowledge acquisition: Enrichment or conceptual change? In: Carey, S., Gelman, R. (eds.) The Epigenesis of Mind: Essays in Biology and Cognition, pp. 257–291. MIT Press, Cambridge (1999)

    Google Scholar 

  4. Mandler, J.M.: Foundations of Mind. Oxford University Press, New York (2004)

    Google Scholar 

  5. Quin, P., Eimas, P.: The emergence of category representation during infancy: Are separate perceptual and conceptual processes required? Journal of Cognition and development 1, 55–61 (2000)

    Article  Google Scholar 

  6. Jones, S.S., Smith, L.B.: The place of perception in children’s concepts. Cognitive Development 8, 113–139 (1993)

    Article  Google Scholar 

  7. Mandler, J.M.: A synopsis of The foundations of mind: Origins of conceptual thought. Developmental Science 7, 499–505 (2004)

    Article  Google Scholar 

  8. Barsalou, L.W.: Perceptual symbol systems. Behavioral and Brain Sciences 22, 577–609 (1999)

    Google Scholar 

  9. Regier, T.: The Human Semantic Potential: Spatial Language and Constrained Connectionism. Bradford Books (1996)

    Google Scholar 

  10. Roy, D.K., Pentland, A.P.: Learning words from sights and sounds: a computational model. Cognitive Science 26, 113–146 (2002)

    Article  Google Scholar 

  11. Langacker, R.: Foundations of Cognitive Grammar, Descriptive Application, vol. 2. Stanford University Press, Stanford, CA (1991)

    Google Scholar 

  12. Quine, W.V.O.: Word and Object. John Wiley and Sons, New York (1960)

    MATH  Google Scholar 

  13. Singh, V.K., Maji, S., Mukerjee, A.: Confidence Based updation of Motion Conspicuity in Dynamic Scenes. In: CRV 2006. Third Canadian Conference on Computer and Robot Vision (2006)

    Google Scholar 

  14. Itti, L., Koch, C.: Computational modeling of visual attention. Nature Reviews Neuroscience 2, 194–203 (2001)

    Article  Google Scholar 

  15. Coldren, J.T., Haaf, R.A.: Priority of processing components of visual stimuli by 6-month-old infants. Infant Behavior and Development 22, 131–135 (1999)

    Article  Google Scholar 

  16. Yu, C., Ballard, D.H.: A Multimodal Learning Interface for Grounding Spoken Language in Sensory Perceptions. ACM Transactions on Applied Perception  (2004)

    Google Scholar 

  17. Baillargeon, R., hua Wang, S.: Event categorization in infancy. Trends in Cognitive Sciences 6, 85–93 (2002)

    Article  Google Scholar 

  18. Guha, P., Biswas, A., Mukerjee, A., Venkatesh, K.: Occlusion sequence mining for complex multi-agent activity discovery. In: Proceedings of The Sixth IEEE International Workshop on Visual Surveillance, pp. 33–40 (2006)

    Google Scholar 

  19. Roy, D.: Semiotic schemas: A framework for grounding language in action and perception. Artificial Intelligence 167, 170–205 (2005)

    Article  Google Scholar 

  20. Dominey, P.F., Boucher, J.D.: Learning To Talk About Events From Narrated Video in the Construction Grammar Framework. Artificial Intelligence 167, 31–61 (2005)

    Article  Google Scholar 

  21. Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D.M., Jordan, M.I.: Matching words and pictures. Journal of Machine Learning Research 3, 1107–1135 (2003)

    MATH  Google Scholar 

  22. Chang, Y.-H., Morrison, C.T., Kerr, W., Galstyan, A., Cohen, P.R., Beal, C., Amant, R.S., Oates, T.: The Jean System. In: ICDL 2006. International Conference on Development and Learning (2006)

    Google Scholar 

  23. Siskind, J.M.: Grounding the Lexical Semantics of Verbs in Visual Perception using Force Dynamics and Event Logic. J. of Artificial Intelligence Res. 15, 31–90 (2001)

    MATH  Google Scholar 

  24. Zivkovic, Z.: Improved adaptive gaussian mixture model for background subtraction. In: Proceedings of the 17th International Conference on Pattern Recognition, vol. 2, pp. 28–31 (2004)

    Google Scholar 

  25. Proesmans, M., Van Gool, L.J., Pauwels, E.J., Osterlinck, A.: Determination of optical flow and its discontinuities using non-linear diffusion. In: Eklundh, J.-O. (ed.) ECCV 1994. LNCS, vol. 801, pp. 295–304. Springer, Heidelberg (1994)

    Google Scholar 

  26. Guha, P., Mukerjee, A., Venkatesh, K.S.: Spatio-temporal Discovery: Appearance + Behavior = Agent. In: Kalra, P., Peleg, S. (eds.) ICVGIP 2006. LNCS, vol. 4338, pp. 516–527. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  27. Bloom, P.: How Children Learn the Meanings of Words, pp. 55–87. MIT Press, Cambridge (2000)

    Google Scholar 

  28. Rothenstein, A.L., Tsotsos, J.K.: Attention links sensing to recognition. Image and Vision Computing , 1–13 (2006), doi:10.1016/j.imavis.2005.08.011

    Google Scholar 

  29. Regier, T.: Emergent constraints on word-learning: A computational review. Trends in Cognitive Sciences 7, 263–268 (2003)

    Article  Google Scholar 

  30. Shutts, K., Spelke, E.S.: Straddling the perception-conception boundary. Developmental Science 7, 507–511 (2004)

    Article  Google Scholar 

  31. Stromswold, K.: The cognitive neuroscience of language acquisition. In: Gazzaniga (ed.) The new cognitive neurosciences, pp. 909–932. MIT Press, Cambridge, MA (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Guha, P., Mukerjee, A. (2007). Language Label Learning for Visual Concepts Discovered from Video Sequences. In: Paletta, L., Rome, E. (eds) Attention in Cognitive Systems. Theories and Systems from an Interdisciplinary Viewpoint. WAPCV 2007. Lecture Notes in Computer Science(), vol 4840. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77343-6_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-77343-6_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-77342-9

  • Online ISBN: 978-3-540-77343-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics