Language Label Learning for Visual Concepts Discovered from Video Sequences

Guha, Prithwijit; Mukerjee, Amitabha

doi:10.1007/978-3-540-77343-6_6

Prithwijit Guha³ &
Amitabha Mukerjee⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4840))

Included in the following conference series:

International Workshop on Attention in Cognitive Systems

1644 Accesses

Abstract

Computational models of grounded language learning have been based on the premise that words and concepts are learned simultaneously. Given the mounting cognitive evidence for concept formation in infants, we argue that the availability of pre-lexical concepts (learned from image sequences) leads to considerable computational efficiency in word acquisition. Key to the process is a model of bottom-up visual attention in dynamic scenes. Background learning and foreground segmentation is used to generate robust tracking and detect occlusion events. Trajectories are clustered to obtain motion event concepts. The object concepts (image schemas) are abstracted from the combined appearance and motion data. The set of acquired concepts under visual attentive focus are then correlated with contemporaneous commentary to learn the grounded semantics of words and multi-word phrasal concatenations from the narrative. We demonstrate that even based on a mere half hour of video (of a scene involving many objects and activities), a number of rudimentary concepts can be discovered. When these concepts are associated with unedited English commentary, we find that several words emerge - approximately half the identified concepts from the video are associated with the correct concepts. Thus, the computational model reflects the beginning of language comprehension, based on attentional parsing of the visual data. Finally, the emergence of multi-word phrasal concatenations, a precursor to syntax, is observed where they are more salient referents than single words.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Piaget, J.: The Construction of Reality in the Child. Basic Books, New York (1994)
Google Scholar
Fodor, J.A., Lepore, E.: What Can’t Be Evaluated Can’t Be Evaluated, and It Can’t Be Supervalued Either. Journal Of Philosophy 93, 516–536 (1996)
MathSciNet Google Scholar
Carey, S.: Knowledge acquisition: Enrichment or conceptual change? In: Carey, S., Gelman, R. (eds.) The Epigenesis of Mind: Essays in Biology and Cognition, pp. 257–291. MIT Press, Cambridge (1999)
Google Scholar
Mandler, J.M.: Foundations of Mind. Oxford University Press, New York (2004)
Google Scholar
Quin, P., Eimas, P.: The emergence of category representation during infancy: Are separate perceptual and conceptual processes required? Journal of Cognition and development 1, 55–61 (2000)
Article Google Scholar
Jones, S.S., Smith, L.B.: The place of perception in children’s concepts. Cognitive Development 8, 113–139 (1993)
Article Google Scholar
Mandler, J.M.: A synopsis of The foundations of mind: Origins of conceptual thought. Developmental Science 7, 499–505 (2004)
Article Google Scholar
Barsalou, L.W.: Perceptual symbol systems. Behavioral and Brain Sciences 22, 577–609 (1999)
Google Scholar
Regier, T.: The Human Semantic Potential: Spatial Language and Constrained Connectionism. Bradford Books (1996)
Google Scholar
Roy, D.K., Pentland, A.P.: Learning words from sights and sounds: a computational model. Cognitive Science 26, 113–146 (2002)
Article Google Scholar
Langacker, R.: Foundations of Cognitive Grammar, Descriptive Application, vol. 2. Stanford University Press, Stanford, CA (1991)
Google Scholar
Quine, W.V.O.: Word and Object. John Wiley and Sons, New York (1960)
MATH Google Scholar
Singh, V.K., Maji, S., Mukerjee, A.: Confidence Based updation of Motion Conspicuity in Dynamic Scenes. In: CRV 2006. Third Canadian Conference on Computer and Robot Vision (2006)
Google Scholar
Itti, L., Koch, C.: Computational modeling of visual attention. Nature Reviews Neuroscience 2, 194–203 (2001)
Article Google Scholar
Coldren, J.T., Haaf, R.A.: Priority of processing components of visual stimuli by 6-month-old infants. Infant Behavior and Development 22, 131–135 (1999)
Article Google Scholar
Yu, C., Ballard, D.H.: A Multimodal Learning Interface for Grounding Spoken Language in Sensory Perceptions. ACM Transactions on Applied Perception (2004)
Google Scholar
Baillargeon, R., hua Wang, S.: Event categorization in infancy. Trends in Cognitive Sciences 6, 85–93 (2002)
Article Google Scholar
Guha, P., Biswas, A., Mukerjee, A., Venkatesh, K.: Occlusion sequence mining for complex multi-agent activity discovery. In: Proceedings of The Sixth IEEE International Workshop on Visual Surveillance, pp. 33–40 (2006)
Google Scholar
Roy, D.: Semiotic schemas: A framework for grounding language in action and perception. Artificial Intelligence 167, 170–205 (2005)
Article Google Scholar
Dominey, P.F., Boucher, J.D.: Learning To Talk About Events From Narrated Video in the Construction Grammar Framework. Artificial Intelligence 167, 31–61 (2005)
Article Google Scholar
Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D.M., Jordan, M.I.: Matching words and pictures. Journal of Machine Learning Research 3, 1107–1135 (2003)
MATH Google Scholar
Chang, Y.-H., Morrison, C.T., Kerr, W., Galstyan, A., Cohen, P.R., Beal, C., Amant, R.S., Oates, T.: The Jean System. In: ICDL 2006. International Conference on Development and Learning (2006)
Google Scholar
Siskind, J.M.: Grounding the Lexical Semantics of Verbs in Visual Perception using Force Dynamics and Event Logic. J. of Artificial Intelligence Res. 15, 31–90 (2001)
MATH Google Scholar
Zivkovic, Z.: Improved adaptive gaussian mixture model for background subtraction. In: Proceedings of the 17th International Conference on Pattern Recognition, vol. 2, pp. 28–31 (2004)
Google Scholar
Proesmans, M., Van Gool, L.J., Pauwels, E.J., Osterlinck, A.: Determination of optical flow and its discontinuities using non-linear diffusion. In: Eklundh, J.-O. (ed.) ECCV 1994. LNCS, vol. 801, pp. 295–304. Springer, Heidelberg (1994)
Google Scholar
Guha, P., Mukerjee, A., Venkatesh, K.S.: Spatio-temporal Discovery: Appearance + Behavior = Agent. In: Kalra, P., Peleg, S. (eds.) ICVGIP 2006. LNCS, vol. 4338, pp. 516–527. Springer, Heidelberg (2006)
Chapter Google Scholar
Bloom, P.: How Children Learn the Meanings of Words, pp. 55–87. MIT Press, Cambridge (2000)
Google Scholar
Rothenstein, A.L., Tsotsos, J.K.: Attention links sensing to recognition. Image and Vision Computing , 1–13 (2006), doi:10.1016/j.imavis.2005.08.011
Google Scholar
Regier, T.: Emergent constraints on word-learning: A computational review. Trends in Cognitive Sciences 7, 263–268 (2003)
Article Google Scholar
Shutts, K., Spelke, E.S.: Straddling the perception-conception boundary. Developmental Science 7, 507–511 (2004)
Article Google Scholar
Stromswold, K.: The cognitive neuroscience of language acquisition. In: Gazzaniga (ed.) The new cognitive neurosciences, pp. 909–932. MIT Press, Cambridge, MA (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, Indian Institute of Technology, Kanpur, Kanpur - 208016, Uttar Pradesh, India
Prithwijit Guha
Department of Computer Science & Engineering, Indian Institute of Technology, Kanpur, Kanpur - 208016, Uttar Pradesh, India
Amitabha Mukerjee

Authors

Prithwijit Guha
View author publications
You can also search for this author in PubMed Google Scholar
Amitabha Mukerjee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Joanneum Research, Forschungsgesellschaft mbH, Computational Perception Group,, Institute of Digital Image Processing, Wastiangasse 6, 8010, Graz, Austria
Lucas Paletta
Autonomous Intelligent Systems (AIS), Autonomous Robots Department, Fraunhofer Institute, Schloss Birlinghoven, 53754, Sankt Augustin, Germany
Erich Rome

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guha, P., Mukerjee, A. (2007). Language Label Learning for Visual Concepts Discovered from Video Sequences. In: Paletta, L., Rome, E. (eds) Attention in Cognitive Systems. Theories and Systems from an Interdisciplinary Viewpoint. WAPCV 2007. Lecture Notes in Computer Science(), vol 4840. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77343-6_6

Download citation

DOI: https://doi.org/10.1007/978-3-540-77343-6_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77342-9
Online ISBN: 978-3-540-77343-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics