Pragmatic Inference and Visual Abstraction Enable Contextual Flexibility During Visual Communication

Fan, Judith E.; Hawkins, Robert D.; Wu, Mike; Goodman, Noah D.

doi:10.1007/s42113-019-00058-7

Pragmatic Inference and Visual Abstraction Enable Contextual Flexibility During Visual Communication

Original Paper
Published: 05 September 2019

Volume 3, pages 86–101, (2020)
Cite this article

Computational Brain & Behavior Aims and scope Submit manuscript

Judith E. Fan ORCID: orcid.org/0000-0002-0097-3254¹,
Robert D. Hawkins²,
Mike Wu³ &
…
Noah D. Goodman^2,3

1384 Accesses
15 Citations
22 Altmetric
Explore all metrics

Abstract

Visual modes of communication are ubiquitous in modern life—from maps to data plots to political cartoons. Here, we investigate drawing, the most basic form of visual communication. Participants were paired in an online environment to play a drawing-based reference game. On each trial, both participants were shown the same four objects, but in different locations. The sketcher’s goal was to draw one of these objects so that the viewer could select it from the array. On “close” trials, objects belonged to the same basic-level category, whereas on “far” trials objects belonged to different categories. We found that people exploited shared information to efficiently communicate about the target object: on far trials, sketchers achieved high recognition accuracy while applying fewer strokes, using less ink, and spending less time on their drawings than on close trials. We hypothesized that humans succeed in this task by recruiting two core faculties: visual abstraction, the ability to perceive the correspondence between an object and a drawing of it; and pragmatic inference, the ability to judge what information would help a viewer distinguish the target from distractors. To evaluate this hypothesis, we developed a computational model of the sketcher that embodied both faculties, instantiated as a deep convolutional neural network nested within a probabilistic program. We found that this model fit human data well and outperformed lesioned variants. Together, this work provides the first algorithmically explicit theory of how visual perception and social cognition jointly support contextual flexibility in visual communication.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

How feature integration theory integrated cognitive psychology, neurophysiology, and psychophysics

Article 09 July 2019

Humans versus AI: whether and why we prefer human-created compared to AI-created artwork

Article Open access 04 July 2023

No one knows what attention is

Article Open access 05 September 2019

Code and Data Availability

All code and data used to produce the results in this article are publicly available in a Github repository at: https://github.com/judithfan/visual_communication_in_context. The code used to train the visual encoder module is available at: https://github.com/judithfan/visual-modules-for-sketch-communication-public..

Notes

As a property of the input domain, the gradients with respect to adaptor parameters are very small (1.51e-4 ± 2.61e-4), inevitably resulting in poor learning (we can reproduce this effect from several initializations). We find that naively increasing the learning rate led to unstable optimization, but that multiplying the loss by a large constant C leads to a much smoother learning trajectories and good test generalization. Critically, increasing the learning rate and multiplying the loss by a constant are not equivalent for second moment gradient methods. In practice, C = 1e4.

References

Abell, C. (2009). Canny resemblance. Philosophical Review, 118(2), 183–223.
Article Google Scholar
Allen, JP. (2000). Middle egyptian: An introduction to the language and culture of hieroglyphs. Cambridge: Cambridge University Press.
Google Scholar
Aubert, M, Brumm, A, Ramli, M, Sutikna, T, Saptomo, E W, Hakim, B, Morwood, M J, van den Bergh, G D, Kinsley, L, Dosseto, A. (2014). Pleistocene cave art from Sulawesi, Indonesia. Nature, 514 (7521), 223–227.
Article PubMed Google Scholar
Bergen, L, Levy, R, Goodman, N. (2016). Pragmatic reasoning through semantic inference. Semantics and Pragmatics, 9.
Boltz, WG. (1994). The origin and early development of the Chinese writing system, Vol 78. American Oriental Society.
Cohn-Gordon, R, Goodman, ND, Potts, C. (2018). Pragmatically informative image captioning with character-level inference. In Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics (pp. 439–443).
Cumming, G. (2014). The new statistics: Why and how. Psychological Science, 25(1), 7–29.
Article PubMed Google Scholar
Deng, J, Dong, W, Socher, R, Li, LJ, Li, K, Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In Computer vision and pattern recognition, 2009, IEEE (pp. 248–255).
Donald, M. (1991). Origins of the modern mind: Three stages in the evolution of culture and cognition. Harvard University Press.
Efron, B, & Tibshirani, RJ. (1994). An introduction to the bootstrap. CRC Press.
Fan, JE, Yamins, D.L.K, Turk-Browne, NB. (2018). Common object representations for visual production and recognition. Cognitive Science. https://onlinelibrary.wiley.com/doi/abs/10.1111/cogs.12676.
Fay, N, Garrod, S, Roberts, L, Swoboda, N. (2010). The interactive evolution of human communication systems. Cognitive Science, 34(3), 351–386.
Article PubMed Google Scholar
Forbus, K D, Usher, J M, Lovett, A M, Lockwood, K, Wetzel, J. (2008). Cogsketch: Open-domain sketch understanding for cognitive science research and for education. SBM, 8, 159–166.
Google Scholar
Frank, M C, & Goodman, N D. (2012). Predicting pragmatic reasoning in language games. Science, 336 (6084), 998–998.
Article PubMed Google Scholar
Franke, M, & Jäger, G. (2016). Probabilistic pragmatics, or why bayes’ rule is probably important for pragmatics. Zeitschrift für sprachwissenschaft, 35(1), 3–44.
Google Scholar
Gal, Y, & Ghahramani, Z. (2015). Dropout as a bayesian approximation: Insights and applications. In Deep learning workshop, ICML (Vol. 1, pp. 2).
Galantucci, B. (2005). An experimental study of the emergence of human communication systems. Cognitive Science, 29(5), 737–767.
Article PubMed Google Scholar
Ganin, Y, Kulkarni, T, Babuschkin, I, Eslami, S, Vinyals, O. (2018). Synthesizing programs for images using reinforced adversarial learning. arXiv:180401118.
Garrod, S, Fay, N, Lee, J, Oberlander, J, MacLeod, T. (2007). Foundations of representation: where might graphical symbol systems come from? Cognitive Science, 31(6), 961–987.
Article PubMed Google Scholar
Garrod, S, Fay, N, Rogers, S, Walker, B, Swoboda, N. (2010). Can iterated learning explain the emergence of graphical symbols? Interaction Studies, 11(1), 33–50.
Article Google Scholar
Gibson, JJ. (1979). The ecological approach to visual perception: Classic edition. Psychology Press.
Goldin-Meadow, S, & Feldman, H. (1977). The development of language-like communication without a language model. Science, 197(4301), 401–403.
Article PubMed Google Scholar
Gombrich, E. (1969). Art and illusion: A study in the psychology of pictorial representation. Princeton: Princeton University Press.
Gombrich, E. (1989). The story of art. Phaidon Press, Ltd.
Goodman, N. (1976). Languages of art: An approach to a theory of symbols. Hackett Publishing.
Goodman, N, & Frank, M. (2016). Pragmatic language interpretation as probabilistic inference. Trends in Cognitive Sciences, 20(11), 818–829.
Article PubMed Google Scholar
Goodman, N, & Stuhlmüller, A. (2013). Knowledge and implicature: Modeling language understanding as social cognition. Topics in Cognitive Science, 5(1), 173–184.
Article PubMed Google Scholar
Goodman, N, & Stuhlmüller, A. (2014). The design and implementation of probabilistic programming languages.
Grice, H P. (1975). Logic and conversation. In P. Cole and J. Morgan (Eds.), Syntax & semantics, 3.
Google Scholar
Güċlü, U, & van Gerven, M A. (2015). Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. Journal of Neuroscience, 35(27), 10005–10014.
Article PubMed Google Scholar
Ha, D, & Eck, D. (2017). A neural representation of sketch drawings. arXiv:170403477.
Hawkins, R, Sano, M, Goodman, N, Fan, J. (2019). Disentangling contributions of visual information and interaction history in the formation of graphical conventions. In Proceedings of the 41st annual conference of the cognitive science society. Austin, TX: Cognitive Science Society.
Hinton, GE, Srivastava, N, Krizhevsky, A, Sutskever, I, Salakhutdinov, RR. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv:12070580.
Hochberg, J, & Brooks, V. (1962). Pictorial recognition as an unlearned ability: A study of one child’s performance. The American Journal of Psychology, pp. 624–628.
Hoffmann, D, Standish, C, García-Diez, M., Pettitt, P, Milton, J, Zilhão, J., Alcolea-González, J., Cantalejo-Duarte, P, Collado, H, De Balbìn, R. (2018). U-th dating of carbonate crusts reveals neandertal origin of iberian cave art. Science, 359(6378), 912–915.
Article PubMed Google Scholar
Jefferys, W H, & Berger, J O. (1992). Ockham’s razor and bayesian analysis. American Scientist, 80(1), 64–72.
Google Scholar
Kao, J, Bergen, L, Goodman, N. (2014). Formalizing the pragmatics of metaphor understanding. In Proceedings of the 36th annual meeting of the cognitive science society (Vol. 36).
Kennedy, J M, & Ross, A S. (1975). Outline picture perception by the songe of papua. Perception, 4(4), 391–406.
Article Google Scholar
Kingma, DP, & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv:14126980.
Kubilius, J, Bracci, S, de Beeck, H P O. (2016). Deep neural networks as a computational model for human shape sensitivity. PLoS Computational Biology, 12(4), e1004896.
Article PubMed PubMed Central Google Scholar
Lake, B M, Salakhutdinov, R, Tenenbaum, J B. (2015). Human-level concept learning through probabilistic program induction. Science, 350(6266), 1332–1338.
Article PubMed Google Scholar
Lewis, D. (1969). Convention: A philosophical study. Harvard University Press.
Malach, R, Levy, I, Hasson, U. (2002). The topography of high-order human object areas. Trends in cognitive sciences, 6(4), 176–184.
Article PubMed Google Scholar
Medin, D L, & Schaffer, M M. (1978). Context theory of classification learning. Psychological Review, 85(3), 207.
Article Google Scholar
Monroe, W, Hawkins, RX, Goodman, ND, Potts, C. (2017). Colors in context: A pragmatic neural model for grounded language understanding. arXiv:170310186.
Mukherjee, K, Hawkins, R, Fan, J. (2019). Conveying semantic part information in drawings. In Proceedings of the 41st annual conference of the cognitive science society. Austin, TX: Cognitive Science Society.
Nosofsky, R M. (1988). Exemplar-based accounts of relations between classification, recognition, and typicality. Journal of Experimental Psychology:, learning, memory, and cognition, 14(4), 700.
Google Scholar
Nosofsky, R M. (2011). The generalized context model: An exemplar model of classification, Formal approaches in categorization, pp. 18–39.
Peterson, J C, Abbott, J T, Griffiths, T L. (2018). Evaluating (and improving) the correspondence between deep neural networks and human representations. Cognitive Science, 42(8), 2648–2669.
Article PubMed Google Scholar
Ramachandran, P, Zoph, B, Le, QV. (2018). Searching for activation functions. arXiv preprint arXiv:1710.05941.
Rolls, ET. (2001). Functions of the primate temporal lobe cortical visual areas in invariant visual object and face recognition. In Vision: The approach of biophysics and neurosciences, world scientific (pp. 366–395).
Sangkloy, P, Burnell, N, Ham, C, Hays, J. (2016). The sketchy database: learning to retrieve badly drawn bunnies. ACM Transactions on Graphics (TOG), 35(4), 119.
Article Google Scholar
Sayim, B, & Cavanagh, P. (2011). What line drawings reveal about the visual brain. Frontiers in Human Neuroscience, 5, 118.
Article PubMed PubMed Central Google Scholar
Shepard, R N. (1958). Stimulus and response generalization: tests of a model relating generalization to distance in psychological space. Journal of Experimental Psychology, 55(6), 509.
Article PubMed Google Scholar
Simonyan, K, & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv:14091556.
Song, J, Yu, Q, Song, YZ, Xiang, T, Hospedales, TM. (2017). Deep spatial-semantic attention for fine-grained sketch-based image retrieval. In International conference on computer vision (ICCV) (pp. 5552–5561).
Tanaka, M. (2007). Recognition of pictorial representations by chimpanzees (pan troglodytes). Animal cognition, 10(2), 169–179.
Article PubMed Google Scholar
Theisen, C A, Oberlander, J, Kirby, S. (2010). Systematicity and arbitrariness in novel communication systems. Interaction Studies, 11(1), 14–32.
Google Scholar
Tomasello, M. (2009). The cultural origins of human cognition. Harvard: Harvard University Press.
Book Google Scholar
Verhoef, T, Kirby, S, De Boer, B. (2014). Emergence of combinatorial structure and economy through iterated learning with continuous acoustic signals. Journal of Phonetics, 43, 57–68.
Article Google Scholar
Wagenmakers, E J, Lodewyckx, T, Kuriyal, H, Grasman, R. (2010). Bayesian hypothesis testing for psychologists: A tutorial on the savage–dickey method. Cognitive Psychology, 60(3), 158–189.
Article PubMed Google Scholar
Wagenmakers, E J, Marsman, M, Jamil, T, Ly, A, Verhagen, J, Love, J, Selker, R, Gronau, Q F, Šmíra, M., Epskamp, S. (2018). Bayesian inference for psychology. part i: Theoretical advantages and practical ramifications. Psychonomic Bulletin & Review, 25(1), 35–57.
Article Google Scholar
Wilson, D, & Sperber, D. (1986). Relevance: Communication and cognition. Mass.
Wittgenstein, L. (1953). Philosophical investigations. Macmillan.
Xu, K, Ba, J, Kiros, R, Cho, K, Courville, A, Salakhudinov, R, Zemel, R, Bengio, Y. (2015). Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning (pp. 2048–2057).
Yamins, D L, Hong, H, Cadieu, C F, Solomon, E A, Seibert, D, DiCarlo, J J. (2014). Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences, 111(23), 8619–8624.
Article Google Scholar
Yu, Q, Yang, Y, Liu, F, Song, Y Z, Xiang, T, Hospedales, T M. (2017). Sketch-a-net: A deep neural network that beats humans. International Journal of Computer Vision, 122(3), 411–425.
Article Google Scholar
Zipf, GK. (1936). The psycho-biology of language: An introduction to dynamic philology. Routledge.

Download references

Acknowledgments

Thanks to Dan Yamins and the Stanford CoCo Lab for helpful comments and discussion.

Funding

Thanks to Dan Yamins and the Stanford CoCo Lab for helpful comments and discussion. RXDH was supported by the Stanford Graduate Fellowship and the National Science Foundation Graduate Research Fellowship under Grant No. DGE-114747.

Author information

Authors and Affiliations

Department of Psychology, University of California, San Diego, 9500 Gilman Drive MC 0109, La Jolla, CA, 92093, USA
Judith E. Fan
Department of Psychology, Stanford University, 450 Serra Mall, Stanford, CA, 94305, USA
Robert D. Hawkins & Noah D. Goodman
Department of Computer Science, Stanford University, 353 Serra Mall, Stanford, CA, 94305, USA
Mike Wu & Noah D. Goodman

Authors

Judith E. Fan
View author publications
You can also search for this author in PubMed Google Scholar
Robert D. Hawkins
View author publications
You can also search for this author in PubMed Google Scholar
Mike Wu
View author publications
You can also search for this author in PubMed Google Scholar
Noah D. Goodman
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.E.F and R.X.D.H. designed and conducted human experiments, J.E.F, R.X.D.H, and M.W. analyzed data and performed computational modeling. J.E.F, R.X.D.H, M.W., and N.D.G. formulated models, interpreted results, and wrote the paper.

Corresponding author

Correspondence to Judith E. Fan.

Ethics declarations

In all experiments, participants provided informed consent in accordance with the Stanford IRB.

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fan, J.E., Hawkins, R.D., Wu, M. et al. Pragmatic Inference and Visual Abstraction Enable Contextual Flexibility During Visual Communication. Comput Brain Behav 3, 86–101 (2020). https://doi.org/10.1007/s42113-019-00058-7

Download citation

Published: 05 September 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s42113-019-00058-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pragmatic Inference and Visual Abstraction Enable Contextual Flexibility During Visual Communication

Abstract

Access this article

Similar content being viewed by others

How feature integration theory integrated cognitive psychology, neurophysiology, and psychophysics

Humans versus AI: whether and why we prefer human-created compared to AI-created artwork

No one knows what attention is

Code and Data Availability

Notes

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Pragmatic Inference and Visual Abstraction Enable Contextual Flexibility During Visual Communication

Abstract

Access this article

Similar content being viewed by others

How feature integration theory integrated cognitive psychology, neurophysiology, and psychophysics

Humans versus AI: whether and why we prefer human-created compared to AI-created artwork

No one knows what attention is

Code and Data Availability

Notes

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation