Skip to main content
Log in

Pragmatic Inference and Visual Abstraction Enable Contextual Flexibility During Visual Communication

  • Original Paper
  • Published:
Computational Brain & Behavior Aims and scope Submit manuscript

Abstract

Visual modes of communication are ubiquitous in modern life—from maps to data plots to political cartoons. Here, we investigate drawing, the most basic form of visual communication. Participants were paired in an online environment to play a drawing-based reference game. On each trial, both participants were shown the same four objects, but in different locations. The sketcher’s goal was to draw one of these objects so that the viewer could select it from the array. On “close” trials, objects belonged to the same basic-level category, whereas on “far” trials objects belonged to different categories. We found that people exploited shared information to efficiently communicate about the target object: on far trials, sketchers achieved high recognition accuracy while applying fewer strokes, using less ink, and spending less time on their drawings than on close trials. We hypothesized that humans succeed in this task by recruiting two core faculties: visual abstraction, the ability to perceive the correspondence between an object and a drawing of it; and pragmatic inference, the ability to judge what information would help a viewer distinguish the target from distractors. To evaluate this hypothesis, we developed a computational model of the sketcher that embodied both faculties, instantiated as a deep convolutional neural network nested within a probabilistic program. We found that this model fit human data well and outperformed lesioned variants. Together, this work provides the first algorithmically explicit theory of how visual perception and social cognition jointly support contextual flexibility in visual communication.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Code and Data Availability

All code and data used to produce the results in this article are publicly available in a Github repository at: https://github.com/judithfan/visual_communication_in_context. The code used to train the visual encoder module is available at: https://github.com/judithfan/visual-modules-for-sketch-communication-public..

Notes

  1. As a property of the input domain, the gradients with respect to adaptor parameters are very small (1.51e-4 ± 2.61e-4), inevitably resulting in poor learning (we can reproduce this effect from several initializations). We find that naively increasing the learning rate led to unstable optimization, but that multiplying the loss by a large constant C leads to a much smoother learning trajectories and good test generalization. Critically, increasing the learning rate and multiplying the loss by a constant are not equivalent for second moment gradient methods. In practice, C = 1e4.

References

  • Abell, C. (2009). Canny resemblance. Philosophical Review, 118(2), 183–223.

    Article  Google Scholar 

  • Allen, JP. (2000). Middle egyptian: An introduction to the language and culture of hieroglyphs. Cambridge: Cambridge University Press.

    Google Scholar 

  • Aubert, M, Brumm, A, Ramli, M, Sutikna, T, Saptomo, E W, Hakim, B, Morwood, M J, van den Bergh, G D, Kinsley, L, Dosseto, A. (2014). Pleistocene cave art from Sulawesi, Indonesia. Nature, 514 (7521), 223–227.

    Article  PubMed  Google Scholar 

  • Bergen, L, Levy, R, Goodman, N. (2016). Pragmatic reasoning through semantic inference. Semantics and Pragmatics, 9.

  • Boltz, WG. (1994). The origin and early development of the Chinese writing system, Vol 78. American Oriental Society.

  • Cohn-Gordon, R, Goodman, ND, Potts, C. (2018). Pragmatically informative image captioning with character-level inference. In Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics (pp. 439–443).

  • Cumming, G. (2014). The new statistics: Why and how. Psychological Science, 25(1), 7–29.

    Article  PubMed  Google Scholar 

  • Deng, J, Dong, W, Socher, R, Li, LJ, Li, K, Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In Computer vision and pattern recognition, 2009, IEEE (pp. 248–255).

  • Donald, M. (1991). Origins of the modern mind: Three stages in the evolution of culture and cognition. Harvard University Press.

  • Efron, B, & Tibshirani, RJ. (1994). An introduction to the bootstrap. CRC Press.

  • Fan, JE, Yamins, D.L.K, Turk-Browne, NB. (2018). Common object representations for visual production and recognition. Cognitive Science. https://onlinelibrary.wiley.com/doi/abs/10.1111/cogs.12676.

  • Fay, N, Garrod, S, Roberts, L, Swoboda, N. (2010). The interactive evolution of human communication systems. Cognitive Science, 34(3), 351–386.

    Article  PubMed  Google Scholar 

  • Forbus, K D, Usher, J M, Lovett, A M, Lockwood, K, Wetzel, J. (2008). Cogsketch: Open-domain sketch understanding for cognitive science research and for education. SBM, 8, 159–166.

    Google Scholar 

  • Frank, M C, & Goodman, N D. (2012). Predicting pragmatic reasoning in language games. Science, 336 (6084), 998–998.

    Article  PubMed  Google Scholar 

  • Franke, M, & Jäger, G. (2016). Probabilistic pragmatics, or why bayes’ rule is probably important for pragmatics. Zeitschrift für sprachwissenschaft, 35(1), 3–44.

    Google Scholar 

  • Gal, Y, & Ghahramani, Z. (2015). Dropout as a bayesian approximation: Insights and applications. In Deep learning workshop, ICML (Vol. 1, pp. 2).

  • Galantucci, B. (2005). An experimental study of the emergence of human communication systems. Cognitive Science, 29(5), 737–767.

    Article  PubMed  Google Scholar 

  • Ganin, Y, Kulkarni, T, Babuschkin, I, Eslami, S, Vinyals, O. (2018). Synthesizing programs for images using reinforced adversarial learning. arXiv:180401118.

  • Garrod, S, Fay, N, Lee, J, Oberlander, J, MacLeod, T. (2007). Foundations of representation: where might graphical symbol systems come from? Cognitive Science, 31(6), 961–987.

    Article  PubMed  Google Scholar 

  • Garrod, S, Fay, N, Rogers, S, Walker, B, Swoboda, N. (2010). Can iterated learning explain the emergence of graphical symbols? Interaction Studies, 11(1), 33–50.

    Article  Google Scholar 

  • Gibson, JJ. (1979). The ecological approach to visual perception: Classic edition. Psychology Press.

  • Goldin-Meadow, S, & Feldman, H. (1977). The development of language-like communication without a language model. Science, 197(4301), 401–403.

    Article  PubMed  Google Scholar 

  • Gombrich, E. (1969). Art and illusion: A study in the psychology of pictorial representation. Princeton: Princeton University Press.

  • Gombrich, E. (1989). The story of art. Phaidon Press, Ltd.

  • Goodman, N. (1976). Languages of art: An approach to a theory of symbols. Hackett Publishing.

  • Goodman, N, & Frank, M. (2016). Pragmatic language interpretation as probabilistic inference. Trends in Cognitive Sciences, 20(11), 818–829.

    Article  PubMed  Google Scholar 

  • Goodman, N, & Stuhlmüller, A. (2013). Knowledge and implicature: Modeling language understanding as social cognition. Topics in Cognitive Science, 5(1), 173–184.

    Article  PubMed  Google Scholar 

  • Goodman, N, & Stuhlmüller, A. (2014). The design and implementation of probabilistic programming languages.

  • Grice, H P. (1975). Logic and conversation. In P. Cole and J. Morgan (Eds.), Syntax & semantics, 3.

    Google Scholar 

  • Güċlü, U, & van Gerven, M A. (2015). Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. Journal of Neuroscience, 35(27), 10005–10014.

    Article  PubMed  Google Scholar 

  • Ha, D, & Eck, D. (2017). A neural representation of sketch drawings. arXiv:170403477.

  • Hawkins, R, Sano, M, Goodman, N, Fan, J. (2019). Disentangling contributions of visual information and interaction history in the formation of graphical conventions. In Proceedings of the 41st annual conference of the cognitive science society. Austin, TX: Cognitive Science Society.

  • Hinton, GE, Srivastava, N, Krizhevsky, A, Sutskever, I, Salakhutdinov, RR. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv:12070580.

  • Hochberg, J, & Brooks, V. (1962). Pictorial recognition as an unlearned ability: A study of one child’s performance. The American Journal of Psychology, pp. 624–628.

  • Hoffmann, D, Standish, C, García-Diez, M., Pettitt, P, Milton, J, Zilhão, J., Alcolea-González, J., Cantalejo-Duarte, P, Collado, H, De Balbìn, R. (2018). U-th dating of carbonate crusts reveals neandertal origin of iberian cave art. Science, 359(6378), 912–915.

    Article  PubMed  Google Scholar 

  • Jefferys, W H, & Berger, J O. (1992). Ockham’s razor and bayesian analysis. American Scientist, 80(1), 64–72.

    Google Scholar 

  • Kao, J, Bergen, L, Goodman, N. (2014). Formalizing the pragmatics of metaphor understanding. In Proceedings of the 36th annual meeting of the cognitive science society (Vol. 36).

  • Kennedy, J M, & Ross, A S. (1975). Outline picture perception by the songe of papua. Perception, 4(4), 391–406.

    Article  Google Scholar 

  • Kingma, DP, & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv:14126980.

  • Kubilius, J, Bracci, S, de Beeck, H P O. (2016). Deep neural networks as a computational model for human shape sensitivity. PLoS Computational Biology, 12(4), e1004896.

    Article  PubMed  PubMed Central  Google Scholar 

  • Lake, B M, Salakhutdinov, R, Tenenbaum, J B. (2015). Human-level concept learning through probabilistic program induction. Science, 350(6266), 1332–1338.

    Article  PubMed  Google Scholar 

  • Lewis, D. (1969). Convention: A philosophical study. Harvard University Press.

  • Malach, R, Levy, I, Hasson, U. (2002). The topography of high-order human object areas. Trends in cognitive sciences, 6(4), 176–184.

    Article  PubMed  Google Scholar 

  • Medin, D L, & Schaffer, M M. (1978). Context theory of classification learning. Psychological Review, 85(3), 207.

    Article  Google Scholar 

  • Monroe, W, Hawkins, RX, Goodman, ND, Potts, C. (2017). Colors in context: A pragmatic neural model for grounded language understanding. arXiv:170310186.

  • Mukherjee, K, Hawkins, R, Fan, J. (2019). Conveying semantic part information in drawings. In Proceedings of the 41st annual conference of the cognitive science society. Austin, TX: Cognitive Science Society.

  • Nosofsky, R M. (1988). Exemplar-based accounts of relations between classification, recognition, and typicality. Journal of Experimental Psychology:, learning, memory, and cognition, 14(4), 700.

    Google Scholar 

  • Nosofsky, R M. (2011). The generalized context model: An exemplar model of classification, Formal approaches in categorization, pp. 18–39.

  • Peterson, J C, Abbott, J T, Griffiths, T L. (2018). Evaluating (and improving) the correspondence between deep neural networks and human representations. Cognitive Science, 42(8), 2648–2669.

    Article  PubMed  Google Scholar 

  • Ramachandran, P, Zoph, B, Le, QV. (2018). Searching for activation functions. arXiv preprint arXiv:1710.05941.

  • Rolls, ET. (2001). Functions of the primate temporal lobe cortical visual areas in invariant visual object and face recognition. In Vision: The approach of biophysics and neurosciences, world scientific (pp. 366–395).

  • Sangkloy, P, Burnell, N, Ham, C, Hays, J. (2016). The sketchy database: learning to retrieve badly drawn bunnies. ACM Transactions on Graphics (TOG), 35(4), 119.

    Article  Google Scholar 

  • Sayim, B, & Cavanagh, P. (2011). What line drawings reveal about the visual brain. Frontiers in Human Neuroscience, 5, 118.

    Article  PubMed  PubMed Central  Google Scholar 

  • Shepard, R N. (1958). Stimulus and response generalization: tests of a model relating generalization to distance in psychological space. Journal of Experimental Psychology, 55(6), 509.

    Article  PubMed  Google Scholar 

  • Simonyan, K, & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv:14091556.

  • Song, J, Yu, Q, Song, YZ, Xiang, T, Hospedales, TM. (2017). Deep spatial-semantic attention for fine-grained sketch-based image retrieval. In International conference on computer vision (ICCV) (pp. 5552–5561).

  • Tanaka, M. (2007). Recognition of pictorial representations by chimpanzees (pan troglodytes). Animal cognition, 10(2), 169–179.

    Article  PubMed  Google Scholar 

  • Theisen, C A, Oberlander, J, Kirby, S. (2010). Systematicity and arbitrariness in novel communication systems. Interaction Studies, 11(1), 14–32.

    Google Scholar 

  • Tomasello, M. (2009). The cultural origins of human cognition. Harvard: Harvard University Press.

    Book  Google Scholar 

  • Verhoef, T, Kirby, S, De Boer, B. (2014). Emergence of combinatorial structure and economy through iterated learning with continuous acoustic signals. Journal of Phonetics, 43, 57–68.

    Article  Google Scholar 

  • Wagenmakers, E J, Lodewyckx, T, Kuriyal, H, Grasman, R. (2010). Bayesian hypothesis testing for psychologists: A tutorial on the savage–dickey method. Cognitive Psychology, 60(3), 158–189.

    Article  PubMed  Google Scholar 

  • Wagenmakers, E J, Marsman, M, Jamil, T, Ly, A, Verhagen, J, Love, J, Selker, R, Gronau, Q F, Šmíra, M., Epskamp, S. (2018). Bayesian inference for psychology. part i: Theoretical advantages and practical ramifications. Psychonomic Bulletin & Review, 25(1), 35–57.

    Article  Google Scholar 

  • Wilson, D, & Sperber, D. (1986). Relevance: Communication and cognition. Mass.

  • Wittgenstein, L. (1953). Philosophical investigations. Macmillan.

  • Xu, K, Ba, J, Kiros, R, Cho, K, Courville, A, Salakhudinov, R, Zemel, R, Bengio, Y. (2015). Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning (pp. 2048–2057).

  • Yamins, D L, Hong, H, Cadieu, C F, Solomon, E A, Seibert, D, DiCarlo, J J. (2014). Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences, 111(23), 8619–8624.

    Article  Google Scholar 

  • Yu, Q, Yang, Y, Liu, F, Song, Y Z, Xiang, T, Hospedales, T M. (2017). Sketch-a-net: A deep neural network that beats humans. International Journal of Computer Vision, 122(3), 411–425.

    Article  Google Scholar 

  • Zipf, GK. (1936). The psycho-biology of language: An introduction to dynamic philology. Routledge.

Download references

Acknowledgments

Thanks to Dan Yamins and the Stanford CoCo Lab for helpful comments and discussion.

Funding

Thanks to Dan Yamins and the Stanford CoCo Lab for helpful comments and discussion. RXDH was supported by the Stanford Graduate Fellowship and the National Science Foundation Graduate Research Fellowship under Grant No. DGE-114747.

Author information

Authors and Affiliations

Authors

Contributions

J.E.F and R.X.D.H. designed and conducted human experiments, J.E.F, R.X.D.H, and M.W. analyzed data and performed computational modeling. J.E.F, R.X.D.H, M.W., and N.D.G. formulated models, interpreted results, and wrote the paper.

Corresponding author

Correspondence to Judith E. Fan.

Ethics declarations

In all experiments, participants provided informed consent in accordance with the Stanford IRB.

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fan, J.E., Hawkins, R.D., Wu, M. et al. Pragmatic Inference and Visual Abstraction Enable Contextual Flexibility During Visual Communication. Comput Brain Behav 3, 86–101 (2020). https://doi.org/10.1007/s42113-019-00058-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42113-019-00058-7

Keywords

Navigation