Abstract
Visual modes of communication are ubiquitous in modern life—from maps to data plots to political cartoons. Here, we investigate drawing, the most basic form of visual communication. Participants were paired in an online environment to play a drawing-based reference game. On each trial, both participants were shown the same four objects, but in different locations. The sketcher’s goal was to draw one of these objects so that the viewer could select it from the array. On “close” trials, objects belonged to the same basic-level category, whereas on “far” trials objects belonged to different categories. We found that people exploited shared information to efficiently communicate about the target object: on far trials, sketchers achieved high recognition accuracy while applying fewer strokes, using less ink, and spending less time on their drawings than on close trials. We hypothesized that humans succeed in this task by recruiting two core faculties: visual abstraction, the ability to perceive the correspondence between an object and a drawing of it; and pragmatic inference, the ability to judge what information would help a viewer distinguish the target from distractors. To evaluate this hypothesis, we developed a computational model of the sketcher that embodied both faculties, instantiated as a deep convolutional neural network nested within a probabilistic program. We found that this model fit human data well and outperformed lesioned variants. Together, this work provides the first algorithmically explicit theory of how visual perception and social cognition jointly support contextual flexibility in visual communication.
Similar content being viewed by others
Code and Data Availability
All code and data used to produce the results in this article are publicly available in a Github repository at: https://github.com/judithfan/visual_communication_in_context. The code used to train the visual encoder module is available at: https://github.com/judithfan/visual-modules-for-sketch-communication-public..
Notes
As a property of the input domain, the gradients with respect to adaptor parameters are very small (1.51e-4 ± 2.61e-4), inevitably resulting in poor learning (we can reproduce this effect from several initializations). We find that naively increasing the learning rate led to unstable optimization, but that multiplying the loss by a large constant C leads to a much smoother learning trajectories and good test generalization. Critically, increasing the learning rate and multiplying the loss by a constant are not equivalent for second moment gradient methods. In practice, C = 1e4.
References
Abell, C. (2009). Canny resemblance. Philosophical Review, 118(2), 183–223.
Allen, JP. (2000). Middle egyptian: An introduction to the language and culture of hieroglyphs. Cambridge: Cambridge University Press.
Aubert, M, Brumm, A, Ramli, M, Sutikna, T, Saptomo, E W, Hakim, B, Morwood, M J, van den Bergh, G D, Kinsley, L, Dosseto, A. (2014). Pleistocene cave art from Sulawesi, Indonesia. Nature, 514 (7521), 223–227.
Bergen, L, Levy, R, Goodman, N. (2016). Pragmatic reasoning through semantic inference. Semantics and Pragmatics, 9.
Boltz, WG. (1994). The origin and early development of the Chinese writing system, Vol 78. American Oriental Society.
Cohn-Gordon, R, Goodman, ND, Potts, C. (2018). Pragmatically informative image captioning with character-level inference. In Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics (pp. 439–443).
Cumming, G. (2014). The new statistics: Why and how. Psychological Science, 25(1), 7–29.
Deng, J, Dong, W, Socher, R, Li, LJ, Li, K, Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In Computer vision and pattern recognition, 2009, IEEE (pp. 248–255).
Donald, M. (1991). Origins of the modern mind: Three stages in the evolution of culture and cognition. Harvard University Press.
Efron, B, & Tibshirani, RJ. (1994). An introduction to the bootstrap. CRC Press.
Fan, JE, Yamins, D.L.K, Turk-Browne, NB. (2018). Common object representations for visual production and recognition. Cognitive Science. https://onlinelibrary.wiley.com/doi/abs/10.1111/cogs.12676.
Fay, N, Garrod, S, Roberts, L, Swoboda, N. (2010). The interactive evolution of human communication systems. Cognitive Science, 34(3), 351–386.
Forbus, K D, Usher, J M, Lovett, A M, Lockwood, K, Wetzel, J. (2008). Cogsketch: Open-domain sketch understanding for cognitive science research and for education. SBM, 8, 159–166.
Frank, M C, & Goodman, N D. (2012). Predicting pragmatic reasoning in language games. Science, 336 (6084), 998–998.
Franke, M, & Jäger, G. (2016). Probabilistic pragmatics, or why bayes’ rule is probably important for pragmatics. Zeitschrift für sprachwissenschaft, 35(1), 3–44.
Gal, Y, & Ghahramani, Z. (2015). Dropout as a bayesian approximation: Insights and applications. In Deep learning workshop, ICML (Vol. 1, pp. 2).
Galantucci, B. (2005). An experimental study of the emergence of human communication systems. Cognitive Science, 29(5), 737–767.
Ganin, Y, Kulkarni, T, Babuschkin, I, Eslami, S, Vinyals, O. (2018). Synthesizing programs for images using reinforced adversarial learning. arXiv:180401118.
Garrod, S, Fay, N, Lee, J, Oberlander, J, MacLeod, T. (2007). Foundations of representation: where might graphical symbol systems come from? Cognitive Science, 31(6), 961–987.
Garrod, S, Fay, N, Rogers, S, Walker, B, Swoboda, N. (2010). Can iterated learning explain the emergence of graphical symbols? Interaction Studies, 11(1), 33–50.
Gibson, JJ. (1979). The ecological approach to visual perception: Classic edition. Psychology Press.
Goldin-Meadow, S, & Feldman, H. (1977). The development of language-like communication without a language model. Science, 197(4301), 401–403.
Gombrich, E. (1969). Art and illusion: A study in the psychology of pictorial representation. Princeton: Princeton University Press.
Gombrich, E. (1989). The story of art. Phaidon Press, Ltd.
Goodman, N. (1976). Languages of art: An approach to a theory of symbols. Hackett Publishing.
Goodman, N, & Frank, M. (2016). Pragmatic language interpretation as probabilistic inference. Trends in Cognitive Sciences, 20(11), 818–829.
Goodman, N, & Stuhlmüller, A. (2013). Knowledge and implicature: Modeling language understanding as social cognition. Topics in Cognitive Science, 5(1), 173–184.
Goodman, N, & Stuhlmüller, A. (2014). The design and implementation of probabilistic programming languages.
Grice, H P. (1975). Logic and conversation. In P. Cole and J. Morgan (Eds.), Syntax & semantics, 3.
Güċlü, U, & van Gerven, M A. (2015). Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. Journal of Neuroscience, 35(27), 10005–10014.
Ha, D, & Eck, D. (2017). A neural representation of sketch drawings. arXiv:170403477.
Hawkins, R, Sano, M, Goodman, N, Fan, J. (2019). Disentangling contributions of visual information and interaction history in the formation of graphical conventions. In Proceedings of the 41st annual conference of the cognitive science society. Austin, TX: Cognitive Science Society.
Hinton, GE, Srivastava, N, Krizhevsky, A, Sutskever, I, Salakhutdinov, RR. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv:12070580.
Hochberg, J, & Brooks, V. (1962). Pictorial recognition as an unlearned ability: A study of one child’s performance. The American Journal of Psychology, pp. 624–628.
Hoffmann, D, Standish, C, García-Diez, M., Pettitt, P, Milton, J, Zilhão, J., Alcolea-González, J., Cantalejo-Duarte, P, Collado, H, De Balbìn, R. (2018). U-th dating of carbonate crusts reveals neandertal origin of iberian cave art. Science, 359(6378), 912–915.
Jefferys, W H, & Berger, J O. (1992). Ockham’s razor and bayesian analysis. American Scientist, 80(1), 64–72.
Kao, J, Bergen, L, Goodman, N. (2014). Formalizing the pragmatics of metaphor understanding. In Proceedings of the 36th annual meeting of the cognitive science society (Vol. 36).
Kennedy, J M, & Ross, A S. (1975). Outline picture perception by the songe of papua. Perception, 4(4), 391–406.
Kingma, DP, & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv:14126980.
Kubilius, J, Bracci, S, de Beeck, H P O. (2016). Deep neural networks as a computational model for human shape sensitivity. PLoS Computational Biology, 12(4), e1004896.
Lake, B M, Salakhutdinov, R, Tenenbaum, J B. (2015). Human-level concept learning through probabilistic program induction. Science, 350(6266), 1332–1338.
Lewis, D. (1969). Convention: A philosophical study. Harvard University Press.
Malach, R, Levy, I, Hasson, U. (2002). The topography of high-order human object areas. Trends in cognitive sciences, 6(4), 176–184.
Medin, D L, & Schaffer, M M. (1978). Context theory of classification learning. Psychological Review, 85(3), 207.
Monroe, W, Hawkins, RX, Goodman, ND, Potts, C. (2017). Colors in context: A pragmatic neural model for grounded language understanding. arXiv:170310186.
Mukherjee, K, Hawkins, R, Fan, J. (2019). Conveying semantic part information in drawings. In Proceedings of the 41st annual conference of the cognitive science society. Austin, TX: Cognitive Science Society.
Nosofsky, R M. (1988). Exemplar-based accounts of relations between classification, recognition, and typicality. Journal of Experimental Psychology:, learning, memory, and cognition, 14(4), 700.
Nosofsky, R M. (2011). The generalized context model: An exemplar model of classification, Formal approaches in categorization, pp. 18–39.
Peterson, J C, Abbott, J T, Griffiths, T L. (2018). Evaluating (and improving) the correspondence between deep neural networks and human representations. Cognitive Science, 42(8), 2648–2669.
Ramachandran, P, Zoph, B, Le, QV. (2018). Searching for activation functions. arXiv preprint arXiv:1710.05941.
Rolls, ET. (2001). Functions of the primate temporal lobe cortical visual areas in invariant visual object and face recognition. In Vision: The approach of biophysics and neurosciences, world scientific (pp. 366–395).
Sangkloy, P, Burnell, N, Ham, C, Hays, J. (2016). The sketchy database: learning to retrieve badly drawn bunnies. ACM Transactions on Graphics (TOG), 35(4), 119.
Sayim, B, & Cavanagh, P. (2011). What line drawings reveal about the visual brain. Frontiers in Human Neuroscience, 5, 118.
Shepard, R N. (1958). Stimulus and response generalization: tests of a model relating generalization to distance in psychological space. Journal of Experimental Psychology, 55(6), 509.
Simonyan, K, & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv:14091556.
Song, J, Yu, Q, Song, YZ, Xiang, T, Hospedales, TM. (2017). Deep spatial-semantic attention for fine-grained sketch-based image retrieval. In International conference on computer vision (ICCV) (pp. 5552–5561).
Tanaka, M. (2007). Recognition of pictorial representations by chimpanzees (pan troglodytes). Animal cognition, 10(2), 169–179.
Theisen, C A, Oberlander, J, Kirby, S. (2010). Systematicity and arbitrariness in novel communication systems. Interaction Studies, 11(1), 14–32.
Tomasello, M. (2009). The cultural origins of human cognition. Harvard: Harvard University Press.
Verhoef, T, Kirby, S, De Boer, B. (2014). Emergence of combinatorial structure and economy through iterated learning with continuous acoustic signals. Journal of Phonetics, 43, 57–68.
Wagenmakers, E J, Lodewyckx, T, Kuriyal, H, Grasman, R. (2010). Bayesian hypothesis testing for psychologists: A tutorial on the savage–dickey method. Cognitive Psychology, 60(3), 158–189.
Wagenmakers, E J, Marsman, M, Jamil, T, Ly, A, Verhagen, J, Love, J, Selker, R, Gronau, Q F, Šmíra, M., Epskamp, S. (2018). Bayesian inference for psychology. part i: Theoretical advantages and practical ramifications. Psychonomic Bulletin & Review, 25(1), 35–57.
Wilson, D, & Sperber, D. (1986). Relevance: Communication and cognition. Mass.
Wittgenstein, L. (1953). Philosophical investigations. Macmillan.
Xu, K, Ba, J, Kiros, R, Cho, K, Courville, A, Salakhudinov, R, Zemel, R, Bengio, Y. (2015). Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning (pp. 2048–2057).
Yamins, D L, Hong, H, Cadieu, C F, Solomon, E A, Seibert, D, DiCarlo, J J. (2014). Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences, 111(23), 8619–8624.
Yu, Q, Yang, Y, Liu, F, Song, Y Z, Xiang, T, Hospedales, T M. (2017). Sketch-a-net: A deep neural network that beats humans. International Journal of Computer Vision, 122(3), 411–425.
Zipf, GK. (1936). The psycho-biology of language: An introduction to dynamic philology. Routledge.
Acknowledgments
Thanks to Dan Yamins and the Stanford CoCo Lab for helpful comments and discussion.
Funding
Thanks to Dan Yamins and the Stanford CoCo Lab for helpful comments and discussion. RXDH was supported by the Stanford Graduate Fellowship and the National Science Foundation Graduate Research Fellowship under Grant No. DGE-114747.
Author information
Authors and Affiliations
Contributions
J.E.F and R.X.D.H. designed and conducted human experiments, J.E.F, R.X.D.H, and M.W. analyzed data and performed computational modeling. J.E.F, R.X.D.H, M.W., and N.D.G. formulated models, interpreted results, and wrote the paper.
Corresponding author
Ethics declarations
In all experiments, participants provided informed consent in accordance with the Stanford IRB.
Conflict of Interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Fan, J.E., Hawkins, R.D., Wu, M. et al. Pragmatic Inference and Visual Abstraction Enable Contextual Flexibility During Visual Communication. Comput Brain Behav 3, 86–101 (2020). https://doi.org/10.1007/s42113-019-00058-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42113-019-00058-7