Abstract
Over the last decade, Computer Vision, the branch of Artificial Intelligence aimed at understanding the visual world, has evolved from simply recognizing objects in images to describing pictures, answering questions about images, aiding robots maneuver around physical spaces, and even generating novel visual content. As these tasks and applications have modernized, so too has the reliance on more data, either for model training or for evaluation. In this chapter, we demonstrate that novel interaction strategies can enable new forms of data collection and evaluation for Computer Vision. First, we present a crowdsourcing interface for speeding up paid data collection by an order of magnitude, feeding the data-hungry nature of modern vision models. Second, we explore a method to increase volunteer contributions using automated social interventions. Third, we develop a system to ensure human evaluation of generative vision models are reliable, affordable, and grounded in psychophysics theory. We conclude with future opportunities for Human–Computer Interaction to aid Computer Vision.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Applications can be found at https://taptapsee.com/, https://www.bemyeyes.com/, and https://camfindapp.com/.
- 2.
The dataset of social media posts and social strategies for training the reinforcement learning model, as well as the trained contextual bandit model, is publicly available at http://cs.stanford.edu/people/ranjaykrishna/socialstrategies.
- 3.
We explicitly reveal this ratio to evaluators. Amazon Mechanical Turk forums would enable evaluators to discuss and learn about this distribution over time, thus altering how different evaluators would approach the task. By making this ratio explicit, evaluators would have the same prior entering the task.
- 4.
Hyper-realism is relative to the real dataset on which a model is trained. Some datasets already look less realistic because of lower resolution and/or lower diversity of images.
References
Adadi A, Berrada M (2018) Peeking inside the black-box: a survey on explainable artificial intelligence (xai). IEEE Access 6:52138–52160
Ambati V, Vogel S, Carbonell J (2011) Towards task recommendation in micro-task markets
Antol S, Agrawal A, Lu J, Mitchell M, Batra D, Lawrence Zitnick C, Parikh D (2015) Vqa: visual question answering. In: Proceedings of the IEEE international conference on computer vision, pp 2425–2433
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473
Banerjee S, Lavie A (2005) Meteor: an automatic metric for mt evaluation with improved correlation with human judgments. In: Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pp 65–72
Barratt S, Sharma R (2018) A note on the inception score. arXiv:1801.01973
Bernstein MS, Brandt J, Miller RC, Karger DR (2011) Crowds in two seconds: enabling realtime crowd-powered interfaces. In: Proceedings of the 24th annual ACM symposium on User interface software and technology. ACM, pp 33–42
Bernstein MS, Little G, Miller RC, Hartmann B, Ackerman MS, Karger DR, Crowell D, Panovich K (2010) Soylent: a word processor with a crowd inside. In: Proceedings of the 23nd annual ACM symposium on user interface software and technology. ACM, pp 313–322
Berthelot D, Schumm T, Metz L (2017) Began: boundary equilibrium generative adversarial networks. arXiv:1703.10717
Bigham JP, Jayant C, Ji H, Little G, Miller A, Miller RC, Miller R, Tatarowicz A, White B, White S, et al (2010) Vizwiz: nearly real-time answers to visual questions. In: Proceedings of the 23nd annual ACM symposium on User interface software and technology. ACM, pp 333–342
Bińkowski M, Sutherland DJ, Arbel M, Gretton A (2018) Demystifying mmd gans. arXiv:1801.01401
Bishop CM (2006) Pattern recognition and machine learning. Springer
Biswas A, Parikh D (2013) Simultaneous active learning of classifiers & attributes via relative feedback. In: 2013 Ieee conference on computer vision and pattern recognition (CVPR). IEEE, pp 644–651
Bohus D, Rudnicky AI (2009) The ravenclaw dialog management framework: architecture and systems. Comput Speech Lang 23(3):332–361
Borji A (2018) Pros and cons of gan evaluation measures. In: Computer vision and image understanding
Brady E, Morris MR, Bigham JP (2015) Gauging receptiveness to social microvolunteering. In: Proceedings of the 33rd annual ACM conference on human factors in computing systems, CHI ’15. ACM, New York, NY, USA, pp 1055–1064
Brady EL, Zhong Y, Morris MR, Bigham JP (2013) Investigating the appropriateness of social network question asking as a resource for blind users. In: Proceedings of the 2013 conference on computer supported cooperative work. ACM, pp 1225–1236
Bragg J, Daniel M, Weld DS (2013) Crowdsourcing multi-label classification for taxonomy creation. In: First AAAI conference on human computation and crowdsourcing
Branson S, Hjorleifsson KE, Perona P (2014) Active annotation translation. In: 2014 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3702–3709
Branson S, Wah C, Schroff F, Babenko B, Welinder P, Perona P, Belongie S (2010) Visual recognition with humans in the loop. In: Computer vision–ECCV 2010. Springer, pp 438–451
Broadbent DE, Broadbent MHP (1987) From detection to identification: response to multiple targets in rapid serial visual presentation. Percept Psychophys 42(2):105–113
Brock A, Donahue J, Simonyan K (2018) Large scale gan training for high fidelity natural image synthesis. arXiv:1809.11096
Buçinca Z, Lin P, Gajos KZ, Glassman EL (2020) Proxy tasks and subjective measures can be misleading in evaluating explainable ai systems. In: Proceedings of the 25th international conference on intelligent user interfaces, pp 454–464
Buolamwini J, Gebru T (2018) Gender shades: intersectional accuracy disparities in commercial gender classification. In: Conference on fairness, accountability and transparency, pp 77–91
Burke M, Kraut RE, Joyce E (2014) Membership claims and requests: some newcomer socialization strategies in online communities. Small Group Research
Burke M, Kraut R (2013) Using facebook after losing a job: Differential benefits of strong and weak ties. In: Proceedings of the 2013 conference on computer supported cooperative work. ACM, pp 1419–1430
Card SK, Newell A, Moran TP (1983) The psychology of human-computer interaction
Carroll M, Shah R, Ho MK, Griffiths T, Seshia S, Abbeel P, Dragan A (2019) On the utility of learning about humans for human-ai coordination. In: Advances in neural information processing systems, pp 5174–5185
Cassell J, Thórisson KR (1999) The power of a nod and a glance: envelope vs. emotional feedback in animated conversational agents. Appl Artif Intell 13:519–538
Cerrato L, Ekeklint S (2002) Different ways of ending human-machine dialogues
Chaiken S (1989) Heuristic and systematic information processing within and beyond the persuasion context. In: Unintended thought, pp 212–252
Chellappa R, Sinha P, Jonathon Phillips P (2010) Face recognition by computers and humans. Computer 43(2):46–55
Cheng J, Teevan J, Bernstein MS (2015) Measuring crowdsourcing effort with error-time curves. In: Proceedings of the 33rd annual ACM conference on human factors in computing systems. ACM, pp 1365–1374
Chidambaram V, Chiang Y-H, Mutlu B (2012) Designing persuasive robots: how robots might persuade people using vocal and nonverbal cues. In: Proceedings of the seventh annual ACM/IEEE international conference on human-robot interaction. ACM, pp 293–300
Chilton LB, Little G, Edge D, Weld DS, Landay JA (2013) Cascade: crowdsourcing taxonomy creation. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 1999–2008
Cialdini R (2016) Pre-suasion: a revolutionary way to influence and persuade. Simon and Schuster
Colligan L, Potts HWW, Finn CT, Sinkin RA (2015) Cognitive workload changes for nurses transitioning from a legacy system with paper documentation to a commercial electronic health record. Int J Med Inform 84(7):469–476
Cornsweet TN (1962) The staircrase-method in psychophysics
Corti K, Gillespie A (2016) Co-constructing intersubjectivity with artificial conversational agents: people are more likely to initiate repairs of misunderstandings with agents represented as human. Comput Hum Behav 58:431–442
Dakin SC, Omigie D (2009) Psychophysical evidence for a non-linear representation of facial identity. Vis Res 49(18):2285–2296
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1, pp 886–893
Darley JM, Latané B (1968) Bystander intervention in emergencies: diffusion of responsibility. J Personal Soc Psychol 8(4p1):377
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. Ieee, pp 248–255
Deng J, Russakovsky O, Krause J, Bernstein MS, Berg A, Fei-Fei L (2014) Scalable multi-label annotation. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 3099–3102
Denton EL, Chintala S, Fergus R, et al (2015) Deep generative image models using a laplacian pyramid of adversarial networks. In: Advances in neural information processing systems, pp 1486–1494
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Difallah DE, Demartini G, Cudré-Mauroux P (2013) Pick-a-crowd: tell me what you like, and i’ll tell you what to do. In: Proceedings of the 22nd international conference on world wide web, WWW ’13. ACM, New York, NY, USA, pp 367–374
Dragan AD, Lee KCT, Srinivasa SS (2013) Legibility and predictability of robot motion. In: 2013 8th ACM/IEEE international conference on human-robot interaction (HRI). IEEE, pp 301–308
Fast E, Chen B, Mendelsohn J, Bassen J, Bernstein MS (2018) Iris: a conversational agent for complex tasks. In: Proceedings of the 2018 CHI conference on human factors in computing systems. ACM, p 473
Fast E, Steffee D, Wang L, Brandt JR, Bernstein MS (2014) Emergent, crowd-scale programming practice in the ide. In: Proceedings of the 32nd annual ACM conference on Human factors in computing systems. ACM, pp 2491–2500
Fei-Fei L, Iyer A, Koch C, Perona P (2007) What do we perceive in a glance of a real-world scene? J Vis 7(1):10
Felsenstein J (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39(4):783–791
Ferrara E, Varol O, Davis C, Menczer F, Flammini A (2016) The rise of social bots. Commun ACM 59(7):96–104
Fraisse P (1984) Perception and estimation of time. Ann Rev Psychol 35(1):1–37
Geiger D, Schader M (2014) Personalized task recommendation in crowdsourcing information systems – current state of the art. Decis Support Syst 65:3–16. Crowdsourcing and Social Networks Analysis
Gilbert E, Karahalios K (2009) Predicting tie strength with social media. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 211–220
Gillund G, Shiffrin RM (1984) A retrieval model for both recognition and recall. Psychol Rev 91(1):1
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 580–587
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680
Gray M, Suri S (2019) Ghost work: how to stop silicon valley from building a new global underclass. Eamon Dolan
Greene MR, Oliva A (2009) The briefest of glances: the time course of natural scene understanding. Psychol Sci 20(4):464–472
Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville AC (2017) Improved training of wasserstein gans. In: Advances in neural information processing systems, pp 5767–5777
Haque A, Milstein A, Fei-Fei L (2020) Illuminating the dark spaces of healthcare with ambient intelligence. Nature 585(7824):193–202
Hashimoto TB, Zhang H, Liang P (2019) Unifying human and statistical evaluation for natural language generation. arXiv:1904.02792
Hata K, Krishna R, Fei-Fei L, Bernstein MS (2017) A glimpse far into the future: understanding long-term crowd worker quality. In: Proceedings of the 2017 ACM conference on computer supported cooperative work and social computing. ACM, pp 889–901
Healy K, Schussman A (2003) The ecology of open-source software development. Technical report, Technical report, University of Arizona, USA
Hempel J (2015) Facebook launches m, its bold answer to siri and cortana. In: Wired. Retrieved January 1:2017
Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in neural information processing systems, pp 6626–6637
Hill BM (2013) Almost wikipedia: eight early encyclopedia projects and the mechanisms of collective action. Massachusetts institute of technology, pp 1–38
Hinton GE (2002) Training products of experts by minimizing contrastive divergence. Neural Comput 14(8):1771–1800
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Hoffman ML (1981) Is altruism part of human nature? J Personal Soc Psychol 40(1):121
Horvitz E (1999) Principles of mixed-initiative user interfaces. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 159–166
Huang F, Canny JF (2019) Sketchforme: composing sketched scenes from text descriptions for interactive applications. In: Proceedings of the 32nd annual ACM symposium on user interface software and technology, pp 209–220
Huang T-HK, Chang J, Bigham J (2018) Evorus: a crowd-powered conversational assistant built to automate itself over time. In: Proceedings of the 2018 CHI conference on human factors in computing systems. ACM, p 295
Hutto CJ, Gilbert E (2014) Vader: a parsimonious rule-based model for sentiment analysis of social media text. In: Eighth international AAAI conference on weblogs and social media
Iordan MC, Greene MR, Beck DM, Fei-Fei L (2015) Basic level category structure emerges gradually across human ventral visual cortex. In: Journal of cognitive neuroscience
Ipeirotis PG (2010) Analyzing the amazon mechanical turk marketplace. XRDS: Crossroads. The ACM Mag Stud 17(2):16–21
Ipeirotis PG, Provost F, Wang J (2010) Quality management on amazon mechanical turk. In: Proceedings of the ACM SIGKDD workshop on human computation. ACM, pp 64–67
Irani LC, Silberman M (2013) Turkopticon: interrupting worker invisibility in amazon mechanical turk. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 611–620
Jain SD, Grauman K (2013) Predicting sufficient annotation strength for interactive foreground segmentation. In: 2013 IEEE international conference on computer vision (ICCV). IEEE, pp 1313–1320
Jain U, Weihs L, Kolve E, Farhadi A, Lazebnik S, Kembhavi A, Schwing A (2020) A cordial sync: Going beyond marginal policies for multi-agent embodied tasks. In: European conference on computer vision. Springer, pp 471–490
Jean N, Burke M, Xie M, Davis WM, Lobell DB, Ermon S (2016) Combining satellite imagery and machine learning to predict poverty. Science 353(6301):790–794
Josephy T, Lease M, Paritosh P (2013) Crowdscale 2013: crowdsourcing at scale workshop report
Kamar E, Hacker S, Horvitz E (2012) Combining human and machine intelligence in large-scale crowdsourcing. In: Proceedings of the 11th international conference on autonomous agents and multiagent systems-volume 1. International Foundation for Autonomous Agents and Multiagent Systems, pp 467–474
Karger DR, Oh S, Shah D (2011) Budget-optimal crowdsourcing using low-rank matrix approximations. In: 2011 49th annual allerton conference on communication, control, and computing (allerton). IEEE, pp 284–291
Karger DR, Oh S (2014) Shah D Budget-optimal task allocation for reliable crowdsourcing systems. Oper Res 62(1):1–24
Karras T, Aila T, Laine S, Lehtinen J (2017) Progressive growing of gans for improved quality, stability, and variation. arXiv:1710.10196
Karras T, Laine S, Aila T (2018) A style-based generator architecture for generative adversarial networks. arXiv:1812.04948
Karras T, Laine S, Aila T (2019) A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4401–4410
Khadpe P, Krishna R, Fei-Fei L, Hancock JT, Bernstein MS (2020) Conceptual metaphors impact perceptions of human-ai collaboration. Proc ACM Hum-Comput Interact 4(CSCW2):1–26
Kittur A, Chi EH, Suh B (2008) Crowdsourcing user studies with mechanical turk. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 453–456
Klein SA (2001) Measuring, estimating, and understanding the psychometric function: a commentary. Percept Psychophys 63(8):1421–1455
Kramer ADI, Guillory JE, Hancock JT (2014) Experimental evidence of massive-scale emotional contagion through social networks. Proc Natl Acad Sci 111(24):8788–8790
Kraut RE, Resnick P (2011) Encouraging contribution to online communities. Building successful online communities: evidence-based social design, pp 21–76
Krishna R, Bernstein M, Fei-Fei L (2019) Information maximizing visual question generation. In: IEEE conference on computer vision and pattern recognition
Krishna R, Hata K, Ren F, Fei-Fei L, Niebles JC (2017) Dense-captioning events in videos. In: Proceedings of the IEEE international conference on computer vision, pp 706–715
Krishna R, Zhu Y, Groth O, Johnson J, Hata K, Kravitz J, Chen S, Kalantidis Y, Li L-J, Shamma DA et al (2017) Visual genome: connecting language and vision using crowdsourced dense image annotations. Int J Comput Vis 123(1):32–73
Krishna RA, Hata K, Chen S, Kravitz J, Shamma DA, Fei-Fei L, Bernstein MS (2016) Embracing error to enable rapid crowdsourcing. In: Proceedings of the 2016 CHI conference on human factors in computing systems. ACM, pp 3167–3179
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical report, Citeseer
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems 25. Curran Associates, Inc., pp 1097–1105
Krueger GP (1989) Sustained work, fatigue, sleep loss and performance: a review of the issues. Work Stress 3(2):129–141
Kumar R, Satyanarayan A, Torres C, Lim M, Ahmad S, Klemmer SR, Talton JO (2013) Webzeitgeist: design mining the web. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 3083–3092
Kurakin A, Goodfellow I, Bengio S (2016) Adversarial examples in the physical world. arXiv:1607.02533
Kwon M, Biyik E, Talati A, Bhasin K, Losey DP, Sadigh D (2020) When humans aren’t optimal: robots that collaborate with risk-aware humans. In: Proceedings of the 2020 ACM/IEEE international conference on human-robot interaction, pp 43–52
Laielli M, Smith J, Biamby G, Darrell T, Hartmann B (2019) Labelar: a spatial guidance interface for fast computer vision image collection. In: Proceedings of the 32nd annual ACM symposium on user interface software and technology, pp 987–998
Langer EJ, Blank A, Chanowitz B (1978) The mindlessness of ostensibly thoughtful action: the role of “placebic’’ information in interpersonal interaction. J Personal Soc Psychol 36(6):635
Laput G, Lasecki WS, Wiese J, Xiao R, Bigham JP, Harrison C (2015) Zensors: adaptive, rapidly deployable, human-intelligent sensor feeds. In: Proceedings of the 33rd annual ACM conference on human factors in computing systems. ACM, pp 1935–1944
Lasecki W, Miller C, Sadilek A, Abumoussa A, Borrello D, Kushalnagar R, Bigham J (2012) Real-time captioning by groups of non-experts. In: Proceedings of the 25th annual ACM symposium on user interface software and technology. ACM, pp 23–34
Lasecki WS, Murray KI, White S, Miller RC, Bigham JP (2011) Real-time crowd control of existing interfaces. In: Proceedings of the 24th annual ACM symposium on User interface software and technology. ACM, pp 23–32
Lasecki WS, Wesley R, Nichols J, Kulkarni A, Allen JF, Bigham JP (2013) Chorus: a crowd-powered conversational assistant. In: Proceedings of the 26th annual ACM symposium on User interface software and technology. ACM, pp 151–162
Law E, Yin M, Goh J, Chen K, Terry MA, Gajos KZ (2016) Curiosity killed the cat, but makes crowdwork better. In: Proceedings of the 2016 CHI conference on human factors in computing systems. ACM, pp 4098–4110
Le J, Edmonds A, Hester V, Biewald L (2010) Ensuring quality in crowdsourced search relevance evaluation: the effects of training question distribution. In: SIGIR 2010 workshop on crowdsourcing for search evaluation, vol 2126, pp 22–32
Levitt HCCH (1971) Transformed up-down methods in psychoacoustics. J Acoust Soc Am 49(2B):467–477
Lewis DD, Hayes PJ (1994) Guest editorial. ACM Trans Inf Syst 12(3):231 July
Li FF, VanRullen R, Koch C, Perona P (2002) Rapid natural scene categorization in the near absence of attention. Proc Natl Acad Sci 99(14):9596–9601
Li L, Chu W, Langford J, Schapire RE (2010) A contextual-bandit approach to personalized news article recommendation. In: Proceedings of the 19th international conference on world wide web. ACM, pp 661–670
Li T, Ogihara M (2003) Detecting emotion in music. In: ISMIR, vol 3, pp 239–240
Liang L, Grauman K (2014) Beyond comparing image pairs: setwise active learning for relative attributes. In: 2014 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 208–215
Lin C, Kamar E, Horvitz E (2014) Signals in the silence: models of implicit feedback in a recommendation system for crowdsourcing
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Lawrence Zitnick C (2014) Microsoft coco: common objects in context. In: Computer vision–ECCV 2014. Springer, pp 740–755
Lintott CJ, Schawinski K, Slosar A, Land K, Bamford S, Thomas D, Raddick MJ, Nichol RC, Szalay A, Andreescu D et al (2008) Galaxy zoo: morphologies derived from visual inspection of galaxies from the sloan digital sky survey. Mon Not R Astron Soc 389(3):1179–1189
Liu A, Soderland S, Bragg J, Lin CH, Ling X, Weld DS (2016) Effective crowd annotation for relation extraction. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 897–906
Liu Z, Luo P, Wang X, Tang X (2015) Deep learning face attributes in the wild. In: Proceedings of international conference on computer vision (ICCV)
Lowe DG (1999) Object recognition from local scale-invariant features. In: Proceedings of the seventh IEEE international conference on computer vision. Ieee, vol 2, pp 1150–1157
Lu C, Krishna R, Bernstein M, Fei-Fei L (2016) Visual relationship detection with language priors. In: European conference on computer vision. Springer, pp 852–869
Lucic M, Kurach K, Michalski M, Gelly S, Bousquet O (2018) Are gans created equal? a large-scale study. In: Advances in neural information processing systems, pp 698–707
Mani I (1999) Advances in automatic text summarization. MIT press
Marcus A, Parameswaran A (2015) Crowdsourced data management: industry and academic perspectives. Foundations and Trends in Databases
Markey PM (2000) Bystander intervention in computer-mediated communication. Comput Hum Behav 16(2):183–188
Martin D, Hanrahan BV, O’Neill J, Gupta N (2014) Being a turker. In: Proceedings of the 17th ACM conference on computer supported cooperative work & social computing. ACM, pp 224–235
Mason W, Suri S (2012) Conducting behavioral research on amazon’s mechanical turk. Behav Res Methods 44(1):1–23
Mildenhall B, Srinivasan PP, Tancik M, Barron JT, Ramamoorthi R, Ng R (2020) Nerf: representing scenes as neural radiance fields for view synthesis. arXiv:2003.08934
Miller GA, Charles WG (1991) Contextual correlates of semantic similarity. Lang Cogn Process 6(1):1–28
Mitchell M, Wu S, Zaldivar A, Barnes P, Vasserman L, Hutchinson B, Spitzer E, Raji ID, Gebru T (2019) Model cards for model reporting. In: Proceedings of the conference on fairness, accountability, and transparency, pp 220–229
Mitra T, Hutto CJ, Gilbert E (2015) Comparing person-and process-centric strategies for obtaining quality data on amazon mechanical turk. In: Proceedings of the 33rd annual ACM conference on human factors in computing systems. ACM, pp 1345–1354
Miyato T, Kataoka T, Koyama M, Yoshida Y (2018) Spectral normalization for generative adversarial networks. arXiv:1802.05957
Nass C, Brave S (2007) Wired for speech: how voice activates and advances the human-computer relationship. The MIT Press
Niebles JC, Wang H, Fei-Fei L (2008) Unsupervised learning of human action categories using spatial-temporal words. Int J Comput Vis 79(3):299–318
Olsson C, Bhupatiraju S, Brown T, Odena A, Goodfellow I (2018) Skill rating for generative models. arXiv:1808.04888
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135
Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 311–318
Park J, Krishna R, Khadpe P, Fei-Fei L, Bernstein M (2019) Ai-based request augmentation to increase crowdsourcing participation. Proc AAAI Conf Hum Comput Crowdsourcing 7:115–124
Parkash A, Parikh D (2012) Attributes for classifier feedback. In: Computer vision–ECCV 2012. Springer, pp 354–368
Peng Dai MD, Weld S (2010) Decision-theoretic control of crowd-sourced workflows. In: In the 24th AAAI conference on artificial intelligence (AAAI’10. Citeseer
Portilla J, Simoncelli EP (2000) A parametric texture model based on joint statistics of complex wavelet coefficients. Int J Comput Vis 40(1):49–70
Potter MC (1976) Short-term conceptual memory for pictures. J Exp Psychol Hum Learn Mem 2(5):509
Potter MC, Levy EI (1969) Recognition memory for a rapid sequence of pictures. J Exp Psychol 81(1):10
Radford A, Metz L, Chintala S (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv:1511.06434
Ravuri S, Mohamed S, Rosca M, Vinyals O (2018) Learning implicit generative models with the method of learned moments. arXiv:1806.11006
Rayner K, Smith TJ, Malcolm GL, Henderson JM (2009) Eye movements and visual encoding during scene perception. Psychol Sci 20(1):6–10
Reeves A, Sperling G (1986) Attention gating in short-term visual memory. Psychol Rev 93(2):180
Reeves B, Nass CI (1996) The media equation: how people treat computers, television, and new media like real people and places. Cambridge university press
Reich J, Murnane R, Willett J (2012) The state of wiki usage in us k–12 schools: Leveraging web 2.0 data warehouses to assess quality and equity in online learning environments. Educ Res 41(1):7–15
Robert C (1984) Influence: the psychology of persuasion. William Morrow and Company, Nowy Jork
Rosca M, Lakshminarayanan B, Warde-Farley D, Mohamed S (2017) Variational approaches for auto-encoding generative adversarial networks. arXiv:1706.04987
Rössler A, Cozzolino D, Verdoliva L, Riess C, Thies J, Nießner M (2019) Faceforensics++: learning to detect manipulated facial images. arXiv:1901.08971
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Li F-F (2014) Imagenet large scale visual recognition challenge. In: International Journal of Computer Vision, pp 1–42
Russakovsky O, Li L-J, Fei-Fei L (2015) Best of both worlds: human-machine collaboration for object annotation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2121–2131
Rzeszotarski JM, Chi E, Paritosh P, Dai P (2013) Inserting micro-breaks into crowdsourcing workflows. In: First AAAI conference on human computation and crowdsourcing
Sajjadi MSM, Bachem O, Lucic M, Bousquet O, Gelly S (2018) Assessing generative models via precision and recall. In: Advances in neural information processing systems, pp 5228–5237
Salehi N, Irani LC, Bernstein MS (2015) We are dynamo: overcoming stalling and friction in collective action for crowd workers. In: Proceedings of the 33rd annual ACM conference on human factors in computing systems. ACM, pp 1621–1630
Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training gans. In: Advances in neural information processing systems, pp 2234–2242
Sardar A, Joosse M, Weiss A, Evers V (2012) Don’t stand so close to me: users’ attitudinal and behavioral responses to personal space invasion by robots. In: Proceedings of the seventh annual ACM/IEEE international conference on human-robot interaction. ACM, pp 229–230
Schapire RE, Singer Y (2000) Boostexter: a boosting-based system for text categorization. Mach Learn 39(2):135–168
Seetharaman P, Pardo B (2014) Crowdsourcing a reverberation descriptor map. In: Proceedings of the ACM international conference on multimedia. ACM, pp 587–596
Sheng VS, Provost F, Ipeirotis PG (2008) Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 614–622
Sheshadri A, Lease M (2013) Square: a benchmark for research on computing crowd consensus. In: First AAAI conference on human computation and crowdsourcing
Shneiderman B, Maes P (1997) Direct manipulation vs. interface agents. Interactions 4(6):42–61 November
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556
Smyth P, Burl MC, Fayyad UM, Perona P (1994) Knowledge discovery in large image databases: dealing with uncertainties in ground truth. In: KDD workshop, pp 109–120
Smyth P, Fayyad U, Burl M, Perona P, Baldi P (1995) Inferring ground truth from subjective labelling of venus images
Snow R, O’Connor B, Jurafsky D, Ng AY (2008) Cheap and fast—but is it good?: evaluating non-expert annotations for natural language tasks. In: Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 254–263
Song Z, Chen Q, Huang Z, Hua Y, Yan S (2011) Contextualizing object detection and classification. In: 2011 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 1585–1592
Sperling G (1963) A model for visual memory tasks. Hum Factors 5(1):19–31
Su H, Deng J, Fei-Fei L (2012) Crowdsourcing annotations for visual object detection. In: Workshops at the twenty-sixth AAAI conference on artificial intelligence
Suchman LA (1987) Plans and situated actions: the problem of human-machine communication. Cambridge University Press, Cambridge
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
Tamuz O, Liu C, Belongie S, Shamir O, Kalai AT (2011) Adaptively learning the crowd kernel. arXiv:1105.1033
Taylor PJ, Thomas S (2008) Linguistic style matching and negotiation outcome. Negot Confl Manag Res 1(3):263–281
Theis L, van den Oord A, Bethge M (2015) A note on the evaluation of generative models. arXiv:1511.01844
Thomaz AL, Breazeal C (2008) Teachable robots: understanding human teaching behavior to build more effective robot learners. Artif Intell 172(6–7):716–737
Thomee B, Shamma DA, Friedland G, Elizalde B, Ni K, Poland D, Borth D, Li L-J (2016) Yfcc100m: the new data in multimedia research. Commun ACM 59(2). To Appear
Vedantam R, Zitnick CL, Parikh D (2015) Cider: consensus-based image description evaluation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4566–4575
Vijayanarasimhan S, Jain P, Grauman K (2010) Far-sighted active learning on a budget for image and video recognition. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3035–3042
Vinyals O, Toshev A, Bengio S, Erhan D (2014) Show and tell: a neural image caption generator. arXiv:1411.4555
Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generator. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164
von Ahn L, Dabbish L (2004) Labeling images with a computer game. In: Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, pp 319–326
von Ahn L, Dabbish L (2004) Labeling images with a computer game, pp 319–326
Vondrick C, Patterson D, Ramanan D (2013) Efficiently scaling up crowdsourced video annotation. Int J Comput Vis 101(1):184–204
Wah C, Branson S, Perona P, Belongie S (2011) Multiclass recognition and part localization with humans in the loop. In: 2011 IEEE international conference on computer vision (ICCV). IEEE, pp 2524–2531
Wah C, Van Horn G, Branson S, Maji S, Perona P, Belongie S (2014) Similarity comparisons for interactive fine-grained categorization. In: 2014 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 859–866
Wang Y-C, Kraut RE, Levine JM (2015) Eliciting and receiving online support: using computer-aided content analysis to examine the dynamics of online social support. J Med Internet Res 17(4):e99
Warde-Farley D, Bengio Y (2016) Improving generative adversarial networks with denoising feature matching
Warncke-Wang M, Ranjan V, Terveen L, Hecht B (2015) Misalignment between supply and demand of quality content in peer production communities. In: Ninth international AAAI conference on web and social media
Weichselgartner E, Sperling G (1987) Dynamics of automatic and controlled visual attention. Science 238(4828):778–780
Weld DS, Lin CH, Bragg J (2015) Artificial intelligence and collective intelligence. In: Handbook of collective intelligence, pp. 89–114
Welinder P, Branson S, Perona P, Belongie SJ (2010) The multidimensional wisdom of crowds. In: Advances in neural information processing systems, pp 2424–2432
Whitehill J, Wu T-f, Bergsma J, Movellan JR, Ruvolo PL (2009) Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: Advances in neural information processing systems, pp 2035–2043
Wichmann FA, Jeremy Hill N (2001) The psychometric function: I. Fitting, sampling, and goodness of fit. Percept Psychophys 63(8):1293–1313
Willis CG, Law E, Williams AC, Franzone BF, Bernardos R, Bruno L, Hopkins C, Schorn C, Weber E, Park DS et al (2017) Crowdcurio: an online crowdsourcing platform to facilitate climate change studies using herbarium specimens. New Phytol 215(1):479–488
Wobbrock JO, Forlizzi J, Hudson SE, Myers BA (2002) Webthumb: interaction techniques for small-screen browsers. In: Proceedings of the 15th annual ACM symposium on User interface software and technology. ACM, pp 205–208
Xia H, Jacobs J, Agrawala M (2020) Crosscast: adding visuals to audio travel podcasts. In: Proceedings of the 33rd annual ACM symposium on user interface software and technology, pp 735–746
Yang D, Kraut RE (2017) Persuading teammates to give: systematic versus heuristic cues for soliciting loans. Proc. ACM Hum-Comput Interact 1(CSCW):114:1–114:21
Yue Y-T, Yang Y-L, Ren G, Wang W (2017) Scenectrl: mixed reality enhancement via efficient scene editing. In: Proceedings of the 30th annual ACM symposium on user interface software and technology, pp 427–436
Zhang H, Sciutto C, Agrawala M, Fatahalian K (2020) Vid2player: controllable video sprites that behave and appear like professional tennis players. arXiv:2008.04524
Zhang T (2004) Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the twenty-first international conference on Machine learning. ACM, p 116
Zhou D, Basu S, Mao Y, Platt JC (2012) Learning from the wisdom of crowds by minimax entropy. In: Advances in neural information processing systems, pp 2195–2203
Zhou S, Gordon M, Krishna R, Narcomey A, Fei-Fei LF, Bernstein M (2019) Hype: a benchmark for human eye perceptual evaluation of generative models. In: Advances in neural information processing systems, pp 3449–3461
Acknowledgements
The first project was supported by the National Science Foundation award 1351131. The second project was partially funded by the Brown Institute of Media Innovation and by Toyota Research Institute (“TRI”). The third project was partially funded by a Junglee Corporation Stanford Graduate Fellowship, an Alfred P. Sloan fellowship and by TRI. This chapter solely reflects the opinions and conclusions of its authors and not TRI or any other Toyota entity.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Krishna, R., Gordon, M., Fei-Fei, L., Bernstein, M. (2021). Visual Intelligence through Human Interaction. In: Li, Y., Hilliges, O. (eds) Artificial Intelligence for Human Computer Interaction: A Modern Approach. Human–Computer Interaction Series. Springer, Cham. https://doi.org/10.1007/978-3-030-82681-9_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-82681-9_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-82680-2
Online ISBN: 978-3-030-82681-9
eBook Packages: Computer ScienceComputer Science (R0)