Skip to main content

Visual Intelligence through Human Interaction

  • Chapter
  • First Online:
Artificial Intelligence for Human Computer Interaction: A Modern Approach

Part of the book series: Human–Computer Interaction Series ((HCIS))

Abstract

Over the last decade, Computer Vision, the branch of Artificial Intelligence aimed at understanding the visual world, has evolved from simply recognizing objects in images to describing pictures, answering questions about images, aiding robots maneuver around physical spaces, and even generating novel visual content. As these tasks and applications have modernized, so too has the reliance on more data, either for model training or for evaluation. In this chapter, we demonstrate that novel interaction strategies can enable new forms of data collection and evaluation for Computer Vision. First, we present a crowdsourcing interface for speeding up paid data collection by an order of magnitude, feeding the data-hungry nature of modern vision models. Second, we explore a method to increase volunteer contributions using automated social interventions. Third, we develop a system to ensure human evaluation of generative vision models are reliable, affordable, and grounded in psychophysics theory. We conclude with future opportunities for Human–Computer Interaction to aid Computer Vision.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Applications can be found at https://taptapsee.com/, https://www.bemyeyes.com/, and https://camfindapp.com/.

  2. 2.

    The dataset of social media posts and social strategies for training the reinforcement learning model, as well as the trained contextual bandit model, is publicly available at http://cs.stanford.edu/people/ranjaykrishna/socialstrategies.

  3. 3.

    We explicitly reveal this ratio to evaluators. Amazon Mechanical Turk forums would enable evaluators to discuss and learn about this distribution over time, thus altering how different evaluators would approach the task. By making this ratio explicit, evaluators would have the same prior entering the task.

  4. 4.

    Hyper-realism is relative to the real dataset on which a model is trained. Some datasets already look less realistic because of lower resolution and/or lower diversity of images.

References

  1. Adadi A, Berrada M (2018) Peeking inside the black-box: a survey on explainable artificial intelligence (xai). IEEE Access 6:52138–52160

    Article  Google Scholar 

  2. Ambati V, Vogel S, Carbonell J (2011) Towards task recommendation in micro-task markets

    Google Scholar 

  3. Antol S, Agrawal A, Lu J, Mitchell M, Batra D, Lawrence Zitnick C, Parikh D (2015) Vqa: visual question answering. In: Proceedings of the IEEE international conference on computer vision, pp 2425–2433

    Google Scholar 

  4. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473

  5. Banerjee S, Lavie A (2005) Meteor: an automatic metric for mt evaluation with improved correlation with human judgments. In: Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pp 65–72

    Google Scholar 

  6. Barratt S, Sharma R (2018) A note on the inception score. arXiv:1801.01973

  7. Bernstein MS, Brandt J, Miller RC, Karger DR (2011) Crowds in two seconds: enabling realtime crowd-powered interfaces. In: Proceedings of the 24th annual ACM symposium on User interface software and technology. ACM, pp 33–42

    Google Scholar 

  8. Bernstein MS, Little G, Miller RC, Hartmann B, Ackerman MS, Karger DR, Crowell D, Panovich K (2010) Soylent: a word processor with a crowd inside. In: Proceedings of the 23nd annual ACM symposium on user interface software and technology. ACM, pp 313–322

    Google Scholar 

  9. Berthelot D, Schumm T, Metz L (2017) Began: boundary equilibrium generative adversarial networks. arXiv:1703.10717

  10. Bigham JP, Jayant C, Ji H, Little G, Miller A, Miller RC, Miller R, Tatarowicz A, White B, White S, et al (2010) Vizwiz: nearly real-time answers to visual questions. In: Proceedings of the 23nd annual ACM symposium on User interface software and technology. ACM, pp 333–342

    Google Scholar 

  11. Bińkowski M, Sutherland DJ, Arbel M, Gretton A (2018) Demystifying mmd gans. arXiv:1801.01401

  12. Bishop CM (2006) Pattern recognition and machine learning. Springer

    Google Scholar 

  13. Biswas A, Parikh D (2013) Simultaneous active learning of classifiers & attributes via relative feedback. In: 2013 Ieee conference on computer vision and pattern recognition (CVPR). IEEE, pp 644–651

    Google Scholar 

  14. Bohus D, Rudnicky AI (2009) The ravenclaw dialog management framework: architecture and systems. Comput Speech Lang 23(3):332–361

    Article  Google Scholar 

  15. Borji A (2018) Pros and cons of gan evaluation measures. In: Computer vision and image understanding

    Google Scholar 

  16. Brady E, Morris MR, Bigham JP (2015) Gauging receptiveness to social microvolunteering. In: Proceedings of the 33rd annual ACM conference on human factors in computing systems, CHI ’15. ACM, New York, NY, USA, pp 1055–1064

    Google Scholar 

  17. Brady EL, Zhong Y, Morris MR, Bigham JP (2013) Investigating the appropriateness of social network question asking as a resource for blind users. In: Proceedings of the 2013 conference on computer supported cooperative work. ACM, pp 1225–1236

    Google Scholar 

  18. Bragg J, Daniel M, Weld DS (2013) Crowdsourcing multi-label classification for taxonomy creation. In: First AAAI conference on human computation and crowdsourcing

    Google Scholar 

  19. Branson S, Hjorleifsson KE, Perona P (2014) Active annotation translation. In: 2014 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3702–3709

    Google Scholar 

  20. Branson S, Wah C, Schroff F, Babenko B, Welinder P, Perona P, Belongie S (2010) Visual recognition with humans in the loop. In: Computer vision–ECCV 2010. Springer, pp 438–451

    Google Scholar 

  21. Broadbent DE, Broadbent MHP (1987) From detection to identification: response to multiple targets in rapid serial visual presentation. Percept Psychophys 42(2):105–113

    Article  Google Scholar 

  22. Brock A, Donahue J, Simonyan K (2018) Large scale gan training for high fidelity natural image synthesis. arXiv:1809.11096

  23. Buçinca Z, Lin P, Gajos KZ, Glassman EL (2020) Proxy tasks and subjective measures can be misleading in evaluating explainable ai systems. In: Proceedings of the 25th international conference on intelligent user interfaces, pp 454–464

    Google Scholar 

  24. Buolamwini J, Gebru T (2018) Gender shades: intersectional accuracy disparities in commercial gender classification. In: Conference on fairness, accountability and transparency, pp 77–91

    Google Scholar 

  25. Burke M, Kraut RE, Joyce E (2014) Membership claims and requests: some newcomer socialization strategies in online communities. Small Group Research

    Google Scholar 

  26. Burke M, Kraut R (2013) Using facebook after losing a job: Differential benefits of strong and weak ties. In: Proceedings of the 2013 conference on computer supported cooperative work. ACM, pp 1419–1430

    Google Scholar 

  27. Card SK, Newell A, Moran TP (1983) The psychology of human-computer interaction

    Google Scholar 

  28. Carroll M, Shah R, Ho MK, Griffiths T, Seshia S, Abbeel P, Dragan A (2019) On the utility of learning about humans for human-ai coordination. In: Advances in neural information processing systems, pp 5174–5185

    Google Scholar 

  29. Cassell J, Thórisson KR (1999) The power of a nod and a glance: envelope vs. emotional feedback in animated conversational agents. Appl Artif Intell 13:519–538

    Article  Google Scholar 

  30. Cerrato L, Ekeklint S (2002) Different ways of ending human-machine dialogues

    Google Scholar 

  31. Chaiken S (1989) Heuristic and systematic information processing within and beyond the persuasion context. In: Unintended thought, pp 212–252

    Google Scholar 

  32. Chellappa R, Sinha P, Jonathon Phillips P (2010) Face recognition by computers and humans. Computer 43(2):46–55

    Article  Google Scholar 

  33. Cheng J, Teevan J, Bernstein MS (2015) Measuring crowdsourcing effort with error-time curves. In: Proceedings of the 33rd annual ACM conference on human factors in computing systems. ACM, pp 1365–1374

    Google Scholar 

  34. Chidambaram V, Chiang Y-H, Mutlu B (2012) Designing persuasive robots: how robots might persuade people using vocal and nonverbal cues. In: Proceedings of the seventh annual ACM/IEEE international conference on human-robot interaction. ACM, pp 293–300

    Google Scholar 

  35. Chilton LB, Little G, Edge D, Weld DS, Landay JA (2013) Cascade: crowdsourcing taxonomy creation. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 1999–2008

    Google Scholar 

  36. Cialdini R (2016) Pre-suasion: a revolutionary way to influence and persuade. Simon and Schuster

    Google Scholar 

  37. Colligan L, Potts HWW, Finn CT, Sinkin RA (2015) Cognitive workload changes for nurses transitioning from a legacy system with paper documentation to a commercial electronic health record. Int J Med Inform 84(7):469–476

    Article  Google Scholar 

  38. Cornsweet TN (1962) The staircrase-method in psychophysics

    Google Scholar 

  39. Corti K, Gillespie A (2016) Co-constructing intersubjectivity with artificial conversational agents: people are more likely to initiate repairs of misunderstandings with agents represented as human. Comput Hum Behav 58:431–442

    Article  Google Scholar 

  40. Dakin SC, Omigie D (2009) Psychophysical evidence for a non-linear representation of facial identity. Vis Res 49(18):2285–2296

    Article  Google Scholar 

  41. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1, pp 886–893

    Google Scholar 

  42. Darley JM, Latané B (1968) Bystander intervention in emergencies: diffusion of responsibility. J Personal Soc Psychol 8(4p1):377

    Article  Google Scholar 

  43. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. Ieee, pp 248–255

    Google Scholar 

  44. Deng J, Russakovsky O, Krause J, Bernstein MS, Berg A, Fei-Fei L (2014) Scalable multi-label annotation. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 3099–3102

    Google Scholar 

  45. Denton EL, Chintala S, Fergus R, et al (2015) Deep generative image models using a laplacian pyramid of adversarial networks. In: Advances in neural information processing systems, pp 1486–1494

    Google Scholar 

  46. Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805

  47. Difallah DE, Demartini G, Cudré-Mauroux P (2013) Pick-a-crowd: tell me what you like, and i’ll tell you what to do. In: Proceedings of the 22nd international conference on world wide web, WWW ’13. ACM, New York, NY, USA, pp 367–374

    Google Scholar 

  48. Dragan AD, Lee KCT, Srinivasa SS (2013) Legibility and predictability of robot motion. In: 2013 8th ACM/IEEE international conference on human-robot interaction (HRI). IEEE, pp 301–308

    Google Scholar 

  49. Fast E, Chen B, Mendelsohn J, Bassen J, Bernstein MS (2018) Iris: a conversational agent for complex tasks. In: Proceedings of the 2018 CHI conference on human factors in computing systems. ACM, p 473

    Google Scholar 

  50. Fast E, Steffee D, Wang L, Brandt JR, Bernstein MS (2014) Emergent, crowd-scale programming practice in the ide. In: Proceedings of the 32nd annual ACM conference on Human factors in computing systems. ACM, pp 2491–2500

    Google Scholar 

  51. Fei-Fei L, Iyer A, Koch C, Perona P (2007) What do we perceive in a glance of a real-world scene? J Vis 7(1):10

    Article  Google Scholar 

  52. Felsenstein J (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39(4):783–791

    Article  Google Scholar 

  53. Ferrara E, Varol O, Davis C, Menczer F, Flammini A (2016) The rise of social bots. Commun ACM 59(7):96–104

    Article  Google Scholar 

  54. Fraisse P (1984) Perception and estimation of time. Ann Rev Psychol 35(1):1–37

    Article  Google Scholar 

  55. Geiger D, Schader M (2014) Personalized task recommendation in crowdsourcing information systems – current state of the art. Decis Support Syst 65:3–16. Crowdsourcing and Social Networks Analysis

    Google Scholar 

  56. Gilbert E, Karahalios K (2009) Predicting tie strength with social media. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 211–220

    Google Scholar 

  57. Gillund G, Shiffrin RM (1984) A retrieval model for both recognition and recall. Psychol Rev 91(1):1

    Article  Google Scholar 

  58. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 580–587

    Google Scholar 

  59. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680

    Google Scholar 

  60. Gray M, Suri S (2019) Ghost work: how to stop silicon valley from building a new global underclass. Eamon Dolan

    Google Scholar 

  61. Greene MR, Oliva A (2009) The briefest of glances: the time course of natural scene understanding. Psychol Sci 20(4):464–472

    Article  Google Scholar 

  62. Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville AC (2017) Improved training of wasserstein gans. In: Advances in neural information processing systems, pp 5767–5777

    Google Scholar 

  63. Haque A, Milstein A, Fei-Fei L (2020) Illuminating the dark spaces of healthcare with ambient intelligence. Nature 585(7824):193–202

    Article  Google Scholar 

  64. Hashimoto TB, Zhang H, Liang P (2019) Unifying human and statistical evaluation for natural language generation. arXiv:1904.02792

  65. Hata K, Krishna R, Fei-Fei L, Bernstein MS (2017) A glimpse far into the future: understanding long-term crowd worker quality. In: Proceedings of the 2017 ACM conference on computer supported cooperative work and social computing. ACM, pp 889–901

    Google Scholar 

  66. Healy K, Schussman A (2003) The ecology of open-source software development. Technical report, Technical report, University of Arizona, USA

    Google Scholar 

  67. Hempel J (2015) Facebook launches m, its bold answer to siri and cortana. In: Wired. Retrieved January 1:2017

    Google Scholar 

  68. Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in neural information processing systems, pp 6626–6637

    Google Scholar 

  69. Hill BM (2013) Almost wikipedia: eight early encyclopedia projects and the mechanisms of collective action. Massachusetts institute of technology, pp 1–38

    Google Scholar 

  70. Hinton GE (2002) Training products of experts by minimizing contrastive divergence. Neural Comput 14(8):1771–1800

    Article  MATH  Google Scholar 

  71. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  72. Hoffman ML (1981) Is altruism part of human nature? J Personal Soc Psychol 40(1):121

    Article  Google Scholar 

  73. Horvitz E (1999) Principles of mixed-initiative user interfaces. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 159–166

    Google Scholar 

  74. Huang F, Canny JF (2019) Sketchforme: composing sketched scenes from text descriptions for interactive applications. In: Proceedings of the 32nd annual ACM symposium on user interface software and technology, pp 209–220

    Google Scholar 

  75. Huang T-HK, Chang J, Bigham J (2018) Evorus: a crowd-powered conversational assistant built to automate itself over time. In: Proceedings of the 2018 CHI conference on human factors in computing systems. ACM, p 295

    Google Scholar 

  76. Hutto CJ, Gilbert E (2014) Vader: a parsimonious rule-based model for sentiment analysis of social media text. In: Eighth international AAAI conference on weblogs and social media

    Google Scholar 

  77. Iordan MC, Greene MR, Beck DM, Fei-Fei L (2015) Basic level category structure emerges gradually across human ventral visual cortex. In: Journal of cognitive neuroscience

    Google Scholar 

  78. Ipeirotis PG (2010) Analyzing the amazon mechanical turk marketplace. XRDS: Crossroads. The ACM Mag Stud 17(2):16–21

    Google Scholar 

  79. Ipeirotis PG, Provost F, Wang J (2010) Quality management on amazon mechanical turk. In: Proceedings of the ACM SIGKDD workshop on human computation. ACM, pp 64–67

    Google Scholar 

  80. Irani LC, Silberman M (2013) Turkopticon: interrupting worker invisibility in amazon mechanical turk. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 611–620

    Google Scholar 

  81. Jain SD, Grauman K (2013) Predicting sufficient annotation strength for interactive foreground segmentation. In: 2013 IEEE international conference on computer vision (ICCV). IEEE, pp 1313–1320

    Google Scholar 

  82. Jain U, Weihs L, Kolve E, Farhadi A, Lazebnik S, Kembhavi A, Schwing A (2020) A cordial sync: Going beyond marginal policies for multi-agent embodied tasks. In: European conference on computer vision. Springer, pp 471–490

    Google Scholar 

  83. Jean N, Burke M, Xie M, Davis WM, Lobell DB, Ermon S (2016) Combining satellite imagery and machine learning to predict poverty. Science 353(6301):790–794

    Article  Google Scholar 

  84. Josephy T, Lease M, Paritosh P (2013) Crowdscale 2013: crowdsourcing at scale workshop report

    Google Scholar 

  85. Kamar E, Hacker S, Horvitz E (2012) Combining human and machine intelligence in large-scale crowdsourcing. In: Proceedings of the 11th international conference on autonomous agents and multiagent systems-volume 1. International Foundation for Autonomous Agents and Multiagent Systems, pp 467–474

    Google Scholar 

  86. Karger DR, Oh S, Shah D (2011) Budget-optimal crowdsourcing using low-rank matrix approximations. In: 2011 49th annual allerton conference on communication, control, and computing (allerton). IEEE, pp 284–291

    Google Scholar 

  87. Karger DR, Oh S (2014) Shah D Budget-optimal task allocation for reliable crowdsourcing systems. Oper Res 62(1):1–24

    Article  MATH  Google Scholar 

  88. Karras T, Aila T, Laine S, Lehtinen J (2017) Progressive growing of gans for improved quality, stability, and variation. arXiv:1710.10196

  89. Karras T, Laine S, Aila T (2018) A style-based generator architecture for generative adversarial networks. arXiv:1812.04948

  90. Karras T, Laine S, Aila T (2019) A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4401–4410

    Google Scholar 

  91. Khadpe P, Krishna R, Fei-Fei L, Hancock JT, Bernstein MS (2020) Conceptual metaphors impact perceptions of human-ai collaboration. Proc ACM Hum-Comput Interact 4(CSCW2):1–26

    Article  Google Scholar 

  92. Kittur A, Chi EH, Suh B (2008) Crowdsourcing user studies with mechanical turk. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 453–456

    Google Scholar 

  93. Klein SA (2001) Measuring, estimating, and understanding the psychometric function: a commentary. Percept Psychophys 63(8):1421–1455

    Article  Google Scholar 

  94. Kramer ADI, Guillory JE, Hancock JT (2014) Experimental evidence of massive-scale emotional contagion through social networks. Proc Natl Acad Sci 111(24):8788–8790

    Article  Google Scholar 

  95. Kraut RE, Resnick P (2011) Encouraging contribution to online communities. Building successful online communities: evidence-based social design, pp 21–76

    Google Scholar 

  96. Krishna R, Bernstein M, Fei-Fei L (2019) Information maximizing visual question generation. In: IEEE conference on computer vision and pattern recognition

    Google Scholar 

  97. Krishna R, Hata K, Ren F, Fei-Fei L, Niebles JC (2017) Dense-captioning events in videos. In: Proceedings of the IEEE international conference on computer vision, pp 706–715

    Google Scholar 

  98. Krishna R, Zhu Y, Groth O, Johnson J, Hata K, Kravitz J, Chen S, Kalantidis Y, Li L-J, Shamma DA et al (2017) Visual genome: connecting language and vision using crowdsourced dense image annotations. Int J Comput Vis 123(1):32–73

    Article  MathSciNet  Google Scholar 

  99. Krishna RA, Hata K, Chen S, Kravitz J, Shamma DA, Fei-Fei L, Bernstein MS (2016) Embracing error to enable rapid crowdsourcing. In: Proceedings of the 2016 CHI conference on human factors in computing systems. ACM, pp 3167–3179

    Google Scholar 

  100. Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical report, Citeseer

    Google Scholar 

  101. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

    Google Scholar 

  102. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems 25. Curran Associates, Inc., pp 1097–1105

    Google Scholar 

  103. Krueger GP (1989) Sustained work, fatigue, sleep loss and performance: a review of the issues. Work Stress 3(2):129–141

    Article  Google Scholar 

  104. Kumar R, Satyanarayan A, Torres C, Lim M, Ahmad S, Klemmer SR, Talton JO (2013) Webzeitgeist: design mining the web. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 3083–3092

    Google Scholar 

  105. Kurakin A, Goodfellow I, Bengio S (2016) Adversarial examples in the physical world. arXiv:1607.02533

  106. Kwon M, Biyik E, Talati A, Bhasin K, Losey DP, Sadigh D (2020) When humans aren’t optimal: robots that collaborate with risk-aware humans. In: Proceedings of the 2020 ACM/IEEE international conference on human-robot interaction, pp 43–52

    Google Scholar 

  107. Laielli M, Smith J, Biamby G, Darrell T, Hartmann B (2019) Labelar: a spatial guidance interface for fast computer vision image collection. In: Proceedings of the 32nd annual ACM symposium on user interface software and technology, pp 987–998

    Google Scholar 

  108. Langer EJ, Blank A, Chanowitz B (1978) The mindlessness of ostensibly thoughtful action: the role of “placebic’’ information in interpersonal interaction. J Personal Soc Psychol 36(6):635

    Article  Google Scholar 

  109. Laput G, Lasecki WS, Wiese J, Xiao R, Bigham JP, Harrison C (2015) Zensors: adaptive, rapidly deployable, human-intelligent sensor feeds. In: Proceedings of the 33rd annual ACM conference on human factors in computing systems. ACM, pp 1935–1944

    Google Scholar 

  110. Lasecki W, Miller C, Sadilek A, Abumoussa A, Borrello D, Kushalnagar R, Bigham J (2012) Real-time captioning by groups of non-experts. In: Proceedings of the 25th annual ACM symposium on user interface software and technology. ACM, pp 23–34

    Google Scholar 

  111. Lasecki WS, Murray KI, White S, Miller RC, Bigham JP (2011) Real-time crowd control of existing interfaces. In: Proceedings of the 24th annual ACM symposium on User interface software and technology. ACM, pp 23–32

    Google Scholar 

  112. Lasecki WS, Wesley R, Nichols J, Kulkarni A, Allen JF, Bigham JP (2013) Chorus: a crowd-powered conversational assistant. In: Proceedings of the 26th annual ACM symposium on User interface software and technology. ACM, pp 151–162

    Google Scholar 

  113. Law E, Yin M, Goh J, Chen K, Terry MA, Gajos KZ (2016) Curiosity killed the cat, but makes crowdwork better. In: Proceedings of the 2016 CHI conference on human factors in computing systems. ACM, pp 4098–4110

    Google Scholar 

  114. Le J, Edmonds A, Hester V, Biewald L (2010) Ensuring quality in crowdsourced search relevance evaluation: the effects of training question distribution. In: SIGIR 2010 workshop on crowdsourcing for search evaluation, vol 2126, pp 22–32

    Google Scholar 

  115. Levitt HCCH (1971) Transformed up-down methods in psychoacoustics. J Acoust Soc Am 49(2B):467–477

    Article  Google Scholar 

  116. Lewis DD, Hayes PJ (1994) Guest editorial. ACM Trans Inf Syst 12(3):231 July

    Google Scholar 

  117. Li FF, VanRullen R, Koch C, Perona P (2002) Rapid natural scene categorization in the near absence of attention. Proc Natl Acad Sci 99(14):9596–9601

    Article  Google Scholar 

  118. Li L, Chu W, Langford J, Schapire RE (2010) A contextual-bandit approach to personalized news article recommendation. In: Proceedings of the 19th international conference on world wide web. ACM, pp 661–670

    Google Scholar 

  119. Li T, Ogihara M (2003) Detecting emotion in music. In: ISMIR, vol 3, pp 239–240

    Google Scholar 

  120. Liang L, Grauman K (2014) Beyond comparing image pairs: setwise active learning for relative attributes. In: 2014 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 208–215

    Google Scholar 

  121. Lin C, Kamar E, Horvitz E (2014) Signals in the silence: models of implicit feedback in a recommendation system for crowdsourcing

    Google Scholar 

  122. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Lawrence Zitnick C (2014) Microsoft coco: common objects in context. In: Computer vision–ECCV 2014. Springer, pp 740–755

    Google Scholar 

  123. Lintott CJ, Schawinski K, Slosar A, Land K, Bamford S, Thomas D, Raddick MJ, Nichol RC, Szalay A, Andreescu D et al (2008) Galaxy zoo: morphologies derived from visual inspection of galaxies from the sloan digital sky survey. Mon Not R Astron Soc 389(3):1179–1189

    Article  Google Scholar 

  124. Liu A, Soderland S, Bragg J, Lin CH, Ling X, Weld DS (2016) Effective crowd annotation for relation extraction. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 897–906

    Google Scholar 

  125. Liu Z, Luo P, Wang X, Tang X (2015) Deep learning face attributes in the wild. In: Proceedings of international conference on computer vision (ICCV)

    Google Scholar 

  126. Lowe DG (1999) Object recognition from local scale-invariant features. In: Proceedings of the seventh IEEE international conference on computer vision. Ieee, vol 2, pp 1150–1157

    Google Scholar 

  127. Lu C, Krishna R, Bernstein M, Fei-Fei L (2016) Visual relationship detection with language priors. In: European conference on computer vision. Springer, pp 852–869

    Google Scholar 

  128. Lucic M, Kurach K, Michalski M, Gelly S, Bousquet O (2018) Are gans created equal? a large-scale study. In: Advances in neural information processing systems, pp 698–707

    Google Scholar 

  129. Mani I (1999) Advances in automatic text summarization. MIT press

    Google Scholar 

  130. Marcus A, Parameswaran A (2015) Crowdsourced data management: industry and academic perspectives. Foundations and Trends in Databases

    Google Scholar 

  131. Markey PM (2000) Bystander intervention in computer-mediated communication. Comput Hum Behav 16(2):183–188

    Article  Google Scholar 

  132. Martin D, Hanrahan BV, O’Neill J, Gupta N (2014) Being a turker. In: Proceedings of the 17th ACM conference on computer supported cooperative work & social computing. ACM, pp 224–235

    Google Scholar 

  133. Mason W, Suri S (2012) Conducting behavioral research on amazon’s mechanical turk. Behav Res Methods 44(1):1–23

    Article  Google Scholar 

  134. Mildenhall B, Srinivasan PP, Tancik M, Barron JT, Ramamoorthi R, Ng R (2020) Nerf: representing scenes as neural radiance fields for view synthesis. arXiv:2003.08934

  135. Miller GA, Charles WG (1991) Contextual correlates of semantic similarity. Lang Cogn Process 6(1):1–28

    Article  Google Scholar 

  136. Mitchell M, Wu S, Zaldivar A, Barnes P, Vasserman L, Hutchinson B, Spitzer E, Raji ID, Gebru T (2019) Model cards for model reporting. In: Proceedings of the conference on fairness, accountability, and transparency, pp 220–229

    Google Scholar 

  137. Mitra T, Hutto CJ, Gilbert E (2015) Comparing person-and process-centric strategies for obtaining quality data on amazon mechanical turk. In: Proceedings of the 33rd annual ACM conference on human factors in computing systems. ACM, pp 1345–1354

    Google Scholar 

  138. Miyato T, Kataoka T, Koyama M, Yoshida Y (2018) Spectral normalization for generative adversarial networks. arXiv:1802.05957

  139. Nass C, Brave S (2007) Wired for speech: how voice activates and advances the human-computer relationship. The MIT Press

    Google Scholar 

  140. Niebles JC, Wang H, Fei-Fei L (2008) Unsupervised learning of human action categories using spatial-temporal words. Int J Comput Vis 79(3):299–318

    Article  Google Scholar 

  141. Olsson C, Bhupatiraju S, Brown T, Odena A, Goodfellow I (2018) Skill rating for generative models. arXiv:1808.04888

  142. Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135

    Article  Google Scholar 

  143. Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 311–318

    Google Scholar 

  144. Park J, Krishna R, Khadpe P, Fei-Fei L, Bernstein M (2019) Ai-based request augmentation to increase crowdsourcing participation. Proc AAAI Conf Hum Comput Crowdsourcing 7:115–124

    Google Scholar 

  145. Parkash A, Parikh D (2012) Attributes for classifier feedback. In: Computer vision–ECCV 2012. Springer, pp 354–368

    Google Scholar 

  146. Peng Dai MD, Weld S (2010) Decision-theoretic control of crowd-sourced workflows. In: In the 24th AAAI conference on artificial intelligence (AAAI’10. Citeseer

    Google Scholar 

  147. Portilla J, Simoncelli EP (2000) A parametric texture model based on joint statistics of complex wavelet coefficients. Int J Comput Vis 40(1):49–70

    Article  MATH  Google Scholar 

  148. Potter MC (1976) Short-term conceptual memory for pictures. J Exp Psychol Hum Learn Mem 2(5):509

    Article  MathSciNet  Google Scholar 

  149. Potter MC, Levy EI (1969) Recognition memory for a rapid sequence of pictures. J Exp Psychol 81(1):10

    Article  Google Scholar 

  150. Radford A, Metz L, Chintala S (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv:1511.06434

  151. Ravuri S, Mohamed S, Rosca M, Vinyals O (2018) Learning implicit generative models with the method of learned moments. arXiv:1806.11006

  152. Rayner K, Smith TJ, Malcolm GL, Henderson JM (2009) Eye movements and visual encoding during scene perception. Psychol Sci 20(1):6–10

    Article  Google Scholar 

  153. Reeves A, Sperling G (1986) Attention gating in short-term visual memory. Psychol Rev 93(2):180

    Article  Google Scholar 

  154. Reeves B, Nass CI (1996) The media equation: how people treat computers, television, and new media like real people and places. Cambridge university press

    Google Scholar 

  155. Reich J, Murnane R, Willett J (2012) The state of wiki usage in us k–12 schools: Leveraging web 2.0 data warehouses to assess quality and equity in online learning environments. Educ Res 41(1):7–15

    Article  Google Scholar 

  156. Robert C (1984) Influence: the psychology of persuasion. William Morrow and Company, Nowy Jork

    Google Scholar 

  157. Rosca M, Lakshminarayanan B, Warde-Farley D, Mohamed S (2017) Variational approaches for auto-encoding generative adversarial networks. arXiv:1706.04987

  158. Rössler A, Cozzolino D, Verdoliva L, Riess C, Thies J, Nießner M (2019) Faceforensics++: learning to detect manipulated facial images. arXiv:1901.08971

  159. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Li F-F (2014) Imagenet large scale visual recognition challenge. In: International Journal of Computer Vision, pp 1–42

    Google Scholar 

  160. Russakovsky O, Li L-J, Fei-Fei L (2015) Best of both worlds: human-machine collaboration for object annotation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2121–2131

    Google Scholar 

  161. Rzeszotarski JM, Chi E, Paritosh P, Dai P (2013) Inserting micro-breaks into crowdsourcing workflows. In: First AAAI conference on human computation and crowdsourcing

    Google Scholar 

  162. Sajjadi MSM, Bachem O, Lucic M, Bousquet O, Gelly S (2018) Assessing generative models via precision and recall. In: Advances in neural information processing systems, pp 5228–5237

    Google Scholar 

  163. Salehi N, Irani LC, Bernstein MS (2015) We are dynamo: overcoming stalling and friction in collective action for crowd workers. In: Proceedings of the 33rd annual ACM conference on human factors in computing systems. ACM, pp 1621–1630

    Google Scholar 

  164. Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training gans. In: Advances in neural information processing systems, pp 2234–2242

    Google Scholar 

  165. Sardar A, Joosse M, Weiss A, Evers V (2012) Don’t stand so close to me: users’ attitudinal and behavioral responses to personal space invasion by robots. In: Proceedings of the seventh annual ACM/IEEE international conference on human-robot interaction. ACM, pp 229–230

    Google Scholar 

  166. Schapire RE, Singer Y (2000) Boostexter: a boosting-based system for text categorization. Mach Learn 39(2):135–168

    Article  MATH  Google Scholar 

  167. Seetharaman P, Pardo B (2014) Crowdsourcing a reverberation descriptor map. In: Proceedings of the ACM international conference on multimedia. ACM, pp 587–596

    Google Scholar 

  168. Sheng VS, Provost F, Ipeirotis PG (2008) Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 614–622

    Google Scholar 

  169. Sheshadri A, Lease M (2013) Square: a benchmark for research on computing crowd consensus. In: First AAAI conference on human computation and crowdsourcing

    Google Scholar 

  170. Shneiderman B, Maes P (1997) Direct manipulation vs. interface agents. Interactions 4(6):42–61 November

    Article  Google Scholar 

  171. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556

    Google Scholar 

  172. Smyth P, Burl MC, Fayyad UM, Perona P (1994) Knowledge discovery in large image databases: dealing with uncertainties in ground truth. In: KDD workshop, pp 109–120

    Google Scholar 

  173. Smyth P, Fayyad U, Burl M, Perona P, Baldi P (1995) Inferring ground truth from subjective labelling of venus images

    Google Scholar 

  174. Snow R, O’Connor B, Jurafsky D, Ng AY (2008) Cheap and fast—but is it good?: evaluating non-expert annotations for natural language tasks. In: Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 254–263

    Google Scholar 

  175. Song Z, Chen Q, Huang Z, Hua Y, Yan S (2011) Contextualizing object detection and classification. In: 2011 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 1585–1592

    Google Scholar 

  176. Sperling G (1963) A model for visual memory tasks. Hum Factors 5(1):19–31

    Article  Google Scholar 

  177. Su H, Deng J, Fei-Fei L (2012) Crowdsourcing annotations for visual object detection. In: Workshops at the twenty-sixth AAAI conference on artificial intelligence

    Google Scholar 

  178. Suchman LA (1987) Plans and situated actions: the problem of human-machine communication. Cambridge University Press, Cambridge

    Google Scholar 

  179. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826

    Google Scholar 

  180. Tamuz O, Liu C, Belongie S, Shamir O, Kalai AT (2011) Adaptively learning the crowd kernel. arXiv:1105.1033

  181. Taylor PJ, Thomas S (2008) Linguistic style matching and negotiation outcome. Negot Confl Manag Res 1(3):263–281

    Google Scholar 

  182. Theis L, van den Oord A, Bethge M (2015) A note on the evaluation of generative models. arXiv:1511.01844

  183. Thomaz AL, Breazeal C (2008) Teachable robots: understanding human teaching behavior to build more effective robot learners. Artif Intell 172(6–7):716–737

    Article  Google Scholar 

  184. Thomee B, Shamma DA, Friedland G, Elizalde B, Ni K, Poland D, Borth D, Li L-J (2016) Yfcc100m: the new data in multimedia research. Commun ACM 59(2). To Appear

    Google Scholar 

  185. Vedantam R, Zitnick CL, Parikh D (2015) Cider: consensus-based image description evaluation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4566–4575

    Google Scholar 

  186. Vijayanarasimhan S, Jain P, Grauman K (2010) Far-sighted active learning on a budget for image and video recognition. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3035–3042

    Google Scholar 

  187. Vinyals O, Toshev A, Bengio S, Erhan D (2014) Show and tell: a neural image caption generator. arXiv:1411.4555

  188. Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generator. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164

    Google Scholar 

  189. von Ahn L, Dabbish L (2004) Labeling images with a computer game. In: Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, pp 319–326

    Google Scholar 

  190. von Ahn L, Dabbish L (2004) Labeling images with a computer game, pp 319–326

    Google Scholar 

  191. Vondrick C, Patterson D, Ramanan D (2013) Efficiently scaling up crowdsourced video annotation. Int J Comput Vis 101(1):184–204

    Article  Google Scholar 

  192. Wah C, Branson S, Perona P, Belongie S (2011) Multiclass recognition and part localization with humans in the loop. In: 2011 IEEE international conference on computer vision (ICCV). IEEE, pp 2524–2531

    Google Scholar 

  193. Wah C, Van Horn G, Branson S, Maji S, Perona P, Belongie S (2014) Similarity comparisons for interactive fine-grained categorization. In: 2014 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 859–866

    Google Scholar 

  194. Wang Y-C, Kraut RE, Levine JM (2015) Eliciting and receiving online support: using computer-aided content analysis to examine the dynamics of online social support. J Med Internet Res 17(4):e99

    Google Scholar 

  195. Warde-Farley D, Bengio Y (2016) Improving generative adversarial networks with denoising feature matching

    Google Scholar 

  196. Warncke-Wang M, Ranjan V, Terveen L, Hecht B (2015) Misalignment between supply and demand of quality content in peer production communities. In: Ninth international AAAI conference on web and social media

    Google Scholar 

  197. Weichselgartner E, Sperling G (1987) Dynamics of automatic and controlled visual attention. Science 238(4828):778–780

    Article  Google Scholar 

  198. Weld DS, Lin CH, Bragg J (2015) Artificial intelligence and collective intelligence. In: Handbook of collective intelligence, pp. 89–114

    Google Scholar 

  199. Welinder P, Branson S, Perona P, Belongie SJ (2010) The multidimensional wisdom of crowds. In: Advances in neural information processing systems, pp 2424–2432

    Google Scholar 

  200. Whitehill J, Wu T-f, Bergsma J, Movellan JR, Ruvolo PL (2009) Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: Advances in neural information processing systems, pp 2035–2043

    Google Scholar 

  201. Wichmann FA, Jeremy Hill N (2001) The psychometric function: I. Fitting, sampling, and goodness of fit. Percept Psychophys 63(8):1293–1313

    Article  Google Scholar 

  202. Willis CG, Law E, Williams AC, Franzone BF, Bernardos R, Bruno L, Hopkins C, Schorn C, Weber E, Park DS et al (2017) Crowdcurio: an online crowdsourcing platform to facilitate climate change studies using herbarium specimens. New Phytol 215(1):479–488

    Article  Google Scholar 

  203. Wobbrock JO, Forlizzi J, Hudson SE, Myers BA (2002) Webthumb: interaction techniques for small-screen browsers. In: Proceedings of the 15th annual ACM symposium on User interface software and technology. ACM, pp 205–208

    Google Scholar 

  204. Xia H, Jacobs J, Agrawala M (2020) Crosscast: adding visuals to audio travel podcasts. In: Proceedings of the 33rd annual ACM symposium on user interface software and technology, pp 735–746

    Google Scholar 

  205. Yang D, Kraut RE (2017) Persuading teammates to give: systematic versus heuristic cues for soliciting loans. Proc. ACM Hum-Comput Interact 1(CSCW):114:1–114:21

    Google Scholar 

  206. Yue Y-T, Yang Y-L, Ren G, Wang W (2017) Scenectrl: mixed reality enhancement via efficient scene editing. In: Proceedings of the 30th annual ACM symposium on user interface software and technology, pp 427–436

    Google Scholar 

  207. Zhang H, Sciutto C, Agrawala M, Fatahalian K (2020) Vid2player: controllable video sprites that behave and appear like professional tennis players. arXiv:2008.04524

  208. Zhang T (2004) Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the twenty-first international conference on Machine learning. ACM, p 116

    Google Scholar 

  209. Zhou D, Basu S, Mao Y, Platt JC (2012) Learning from the wisdom of crowds by minimax entropy. In: Advances in neural information processing systems, pp 2195–2203

    Google Scholar 

  210. Zhou S, Gordon M, Krishna R, Narcomey A, Fei-Fei LF, Bernstein M (2019) Hype: a benchmark for human eye perceptual evaluation of generative models. In: Advances in neural information processing systems, pp 3449–3461

    Google Scholar 

Download references

Acknowledgements

The first project was supported by the National Science Foundation award 1351131. The second project was partially funded by the Brown Institute of Media Innovation and by Toyota Research Institute (“TRI”). The third project was partially funded by a Junglee Corporation Stanford Graduate Fellowship, an Alfred P. Sloan fellowship and by TRI. This chapter solely reflects the opinions and conclusions of its authors and not TRI or any other Toyota entity.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ranjay Krishna .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Krishna, R., Gordon, M., Fei-Fei, L., Bernstein, M. (2021). Visual Intelligence through Human Interaction. In: Li, Y., Hilliges, O. (eds) Artificial Intelligence for Human Computer Interaction: A Modern Approach. Human–Computer Interaction Series. Springer, Cham. https://doi.org/10.1007/978-3-030-82681-9_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-82681-9_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-82680-2

  • Online ISBN: 978-3-030-82681-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics