Skip to main content

Utilizing Unsupervised Crowdsourcing to Develop a Machine Learning Model for Virtual Human Animation Prediction

  • Reference work entry
  • First Online:
Handbook of Human Motion
  • 582 Accesses

Abstract

One type of experiential learning in the medical domain is chat interactions with a virtual human. These virtual humans play the role of a patient and allow students to practice skills such as communication and empathy in a safe, but realistic sandbox. These interactions last 10–15 min, and the typical virtual human has approximately 200 responses. Part of the realism of the virtual human’s response is the associated animation. These animations can be time consuming to create and associate with each response.

We turned to crowdsourcing to assist with this problem. We decomposed the process of creating basic animations into a simple task that nonexpert workers can complete. We provided workers with a set of predefined basic animations: six focused on head animation and nine focused on body animation. These animations could be mixed and matched for each question/response pair. Then, we used this unsupervised process to create a machine learning model for animation prediction: one for head animation and one for body animation. Multiple models were evaluated and their performance was assessed.

In an experiment, we evaluated participant perception of multiple versions of a virtual human suffering from dyspepsia (heartburn-like symptoms). For the version of the virtual human that utilized our machine learning approach, participants rated the character’s animation on par with a commercial expert. Head animation specifically was rated more natural and typically expected than other versions. Additionally, analysis of time and cost show the machine learning approach to be quicker and cheaper than an expert alternative.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  • Adda G, Mariani J, Besacier L, Gelas H (2013) Economic and ethical background of crowdsourcing for speech. In: Crowdsourcing for speech processing: applications to data collection, pp 303–334

    Google Scholar 

  • Borish M, Lok B (2016) Rapid low-cost virtual human bootstrapping via the crowd. Trans Intell Syst Technol 7(4):47

    Google Scholar 

  • Brand M, Hertzmann A (2000) Style machines. In: 27th SIGGRAPH, pp 183–192

    Google Scholar 

  • Callison-Burch C, Dredze M (2010) Creating speech and language data with Amazons Mechanical Turk. In: Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazon’s Mechanical Turk, number June, pp 1–12

    Google Scholar 

  • Cassell J, Thorisson KR (1999) The power of a nod and a glance: envelope vs. emotional feedback in animated conversational agents. Appl Artif Intell 13(4–5):519–538

    Article  Google Scholar 

  • Deng Z, Gu Q, Li Q (2009) Perceptually consistent example-based human motion retrieval. In: Interactive 3D graphics and games, vol 1, pp 191–198

    Google Scholar 

  • Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software. ACM SIGKDD Explor Newsl 11(1):10

    Article  Google Scholar 

  • Ho C-C, MacDorman KF (2010) Revisiting the uncanny valley theory: developing and validating an alternative to the Godspeed indices. Comput Hum Behav 26(6):1508–1518

    Article  Google Scholar 

  • Hoon LN, Chai WY, Aidil K, Abd A (2014) Development of real-time lip sync animation framework based on viseme human speech. Arch Des Res 27(4):19–29

    Google Scholar 

  • Hoque ME, Courgeon M, Mutlu B, Picard RW, Link C, Martin JC (2013) MACH: My Automated Conversation coacH. In: Pervasive and ubiquitous computing, pp 697–706

    Google Scholar 

  • Huang L, Morency LP, Gratch J (2011) Virtual rapport 2.0. In: Intelligent virtual agents, pp 68–79

    Google Scholar 

  • Levine S, Theobalt C (2009) Real-time prosody-driven synthesis of body language. ACM Trans Graph 28(5):17

    Article  Google Scholar 

  • Madirolas G, de Polavieja G (2014) Wisdom of the confident: using social interactions to eliminate the bias in wisdom of the crowds. In: Collective intelligence, pp 2012–2015

    Google Scholar 

  • Marsella S, Lhommet M, Feng A (2013) Virtual character performance from speech. In: 12th SIGGRAPH/Eurographics symposium on computer animation, pp 25–35

    Google Scholar 

  • Mason W, Street W, Watts DJ (2009) Financial incentives and the performance of crowds. SIGKDD 11(2):100–108

    Article  Google Scholar 

  • Mcclendon JL, Mack NA, Hodges LF (2014) The use of paraphrase identification in the retrieval of appropriate responses for script based conversational agents. In: Twenty-seventh international flairs conference, pp 19–201

    Google Scholar 

  • Min J, Chai J (2012) Motion graphs++. ACM Trans Graph 31(6):153

    Article  Google Scholar 

  • Rossen B, Lok B (2012) A crowdsourcing method to develop virtual human conversational agents. IJHCS 70(4):301–319

    Google Scholar 

  • Rossen B, Cendan J, Lok B (2010) Using virtual humans to bootstrap the creation of other virtual humans. In: Intelligent virtual agents, pp 392–398

    Google Scholar 

  • Sargin ME, Aran O, Karpov A, Ofli F, Yasinnik Y, Wilson S, Erzin E, Yemez Y, Tekalp AM (2006) Combined gesture-speech analysis and speech driven gesture synthesis. In: Multimedia and Expo, number Jan 2016, pp 893–896

    Google Scholar 

  • Socher R, Perelygin A, Wu JY, Chuang J, Manning CD, Ng AY, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: EMNLP, p 1642

    Google Scholar 

  • Toshinori C, Ishiguro H, Hagita N (2014) Analysis of relationship between head motion events and speech in dialogue conversations. Speech Comm 57:233–243

    Article  Google Scholar 

  • Triola MM, Campion N, Mcgee JB, Albright S, Greene P, Smothers V, Ellaway R (2007) An XML standard for virtual patients: exchanging case-based simulations in medical education. In: AMIA, pp 741–745

    Google Scholar 

  • Triola MM, Huwendiek S, Levinson AJ, Cook DA (2012) New directions in e-learning research in health professions education: report of two symposia. Med Teach 34(1):15–20

    Article  Google Scholar 

  • Wang L, Cardie C (2014) Improving agreement and disagreement identification in online discussions with a socially-tuned sentiment lexicon. In: ACL, vol 97, p 97

    Google Scholar 

  • Xu Y, Pelachaud C, Marsella S (2014) Compound gesture generation: a model based on ideational units. In: IVA, pp 477–491

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Borish .

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Borish, M., Lok, B. (2018). Utilizing Unsupervised Crowdsourcing to Develop a Machine Learning Model for Virtual Human Animation Prediction. In: Handbook of Human Motion. Springer, Cham. https://doi.org/10.1007/978-3-319-14418-4_21

Download citation

Publish with us

Policies and ethics