Skip to main content

Speech Recognition Approach for Motion-Enhanced Display in ARM-COMS System

  • Conference paper
  • First Online:
HCI International 2020 - Late Breaking Papers: Multimodality and Intelligence (HCII 2020)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12424))

Included in the following conference series:

  • 1425 Accesses

Abstract

This research proposes an idea of motion-enhanced display that utilizes the display itself as the communication media which mimics the motion of human head to enhance presence in remote communication. The idea has been implemented as an augmented tele-presence system called ARM-COMS (ARm-supported eMbodied COm-munication Monitor System). Basically, ARM-COMS detects the orientation of a subject face by the face-detection tool based on an image processing technique, and mimics the head motion of a remote partner in an effective manner. In addition to that, ARM-COMS makes appropriate reactions when a communication partner speaks even without any significant motion in video communication by using audio signal during talk.

This paper covers two topics. The first one is a new design of the ARM-COMS robotic arm, with the configuration of six-axis servo motors to enable further smooth motion. In addition to the hardware configuration, the software configuration is also presented based on the ROS framework.

The second topic is a camera stabilizer-based experimental configuration. This study worked on a feasibility study of experimental configuration of ARM-COMS. If it works feasible, the approach could be applied to the redesigned ARM-COMS robotic arm system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Android Studio. https://developer.android.com/studio/index.html?hl=ja. Accessed 28 February 2020

  2. Bertrand, C., Bourdeau, L.: Research interviews by Skype: a new datacollection method. In: Esteves, J. (Ed.), Proceedings from the 9th European Conferenceon Re-search Methods, pp. 70–79. Spain: IE Business School (2010)

    Google Scholar 

  3. DJI (Da-Jiang Innovations Science and Technology). https://developer.dji.com/mobile-sdk/. Accessed 24 February 2020

  4. Ekman, P., Friesen, W.V.: The repertoire or nonverbal behavior: categories, origins, usage, and coding. Semiotica 1, 49–98 (1969)

    Article  Google Scholar 

  5. FASTRK. http://polhemus.com/motion-tracking/all-trackers/fastrak

  6. Gerkey, B., Smart, W., Quigley, M.: Programming Robots with ROS. O’Reilly Media, Sebastopol (2015)

    Google Scholar 

  7. Ito, T., Watanabe, T.: Motion control algorithm of ARM-COMS for entrainment enhancement. In: Yamamoto, S. (ed.) HIMI 2016. LNCS, vol. 9734, pp. 339–346. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-40349-6_32

    Chapter  Google Scholar 

  8. JDK. http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html. Accessed 28 February 2020

  9. Krafka, K., et al.: Eye tracking for everyone. In: IEEE Conference on Computer Vision and Pat-tern Recognition (CVPR) (2016)

    Google Scholar 

  10. Lee, A., Kawahara, T.: Recent development of open-source speech recognition engine julius. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) (2009)

    Google Scholar 

  11. Light, R.: Mosquitto: server and client implementation of the MQTT protocol. J. Open Source Softw. 2(13), 265 (2017). https://doi.org/10.21105/joss

    Article  Google Scholar 

  12. Mehrabian, A., Williams, M.: Nonverbal concomitants of perceived and intended persuasiveness. J. Pers. Soc. Psychol. 13(1), 37–58 (1969). https://doi.org/10.1037/h0027993

    Article  Google Scholar 

  13. Stephen, J.: Understanding body language: birdwhistell’s theory of kinesics. Corp. Commun. Int. J. ( (2000). https://doi.org/10.1108/13563280010377518)

    Article  Google Scholar 

  14. Schoff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. IEEE Conf. CVPR 2015, 815–823 (2015)

    Google Scholar 

  15. Society 5.0. https://www.japan.go.jp/abenomics/_userdata/abenomics/pdf/society_5.0.pdf. Accessed 28 February 2020

  16. Watanabe, T.: Human-Entrained Embodied Interaction and Communication Technology, pp. 161–177. Emotional Engineering, Springer (2011)

    Google Scholar 

  17. Watanabe, T.: InterRobot: speech-driven embodied interaction robot. J. Robot. Soc. Jap. 26(6), 692–695 (2006)

    Article  Google Scholar 

  18. W3C Specification. https://wicg.github.io/speech-api/. Accessed 24 February 2020

  19. Web Speech API. https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API. Accessed 24 February 2020

Download references

Acknowledgement

This work was partly supported by JSPS KAKENHI Grant Numbers JP19K12082 and Original Research Grant 2019 of Okayama Prefectural University. The author would like to acknowledge Risa Tanaka and all members of Kansei Information Engineering Labs at Okayama Prefectural University for their cooperation to conduct the experiments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Teruaki Ito .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ito, T., Oyama, T., Watanabe, T. (2020). Speech Recognition Approach for Motion-Enhanced Display in ARM-COMS System. In: Stephanidis, C., Kurosu, M., Degen, H., Reinerman-Jones, L. (eds) HCI International 2020 - Late Breaking Papers: Multimodality and Intelligence. HCII 2020. Lecture Notes in Computer Science(), vol 12424. Springer, Cham. https://doi.org/10.1007/978-3-030-60117-1_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-60117-1_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-60116-4

  • Online ISBN: 978-3-030-60117-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics