Speech Recognition Approach for Motion-Enhanced Display in ARM-COMS System

Ito, Teruaki; Oyama, Takashi; Watanabe, Tomio

doi:10.1007/978-3-030-60117-1_10

Teruaki Ito¹²,
Takashi Oyama¹² &
Tomio Watanabe¹²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12424))

Included in the following conference series:

International Conference on Human-Computer Interaction

1425 Accesses

Abstract

This research proposes an idea of motion-enhanced display that utilizes the display itself as the communication media which mimics the motion of human head to enhance presence in remote communication. The idea has been implemented as an augmented tele-presence system called ARM-COMS (ARm-supported eMbodied COm-munication Monitor System). Basically, ARM-COMS detects the orientation of a subject face by the face-detection tool based on an image processing technique, and mimics the head motion of a remote partner in an effective manner. In addition to that, ARM-COMS makes appropriate reactions when a communication partner speaks even without any significant motion in video communication by using audio signal during talk.

This paper covers two topics. The first one is a new design of the ARM-COMS robotic arm, with the configuration of six-axis servo motors to enable further smooth motion. In addition to the hardware configuration, the software configuration is also presented based on the ROS framework.

The second topic is a camera stabilizer-based experimental configuration. This study worked on a feasibility study of experimental configuration of ARM-COMS. If it works feasible, the approach could be applied to the redesigned ARM-COMS robotic arm system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Android Studio. https://developer.android.com/studio/index.html?hl=ja. Accessed 28 February 2020
Bertrand, C., Bourdeau, L.: Research interviews by Skype: a new datacollection method. In: Esteves, J. (Ed.), Proceedings from the 9th European Conferenceon Re-search Methods, pp. 70–79. Spain: IE Business School (2010)
Google Scholar
DJI (Da-Jiang Innovations Science and Technology). https://developer.dji.com/mobile-sdk/. Accessed 24 February 2020
Ekman, P., Friesen, W.V.: The repertoire or nonverbal behavior: categories, origins, usage, and coding. Semiotica 1, 49–98 (1969)
Article Google Scholar
FASTRK. http://polhemus.com/motion-tracking/all-trackers/fastrak
Gerkey, B., Smart, W., Quigley, M.: Programming Robots with ROS. O’Reilly Media, Sebastopol (2015)
Google Scholar
Ito, T., Watanabe, T.: Motion control algorithm of ARM-COMS for entrainment enhancement. In: Yamamoto, S. (ed.) HIMI 2016. LNCS, vol. 9734, pp. 339–346. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-40349-6_32
Chapter Google Scholar
JDK. http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html. Accessed 28 February 2020
Krafka, K., et al.: Eye tracking for everyone. In: IEEE Conference on Computer Vision and Pat-tern Recognition (CVPR) (2016)
Google Scholar
Lee, A., Kawahara, T.: Recent development of open-source speech recognition engine julius. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) (2009)
Google Scholar
Light, R.: Mosquitto: server and client implementation of the MQTT protocol. J. Open Source Softw. 2(13), 265 (2017). https://doi.org/10.21105/joss
Article Google Scholar
Mehrabian, A., Williams, M.: Nonverbal concomitants of perceived and intended persuasiveness. J. Pers. Soc. Psychol. 13(1), 37–58 (1969). https://doi.org/10.1037/h0027993
Article Google Scholar
Stephen, J.: Understanding body language: birdwhistell’s theory of kinesics. Corp. Commun. Int. J. ( (2000). https://doi.org/10.1108/13563280010377518)
Article Google Scholar
Schoff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. IEEE Conf. CVPR 2015, 815–823 (2015)
Google Scholar
Society 5.0. https://www.japan.go.jp/abenomics/_userdata/abenomics/pdf/society_5.0.pdf. Accessed 28 February 2020
Watanabe, T.: Human-Entrained Embodied Interaction and Communication Technology, pp. 161–177. Emotional Engineering, Springer (2011)
Google Scholar
Watanabe, T.: InterRobot: speech-driven embodied interaction robot. J. Robot. Soc. Jap. 26(6), 692–695 (2006)
Article Google Scholar
W3C Specification. https://wicg.github.io/speech-api/. Accessed 24 February 2020
Web Speech API. https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API. Accessed 24 February 2020

Download references

Acknowledgement

This work was partly supported by JSPS KAKENHI Grant Numbers JP19K12082 and Original Research Grant 2019 of Okayama Prefectural University. The author would like to acknowledge Risa Tanaka and all members of Kansei Information Engineering Labs at Okayama Prefectural University for their cooperation to conduct the experiments.

Author information

Authors and Affiliations

Faculty of Computer Science and System Engineering, Okayama Prefectural University, 111 Kuboki, Soja, Okayama, 719-1197, Japan
Teruaki Ito, Takashi Oyama & Tomio Watanabe

Authors

Teruaki Ito
View author publications
You can also search for this author in PubMed Google Scholar
Takashi Oyama
View author publications
You can also search for this author in PubMed Google Scholar
Tomio Watanabe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Teruaki Ito .

Editor information

Editors and Affiliations

University of Crete and Foundation for Research and Technology – Hellas (FORTH), Heraklion, Crete, Greece
Constantine Stephanidis
The Open University of Japan, Chiba, Japan
Masaaki Kurosu
Siemens Corporation, Princeton, NJ, USA
Helmut Degen
University of Central Florida, Orlando, FL, USA
Lauren Reinerman-Jones

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ito, T., Oyama, T., Watanabe, T. (2020). Speech Recognition Approach for Motion-Enhanced Display in ARM-COMS System. In: Stephanidis, C., Kurosu, M., Degen, H., Reinerman-Jones, L. (eds) HCI International 2020 - Late Breaking Papers: Multimodality and Intelligence. HCII 2020. Lecture Notes in Computer Science(), vol 12424. Springer, Cham. https://doi.org/10.1007/978-3-030-60117-1_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-60117-1_10
Published: 17 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60116-4
Online ISBN: 978-3-030-60117-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics