Skip to main content

An audiovisual interface-based drumming system for multimodal human–robot interaction


This paper presents a study of an audiovisual interface-based drumming system for multimodal human–robot interaction. The interactive multimodal drumming game is used in conjunction with humanoid robots to establish an audiovisual interactive interface. This study is part of a project to design robot and avatar assistants for education and therapy, especially for children with special needs. It specifically focuses on evaluating robot/virtual avatar tutors, tangible interaction devices, and mobile multimedia devices within a simple drumming-based interactive music tutoring game scenario. Several parameters, including the effect of the embodiment of the tutor/interface and the presence of feedback and training mechanisms, were the focus of interest. For that purpose, we created an interactive drumming game relying on turn-taking and imitation principles, in which a human user is able to play drums with a humanoid robot (Robotic Drum Mate). Three interactive scenarios with different experimental setups for humanoid robots and mobile devices were developed and tested. As a part of those scenarios, a system that enables drum strokes to be automatically detected and recognized using auditory cues was implemented and incorporated into the experimental framework. We verified the applicability and effectiveness of the proposed system in a drum-playing game with adult human test subjects by evaluating it both objectively and subjectively. The results showed that the physical robot tutor, the feedback and training mechanisms had a positive effect on the subjects’ performance and, as expected, although the physical medium is preferred, the virtual medium for drumming caused less failure.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18


  1. 1.

    In our work, the term “drum stroke” encompasses the motion of hitting a certain type of drum unit (e.g., snare drum or cymbal) as well as the sound stemming from that unit.

  2. 2.

  3. 3.

  4. 4.

  5. 5.


  1. 1.

    Walus BP, Pauletto S, Mason-Jones A (2016) Sonification and music as support to the communication of alcohol-related health risks to young people. J Multimodal User Interfaces 10:235246

    Article  Google Scholar 

  2. 2.

    Kose-Bagci H, Ferrari E, Dautenhahn K, Syrdal D, Nehaniv CL (2009) Effects of embodiment and gestures on social interaction in drumming games with a humanoid robot. Adv Robot 24(14):1951–1996

    Article  Google Scholar 

  3. 3.

    Robins B, Dickerson P, Stribling P, Dautenhahn K (2004) Robot-mediated joint attention in children with autism: a case study in robot–human interaction. Interact Stud 5(2):161–198

    Article  Google Scholar 

  4. 4.

    Kose-Bagci H, Dautenhahn K, Nehaniv CL (2008) Emergent dynamics of turn-taking interaction in drumming games with a humanoid robot. In: Proceedings of IEEE international symposium on robot and human interactive communication (RO-MAN). 346–353

  5. 5.

    Dautenhahn K, Nehaniv CL, Walters ML, Robins B, Kose-Bagci H, Mirza NA, Blow M (2009) KASPAR: a minimally expressive humanoid robot for human–robot interaction research. Appl Bionics Biomech 6(3):369–397

    Article  Google Scholar 

  6. 6.

    Ince G, Duman TB, Yorganci R, Kose H (2015) Towards a robust drum stroke recognition system for human–robot interaction. In: Proceedings of the IEEE/SICE international symposium on system integration (SII) 744–749

  7. 7.

    Gurpinar C, Uluer P, Akalin N, Kose H (2020) Sign recognition system for an assistive robot sign tutor for children. Int J Soc Robot 12:355369

    Article  Google Scholar 

  8. 8.

    Uluer P, Kose H, Oz BK, Aydinalev TC, Barkana DE (2020) Towards an affective robot companion for audiology rehabilitation: how does pepper feel today?. In: Proceedings of IEEE RO-MAN, 31 Aug–4 Sept to appear

  9. 9.

    Gfeller K, Driscoll V, Kenworthy M, Van Voorst T (2011) Music therapy for preschool cochlear implant recipients. Music Ther Perspect 29(1):3949

    Article  Google Scholar 

  10. 10.

    Ellis DPW, Arroyo J (2004) Eigenrhythms: drum pattern basis sets for classification and generation. In: Proceeding of ISMIR. 554–559

  11. 11.

    Robertson A, Stark AM, Plumbley MD (2011) Real-time visual beat tracking using a comb filter matrix. In: Proceedings of the 10th international computer music conference

  12. 12.

    Stowell D, Robertson A, Bryan-Kinns N, Plumbley MD (2009) Evaluation of live human–computer music-making: quantitative and qualitative approaches. Int J Human–Comput Stud 67(11):960–975

    Article  Google Scholar 

  13. 13.

    Gillet O, Richard G (2004) Automatic transcription of drum loops. In: Proceedings of IEEE ICASSP, pp 269–272

  14. 14.

    Kanda T, Hirano T, Eaton D, Ishiguro H (2004) Interactive robots as social partners and peer tutors for children: a field trial. J Hum Comput Interact 19(1–2):61–84

    Article  Google Scholar 

  15. 15.

    Nalin M, Baroni I, Kruijff-Korbayova I, Canamero L, Lewis M, Beck A, Cuayahuitl H, Sanna A (2012) Children’s adaptation in multi-session interaction with a humanoid robot. In: Proceedings of IEEE RO-MAN, pp. 351–357 9–13

  16. 16.

    Yoon H, Chi S (2006) Visual processing of rock, scissors, paper game for human–robot interaction. In: Proceedings of international joint conference SICE-ICASE, pp. 326–329

  17. 17.

    Chao C, Lee J, Begum M, Thomaz AL (2011) Simon plays Simon says: the timing of turn-taking in an imitation game. In: Proceedings of IEEE RO-MAN, pp. 235–240

  18. 18.

    Changchun L, Conn K, Sarkar N, Stone W (2008) Online affect detection and robot behavior adaptation for intervention of children with autism. IEEE Trans Robot 24(4):883–896

    Article  Google Scholar 

  19. 19.

    Leite I, McCoy M, Ullman D, Salomons N, Scassellati B (2015) Comparing models of disengagement in individual and group interactions. In: Proceedings of ACM/IEEE international conference on human–robot interaction 99–105

  20. 20.

    Wainer J, Ferrari E, Dautenhahn K, Robins B (2010) The effectiveness of using a robotics class to foster collaboration among groups of children with autism in an exploratory study. Pers Ubiquit Comput 14:445–455

    Article  Google Scholar 

  21. 21.

    Kozima H, Nakagawa C, Yasuda Y (2007) Children–robot interaction: a pilot study in autism therapy. Progr Brain Res 164:385–400

    Article  Google Scholar 

  22. 22.

    Srinivasan SM, Park IK, Neelly LB, Bhat AN (2015) A comparison of the effects of rhythm and robotic interventions on repetitive behaviors and affective states of children with Autism Spectrum Disorder (ASD). Res Autism Spectr Disord 18:51–63

    Article  Google Scholar 

  23. 23.

    Yun S, Kim H, Choi J, Park S (2016) A robot-assisted behavioral intervention system for children with autism spectrum disorders. Robot Auton Syst 76:58–67

    Article  Google Scholar 

  24. 24.

    Petric F, Mikli D, Kovai Z (2018) POMDP-based coding of child–robot interaction within a robot-assisted ASD diagnostic protocol. Int J Hum Robot 15(02):1850011

    Article  Google Scholar 

  25. 25.

    Marino F, Chila P, Sfrazzetto ST, Carrozza C, Crimi I, Failla C, Busa M, Bernava G, Tartarisco G, Vagni D, Ruta L, Pioggia G (2019) Outcomes of a robot-assisted social-emotional understanding intervention for young children with autism spectrum disorders. J Autism Dev Disord 50:1973–1987

    Article  Google Scholar 

  26. 26.

    Pour AG, Taheri A, Alemi M, Meghdari A (2018) Human–robot facial expression reciprocal interaction platform: case studies on children with autism. Int J Soc Robot 10:179198

    Google Scholar 

  27. 27.

    Aziz AA, Mokhsin M, Moganan FFM, Ismail A, Sakamat N, Zainol AS, Lokman AM (2018) Humanoid–robot as teaching mediator: research model in demonstrating the autistic children learning motivation based on the emotional responses. Adv Sci Lett 24(4):2296–2300

    Article  Google Scholar 

  28. 28.

    Greczek J, Kaszubski E, Atrash A, Matari M (2014) Graded cueing feedback in robot-mediated imitation practice for children with autism spectrum disorders. In: Proceedings of IEEE international symposium on robot and human interactive communication pp 561–566

  29. 29.

    Parker JG, Asher SR (1987) Peer relations and later personal adjustment: are low accepted children at risk? Psychol Rev 102:357–389

    Google Scholar 

  30. 30.

    Parker JG, Rubin KH, Price JM, De Rosier ME (1995) Peer relationships, child development, and adjustment: a developmental psychopathology perspective. In: Cicchetti D, Cohen DJ (eds) Developmental psychopathology, vol 2. Wiley, New York

    Google Scholar 

  31. 31.

    Nangle DW, Erdley CA (2001) The role of friendship in psychological adjustment, new directions for child and adolescent development, vol 21. Josseybass, San Francisco, CA

    Google Scholar 

  32. 32.

    Weinberg G, Driscoll S (2006) Robot–human interaction with an anthropomorphic percussionist. In: Proceedings of international ACM computer human interaction conference (CHI), pp 1229–1232

  33. 33.

    Crick C, Munz M, Scassellati B (2006) Synchronization in social tasks: Robotic drumming. Proceedings of IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN) 346-353

  34. 34.

    Kose-Bagci H, Dautenhahn K, Syrdal DS, Nehaniv CL (2010) Drum-mate: interaction dynamics and gestures in human–humanoid drumming experiments. Connect Sci 22(2):103–134

    Article  Google Scholar 

  35. 35. Last Accessed 23 Aug 2020

  36. 36. Last Accessed 23 Aug 2020

  37. 37.

    Adamo-Villani N (2006) A virtual learning environment for deaf children: design and evaluation. IJASET Int J Appl Sci Eng Technol 16:1823

    Google Scholar 

  38. 38.

    Weaver KA, Hamilton H, Zafrulla Z, Brashear H, Starner T, Presti P, Bruckman A (2010) Improving the language ability of deaf signing children through an interactive American sign language-based video game. In: 9th international conference of the learning sciences, pp 306–307

  39. 39.

    Greenbacker C, McCoy K (2008) The ICICLE project: an overiew. In: First annual computer science research day, Department of Computer and Information Sciences, University of Delaware

  40. 40. Last accessed 23 Aug 2020

  41. 41.

    Bruner JS (1990) Acts of meaning. Harvard University Press, Cambridge, MA

    Google Scholar 

  42. 42.

    Powell S (2000) Helping children with autism to learn. David Fulton Publishers, London, UK

    Google Scholar 

  43. 43.

    Hakkarainen P (2003) Play and motivation. In: Engestrm Y, Meittinen R, Punamaki R (eds) Perspectives on activity theory. Cambridge University Press, Cambridge, UK

    Google Scholar 

  44. 44.

    Kose H, Yorganci R, Algan HE, Syrdal DS (2012) Evaluation of the robot assisted sign language tutoring using video-based studies. Int J Soc Robot 4(3):273–283

    Article  Google Scholar 

  45. 45.

    Ertugrul BS, Kivrak H, Daglarli E, Kulaglic A, Tekelioglu A, Kavak S, Ozkul A, Yorganci R, Kose H (2012) iSign: interaction games for humanoid assisted sign language tutoring. In: Proceedings of international Workshop on human-agent nteraction

  46. 46.

    Oshita M, Ishikawa H (2012) Gamepad versus tuchscreen: a comparison of action selection interfaces in computer games. In: Proceedings of the workshop at SIGGRAPH Asia (WASA ’12), pp 27–31

  47. 47.

    Fridin M, Belokopytov M, Embodied Robot versus Virtual Agent (2014) Involvement of preschool children in motor task performance. Int J Human–Comput Interact 30–6:459–469

    Article  Google Scholar 

  48. 48.

    Li Jamy (2015) The benefit of being physically present: a survey of experimental works comparing copresent robots, telepresent robots and virtual agents. Int J Human–Comput Stud 77:23–37

    Article  Google Scholar 

  49. 49.

    Kose H, Uluer P, Akalin N, Yorganci R, Ozkul A, Ince G (2015) The effect of embodiment in sign language tutoring with assistive humanoid robots. Int J Soc Robot 7(4):537–548

    Article  Google Scholar 

  50. 50.

    Hyun E, Kim S, Jang S, Park S (2008) Comparative study of effects of language instruction program using intelligence robot and multimedia on linguistic ability of young children. In: 17th IEEE international symposium on robot and human interactive communication

  51. 51.

    Ince G, Nakajima H, Nakamura K, Rodemann T, Nakadai K, Imura J (2011) Assesment of single-channel noise estimation methods for ego noise. In: IEEE/RSJ international conference on robots and intelligent systems (IROS), pp 131–136

  52. 52.

    Pinelle D, Wong N, Stach T (2008) Heuristic evaluation for games: usability principles for video game design. In: Conference on human factors in computing systems (CHI 2008), pp 1453–1462

  53. 53.

    Bartneck C, Croft E, Kulic D (2009) Measurement instruments for the anthropomorphism, animacy, likeability, perceived intelligence, and perceived safety of robots. Int J Soc Robot 1(1):71–81

    Article  Google Scholar 

  54. 54.

    Kose-Bagci H, Dautenhahn K, Syrdal DS, Nehaniv CL (2010) Drum-mate: interaction dynamics and gestures in human–humanoid drumming experiments. Connect Sci 22(2):103–134

    Article  Google Scholar 

  55. 55.

    Kose H, Yorganci R (2011) Tale of a robot: humanoid robot assisted sign language tutoring. In: Proceedings of 11th IEEE-RAS international conference on humanoid robots (HUMANOIDS 2011), pp 105–111

  56. 56.

    Brooke J (1996) SUS: a quick and dirty usability scale. In: Jordan PW, Thomas B, Weerdmeester BA, McClelland AL (eds) Usability evaluation in industry. Taylor and Francis, London

    Google Scholar 

  57. 57.

    Lim A, Ogata T, Okuno HG (2012) Towards expressive musical robots: a cross-modal framework for emotional gesture, voice and music. EURASIP J Audio Speech Music Process vol 8

  58. 58.

    Weinberg G, Blosser B, Mallikarjuna T, Raman A (2009) The creation of a multi-human, multi-robot interactive jam session. In: Proceedings of international conference on new interfaces of musical expression, pp 70–73

  59. 59.

    Nakamura K, Nakadai K, Ince G (2012) Real-time super-resolution sound source localization for robots. In: Proceedings of the IEEE/RSJ international conference on robots and intelligent systems, pp 694–699

  60. 60.

    Nakadai K, Ince G, Nakamura K, Nakajima H (2012) Robot audition for dynamic environments pp 125–130

  61. 61.

    Ince G, Nakadai K, Rodemann T, Tsujino H, Imura J (2011) Ego-motion noise cancellation of a robot using missing feature masks. Appl Intell 34(3):360–371

    Article  Google Scholar 

Download references


This work is supported by the Scientific Research Project Unit (BAP) of Istanbul Technical University, Project Number: MOA-2019-42321.

Author information



Corresponding author

Correspondence to Gökhan Ince.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ince, G., Yorganci, R., Ozkul, A. et al. An audiovisual interface-based drumming system for multimodal human–robot interaction. J Multimodal User Interfaces 15, 413–428 (2021).

Download citation


  • Human–robot-interaction (HRI)
  • Audio processing
  • Drum game
  • Non-verbal communication
  • Audiovisual interaction
  • Multimodality
  • Human–computer-interaction (HCI)