Journal on Multimodal User Interfaces

, Volume 7, Issue 1–2, pp 157–170 | Cite as

A multi-modal dance corpus for research into interaction between humans in virtual environments

  • Slim Essid
  • Xinyu Lin
  • Marc Gowing
  • Georgios Kordelas
  • Anil Aksay
  • Philip Kelly
  • Thomas Fillon
  • Qianni Zhang
  • Alfred Dielmann
  • Vlado Kitanovski
  • Robin Tournemenne
  • Aymeric Masurelle
  • Ebroul Izquierdo
  • Noel E. O’Connor
  • Petros Daras
  • Gaël Richard
Original Paper

Abstract

We present a new, freely available, multimodal corpus for research into, amongst other areas, real-time realistic interaction between humans in online virtual environments. The specific corpus scenario focuses on an online dance class application scenario where students, with avatars driven by whatever 3D capture technology is locally available to them, can learn choreographies with teacher guidance in an online virtual dance studio. As the dance corpus is focused on this scenario, it consists of student/teacher dance choreographies concurrently captured at two different sites using a variety of media modalities, including synchronised audio rigs, multiple cameras, wearable inertial measurement devices and depth sensors. In the corpus, each of the several dancers performs a number of fixed choreographies, which are graded according to a number of specific evaluation criteria. In addition, ground-truth dance choreography annotations are provided. Furthermore, for unsynchronised sensor modalities, the corpus also includes distinctive events for data stream synchronisation. The total duration of the recorded content is 1 h and 40 min for each single sensor, amounting to 55 h of recordings across all sensors. Although the dance corpus is tailored specifically for an online dance class application scenario, the data is free to download and use for any research and development purposes.

Keywords

Dance Multimodal data Multiview video processing Audio Depth maps Motion Inertial sensors Synchronisation Activity recognition Virtual reality Computer vision Machine listening 

References

  1. 1.
  2. 2.
    Openni (2011). http://www.openni.org/
  3. 3.
    Alexiadis D, Kelly P, Daras P, O’Connor N, Boubekeur T, Moussa MB (2011) Evaluating a dancer’s performance using kinect-based skeleton tracking. In: ACMR, pp 659–662Google Scholar
  4. 4.
    Alonso M, Richard G, David B (2005) Extracting note onsets from musical recordings. In: IEEE International Conference on Multimedia and Expo. IEEE Computer Society, Los Alamitos, USA. http://doi.ieeecomputersociety.org/10.1109/ICME.2005.1521568. ISBN 0-7803-9331-7
  5. 5.
    Altun K, Barshan B (2010) Human activity recognition using inertial/magnetic sensor units. In: HBU, pp 38–51Google Scholar
  6. 6.
    Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Actions as space-time shapes. In: ICCV, vol 3, pp 1395–1402.Google Scholar
  7. 7.
    Cannam C, Landone C, Sandler M (2010) Sonic visualiser: an open source application for viewing, analysing and annotating music audio files. In: Proceedings of the ACM multimedia 2010 international conference, Firenze, Italy, October 2010, pp 1467– 1468.Google Scholar
  8. 8.
    Eichner M, Marin-Jimenez M, Zisserman A, Ferrari V (2012) 2D articulated human pose estimation and retrieval in (almost) unconstrained still images. Int J Comput Vis 99:190–214MathSciNetCrossRefGoogle Scholar
  9. 9.
    Essid S, Alexiadis D, Tournemenne R, Gowing M, Kelly P, Monhagan D, Daras P, Dremeau A, O’Connor NE (2012) An advanced virtual dance performance evaluator. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Kyoto, JapanGoogle Scholar
  10. 10.
    Essid S, Lin X, Gowing M, Kordelas G, Aksay A, Kelly P, Fillon T, Zhang Q, Dielmann A, Kitanovski V, Tournemenne R, O’Connor NE, Daras P (2011) Richard G (2011) A multimodal dance corpus for research into real-time interaction between humans in online virtual environments. In: ICMI workshop on multimodal corpora for machine learning, Alicante, SpainGoogle Scholar
  11. 11.
    Gkalelis N, Kim H, Hilton A, Nikolaidis N, Pitas I (2009) The i3dpost multi-view and 3d human action/interactions. In: CMVP, pp 159–168Google Scholar
  12. 12.
    Gowing M, Kell P, O’Connor N, Concolato C, Essid S, Lefeuvre J, Tournemenne R, Izquierdo E, Kitanovski V, Lin X, Zhang Q (2011) Enhanced visualisation of dance performance from automatically synchronised multimodal recordings. In: ACMR, pp 667–670Google Scholar
  13. 13.
    Gross R, Shik J (2001) The cmu motion of body (mobo) database. Technical report.Google Scholar
  14. 14.
    Hofmann M, Gavrila D (2012) Multi-view 3d human pose estimation in complex environment. IJCV 96(1):103–124MathSciNetCrossRefGoogle Scholar
  15. 15.
    Ji X, Liu H (2010) Advances in view-invariant human motion analysis: a review. SMC Part C 40(1):13–24Google Scholar
  16. 16.
    Messing R, Pal C, Kautz H (2009) Activity recognition using the velocity histories of tracked keypoints. In: IEEE 12th International Conference on Computer Vision, Kyoto, JapanGoogle Scholar
  17. 17.
    Pons-Moll G, Baak A, Helten T, Mueller M, Seidel H, Rosenhahn B (2010) Multisensor-fusion for 3d full-body human motion capture. In: CVPR, pp 663–670Google Scholar
  18. 18.
    Raptis M, Kirovski D, Hoppe H (2011) Real-time classification of dance gestures from skeleton animation. In: ACM/SIGGRAPH SCAGoogle Scholar
  19. 19.
    Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. 17th International Conference on Pattern Recognition. Cambridge, UK. 3, pp 32–36Google Scholar
  20. 20.
    Schwarz L, Mateus D, Navab N (2012) Recognizing multiple human activities and tracking full-body pose in unconstrained environments. Pattern Recognit 45(1):11–23CrossRefGoogle Scholar
  21. 21.
    Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011) Real-time human pose recognition in parts from single depth images. In: CVPRGoogle Scholar
  22. 22.
    Sigal L, Balan AO, Black MJ (2010) Humaneva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int J Comput Vis 87(1):4–27CrossRefGoogle Scholar
  23. 23.
    Singh S, Velastin S, Ragheb H (2010) Muhavi: a multicamera human action video dataset for the evaluation of action recognition methods. In: AVSS, pp 48–55Google Scholar
  24. 24.
    Wang Y, Huang K, Tan T (2007) Human activity recognition based on r transform. In: CVPR, pp 1–8 Google Scholar
  25. 25.
    Ushizaki KDM, Okatani T (2006) Video synchronization based on co-occurrence of appearance changes in video sequence. In: ICPRGoogle Scholar
  26. 26.
    Weinland D, Ronfard R, Boyer E (2006) Free viewpoint action recognition using motion history volumes. CVIU 104(2–3):249–257Google Scholar
  27. 27.
    Yang A, Jarafi R, Kuryloski P, Iyengar S, Sastry S, Bajcsy R (2008) Distributed segmentation and classification of human actions using a wearable motion sensor network. In: CVPRW, pp 1–8Google Scholar

Copyright information

© OpenInterface Association 2012

Authors and Affiliations

  • Slim Essid
    • 1
  • Xinyu Lin
    • 2
  • Marc Gowing
    • 3
  • Georgios Kordelas
    • 2
    • 4
  • Anil Aksay
    • 2
  • Philip Kelly
    • 3
  • Thomas Fillon
    • 1
  • Qianni Zhang
    • 2
  • Alfred Dielmann
    • 1
  • Vlado Kitanovski
    • 2
  • Robin Tournemenne
    • 1
  • Aymeric Masurelle
    • 1
  • Ebroul Izquierdo
    • 2
  • Noel E. O’Connor
    • 3
  • Petros Daras
    • 4
  • Gaël Richard
    • 1
  1. 1.Institut Telecom/Telecom ParisTech, CNRS-LTCIParisFrance
  2. 2.Multimedia and Vision Group (MMV)Queen Mary UniversityLondonUK
  3. 3.CLARITY, Centre for Sensor Web TechnologiesDublin City UniversityDublinIreland
  4. 4.Centre for Research and Technology-HellasInformatics and Telematics InstituteThessaloníkiGreece

Personalised recommendations