Multimedia Tools and Applications

, Volume 78, Issue 14, pp 19201–19227 | Cite as

Crowdsourcing authoring of sensory effects on videos

  • Marcello Novaes de AmorimEmail author
  • Estêvão Bissoli Saleme
  • Fábio Ribeiro de Assis Neto
  • Celso A. S. Santos
  • Gheorghita Ghinea


Human perception is inherently multi-sensorial involving five traditional senses: sight, hearing, touch, taste, and smell. In contrast to traditional multimedia, based on audio and visual stimuli, mulsemedia seek to stimulate all the human senses. One way to produce multi-sensorial content is authoring videos with sensory effects. These effects are represented as metadata attached to the video content, which are processed and rendered through physical devices into the user’s environment. However, creating sensory effects metadata is not a trivial activity because authors have to identify carefully different details in a scene such as the exact point where each effect starts, finishes, and also its presentation features such as intensity, direction, etc. It is a subjective task that requires accurate human perception and time. In this article, we aim at finding out whether a crowdsourcing approach is suitable for authoring coherent sensory effects associated with video content. Our belief is that the combination of a collective common sense to indicate time intervals of sensory effects with an expert fine-tuning is a viable way to generate sensory effects from the point of view of users. To carry out the experiment, we selected three videos from a public mulsemedia dataset, sent them to the crowd through a cascading microtask approach. The results showed that the crowd can indicate intervals in which users agree that there should be insertions of sensory effects, revealing a way of sharing authoring between the author and the crowd.


Mulsemedia content Sensory effects MPEG-V metadata Crowdsourcing Multimedia authoring Multimedia annotation 



This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES), the Brazilian National Council for Scientific and Technological Development (CNPq), and the Fundação de Amparo à Pesquisa e Inovação do Espírito Santo (FAPES). Estêvão Bissoli Saleme thankfully acknowledges support from the Federal Institute of Espírito Santo. Prof. Gheorghita Ghinea gratefully acknowledges funding from the European Union’s Horizon 2020 Research and Innovation programme under Grant Agreement no. 688503 for the NEWTON project (


  1. 1.
    Ademoye OA, Murray N, Muntean GM, Ghinea G (2016) Audio masking effect on inter-component skews in olfaction-enhanced multimedia presentations. ACM Trans Multimedia Comput Commun Appl 12(4):51:1–51:14. CrossRefGoogle Scholar
  2. 2.
    Amorim MN, Neto FRA, Santos CAS (2018) Achieving complex media annotation through collective wisdom and effort from the crowd. In: 2018 25th international conference on systems, signals and image processing (IWSSIP). IEEE, pp 1–5.
  3. 3.
    Ballan L, Bertini M, Del Bimbo A, Seidenari L, Serra G (2011) Event detection and recognition for semantic annotation of video. Multimedia Tools Appl 51(1):279–302. CrossRefGoogle Scholar
  4. 4.
    Bartocci S, Betti S, Marcone G, Tabacchiera M, Zanuccoli F, Chiari A (2015) A novel multimedia-multisensorial 4d platform. In: AEIT International annual conference (AEIT), 2015. IEEE, pp 1–6.
  5. 5.
    Chen J, Yao T, Chao H (2018) See and chat: automatically generating viewer-level comments on images. MTAP: Multimedia Tools Appl, 1–14.
  6. 6.
    Cho H (2010) Event-based control of 4d effects using mpeg rose. Master’s thesis, School of Mechanical, Aerospace and Systems Engineering, Division of Mechanical Engineering. Korea Advanced Institute of Science and Technology. Master’s ThesisGoogle Scholar
  7. 7.
    Choi B, Lee ES, Yoon K (2011) Streaming media with sensory effect. In: 2011 international conference on information science and applications (ICISA). IEEE, pp 1–6.
  8. 8.
    Chowdhury SN, Tandon N, Weikum G (2016) Know2look: commonsense knowledge for visual search. In: Proceedings of the 5th workshop on automated knowledge base construction, pp 57–62Google Scholar
  9. 9.
    Covaci A, Zou L, Tal I, Muntean GM, Ghinea G (2018) Is multimedia multisensorial?-a review of mulsemedia systems. ACM Comput Survey (CSUR) 51(5):91CrossRefGoogle Scholar
  10. 10.
    Cross A, Bayyapunedi M, Ravindran D, Cutrell E, Thies W (2014) Vidwiki: enabling the crowd to improve the legibility of online educational videos. In: Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing. ACM, pp 1167–1175Google Scholar
  11. 11.
    Di Salvo R, Spampinato C, Giordano D (2016) Generating reliable video annotations by exploiting the crowd. In: IEEE Winter conf. on applications of computer vision (WACV). IEEE, pp 1–8
  12. 12.
    Dumitrache A, Aroyo L, Welty C, Sips RJ, Levas A (2013) A.: ”dr. detective”: combining gamification techniques and crowdsourcing to create a gold standard in medical text. 16–31Google Scholar
  13. 13.
    Egan D, Brennan S, Barrett J, Qiao Y, Timmerer C, Murray N (2016) An evaluation of heart rate and electrodermal activity as an objective qoe evaluation method for immersive virtual reality environments. In: 8th international conference on quality of multimedia experience (qoMEX’16).
  14. 14.
    Foncubierta Rodríguez A, Müller H (2012) Ground truth generation in medical imaging: a crowdsourcing-based iterative approach. In: Proceedings of the ACM multimedia 2012 workshop on crowdsourcing for multimedia, CrowdMM ’12. ACM, New York, pp 9–14.
  15. 15.
    Galton F (1907) Vox populi (the wisdom of crowds). Nature 75(7):450–451CrossRefzbMATHGoogle Scholar
  16. 16.
    Ghinea G, Timmerer C, Lin W, Gulliver SR (2014) Mulsemedia: State of the art, perspectives, and challenges. ACM Trans Multimedia Comput Commun Appl 11(1s):17:1–17:23. Google Scholar
  17. 17.
    Gottlieb L, Choi J, Kelm P, Sikora T, Friedland G (2012) Pushing the limits of mechanical turk: qualifying the crowd for video geo-location. In: Proceedings of the ACM multimedia 2012 workshop on crowdsourcing for multimedia. ACM, pp 23–28Google Scholar
  18. 18.
    Hardman L, Obrenović ž, Nack F, Kerhervé B, Piersol K (2008) Canonical processes of semantically annotated media production. Multimedia Syst 14(6):327–340. CrossRefGoogle Scholar
  19. 19.
    Kim S, Han J (2014) Text of white paper on mpeg-v. Tech. Rep ISO/IEC JTC 1/SC 29/WG 11 W14187, San Jose, USAGoogle Scholar
  20. 20.
    Kim SK (2013) Authoring multisensorial content. Signal Process Image Commun 28(2):162–167. CrossRefGoogle Scholar
  21. 21.
    Kim SK, Yang SJ, Ahn CH, Joo YS (2014) Sensorial information extraction and mapping to generate temperature sensory effects. ETRI J 36(2):224–231. CrossRefGoogle Scholar
  22. 22.
    Lasecki W, Miller C, Sadilek A, Abumoussa A, Borrello D, Kushalnagar R, Bigham J (2012) Real-time captioning by groups of non-experts. In: Proceedings of the 25th annual ACM symposium on User interface software and technology - UIST ’12, UIST ’12. ACM Press, New York, pp 23–33.
  23. 23.
    Masiar A, Simko J (2015) Short video metadata acquisition game. In: 10th international workshop on semantic and social media adaptation and personalization (SMAP). IEEE, pp 61–65.
  24. 24.
    McNaney R, Othman M, Richardson D, Dunphy P, Amaral T, Miller N, Stringer H, Olivier P, Vines J (2016) Speeching: mobile crowdsourced speech assessment to support self-monitoring and management for people with parkinson’s. In: Proceedings of the 2016 CHI conference on human factors in computing sys - CHI ’16, CHI ’16. ACM Press, New York, pp 4464–4476.
  25. 25.
    Murray N, Lee B, Qiao Y, Muntean GM (2016) The influence of human factors on olfaction based mulsemedia quality of experience.
  26. 26.
    Neto FRA, Santos CAS (2018) Understanding crowdsourcing projects: a systematic review of tendencies, workflow, and quality management. Inf Process Manag 54(4):490–506. CrossRefGoogle Scholar
  27. 27.
    Oh HW, Huh JD (2017) Auto generation system of mpeg-v motion sensory effects based on media scene. In: 2017 IEEE international conference on consumer electronics (ICCE). IEEE, pp 160–163.
  28. 28.
    Rainer B, Waltl M, Cheng E, Shujau M, Timmerer C, Davis S, Burnett I, Ritz C, Hellwagner H (2012) Investigating the impact of sensory effects on the quality of experience and emotional response in web videos. In: 4th international workshop on quality of multimedia experience (qoMEX). IEEE, pp 278–283.
  29. 29.
    Sadallah M, Aubert O, Prié Y (2014) Chm: an annotation- and component-based hypervideo model for the web. Multimed Tools Appl 70(2):869–903. CrossRefGoogle Scholar
  30. 30.
    Saleme EB, Celestrini JR, Santos CAS (2017) Time evaluation for the integration of a gestural interactive application with a distributed mulsemedia platform. In: Proceedings of the 8th ACM on multimedia systems conference, MMSys’17. ACM, New York, pp 308–314.
  31. 31.
    Saleme EB, Santos CAS, Ghinea G (2018) Coping with the challenges of delivering multiple sensorial media. IEEE MultiMedia, 1–1.
  32. 32.
    Shin SH, Ha KS, Yun HO, Nam YS (2016) Realistic media authoring tool based on mpeg-v international standard. In: 2016 8th international conference on ubiquitous and future networks (ICUFN). IEEE, pp 730–732.
  33. 33.
    Taborsky E, Allen K, Blanton A, Jain AK, Klare BF (2015) Annotating unconstrained face imagery: a scalable approach. In: International conference on biometrics (ICB). IEEE, pp 264–271.
  34. 34.
    Teki S, Kumar S, Griffiths TD (2016) Large-scale analysis of auditory segregation behavior crowdsourced via a smartphone app. PLos ONE, 11(4).
  35. 35.
    Timmerer C, Waltl M, Rainer B, Hellwagner H (2012) Assessing the quality of sensory experience for multimedia presentations. Signal Process Image Commun 27(8):909–916. CrossRefGoogle Scholar
  36. 36.
    van Holthoon F, Olson D (1987) Common sense: the foundations for social science. Common sense. University Press of America, LanhamGoogle Scholar
  37. 37.
    Waltl M, Rainer B, Timmerer C, Hellwagner H (2013) An end-to-end tool chain for sensory experience based on mpeg-v. Signal Process Image Commun 28(2):136–150. CrossRefGoogle Scholar
  38. 38.
    Waltl M, Timmerer C, Hellwagner H (2010) Improving the quality of multimedia experience through sensory effects. In: Second international workshop on quality of multimedia experience (qoMEX). IEEE, pp 124–129Google Scholar
  39. 39.
    Waltl M, Timmerer C, Rainer B, Hellwagner H (2012) Sensory effect dataset and test setups. In: 4th international workshop on quality of multimedia experience (qoMEX). IEEE, pp 115–120.
  40. 40.
    Yuan Z, Bi T, Muntean GM, Ghinea G (2015) Perceived synchronization of mulsemedia services. IEEE Trans Multimedia 17(7):957–966. CrossRefGoogle Scholar
  41. 41.
    Yue T, Wang H, Cheng S (2018) Learning from users: a data-driven method of qoe evaluation for internet video. MTAP: Multimedia Tools Appl, 1–32.
  42. 42.
    Zhai H, Lingren T, Deleger L, Li Q, Kaiser M, Stoutenborough L, Solti I (2013) Web 2.0-based crowdsourcing for high-quality gold standard development in clinical natural language processing. J Med Internet Res 15(4):1–17. CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Federal University of Espírito SantoVitória-ESBrazil
  2. 2.Brunel University LondonUxbridgeEngland

Personalised recommendations