Large-scale Data Collection and Analysis via a Gamified Intelligent Crowdsourcing Platform

Hantke, Simone; Olenyi, Tobias; Hausner, Christoph; Appel, Tobias; Schuller, Björn

doi:10.1007/s11633-019-1180-0

Large-scale Data Collection and Analysis via a Gamified Intelligent Crowdsourcing Platform

Research Article
Published: 06 June 2019

Volume 16, pages 427–436, (2019)
Cite this article

International Journal of Automation and Computing Aims and scope Submit manuscript

Simone Hantke ORCID: orcid.org/0000-0002-9606-2913^1,2,
Tobias Olenyi¹,
Christoph Hausner³,
Tobias Appel³ &
…
Björn Schuller^1,4

233 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

In this contribution, we present iHEARu-PLAY, an online, multi-player platform for crowdsourced database collection and labelling, including the voice analysis application (VoiLA), a free web-based speech classification tool designed to educate iHEARu-PLAY users about state-of-the-art speech analysis paradigms. Via this associated speech analysis web interface, in addition, VoiLA encourages users to take an active role in improving the service by providing labelled speech data. The platform allows users to record and upload voice samples directly from their browser, which are then analysed in a state-of-the-art classification pipeline. A set of pre-trained models targeting a range of speaker states and traits such as gender, valence, arousal, dominance, and 24 different discrete emotions is employed. The analysis results are visualised in a way that they are easily interpretable by laymen, giving users unique insights into how their voice sounds. We assess the effectiveness of iHEARu-PLAY and its integrated VoiLA feature via a series of user evaluations which indicate that it is fun and easy to use, and that it provides accurate and informative results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Development and Evaluation of the Emotional Slovenian Speech Database - EmoLUKS

Towards speech quality assessment using a crowdsourcing approach: evaluation of standardized methods

Article Open access 22 November 2020

Voice analytics in the wild: Validity and predictive accuracy of common audio-recording devices

Article Open access 30 May 2023

References

V. Ambati, S. Vogel, J. Carbonell. Active learning and crowd-sourcing for machine translation. In Proceedings of the 7th International Conference on Language Resources and Evaluation, Association for Computational Linguistics, Valletta, Malta, 2010.
Google Scholar
V. C. Raykar, S. P. Yu, L. H. Zhao, G. H. Valadez, C. Florin, L. Bogoni, L. Moy. Learning from crowds. Journal of Machine Learning Research, vol. 11, pp. 1297–1322, 2010.
MathSciNet Google Scholar
A. Kittur, E. H. Chi, B. Suh. Crowdsourcing for usability: Using micro-task markets for rapid, remote, and low-cost user measurements. In Proceedings of ACM Conference on Human Factors in Computing Systems, ACM, Florence, Italy, pp. 1–4, 2008.
Google Scholar
A. Tarasov, S. J. Delany, C. Cullen. Using crowdsourcing for labelling emotional speech assets. In Proceedings of W3C Workshop on Emotion Markup Language, Telecom ParisTech, Paris, France, 2010. DOI: https://doi.org/10.21427/D7RS4G.
Google Scholar
B. Settles. Active Learning Literature Survey, Computer Sciences Technical Report 1648, University of Wisconsin-Madison, Madison, USA, 2009.
Google Scholar
J. Howe. The rise of crowdsourcing. Wired Magazine, vol. 14, no. 6, pp. 1–4, 2006.
Google Scholar
M. Eskénazi, G. A. Levow, H. Meng, G. Parent, D. Suendermann. Crowdsourcing for Speech Processing: Applications to Data Collection, Transcription and Assessment, Chichester, UK: Wiley, 2013.
Google Scholar
X. J. Niu, S. F. Qin, J. Vines, R. Wong, H. Lu. Key crowdsourcing technologies for product design and development. International Journal of Automation and Computing, vol. 16, no. 1, pp. 1–15, 2019. DOI: https://doi.org/10.1007/s11633-018-1138-7.
Google Scholar
A. Burmania, S. Parthasarathy, C. Busso. Increasing the reliability of crowdsourcing evaluations using online quality assessment. IEEE Transactions on Affective Computing, vol. 7, no. 4, pp. 374–388, 2016. DOI: https://doi.org/10.1109/TAFFC.2015.2493525.
Google Scholar
S. Hantke, E. Marchi, B. Schuller. Introducing the weighted trustability evaluator for crowdsourcing exemplified by speaker likability classification. In Proceedings of the 10th Language Resources and Evaluation Conference, Association for Computational Linguistics, Portorož, Slovenia, pp. 2156–2161, 2016.
Google Scholar
O. F. Zaidan, C. Callison-Burch. Crowdsourcing translation: Professional quality from non-professionals. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACM, Portland, USA, pp. 1220–1229, 2011.
Google Scholar
R. R. Morris, D. McDuff. Crowdsourcing techniques for affective computing. The Oxford Handbook of Affective Computing, R. A. Calvo, S. D’Mello, J. Gratch, A. Kappas, Eds., Oxford, UK: Oxford University Press, pp. 384–394, 2015.
Google Scholar
P. Y. Hsueh, P. Melville, V. Sindhwani. Data quality from crowdsourcing: A study of annotation selection criteria. In Proceedings of NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing, ACM, Boulder, USA, pp. 27–35, 2009.
Google Scholar
S. Hantke, Z. X. Zhang, B. Schuller. Towards intelligent crowdsourcing for audio data annotation: Integrating active learning in the real world. In Proceedings of the 18th Annual Conference of the International Speech Communication Association, ISCA, Stockholm, Sweden, pp. 3951–3955, 2017.
Google Scholar
S. Hantke, A. Abstreiter, N. Cummins, B. Schuller. Trustability-based dynamic active learning for crowdsourced labelling of emotional audio data. IEEE Access, vol. 6, pp. 42142–42155, 2018. DOI: https://doi.org/10.1109/ACCESS.2018.2858931.
Google Scholar
R. Snow, B. O’Connor, D. Jurafsky, A. Y. Ng. Cheap and fast—but is it good? Evaluating non-expert annotations for natural language tasks. In Proceedings of Conference on Empirical Methods in Natural Language Processing, ACM, Honolulu, USA, pp. 254–263, 2008.
Google Scholar
S. Hantke, F. Eyben, T. Appel, B. Schuller. iHEARuPLAY: Introducing a game for crowdsourced data collection for affective computing. In Proceedings of International Conference on Affective Computing and Intelligent Interaction, IEEE, Xi’an, China, pp. 891–897, 2015. DOI: https://doi.org/10.1109/ACII.2015.7344680.
Google Scholar
S. Hantke, T. Olenyi, C. Hausner, B. Schuller. VoiLA: An online intelligent speech analysis and collection platform. In Proceedings of the 1st Asian Conference on Affective Computing and Intelligent Interaction, IEEE, Beijing, China, pp. 1–5, 2018. DOI: https://doi.org/10.1109/ACIIAsia.2018.8470383.
Google Scholar
S. Hantke, T. Appel, B. Schuller. The inclusion of gamification solutions to enhance user enjoyment on crowdsourcing platforms. In Proceedings of the 1st Asian Conference on Affective Computing and Intelligent Interaction, IEEE, Beijing, China, pp. 1–6, 2018. DOI: https://doi.org/10.1109/ACIIAsia.2018.8470330.
Google Scholar
J. Howe. Crowdsourcing: Why the Power of the Crowd is Driving the Future of Business, New York, USA: Crown Business, 2009.
Google Scholar
B. M. Good, A. I. Su. Games with a scientific purpose. Genome Biology, vol. 12, no. 12, pp. 135, 2011. DOI: https://doi.org/10.1186/gb-2011-12-12-135.
Google Scholar
L. von Ahn. Games with a purpose. Computer, vol. 39, no. 6, pp. 92–94, 2006. DOI: https://doi.org/10.1109/MC.2006.196.
Google Scholar
E. L. Law, L. von Ahn, R. B. Dannenberg, M. Crawford. TagATune: A game for music and sound annotation. In Proceedings of International Conference on Music Information Retrieval, Vienna, Austria, pp. 361–364, 2007.
L. von Ahn, R. R. Liu, M. Blum. Peekaboom: A game for locating objects in images. In Proceedings of SIGCHI Conference on Human Factors in Computing Systems, ACM, Montréal, Canada, pp. 55–64, 2006. DOI: https://doi.org/10.1145/1124772.1124782.
Google Scholar
S. Hacker, L. von Ahn. Matchin: Eliciting user preferences with an online game. In Proceedings of SIGCHI Conference on Human Factors in Computing Systems, ACM, Boston, USA, pp. 1207–1216, 2009. DOI: https://doi.org/10.1145/1518701.1518882.
Google Scholar
P. Dulačka, J. Šimko, M. Bieliková. Validation of music metadata via game with a purpose. In Proceedings of the 8th International Conference on Semantic Systems, ACM, Graz, Austria, pp. 177–180, 2012. DOI: https://doi.org/10.1145/2362499.2362526.
Google Scholar
G. Walsh, J. Golbeck. Curator: A game with a purpose for collection recommendation. In Proceedings of SIGCHI Conference on Human Factors in Computing Systems, ACM, Atlanta, USA, pp. 2079–2082, 2010. DOI: https://doi.org/10.1145/1753326.1753643.
Google Scholar
N. J. Venhuizen, V. Basile, K. Evang, J. Bos. Gamification for word sense labeling. In Proceedings of the 10th International Conference on Computational Semantics, Association for Computational Linguistics, Potsdam, Germany, pp. 397–403, 2013.
Google Scholar
C. Wieser, F. Bry, A. Berárd, R. Lagrange. ARTigo: Building an artwork search engine with games and higher-order latent semantic analysis. In Proceedings of the 1st AAAI Conference on Human Computation and Crowdsourcing, AAAI, Palm Springs, USA, pp. 15–20, 2013.
Google Scholar
C. Wieser. Building a Semantic Search Engine with Games and Crowdsourcing, Ph. D. dissertation, Ludwig-Maximilians-Universität, München, Germany, 2014.
Google Scholar
V. Sethu, E. Ambikairajah, J. Epps. Empirical mode decomposition based weighted frequency feature for speech-based emotion classification. In Proceedings of International Conference on Acoustics, Speech and Signal Processing, IEEE, Las Vegas, USA, pp. 5017–5020, 2008. DOI: https://doi.org/10.1109/ICASSP.2008.4518785.
Google Scholar
A. Stuhlsatz, C. Meyer, F. Eyben, T. Zielke, G. Meier, B. Schuller. Deep neural networks for acoustic emotion recognition: Raising the benchmarks. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, Prague, Czech Republic, pp. 5688–5691, 2011. DOI: https://doi.org/10.1109/ICASSP.2011.5947651.
Google Scholar
C. Busso, S. Lee, S. Narayanan. Analysis of emotionally salient aspects of fundamental frequency for emotion detection. IEEE Transactions on Audio, Speech, and Language Processing, vol. 17, no. 4, pp. 582–596, 2009. DOI: https://doi.org/10.1109/TASL.2008.2009578.
Google Scholar
D. Bitouk, R. Verma, A. Nenkova. Class-level spectral features for emotion recognition. Speech Communication, vol. 52, no. 7–8, pp. 613–625, 2010. DOI: https://doi.org/10.1016/j.specom.2010.02.010.
Google Scholar
J. S. Park, J. H. Kim, Y. H. Oh. Feature vector classification based speech emotion recognition for service robots. IEEE Transactions on Consumer Electronics, vol. 55, no. 3, pp. 1590–1596, 2009. DOI: https://doi.org/10.1109/TCE.2009.5278031.
Google Scholar
P. Rani, C. C. Liu, N. Sarkar, E. Vanman. An empirical study of machine learning techniques for affect recognition in human-robot interaction. Pattern Analysis and Applications, vol. 9, no. 1, pp. 58–69, 2006. DOI: https://doi.org/10.1007/s10044-006-0025-y.
Google Scholar
T. M. Wang, Y. Tao, H. Liu. Current researches and future development trend of intelligent robot: A review. International Journal of Automation and Computing, vol. 15, no. 5, pp. 525–546, 2018. DOI: https://doi.org/10.1007/s11633-018-1115-1.
Google Scholar
L. Vidrascu, L. Devillers. Detection of real-life emotions in call centers. In Proceedings of the 9th European Conference on Speech Communication and Technology, ISCA, Lisbon, Portugal, pp. 1841–1844, 2005.
Google Scholar
Z. J. Yao, J. Bi, Y. X. Chen. Applying deep learning to individual and community health monitoring data: A survey. International Journal of Automation and Computing, vol. 15, no. 6, pp. 643–655, 2018. DOI: https://doi.org/10.1007/s11633-018-1136-9.
Google Scholar
B. Lecouteux, M. Vacher, F. Portet. Distant speech recognition in a smart home: Comparison of several multisource ASRs in realistic conditions. In Proceedings of the 12th Annual Conference of the International Speech Communication Association, ISCA, Florence, Italy, pp. 2273–2276, 2011.
Google Scholar
A. Fleury, N. Noury, M. Vacher, H. Glasson, J. F. Seri. Sound and speech detection and classification in a health smart home. In Proceedings of the 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Vancouver, Canada, pp. 4644–4647, 2008. DOI: https://doi.org/10.1109/IEMBS.2008.4650248.
F. Eyben, M. Wöllmer, T. Poitschke, B. Schuller, C. Blaschke, B. Färber, N. Nguyen-Thien. Emotion on the road—necessity, acceptance, and feasibility of affective computing in the car. Advances in Human-computer Interaction, vol. 2010, Article number 263593, 2010. DOI: https://doi.org/10.1155/2010/263593.
Google Scholar
A. Tawari, M. Trivedi. Speech based emotion classification framework for driver assistance system. In Proceedings of IEEE Intelligent Vehicles Symposium, San Diego, USA, pp. 174–178, 2010. DOI: https://doi.org/10.1109/IVS.2010.5547956.
C. M. Jones, I. M. Jonsson. Automatic recognition of affective cues in the speech of car drivers to allow appropriate responses. In Proceedings of the 17th Australia Conference on Computer-Human Interaction: Citizens Online: Considerations for Today and the Future, ACM, Canberra, Australia, pp. 1–10, 2005.
Google Scholar
F. Eyben, F. Weninger, F. Gross, B. Schuller. Recent developments in openSMILE, the munich open-source multimedia feature extractor. In Proceedings of the 21st ACM International Conference on Multimedia, ACM, Barcelona, Spain, pp. 835–838, 2013. DOI: https://doi.org/10.1145/2502081.2502224.
Google Scholar
T. Vogt, E. André, N. Bee. EmoVoice—a framework for online recognition of emotions from voice. In Proceedings of the 4th IEEE Tutorial and Research Workshop on Perception and Interactive Technologies for Speech-based Systems, Springer, Kloster Irsee, Germany, pp. 188–199, 2008. DOI: https://doi.org/10.1007/978-3-540-69369-7.21.
Google Scholar
S. E. Eskimez, M. Sturge-Apple, Z. Y. Duan, W. Heinzelman. WISE: Web-based interactive speech emotion classification. In Proceedings of the 4th Workshop on Sentiment Analysis Where AI Meets Psychology, IJCAI, New York, USA, pp. 2–7, 2016.
Google Scholar
S. Hantke, C. Stemp, B. Schuller. Annotator trustability-based cooperative learning solutions for intelligent audio analysis. In Proceedings of the 19th Annual Conference of the International Speech Communication Association, ISCA, Hyderabad, India, 2018. DOI: https://doi.org/10.21437/Interspeech.2018-1019.
Google Scholar
Django. Computer Software, [Online], Available: https://doi.org/djangoproject.com, 2019.
B. Schuller, S. Steidl, A. Batliner. The interspeech 2009 emotion challenge. In Proceedings of the 10th Annual Conference of the International Speech Communication Association, ISCA, Brighton, UK, pp. 312–315, 2009.
Google Scholar
B. Schuller, S. Steidl, A. Batliner, J. Hirschberg, J. K. Burgoon, A. Baird, A. Elkins, Y. Zhang, E. Coutinho, K. Evanini. The INTERSPEECH 2016 computational paralinguistics challenge: Deception, sincerity & native language. In Proceedings of the 17th Annual Conference of the International Speech Communication Association, ISCA, San Francisco, USA, pp. 2001–2005, 2016. DOI: https://doi.org/10.21437/Interspeech.2016-129.
Google Scholar
R. Bartle. Hearts, clubs, diamonds, spades: Players who suit MUDs. Journal of MUD Research, vol. 1, no. 1, pp. 19, 1996.
Google Scholar
M. Meder, B. J. Jain. The Gamification Design Problem, [Online], Available: https://doi.org/abs/1407.0843, 2014.
Y. W. Xu. Literature Review on Web Application Gamification and Analytics, CSDL Technical Report 11-05, University of Hawaii, Hawaii, USA, 2011.
Google Scholar
K. Werbach, D. Hunter. For the Win: How Game Thinking can Revolutionize Your Business, Philadelphia, USA: Wharton Digital Press, 2012.
Google Scholar
F. Eyben, K. R. Scherer, B. W. Schuller, J. Sundberg, E. André, C. Busso, L. Y. Devillers, J. Epps, P. Laukka, S. S. Narayanan, K. P. Truong. The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Transactions on Affective Computing, vol. 7, no. 2, pp. 190–202, 2016. DOI: https://doi.org/10.1109/TAFFC.2015.2457417.
Google Scholar
J. Brooke. SUS-A quick and dirty usability scale. Usability Evaluation in Industry, P. W. Jordan, B. Thomas, B. Weerdmeester, I. L. McClelland, Eds., London, UK: CRC Press, pp. 4–7, 1996.
Google Scholar
A. Bangor, P. Kortum, J. Miller. Determining what individual SUS scores mean: Adding an adjective rating scale. Journal of Usability Studies, vol. 4, no. 3, pp. 114–123, 2009.
Google Scholar

Download references

Acknowledgements

This work was supported by the European Community’s Seventh Framework Programme (No. 338164) (ERC Starting Grant iHEARu). We thank audEERING for providing sensAI and all iHEARu-PLAY players for taking part in our evaluation.

Author information

Authors and Affiliations

ZD.B Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Augsburg, Germany
Simone Hantke, Tobias Olenyi & Björn Schuller
Machine Intelligence & Signal Processing Group, Technische Universität München, München, Germany
Simone Hantke
audEERING GmbH, Gilching, Germany
Christoph Hausner & Tobias Appel
Group on Language, Audio & Music (GLAM), Department of Computing, Imperial College, London, SW7 2AZ, UK
Björn Schuller

Authors

Simone Hantke
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Olenyi
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Hausner
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Appel
View author publications
You can also search for this author in PubMed Google Scholar
Björn Schuller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Simone Hantke.

Additional information

Recommended by Associate Editor Jian-Hua Tao

Simone Hantke received her Diploma in media technology from the Technische Hochschule Deggendorf, Germany in 2011, and the M.Sc. degree from the Technische Universität München (TUM), Germany in 2014, one of Germany’s Excellence Universities. She currently is a PhD degree candidate at TUM, Germany, and working at the ZD.B Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Germany. She is working on her doctoral thesis in the field of affective computing and speech processing, focusing her research on data collection and new machine learning approaches for robust automatic speech recognition and speaker characterisation. Her main area of involvement has been with the EU FP7 ERC project iHEARu. In the scope of this project she leads the development of crowdsourcing data collection and annotation for speech processing and is the lead author of iHEARu-PLAY.

Tobias Olenyi received the B.Sc. degree in computer science from the University of Passau, Germany in 2017. Currently, he is a master student in informatics at the Technische Universität München, Germany where he focuses on artificial intelligence and machine learning. In his Bachelor’s thesis “Classifying Voice Likability with Instruments of Machine Learning” supervised by Simone Hantke, he explored different approaches to vocal emotion analysis based on feature mapping and he developed the initial version of VoiLA. In addition, he integrated the emotion analysis capabilities of audEERING’s sensAI into the newly-developed tool.

Christoph Hausner received the M. Sc. degree in computer science from University of Passau, Germany in 2017. His main interests lie in software engineering and real-world applications of machine learning and signal processing methods. He previously worked as a student assistant at the Chair for Complex and Intelligent Systems, and contributed to the development of the iHEARu-PLAY platform. As part of his master thesis, Christoph Hausner developed a custom-tailored noise classification system for one of the world’s largest manufacturers in the automotive industry. He is currently working as a software engineer at audEERING GmbH, leading the development of the feature extraction toolkit openSMILE and the sensAI web service.

Tobias Appel received the M.Sc. degree from the Technische Universität München, Germany in 2015. His master’s thesis “Crowdsourcing- and Games-Concepts for Data Annotation” was supervised by Simone Hantke. In the scope of his thesis, he developed the fundamental concept and the basic technological framework for iHEARu-PLAY together with Simone. He is now part-time employed by audEERING GmbH and continues to contribute to iHEARu-PLAY while also working on his doctoral thesis as member of the Munich Network Management Team at the Ludwig Maximilian University of Munich, Germany.

Björn Schuller received the Ph. D. degree on automatic speech and emotion recognition in 2006, and his habilitation in the subject area of signal processing and machine intelligence in 2012, all in electrical engineering and information technology from TUM, Germany. He is a professor of artificial intelligence in the Department of Computing at the Imperial College London, UK, full professor and head of the ZD.B Chair of Embedded Intelligence for Health Care and Wellbeing at the University of Augsburg, Germany, and CEO of audEERING-an audio intelligence company. He (co-)authored 6 books and more than 800 publications in peer reviewed books, journals, and conference proceedings leading to more than overall 22000 citations (H-index = 69). Professor Schuller is co-Program Chair of Interspeech 2019, and repeated Area Chair of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) next to a multitude of further Associate and Guest Editor roles and functions in Technical and Organisational Committees.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hantke, S., Olenyi, T., Hausner, C. et al. Large-scale Data Collection and Analysis via a Gamified Intelligent Crowdsourcing Platform. Int. J. Autom. Comput. 16, 427–436 (2019). https://doi.org/10.1007/s11633-019-1180-0

Download citation

Received: 08 September 2018
Accepted: 26 March 2019
Published: 06 June 2019
Issue Date: August 2019
DOI: https://doi.org/10.1007/s11633-019-1180-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Large-scale Data Collection and Analysis via a Gamified Intelligent Crowdsourcing Platform

Abstract

Access this article

Similar content being viewed by others

Development and Evaluation of the Emotional Slovenian Speech Database - EmoLUKS

Towards speech quality assessment using a crowdsourcing approach: evaluation of standardized methods

Voice analytics in the wild: Validity and predictive accuracy of common audio-recording devices

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Large-scale Data Collection and Analysis via a Gamified Intelligent Crowdsourcing Platform

Abstract

Access this article

Similar content being viewed by others

Development and Evaluation of the Emotional Slovenian Speech Database - EmoLUKS

Towards speech quality assessment using a crowdsourcing approach: evaluation of standardized methods

Voice analytics in the wild: Validity and predictive accuracy of common audio-recording devices

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation