Abstract
One of the principal reasons for the success of machine learning discoveries can be attributed to the utilization of large sums of labeled datasets used to train various learning models. The availabilities of annotated data depend, to a large extent, on the nature of the domain, and how easy it is to obtain labeled data-points. One of the areas that we believe still lacks substantial labeled data is audio. This is not surprising, since labeling audio segments can be rather tedious and time-consuming, mainly due to the temporal nature of it. In this paper, we present a free and open-source web-based platform that we developed, which allows individuals and research teams to crowdsource large sums of labeled audio segments efficiently and effectively. Once an individual or a team signs up to use the platform as researchers, they will be granted administrative access that will enable them to upload their own audio files, and customize the labeling and data collection process according to their study needs. Examples of customizing the study include listing the different labels of interest, specifying the duration of audio segments and how they should be extracted from the audio file(s), and dictating how labelers should be prompted with the audio segments based on a set of pre-determined user-defined rules. Our system will automatically handle generating the audio segments from the audio files, presenting labelers with an intuitive interface using the rules specified by the study administrators, and finally recording the labelers’ responses and providing them to the administrators of the study in a readable and easy-to-access format.
Keywords
- Crowdsourcing
- Labeling
- Human computation
- Web application
- Speech analysis
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
El Ayadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recognit. 44(3), 572–587 (2011)
Ferguson, J., Durrett, G., Klein, D.: Disfluency detection with a semi-Markov model and prosodic features. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 257–262 (2015)
Huang, Z., Chen, L., Harper, M.: An open source prosodic feature extraction tool. In: Proceedings of the Language Resources and Evaluation Conference (LREC) (2006)
Kim, S., Georgiou, P.G., Lee, S., Narayanan, S.: Real-time emotion detection system using speech: multi-modal fusion of different timescale features. In: IEEE 9th Workshop on Multimedia Signal Processing, MMSP 2007, pp. 48–51. IEEE (2007)
Scherer, S., Siegert, I., Bigalke, L., Meudt, S.: Developing an expressive speech labeling tool incorporating the temporal characteristics of emotion. In: LREC (2010)
Snover, M., Dorr, B., Schwartz, R.: A lexically-driven algorithm for disfluency detection. In: Proceedings of HLT-NAACL 2004: Short Papers, HLT-NAACL-Short 2004, pp. 157–160. Association for Computational Linguistics, Stroudsburg (2004). http://dl.acm.org/citation.cfm?id=1613984.1614024
Taboada, M., Brooke, J., Tofiloski, M., Voll, K., Stede, M.: Lexicon-based methods for sentiment analysis. Comput. Linguist. 37(2), 267–307 (2011)
Tumanova, V., Zebrowski, P.M., Throneburg, R.N., Kayikci, M.E.K.: Articulation rate and its relationship to disfluency type, duration, and temperament in preschool children who stutter. J. Commun. Disord. 44(1), 116–129 (2011)
Acknowledgments
This work was partially supported by the Research Center of the Polish-Japanese Academy of Information Technology, supported by the Ministry of Science and Higher Education in Poland.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Hajja, A., Hiers, G.P., Arbajian, P., Raś, Z.W., Wieczorkowska, A.A. (2018). Multipurpose Web-Platform for Labeling Audio Segments Efficiently and Effectively. In: Ceci, M., Japkowicz, N., Liu, J., Papadopoulos, G., Raś, Z. (eds) Foundations of Intelligent Systems. ISMIS 2018. Lecture Notes in Computer Science(), vol 11177. Springer, Cham. https://doi.org/10.1007/978-3-030-01851-1_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-01851-1_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01850-4
Online ISBN: 978-3-030-01851-1
eBook Packages: Computer ScienceComputer Science (R0)