Multipurpose Web-Platform for Labeling Audio Segments Efficiently and Effectively

Hajja, Ayman; Hiers, Griffin P.; Arbajian, Pierre; Raś, Zbigniew W.; Wieczorkowska, Alicja A.

doi:10.1007/978-3-030-01851-1_18

Ayman Hajja¹⁸,
Griffin P. Hiers¹⁸,
Pierre Arbajian¹⁹,
Zbigniew W. Raś^18,20 &
…
Alicja A. Wieczorkowska²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11177))

Included in the following conference series:

International Symposium on Methodologies for Intelligent Systems

835 Accesses
2 Citations

Abstract

One of the principal reasons for the success of machine learning discoveries can be attributed to the utilization of large sums of labeled datasets used to train various learning models. The availabilities of annotated data depend, to a large extent, on the nature of the domain, and how easy it is to obtain labeled data-points. One of the areas that we believe still lacks substantial labeled data is audio. This is not surprising, since labeling audio segments can be rather tedious and time-consuming, mainly due to the temporal nature of it. In this paper, we present a free and open-source web-based platform that we developed, which allows individuals and research teams to crowdsource large sums of labeled audio segments efficiently and effectively. Once an individual or a team signs up to use the platform as researchers, they will be granted administrative access that will enable them to upload their own audio files, and customize the labeling and data collection process according to their study needs. Examples of customizing the study include listing the different labels of interest, specifying the duration of audio segments and how they should be extracted from the audio file(s), and dictating how labelers should be prompted with the audio segments based on a set of pre-determined user-defined rules. Our system will automatically handle generating the audio segments from the audio files, presenting labelers with an intuitive interface using the rules specified by the study administrators, and finally recording the labelers’ responses and providing them to the administrators of the study in a readable and easy-to-access format.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

El Ayadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recognit. 44(3), 572–587 (2011)
Article Google Scholar
Ferguson, J., Durrett, G., Klein, D.: Disfluency detection with a semi-Markov model and prosodic features. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 257–262 (2015)
Google Scholar
Huang, Z., Chen, L., Harper, M.: An open source prosodic feature extraction tool. In: Proceedings of the Language Resources and Evaluation Conference (LREC) (2006)
Google Scholar
Kim, S., Georgiou, P.G., Lee, S., Narayanan, S.: Real-time emotion detection system using speech: multi-modal fusion of different timescale features. In: IEEE 9th Workshop on Multimedia Signal Processing, MMSP 2007, pp. 48–51. IEEE (2007)
Google Scholar
Scherer, S., Siegert, I., Bigalke, L., Meudt, S.: Developing an expressive speech labeling tool incorporating the temporal characteristics of emotion. In: LREC (2010)
Google Scholar
Snover, M., Dorr, B., Schwartz, R.: A lexically-driven algorithm for disfluency detection. In: Proceedings of HLT-NAACL 2004: Short Papers, HLT-NAACL-Short 2004, pp. 157–160. Association for Computational Linguistics, Stroudsburg (2004). http://dl.acm.org/citation.cfm?id=1613984.1614024
Taboada, M., Brooke, J., Tofiloski, M., Voll, K., Stede, M.: Lexicon-based methods for sentiment analysis. Comput. Linguist. 37(2), 267–307 (2011)
Article Google Scholar
Tumanova, V., Zebrowski, P.M., Throneburg, R.N., Kayikci, M.E.K.: Articulation rate and its relationship to disfluency type, duration, and temperament in preschool children who stutter. J. Commun. Disord. 44(1), 116–129 (2011)
Article Google Scholar

Download references

Acknowledgments

This work was partially supported by the Research Center of the Polish-Japanese Academy of Information Technology, supported by the Ministry of Science and Higher Education in Poland.

Author information

Authors and Affiliations

Department of Computer Science, College of Charleston, 66 George Street, Charleston, SC, 29424, USA
Ayman Hajja, Griffin P. Hiers & Zbigniew W. Raś
Department of Computer Science, University of North Carolina, 9201 University City Blvd., Charlotte, NC, 28223, USA
Pierre Arbajian
Polish-Japanese Academy of Information Technology, Koszykowa 86, 02-008, Warsaw, Poland
Zbigniew W. Raś & Alicja A. Wieczorkowska

Authors

Ayman Hajja
View author publications
You can also search for this author in PubMed Google Scholar
Griffin P. Hiers
View author publications
You can also search for this author in PubMed Google Scholar
Pierre Arbajian
View author publications
You can also search for this author in PubMed Google Scholar
Zbigniew W. Raś
View author publications
You can also search for this author in PubMed Google Scholar
Alicja A. Wieczorkowska
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ayman Hajja .

Editor information

Editors and Affiliations

Università degli Studi di Bari Aldo Moro, Bari, Italy
Michelangelo Ceci
American University, Washington, DC, USA
Nathalie Japkowicz
Hong Kong Baptist University, Kowloon, Hong Kong
Jiming Liu
University of Cyprus, Nicosia, Cyprus
George A. Papadopoulos
University of North Carolina, Charlotte, NC, USA
Zbigniew W. Raś

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hajja, A., Hiers, G.P., Arbajian, P., Raś, Z.W., Wieczorkowska, A.A. (2018). Multipurpose Web-Platform for Labeling Audio Segments Efficiently and Effectively. In: Ceci, M., Japkowicz, N., Liu, J., Papadopoulos, G., Raś, Z. (eds) Foundations of Intelligent Systems. ISMIS 2018. Lecture Notes in Computer Science(), vol 11177. Springer, Cham. https://doi.org/10.1007/978-3-030-01851-1_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-01851-1_18
Published: 07 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01850-4
Online ISBN: 978-3-030-01851-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics