Skip to main content

Multipurpose Web-Platform for Labeling Audio Segments Efficiently and Effectively

Part of the Lecture Notes in Computer Science book series (LNAI,volume 11177)


One of the principal reasons for the success of machine learning discoveries can be attributed to the utilization of large sums of labeled datasets used to train various learning models. The availabilities of annotated data depend, to a large extent, on the nature of the domain, and how easy it is to obtain labeled data-points. One of the areas that we believe still lacks substantial labeled data is audio. This is not surprising, since labeling audio segments can be rather tedious and time-consuming, mainly due to the temporal nature of it. In this paper, we present a free and open-source web-based platform that we developed, which allows individuals and research teams to crowdsource large sums of labeled audio segments efficiently and effectively. Once an individual or a team signs up to use the platform as researchers, they will be granted administrative access that will enable them to upload their own audio files, and customize the labeling and data collection process according to their study needs. Examples of customizing the study include listing the different labels of interest, specifying the duration of audio segments and how they should be extracted from the audio file(s), and dictating how labelers should be prompted with the audio segments based on a set of pre-determined user-defined rules. Our system will automatically handle generating the audio segments from the audio files, presenting labelers with an intuitive interface using the rules specified by the study administrators, and finally recording the labelers’ responses and providing them to the administrators of the study in a readable and easy-to-access format.


  • Crowdsourcing
  • Labeling
  • Human computation
  • Web application
  • Speech analysis

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions


  1. El Ayadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recognit. 44(3), 572–587 (2011)

    CrossRef  Google Scholar 

  2. Ferguson, J., Durrett, G., Klein, D.: Disfluency detection with a semi-Markov model and prosodic features. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 257–262 (2015)

    Google Scholar 

  3. Huang, Z., Chen, L., Harper, M.: An open source prosodic feature extraction tool. In: Proceedings of the Language Resources and Evaluation Conference (LREC) (2006)

    Google Scholar 

  4. Kim, S., Georgiou, P.G., Lee, S., Narayanan, S.: Real-time emotion detection system using speech: multi-modal fusion of different timescale features. In: IEEE 9th Workshop on Multimedia Signal Processing, MMSP 2007, pp. 48–51. IEEE (2007)

    Google Scholar 

  5. Scherer, S., Siegert, I., Bigalke, L., Meudt, S.: Developing an expressive speech labeling tool incorporating the temporal characteristics of emotion. In: LREC (2010)

    Google Scholar 

  6. Snover, M., Dorr, B., Schwartz, R.: A lexically-driven algorithm for disfluency detection. In: Proceedings of HLT-NAACL 2004: Short Papers, HLT-NAACL-Short 2004, pp. 157–160. Association for Computational Linguistics, Stroudsburg (2004).

  7. Taboada, M., Brooke, J., Tofiloski, M., Voll, K., Stede, M.: Lexicon-based methods for sentiment analysis. Comput. Linguist. 37(2), 267–307 (2011)

    CrossRef  Google Scholar 

  8. Tumanova, V., Zebrowski, P.M., Throneburg, R.N., Kayikci, M.E.K.: Articulation rate and its relationship to disfluency type, duration, and temperament in preschool children who stutter. J. Commun. Disord. 44(1), 116–129 (2011)

    CrossRef  Google Scholar 

Download references


This work was partially supported by the Research Center of the Polish-Japanese Academy of Information Technology, supported by the Ministry of Science and Higher Education in Poland.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Ayman Hajja .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hajja, A., Hiers, G.P., Arbajian, P., Raś, Z.W., Wieczorkowska, A.A. (2018). Multipurpose Web-Platform for Labeling Audio Segments Efficiently and Effectively. In: Ceci, M., Japkowicz, N., Liu, J., Papadopoulos, G., Raś, Z. (eds) Foundations of Intelligent Systems. ISMIS 2018. Lecture Notes in Computer Science(), vol 11177. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-01850-4

  • Online ISBN: 978-3-030-01851-1

  • eBook Packages: Computer ScienceComputer Science (R0)