Skip to main content

Datasets and Evaluation

Abstract

Developing computational systems requires methods for evaluating their performance to guide development and compare alternate approaches. A reliable evaluation procedure for a classification or recognition system will involve a standard dataset of example input data along with the intended target output, and well-defined metrics to compare the systems’ outputs with this ground truth. This chapter examines the important factors in the design and construction of evaluation datasets and goes through the metrics commonly used in system evaluation, comparing their properties. We include a survey of currently available datasets for environmental sound scene and event recognition and conclude with advice for designing evaluation protocols.

Keywords

  • Audio datasets
  • Reference annotation
  • Sound scene labels
  • Sound event labels
  • Manual audio annotation
  • Annotation process design
  • Evaluation setup
  • Evaluation metrics
  • Evaluation protocol
  • Event-based metrics
  • Segment-based metrics

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-63450-0_6
  • Chapter length: 33 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   149.00
Price excludes VAT (USA)
  • ISBN: 978-3-319-63450-0
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   199.99
Price excludes VAT (USA)
Hardcover Book
USD   199.99
Price excludes VAT (USA)
Fig. 6.1
Fig. 6.2
Fig. 6.3
Fig. 6.4
Fig. 6.5
Fig. 6.6
Fig. 6.7
Fig. 6.8

Notes

  1. 1.

    Amazon Mechanical Turk, https://www.mturk.com.

  2. 2.

    www.freesound.org.

References

  1. Abu-El-Haija, S., Kothari, N., Lee, J., Natsev, P., Toderici, G., Varadarajan, B., Vijayanarasimhan, S.: Youtube-8m: a large-scale video classification benchmark (2016). arXiv preprint arXiv:1609.08675. http://research.google.com/youtube8m/

  2. Allen, J.B., Berkley, D.A.: Image method for efficiently simulating small-room acoustics. J. Acoust. Soc. Am. 65(4), 943–950 (1979). doi:10.1121/1.382599

    CrossRef  Google Scholar 

  3. Anguera, X., Bozonnet, S., Evans, N., Fredouille, C., Friedland, G., Vinyals, O.: Speaker diarization: a review of recent research. IEEE Trans. Audio Speech Lang. Process. 20(2), 356–370 (2012)

    CrossRef  Google Scholar 

  4. Aytar, Y., Vondrick, C., Torralba, A.: Soundnet: learning sound representations from unlabeled video. In: Advances in Neural Information Processing Systems, pp. 892–900 (2016). http://projects.csail.mit.edu/soundnet/

  5. Ballas, J.A.: Common factors in the identification of an assortment of brief everyday sounds. J. Exp. Psychol. Hum. Percept. Perform. 19(2), 250 (1993)

    CrossRef  Google Scholar 

  6. Beltran, J., Chavez, E., Favela, J.: Scalable identification of mixed environmental sounds, recorded from heterogeneous sources. Pattern Recogn. Lett. 68, 153–160 (2015)

    CrossRef  Google Scholar 

  7. Bertin-Mahieux, T., Ellis, D.P.W., Whitman, B., Lamere, P.: The million song dataset. In: 12th International Society for Music Information Retrieval Conference (ISMIR) (2011)

    Google Scholar 

  8. Bisot, V., Essid, S., Richard, G.: Hog and subband power distribution image features for acoustic scene classification. In: 2015 European Signal Processing Conference (EUSIPCO), Nice, pp. 724–728 (2015) ’

    Google Scholar 

  9. Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20(1), 37–46 (1960)

    CrossRef  Google Scholar 

  10. Cronbach, L.J.: Coefficient alpha and the internal structure of tests. Psychometrika 16(3), 297–334 (1951)

    CrossRef  MATH  Google Scholar 

  11. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, New York (2000)

    MATH  Google Scholar 

  12. Eronen, A.J., Peltonen, V.T., Tuomi, J.T., Klapuri, A.P., Fagerlund, S., Sorsa, T., Lorho, G., Huopaniemi, J.: Audio-based context recognition. IEEE Trans. Audio Speech Lang. Process. 14(1), 321–329 (2006)

    CrossRef  Google Scholar 

  13. Finney, N., Janer, J.: Soundscape generation for virtual environments using community-provided audio databases. In: W3C Workshop: Augmented Reality on the Web. W3C, Barcelona (2010)

    Google Scholar 

  14. Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychol. Bull. 76(5), 378–382 (1971)

    CrossRef  Google Scholar 

  15. Forman, G., Scholz, M.: Apples-to-apples in cross-validation studies: pitfalls in classifier performance measurement. SIGKDD Explor. Newsl. 12(1), 49–57 (2010)

    CrossRef  Google Scholar 

  16. Foster, P., Sigtia, S., Krstulovic, S., Barker, J., Plu: Chime-home: a dataset for sound source recognition in a domestic environment. In: Worshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (2015)

    Google Scholar 

  17. Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S., Dahlgren, N.L.: DARPA TIMIT Acoustic Phonetic Continuous Speech Corpus CDROM. National Bureau of Standards (1993)

    Google Scholar 

  18. Gaver, W.W.: How do we hear in the world? Explorations in ecological acoustics. Ecol. Psychol. 5(4), 285–313 (1993)

    CrossRef  Google Scholar 

  19. Gemmeke, J.F., Ellis, D.P.W., Freedman, D., Jansen, A., Lawrence, W., Moore, R.C., Plakal, M., Ritter, M.: Audio set: an ontology and human-labeled dataset for audio events. In: Proceedings of IEEE ICASSP (2017)

    Google Scholar 

  20. Goto, M., Hashiguchi, H., Nishimura, T., Oka, R.: RWC music database: popular, classical, and jazz music databases. In: Proceedings of 3rd International Conference on Music Information Retrieval, pp. 287–288 (2002)

    Google Scholar 

  21. Grootel, M., Andringa, T., Krijnders, J.: DARES-G1: database of annotated real-world everyday sounds. In: Proceedings of the NAG/DAGA Meeting (2009)

    Google Scholar 

  22. Gygi, B., Shafiro, V.: Environmental sound research as it stands today. Proc. Meetings Acoust. 1(1) (2007)

    Google Scholar 

  23. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)

    CrossRef  Google Scholar 

  24. Heittola, T., Mesaros, A., Eronen, A., Virtanen, T.: Context-dependent sound event detection. EURASIP J. Audio Speech Music Process. 2013, 1 (2013)

    Google Scholar 

  25. Hertel, L., Phan, H., Mertins, A.: Comparing time and frequency domain for audio event recognition using deep learning. In: Proceedings of IEEE International Joint Conference on Neural Networks (IJCNN 2016) (2016)

    Google Scholar 

  26. Krijnders, J.D., Andringa, T.C.: Differences between annotating a soundscape live and annotating behind a screen. In: INTER-NOISE and NOISE-CON Congress and Conference Proceedings, pp. 6125–6130. Institute of Noise Control Engineering, Ames, IA (2010)

    Google Scholar 

  27. Kürby, J., Grzeszick, R., Plinge, A., Fink, G.A.: Bag-of-features acoustic event detection for sensor networks. In: Proceedings of the Detection and Classification of Acoustic Scenes and Events 2016 Workshop (DCASE2016), pp. 55–59 (2016)

    Google Scholar 

  28. Lafay, G., Lagrange, M., Rossignol, M., Benetos, E., Roebel, A.: A morphological model for simulating acoustic scenes and its application to sound event detection. IEEE/ACM Trans. Audio Speech Lang. Process. 24(10), 1854–1864 (2016)

    CrossRef  Google Scholar 

  29. Marcell, M.M., Borella, D., Greene, M., Kerr, E., Rogers, S.: Confrontation naming of environmental sounds. J. Clin. Exp. Neuropsychol. 22(6), 830–864 (2000)

    CrossRef  Google Scholar 

  30. Martin, A., Doddington, G., Kamm, T., Ordowski, M., Przybocki, M.: The DET curve in assessment of detection task performance. In: Proceedings of Eurospeech, Rhodes, vol. 4, pp. 1895–1898 (1997)

    Google Scholar 

  31. Mesaros, A., Heittola, T., Palomäki, K.: Analysis of acoustic-semantic relationship for diversely annotated real-world audio data. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 813–817. IEEE Computer Society, Los Alamitos, CA (2013)

    Google Scholar 

  32. Mesaros, A., Heittola, T., Virtanen, T.: Metrics for polyphonic sound event detection. Appl. Sci. 6(6), 162 (2016)

    CrossRef  Google Scholar 

  33. Mesaros, A., Heittola, T., Virtanen, T.: TUT database for acoustic scene classification and sound event detection. In: 24th European Signal Processing Conference 2016 (EUSIPCO 2016) (2016)

    Google Scholar 

  34. Pallett, D.S.: A look at NIST’s benchmark ASR tests: past, present, and future. In: 2003 IEEE Workshop on Automatic Speech Recognition and Understanding, 2003. ASRU’03, pp. 483–488. IEEE, New York (2003)

    Google Scholar 

  35. Parascandolo, G., Huttunen, H., Virtanen, T.: Recurrent neural networks for polyphonic sound event detection in real life recordings. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6440–6444 (2016)

    Google Scholar 

  36. Paul, D.B., Baker, J.M.: The design for the wall street journal-based CSR corpus. In: Proceedings of the Workshop on Speech and Natural Language, HLT ‘91, pp. 357–362. Association for Computational Linguistics, Stroudsburg, PA (1992)

    Google Scholar 

  37. Piczak, K.J.: Environmental sound classification with convolutional neural networks. In: IEEE International Workshop on Machine Learning for Signal Processing (2015)

    CrossRef  Google Scholar 

  38. Piczak, K.J.: ESC: dataset for environmental sound classification. In: Proceedings of the ACM International Conference on Multimedia (ACM), pp. 1015–1018 (2015)

    Google Scholar 

  39. Poliner, G.E., Ellis, D.P.: A discriminative model for polyphonic piano transcription. EURASIP J. Adv. Signal Process. 2007(1), 154 (2007)

    CrossRef  MATH  Google Scholar 

  40. Rakotomamonjy, A., Gasso, G.: Histogram of gradients of time-frequency representations for audio scene classification. IEEE/ACM Trans. Audio Speech Lang. Process. 23(1), 142–153 (2015)

    Google Scholar 

  41. Räsänen, O., Leppänen, J., Laine, U.K., Saarinen, J.P.: Comparison of classifiers in audio and acceleration based context classification in mobile phones. In: 19th European Signal Processing Conference (EUSIPCO), pp. 946–950. IEEE, New York (2011)

    Google Scholar 

  42. Rijsbergen, C.J.V.: Information Retrieval, 2nd edn. Butterworth-Heinemann, Newton, MA (1979)

    MATH  Google Scholar 

  43. Salamon, J., Bello, J.P.: Feature learning with deep scattering for urban sound analysis. In: European Signal Processing Conference (EUSIPCO), Nice, pp. 729–733 (2015)

    Google Scholar 

  44. Salomon, J., Bello, J.P.: Unsupervised feature learning for urban sound classification. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, pp. 171–175 (2015)

    Google Scholar 

  45. Salamon, J., Bello, J.P.: Deep convolutional neural networks and data augmentation for environmental sound classifications. IEEE Signal Process. Lett. 24(3), 279–283 (2017)

    CrossRef  Google Scholar 

  46. Salamon, J., Jacoby, C., Bello, J.P.: A dataset and taxonomy for urban sound research. In: Proceedings of the ACM International Conference on Multimedia, MM ’14, pp. 1041–1044. ACM, New York, NY (2014)

    Google Scholar 

  47. Sammut, C., Webb, G.I.: Encyclopedia of Machine Learning, 1st edn. Springer, Berlin (2011)

    MATH  Google Scholar 

  48. Scott, W.A.: Reliability of content analysis: the case of nominal scale coding. Public Opin. Q. 19(3), 321–325 (1955)

    CrossRef  Google Scholar 

  49. Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)

    CrossRef  Google Scholar 

  50. Sechidis, K., Tsoumakas, G., Vlahavas, I.: On the stratification of multi-label data. In: Machine Learning and Knowledge Discovery in Databases. Lecture Notes in Computer Science, vol. 6913, pp. 145–158. Springer, Berlin, Heidelberg (2011)

    Google Scholar 

  51. Shafiro, V., Gygi, B.: How to select stimuli for environmental sound research and where to find them. Behav. Res. Methods Instrum. Comput. 36(4), 590–598 (2004)

    CrossRef  Google Scholar 

  52. Stowell, D., Plumbley, M.: An open dataset for research on audio field recording archives: freefield1010. In: Audio Engineering Society Conference: 53rd International Conference: Semantic Audio (2014)

    Google Scholar 

  53. Stowell, D., Giannoulis, D., Benetos, E., Lagrange, M., Plumbley, M.: Detection and classification of acoustic scenes and events. IEEE Trans. Multimedia 17(10), 1733–1746 (2015)

    CrossRef  Google Scholar 

  54. Takahashi, N., Gygli, M., Pfister, B., Van Gool, L.: Deep convolutional neural networks and data augmentation for acoustic event detection. In: INTERSPEECH 2016 (2016)

    Google Scholar 

  55. Weiss, C., Arifi-Müller, V., Prätzlich, T., Kleinertz, R., Müller, M.: Analyzing measure annotations for western classical music recordings. In: Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR) (2016)

    Google Scholar 

  56. Yeh, A.: More accurate tests for the statistical significance of result differences. In: Proceedings of the 18th Conference on Computational Linguistics, vol. 2, pp. 947–953. Association for Computational Linguistics, Stroudsburg (2000)

    Google Scholar 

  57. Zieger, C., Omologo, M.: Acoustic event detection - ITC-irst AED database. Technical Report, ITC-irst (2005)

    Google Scholar 

  58. Zölzer, U. (ed.): Digital Audio Signal Processing, 2nd edn. Wiley, New York (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Annamaria Mesaros .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2018 Springer International Publishing AG

About this chapter

Cite this chapter

Mesaros, A., Heittola, T., Ellis, D. (2018). Datasets and Evaluation. In: Virtanen, T., Plumbley, M., Ellis, D. (eds) Computational Analysis of Sound Scenes and Events. Springer, Cham. https://doi.org/10.1007/978-3-319-63450-0_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63450-0_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63449-4

  • Online ISBN: 978-3-319-63450-0

  • eBook Packages: EngineeringEngineering (R0)