Skip to main content

Capacity Estimation from Environmental Audio Signals Using Deep Learning

  • 329 Accesses

Part of the Lecture Notes in Computer Science book series (LNCS,volume 13258)


Estimating the capacity of a room or venue is essential to avoid overcrowding that could compromise people’s safety. Having enough free space to guarantee a minimal safety distance between people is also essential for health reasons, as in the current COVID-19 pandemic. Already existing systems for automatic crowd counting are mostly based on image or video data, and some of them, using deep learning architectures. In this paper, we study the viability of already existing Deep Learning Crowd Counting systems and propose new alternatives based on new network architectures containing convolutional layers, exclusively based on the use of environmental audio signals. The proposed architecture is able to infer the actual capacity with a higher accuracy in comparison to previous proposals. Consequently, conclusions from the accuracy obtained with out approach are drawn and the possible scope of deep learning based crowd counting systems is discussed.


  • Automated Crowd Counting
  • Capacity control
  • Convolutional Neural Networks
  • Regression

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-031-06242-1_12
  • Chapter length: 11 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   84.99
Price excludes VAT (USA)
  • ISBN: 978-3-031-06242-1
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   109.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.


  1. Wang, Q., et al.: Audiovisual crowd counting dataset (2020).

  2. Wang, Q., et al.: Ambient sound helps: audiovisual crowd counting in extreme conditions (2020).

  3. Hershey, S., et al.: CNN architectures for large-scale audio classification (2017).

  4. Thomas, C.: U-Nets with ResNet Encoders and cross connections. Journal (2019).

  5. Li, Y., Zhang, X., Chen, D.: CSRNet: dilated convolutional neural networks for understanding the highly congested scenes (2018).

  6. Liu, W., Salzmann, M., Fua, P.: Context-aware crowd counting (2019).

  7. Gorriz, J.M., et al.: Artificial intelligence within the interplay between natural and artificial computation: advances in data science, trends and applications. Neurocomputing 410, 237–270 (2020).

  8. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks (2012).

  9. Wen, H., et al.: Hanning self-convolution window and its application to harmonic analysis (2009).

  10. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection (2005).

  11. Viola, P., Jones, M.J.: Robust real-time face detection (2004).

  12. Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network (2016).

  13. Zhang, Q., Chan, A.B.: Wide-area crowd counting via ground-plane density maps and multi-view fusion CNNs (2019).

  14. Zhang, B., Leitner, J., Thornton, S.: Audio recognition using MEL spectrograms and convolution neural networks.

Download references


This work was supported by projects PGC2018-098813-B-C32 (Spanish “Ministerio de Ciencia, Innovación y Universidades”), UMA20-FEDERJA-086 (Consejería de econnomía y conocimiento, Junta de Andalucía) and by European Regional Development Funds (ERDF), as well as the BioSiP (TIC-251) research group. Work by F.J.M.M. was supported by the MICINN “Juan de la Cierva - Incorporación” IJC2019-038835-I Fellowship.

Author information

Authors and Affiliations


Corresponding author

Correspondence to C. Reyes-Daneri .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Reyes-Daneri, C., Martínez-Murcia, F.J., Ortiz, A. (2022). Capacity Estimation from Environmental Audio Signals Using Deep Learning. In: Ferrández Vicente, J.M., Álvarez-Sánchez, J.R., de la Paz López, F., Adeli, H. (eds) Artificial Intelligence in Neuroscience: Affective Analysis and Health Applications. IWINAC 2022. Lecture Notes in Computer Science, vol 13258. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-06241-4

  • Online ISBN: 978-3-031-06242-1

  • eBook Packages: Computer ScienceComputer Science (R0)