Skip to main content

Capacity Estimation from Environmental Audio Signals Using Deep Learning

  • Conference paper
  • First Online:
Artificial Intelligence in Neuroscience: Affective Analysis and Health Applications (IWINAC 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13258))

  • 1295 Accesses

Abstract

Estimating the capacity of a room or venue is essential to avoid overcrowding that could compromise people’s safety. Having enough free space to guarantee a minimal safety distance between people is also essential for health reasons, as in the current COVID-19 pandemic. Already existing systems for automatic crowd counting are mostly based on image or video data, and some of them, using deep learning architectures. In this paper, we study the viability of already existing Deep Learning Crowd Counting systems and propose new alternatives based on new network architectures containing convolutional layers, exclusively based on the use of environmental audio signals. The proposed architecture is able to infer the actual capacity with a higher accuracy in comparison to previous proposals. Consequently, conclusions from the accuracy obtained with out approach are drawn and the possible scope of deep learning based crowd counting systems is discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Wang, Q., et al.: Audiovisual crowd counting dataset (2020). https://doi.org/10.5281/zenodo.3828468

  2. Wang, Q., et al.: Ambient sound helps: audiovisual crowd counting in extreme conditions (2020). https://arxiv.org/pdf/2005.07097.pdf

  3. Hershey, S., et al.: CNN architectures for large-scale audio classification (2017). https://arxiv.org/pdf/1609.09430.pdf

  4. Thomas, C.: U-Nets with ResNet Encoders and cross connections. Journal (2019). https://towardsdatascience.com/u-nets-with-resnet-encoders-and-cross-connections-d8ba94125a2c

  5. Li, Y., Zhang, X., Chen, D.: CSRNet: dilated convolutional neural networks for understanding the highly congested scenes (2018). https://arxiv.org/pdf/1802.10062.pdf

  6. Liu, W., Salzmann, M., Fua, P.: Context-aware crowd counting (2019). https://arxiv.org/pdf/1811.10452.pdf

  7. Gorriz, J.M., et al.: Artificial intelligence within the interplay between natural and artificial computation: advances in data science, trends and applications. Neurocomputing 410, 237–270 (2020). https://doi.org/10.1016/j.neucom.2020.05.078

  8. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks (2012). https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf

  9. Wen, H., et al.: Hanning self-convolution window and its application to harmonic analysis (2009). https://doi.org/10.1007/s11431-008-0356-6

  10. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection (2005). https://hal.inria.fr/inria-00548512/document

  11. Viola, P., Jones, M.J.: Robust real-time face detection (2004). https://www.face-rec.org/algorithms/boosting-ensemble/16981346.pdf

  12. Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network (2016). http://people.eecs.berkeley.edu/~yima/psfile/Single-Image-Crowd-Counting.pdf

  13. Zhang, Q., Chan, A.B.: Wide-area crowd counting via ground-plane density maps and multi-view fusion CNNs (2019). http://visal.cs.cityu.edu.hk/static/pubs/conf/cvpr19-wacc.pdf

  14. Zhang, B., Leitner, J., Thornton, S.: Audio recognition using MEL spectrograms and convolution neural networks. http://noiselab.ucsd.edu/ECE228_2019/Reports/Report38.pdf

Download references

Acknowledgements

This work was supported by projects PGC2018-098813-B-C32 (Spanish “Ministerio de Ciencia, Innovación y Universidades”), UMA20-FEDERJA-086 (Consejería de econnomía y conocimiento, Junta de Andalucía) and by European Regional Development Funds (ERDF), as well as the BioSiP (TIC-251) research group. Work by F.J.M.M. was supported by the MICINN “Juan de la Cierva - Incorporación” IJC2019-038835-I Fellowship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to C. Reyes-Daneri .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Reyes-Daneri, C., Martínez-Murcia, F.J., Ortiz, A. (2022). Capacity Estimation from Environmental Audio Signals Using Deep Learning. In: Ferrández Vicente, J.M., Álvarez-Sánchez, J.R., de la Paz López, F., Adeli, H. (eds) Artificial Intelligence in Neuroscience: Affective Analysis and Health Applications. IWINAC 2022. Lecture Notes in Computer Science, vol 13258. Springer, Cham. https://doi.org/10.1007/978-3-031-06242-1_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-06242-1_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-06241-4

  • Online ISBN: 978-3-031-06242-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics