Large Scale Audio-Visual Video Analytics Platform for Forensic Investigations of Terroristic Attacks

  • Alexander SchindlerEmail author
  • Martin Boyer
  • Andrew Lindley
  • David Schreiber
  • Thomas Philipp
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11296)


The forensic investigation of a terrorist attack poses a huge challenge to the investigative authorities, as several thousand hours of video footage need to be spotted. To assist law enforcement agencies (LEA) in identifying suspects and securing evidences, we present a platform which fuses information of surveillance cameras and video uploads from eyewitnesses. The platform integrates analytical modules for different input-modalities on a scalable architecture. Videos are analyzed according their acoustic and visual content. Specifically, Audio Event Detection is applied to index the content according to attack-specific acoustic concepts. Audio similarity search is utilized to identify similar video sequences recorded from different perspectives. Visual object detection and tracking are used to index the content according to relevant concepts. The heterogeneous results of the analytical modules are fused into a distributed index of visual and acoustic concepts to facilitate rapid start of investigations, following traits and investigating witness reports.


Audio event detection Audio similarity Visual object detection Large scale computing Ethics of security Ethics of technology 



This article has been made possible partly by received funding from the European Union’s Horizon 2020 research and innovation program in the context of the VICTORIA project under grant agreement no. SEC-740754 and the project FLORIDA, FFG Kooperative F&E Projekte 2015, project no. 854768.


  1. 1.
    Giannoulis, D., Benetos, E., Stowell, D., Rossignol, M., Lagrange, M., Plumbley, M.D.: Detection and classification of acoustic scenes and events: An IEEE AASP challenge. In: 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 1–4. IEEE (2013)Google Scholar
  2. 2.
    Thomas, L., Schindler, A.: CQT-based convolutional neural networks for audio scene classification. In: Proceedings of the Detection and Classification of Acoustic Scenes and Events 2016 Workshop (DCASE 2016), pp. 60–64, September 2016Google Scholar
  3. 3.
    Adavanne, S., Parascandolo, G., Pertilä, P., Heittola, T., Virtanen, T.: Sound event detection in multichannel audio using spatial and harmonic features. Technical report, DCASE2016 Challenge, September 2016Google Scholar
  4. 4.
    Kukanov, I., Hautamäki, V., Lee, K.A.: Recurrent neural network and maximal figure of merit for acoustic event detection. Technical report, DCASE2017 Challenge (2017)Google Scholar
  5. 5.
    Knees, P., Schedl, M.: Music Similarity and Retrieval: An Introduction to Audio-and Web-based Strategies, vol. 36. Springer, Heidelberg (2016). Scholar
  6. 6.
    Pampalk, E., Flexer, A., Widmer, G., et al.: In: Improvements of audio-based music similarity and genre classificaton. In: ISMIR, London, UK, vol. 5, pp. 634–637 (2005)Google Scholar
  7. 7.
    Kim, J., Urbano, J., Liem, C., Hanjalic, A.: One deep music representation to rule them all?: A comparative analysis of different representation learning strategies. arXiv preprint arXiv:1802.04051 (2018)
  8. 8.
    Srinivas, S., Sarvadevabhatla, R.K., Mopuri, K.R.: A taxonomy of deep convolutional neural netwprks for computer vision (2016)Google Scholar
  9. 9.
    Voulodimos, A., Doulamis, N., Doulamis, A., Protopapadakis, E.: Deep learning for computer vision: a brief review. Comput. Intell. Neurosci. 2018, 1–13 (2018). Article ID 7068349Google Scholar
  10. 10.
    Xu, L.D.: Enterprise systems: state-of-the-art and future trends. IEEE Trans. Ind. Inform. 7(4), 630–640 (2011)CrossRefGoogle Scholar
  11. 11.
    Brandt, J., Bux, M., Leser, U.: Cuneiform: a functional language for large scale scientific data analysis. In: EDBT/ICDT Workshops (2015)Google Scholar
  12. 12.
    Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: Proceedings of International Conference on Management of Data (SIGMOD 2008), pp. 1099–1110. ACM (2008)Google Scholar
  13. 13.
    Zaharia, M., Mosharaf Chowdhury, N.M., Franklin, M., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. Technical Report UCB/EECS-2010-53, EECS Department, University of California, Berkeley, May 2010Google Scholar
  14. 14.
    Nadarajan, G., Chen-Burger, Y.-H., Malone, J.: Semantic-based workflow composition for video processing in the grid. In: 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings) (WI 2006), pp. 161–165 (2006)Google Scholar
  15. 15.
    Fan, C.T., Wang, Y.K., Huang, C.R.: Heterogeneous information fusion and visualization for a large-scale intelligent video surveillance system. IEEE Trans. Syst. Man Cybern. Syst. 47(4), 593–604 (2017)CrossRefGoogle Scholar
  16. 16.
    Mesaros, A., Heittola, T., Virtanen, T.: Tut database for acoustic scene classification and sound event detection. In: 24th European Signal Processing Conference (EUSIPCO) (2016)Google Scholar
  17. 17.
    Schindler, A., Lidy, T., Rauber, A.: Comparing shallow versus deep neural network architectures for automatic music genre classification. In: 9th Forum Media Technology (FMT 2016), vol. 1734, pp. 17–21. CEUR (2016)Google Scholar
  18. 18.
    Schindler, A., Lidy, T., Rauber, A.: Multi-temporal resolution convolutional neural networks for acoustic scene classification. In: Detection and Classification of Acoustic Scenes and Events Workshop (DCASE 2017), Munich, Germany (2017)Google Scholar
  19. 19.
    Xu, Y., Kong, Q., Huang, Q., Wang, W., Plumbley, M.D.: Attention and localization based on a deep convolutional recurrent model for weakly supervised audio tagging. arXiv preprint arXiv:1703.06052 (2017)
  20. 20.
    Choi, K., Joo, D., Kim, J.: Kapre: On-GPU audio preprocessing layers for a quick implementation of deep neural network models with keras. In: Machine Learning for Music Discovery Workshop at 34th International Conference on Machine Learning. ICML (2017)Google Scholar
  21. 21.
    Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)
  22. 22.
    Gemmeke, J.F., et al.: Audio set: an ontology and human-labeled dataset for audio events. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 776–780. IEEE (2017)Google Scholar
  23. 23.
    Schindler, A., Gordea, S., van Biessum, H.: The europeana sounds music information retrieval pilot. In: Ioannides, M., et al. (eds.) EuroMed 2016. LNCS, vol. 10059, pp. 109–117. Springer, Cham (2016). Scholar
  24. 24.
    Lidy, T., Rauber, A., Pertusa, A., Quereda, J.M.I.: Improving genre classification by combination of audio and symbolic descriptors using a transcription systems. In: Proceedings of International Conference on Music Information Retrieval (2007)Google Scholar
  25. 25.
    Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017Google Scholar
  26. 26.
    Ning, G., Zhang, Z., Huang, C., He, Z., Ren, X., Wang, H.: Spatially supervised recurrent convolutional neural networks for visual object tracking (2016)Google Scholar
  27. 27.
    Milan, A., Rezatofighi, S.H., Dick, A., Reid, I., Schindler, K.: Online multi-target tracking using recurrent neural netwroks (2016)Google Scholar
  28. 28.
    Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with deep association metric (2017)Google Scholar
  29. 29.
    Wojke, N., Bewley, A., Paulus, D.: Tracking the untrackable: learning to track multiple cues with long-term dependencies. In: CVPR (2017)Google Scholar
  30. 30.
    Boyer, M., Veigl, S.: A distributed system for secure, modular computer vision. In: Proceedings of Future Security 2014 9th Future Security Security Research Conference, Berlin, 16–18 September 2014, pp. 696–699 (2014)Google Scholar
  31. 31.
    Schmidt, R., Rella, M., Schlarb, S.: ToMaR–a data generator for large volumes of content. In: 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 937–942 (2014)Google Scholar
  32. 32.
    Rampp, B.: Zum Konzept der Sicherheit. In: Ammicht Quinn, R. (ed.) Sicherheitsethik. Studien zur Inneren Sicherheit, vol. 16, pp. 51–61. Springer VS, Wiesbaden (2014). Scholar
  33. 33.
    Grunwald, A.: Einleitung und Überblick. In: Grunwald, A., Simonidis-Puschmann, M. (eds.) Handbuch Technikethik, pp. 1–11. J.B. Metzler, Stuttgart (2013). Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Alexander Schindler
    • 1
    Email author
  • Martin Boyer
    • 1
  • Andrew Lindley
    • 1
  • David Schreiber
    • 1
  • Thomas Philipp
    • 2
  1. 1.Center for Digital Safety and Security, AIT Austrian Institute of Technology GmbHViennaAustria
  2. 2.LIquA - Linzer Institut für qualitative AnalysenLinzAustria

Personalised recommendations