DIRAC: Detection and Identification of Rare Audio-Visual Events

  • Jörn Anemüller
  • Barbara Caputo
  • Hynek Hermansky
  • Frank W. Ohl
  • Tomas Pajdla
  • Misha Pavel
  • Luc van Gool
  • Rufin Vogels
  • Stefan Wabnik
  • Daphna Weinshall
Part of the Studies in Computational Intelligence book series (SCI, volume 384)


The DIRAC project was an integrated project that was carried out between January 1st 2006 and December 31st 2010. It was funded by the European Commission within the Sixth Framework Research Programme (FP6) under contract number IST-027787. Ten partners joined forces to investigate the concept of rare events in machine and cognitive systems, and developed multi-modal technology to identify such events and deal with them in audio-visual applications.

This document summarizes the project and its achievements. In Section 2 we present the research and engineering problem that the project set out to tackle, and discuss why we believe that advance made on solving these problems will get us closer to achieving the general objective of building artificial cognitive system with cognitive capabilities. We describe the approach taken to solving the problem, detailing the theoretical framework we came up with. We further describe how the inter-disciplinary nature of our research and evidence collected from biological and cognitive systems gave us the necessary insights and support for the proposed approach. In Section 3 we describe our efforts towards system design that follow the principles identified in our theoretical investigation. In Section 4 we describe a variety of algorithms we have developed in the context of different applications, to implement the theoretical framework described in Section 2. In Section 5 we describe algorithmic progress on a variety of questions that concern the learning of those rare events as defined in our Section 2. Finally, in Section 6 we describe our application scenarios, an integrated test-bed developed to test our algorithms in an integrated way.


Equal Error Rate Superior Temporal Sulcus Word Error Rate Novelty Detection Large Vocabulary Continuous Speech Recognition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bach, J.-H., Anemüller, J.: Detecting novel objects through classifier incongruence. In: Interspeech, pp. 2206–2209 (2010)Google Scholar
  2. 2.
    Bach, J.-H., Kollmeier, B., Anemüller, J.: Modulation-based detection of speech in real background noise: Generalization to novel background classes. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 41–44 (2010)Google Scholar
  3. 3.
    De Baene, W., Premereur, E., Vogels, R.: Properties of shape tuning of macaque inferior temporal neurons examined using Rapid Serial Visual Presentation. Journal of Neurophysiology 97, 2900–2916 (2007)CrossRefGoogle Scholar
  4. 4.
    Burget, L., Schwarz, P., Matejka, P., Hannemann, M., Rastrow, A., White, C., Khudanpur, S., Hermansky, H., Cernocky, J.: Combination of strongly and weakly constrained recognizers for reliable detection of OOVs. In: International Conference on Acoustics, Speech, and Signal Processing (ICASSP), p. 4 (2008)Google Scholar
  5. 5.
    Castellini, C., Tommasi, T., Noceti, N., Odone, F., Caputo, B.: Using object affordances to improve object recognition. IEEE Transaction on Autonomous Mental Development (2011)Google Scholar
  6. 6.
    De Baene, W., Vogels, R.: Effects of adaptation on the stimulus selectivity of macaque inferior temporal spiking activity and local field potentials. Cerebral Cortex 20(9), 2145–2165 (2010)CrossRefGoogle Scholar
  7. 7.
    De Baene, W., Ons, B., Wagemans, J., Vogels, R.: Effects of category learning on the stimulus selectivity of macaque inferior temporal neurons. Learning and Memory 15, 717–727 (2008)CrossRefGoogle Scholar
  8. 8.
    Deliano, Ohl: Neurodynamics of category learning: Towards understanding the creation of meaning in the brain. New Mathematics and Natural Computation (NMNC) 5, 61–81 (2009)zbMATHCrossRefGoogle Scholar
  9. 9.
    Hannemann, M., et al.: Similarity scoring for recognized repeated Out-of-Vocabulary words. In: Proc. Interspeech 2010, Makuhari, Japan (2010)Google Scholar
  10. 10.
    Hermansky, H.: Dealing With Unexpected Words in Automatic Recognition of Speech. Technical report, Idiap Research Institute (2008)Google Scholar
  11. 11.
    Herrmann, C.S., Ohl, F.W.: Cognitive adequacy in brain-like intelligence. In: Sendhoff, B., Körner, E., Sporns, O., Ritter, H., Doya, K. (eds.) Creating Brain-Like Intelligence. LNCS, vol. 5436, pp. 314–327. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  12. 12.
    Jie, L., Orabona, F., Caputo, B.: An online framework for learning novel concepts over multiple cues. In: Proceedings of Asian Conference on Computer Vision (ACCV), vol. 1, pp. 1–12 (2009)Google Scholar
  13. 13.
    Ketabdar, H., Hannemann, M., Hermansky, H.: Detection of Out-of-Vocabulary (2007)Google Scholar
  14. 14.
    Kayser, H., Ewert, S.D., Anemüller, J., Rohdenburg, T., Hohmann, V., Kollmeier, B.: Database of Multichannel In-Ear and Behind-the-Ear Head-Related and Binaural Room Impulse Responses. EURASIP Journal on Advances in Signal Processing, 1–10 (2009)Google Scholar
  15. 15.
    Words in Posterior Based ASR. In: 8th Annual Conference of the International Speech Communication Association INTERSPEECH 2007, pp. 1757–1760 (2007)Google Scholar
  16. 16.
    Kombrink, S.: OOV detection and beyond. In: DIRAC Workshop at ECML/PKDD (2010)Google Scholar
  17. 17.
    Kombrink, S., Hannemann, M., Burget, L., Heřmanský, H.: Recovery of rare words in lecture speech. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS(LNAI), vol. 6231, pp. 330–337. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  18. 18.
    Kombrink, S., Burget, L., Matejka, P., Karafiat, M., Hermansky, H.: Posterior-based Out of Vocabulary Word Detection in Telephone Speech. In: ISCA, Interspeech 2009, Brighton, GB, pp. 80–83 (2009), ISSN 1990-9772Google Scholar
  19. 19.
    Mikolov, T., Karafiát, M., Burget, L., Černocký, J., Khudanpur, S.: Recurrent neural network based language model. In: Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH 2010), pp. 1045–1048 (2010)Google Scholar
  20. 20.
    Nater, F., Grabner, H., Jaeggli, T., Gool, L.v.: Tracker trees for unusual event detection. In: ICCV 2009 Workshop on Visual Surveillance (2009)Google Scholar
  21. 21.
    Nater, F., Vangeneugden, J., Grabner, H., Gool, L.v., Vogels, R.: Discrimination of locomotion direction at different speeds: A comparison between macaque monkeys and algorithms. In: ECML Workshop on rare audio-visual cues (2010)Google Scholar
  22. 22.
    Orabona, F., Jie, L., Caputo, B.: Online-Batch Strongly Convex Multi Kernel Learning. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010 (2010)Google Scholar
  23. 23.
    Orabona, F., Caputo, B., Fillbrandt, A., Ohl, F.: A Theoretical Framework for Transfer of Knowledge Across Modalities in Artificial and Biological Systems. In: IEEE 8th International Conference on Development and Learning, ICDL 2009 (2009)Google Scholar
  24. 24.
    Orabona, F., Castellini, C., Caputo, B., Luo, J., Sandini, G.: Towards Life-long Learning for Cognitive Systems: Online Independent Support Vector Machine. Pattern Recognition 43(4), 1402–1412 (2010)zbMATHCrossRefGoogle Scholar
  25. 25.
    Orabona, F., Keshet, J., Caputo, B.: Bounded kernel-based perceptrons. Journal of Machine Learning Research 10, 2643–2666 (2009)MathSciNetGoogle Scholar
  26. 26.
    Orabona, F., Keshet, J., Caputo, B.: The projectron: a bounded kernel-based perceptron. In: 25th International Conference on Machine Learning (2008)Google Scholar
  27. 27.
    Orabona, F., Castellini, C., Caputo, B., Luo, J., Sandini, G.: Indoor Place Recognition using Online Independent Support Vector Machines. In: Proceedings of the 18th British Machine Vision Conference (BMVC), pp. 1090–1099 (2007)Google Scholar
  28. 28.
    Pajdla, T., Havlena, M., Heller, J., Kayser, H., Bach, J.-H., Anemüller, J.: Incongruence Detection for Detecting, Removing, and Repairing Incorrect Functionality in Low-Level Processing (CTU-CMP-2009-19). Technical report, CTU Research Report (2009)Google Scholar
  29. 29.
    Schmidt, D., Anemüeller, J.: Acoustic Feature Selection for Speech Detection Based on Amplitude Modulation Spectrograms. In: Fortschritte der Akustik: DAGA 2007, Deutsche Gesellschaft für Akustik (DEGA), pp. 347–348 (2007)Google Scholar
  30. 30.
    Szöke, I., Fapso, M., Burget, L., Cernocky, J.: Hybrid Word-Subword Decoding for Spoken Term Detection. In: SSCS 2008 - Speech search Workshop at SIGIR, p. 4 (2008)Google Scholar
  31. 31.
    Tommasi, T., Orabona, F., Caputo, B.: Safety in numbers: learning categories from few examples with multi model knowledge transfer. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010 (2010)Google Scholar
  32. 32.
    Tommasi, T., Caputo, B.: The more you know, the less you learn: from knowledge transfer to one-shot learning of object categories. In: British Machine Vision Conference, BMVC 2009 (2009)Google Scholar
  33. 33.
    Vangeneugden, J., De Mazière, P., Van Hulle, M., Jaeggli, T., Van Gool, L., Vogels, R.: Distinct Mechanisms for Coding of Visual Actions in Macaque Temporal Cortex. Journal of Neuroscience 31(2), 385–401 (2011)CrossRefGoogle Scholar
  34. 34.
    Vangeneugden, J., Vancleef, K., Jaeggli, T., Van Gool, L., Vogels, R.: Discrimination of locomotion direction in impoverished displays of walkers by macaque monkeys. Journal of Vision 10(4), 22.1–22.19 (2010)CrossRefGoogle Scholar
  35. 35.
    Vangeneugden, J., Pollick, F., Vogels, R.: Functional differentiation of macaque visual temporal cortical neurons using a parametric action space. Cerebral Cortex 19(3), 593–611 (2009)CrossRefGoogle Scholar
  36. 36.
    Verhoef, B.E., Kayaert, G., Franko, E., Vangeneugden, J., Vogels, R.: Stimulus similarity-contingent neural adaptation can be time and cortical area dependent. Journal of Neuroscience 28, 10631–10640 (2008)CrossRefGoogle Scholar
  37. 37.
    White, C., Zweig, G., Burget, L., Schwarz, P., Hermansky, H.: Confidence Estimation, Oov Detection And Language Id Using Phone-To-Word Transduction And Phone-Level Alignments. In: IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, pp. 4085–4088 (2008)Google Scholar
  38. 38.
    Witte, H., Charpentier, M., Mueller, M., Voigt, T., Deliano, M., Garke, B., Veit, P., Hempel, T., Diez, A., Reiher, A., Ohl, F., Dadgar, A., Christen, J., Krost, A.: Neuronal cells on GaN-based materials. Deutsche Physikalische Gesellschaft, Spring Meeting of the Deutsche Physikalische Gesellschaft, Berlin (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Jörn Anemüller
    • 1
  • Barbara Caputo
    • 2
  • Hynek Hermansky
    • 3
  • Frank W. Ohl
    • 4
  • Tomas Pajdla
    • 5
  • Misha Pavel
    • 6
  • Luc van Gool
    • 7
  • Rufin Vogels
    • 8
  • Stefan Wabnik
    • 9
  • Daphna Weinshall
    • 10
  1. 1.Carl von Ossietzky University OldenburgGermany
  2. 2.Fondation de l’Institut Dalle Molle d’Intelligence Artificielle PerceptiveMartignySwitzerland
  3. 3.Brno University of TechnologyCzech Republic
  4. 4.Leibniz Institut für NeurobiologieMagdeburgGermany
  5. 5.Czech Technical University in PragueCzech Republic
  6. 6.Oregon Health and Science UniversityPortlandUSA
  7. 7.Eidgenössische Technische Hochschule ZürichSwitzerland
  8. 8.Katholieke Universiteit LeuvenBelgium
  9. 9.Fraunhofer Institut Digitale MedientechnologieOldenburgGermany
  10. 10.University of JerusalemJerusalemIsrael

Personalised recommendations