Integration of Audio and Video Clues for Source Localization by a Robotic Head

Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 37)


In this work the first step of an integration process between audio and video information for the localization of speakers in closed environments is presented. The proposed metod is based on binaural source localization followed by face recognition and tracking and was realized and implemented in a real environment. Some preliminary results demonstrated the effectiveness of this approach.


Binaural source localization face detection and tracking audio and video integration 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Rayleigh, L.: On our perception of sound direction. Phil. Mag. 13, 214–232 (1907)CrossRefGoogle Scholar
  2. 2.
    Blauert, J.: Spatial Hearing - The Psychophysics of Human Sound Localization. MIT Press (1996)Google Scholar
  3. 3.
    Raspaud, M., Viste, H., Evangelista, G.: Binaural source localization by joint estimation of ILD and ITD. IEEE Trans. on Audio, Speech and Language Processing 18(1), 68–77 (2010)CrossRefGoogle Scholar
  4. 4.
    Monaci, G., Jost, P., Vandergheynst, P., Mailé, B., Lesage, S., Gribonval, R.: Learning multimodal dictionaries. IEEE Trans. on Image Processing 16(9), 2272–2283 (2007)CrossRefGoogle Scholar
  5. 5.
    Zhang, C., Yin, P., Rui, Y., Cutler, R., Viola, P., Sun, X., Pinto, N., Zhang, Z.: Boosting-based multimodal speaker detection for distributed meeting videos. IEEE Trans. on Multimedia 10(8), 1541–1552 (2008)CrossRefGoogle Scholar
  6. 6.
    Schmalenstroeer, J., Haeb-Umbach, R.: Online diarization of streaming audio-visual data for smart envirnments. IEEE Journ. of Selected Topics in Signal Processing 4(5), 845–856 (2010)CrossRefGoogle Scholar
  7. 7.
    Naqvi, S.M., Wang, W., Khan, M.S., Barnard, M., Chambers, J.A.: Multimodal (audio-visual) source separation exploiting multi-speaker tracking, robust beamforming and time-frequency masking. IET Signal Processing 6(5), 466–477 (2012)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Minotto, V.P., Jung, C.R., Lee, B.: Simultaneous-speaker voice activity detection and localization using mid-fusion of svm and hmms. IEEE Trans. on Multimedia 16(4), 1032–1044 (2014)CrossRefGoogle Scholar
  9. 9.
    Wang, D., Brown, G.J.: Computational Auditory Scene Analysis - Principles, Algorithms, and Applications. IEEE Press, Wiley Interscience (2006)Google Scholar
  10. 10.
    Algazi, V.R., Duda, R.O., Thompson, D.M., Avendano, C.: The CIPIC HRTF database. In: 2001 IEEE Workshop on Applications of Digital Signal Processing to Audio and Acoustics (2001)Google Scholar
  11. 11.
    Kuttruff, H.: Room Acoustics, 4th edn. Taylor & Francis (2000)Google Scholar
  12. 12.
    Stéphenne, A., Champagne, B.: A new cepstral prefiltering technique for estimating time delay under reverberant conditions. Signal Processing 59(3), 253–266 (1997)zbMATHCrossRefGoogle Scholar
  13. 13.
    Parisi, R., Gazzetta, R., Di Claudio, E.: Prefiltering approaches for time delay estimation in reverberant environments. In: Proceedings of ICASSP, vol. 3, pp. III-2997–III-3000 (2002)Google Scholar
  14. 14.
    Zannini, C.M., Parisi, R., Uncini, A.: Binaural sound source localization in the presence of reverberation. In: Proc. of the 17th International Conference on Digital Signal Processing (July 2011)Google Scholar
  15. 15.
    Parisi, R., Camoes, F., Scarpiniti, M., Uncini, A.: Cepstrum prefiltering for binaural source localization in reverberant environments. IEEE Signal Processing Letters 19(2), 99–102 (2012)CrossRefGoogle Scholar
  16. 16.
    Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. of Computer Vision 57(2), 137–154 (2004)CrossRefGoogle Scholar
  17. 17.
    Freund, Y.Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1), 119–139 (1997)zbMATHMathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.DIET Dept.University of Rome “Sapienza”RomeItaly

Personalised recommendations