Personal and Ubiquitous Computing

, Volume 13, Issue 1, pp 3–14 | Cite as

Robust multimodal audio–visual processing for advanced context awareness in smart spaces

  • A. PnevmatikakisEmail author
  • J. Soldatos
  • F. Talantzis
  • L. Polymenakos
Original Article


Identifying people and tracking their locations is a key prerequisite to achieving context awareness in smart spaces. Moreover, in realistic context-aware applications, these tasks have to be carried out in a non-obtrusive fashion. In this paper we present a set of robust person-identification and tracking algorithms, based on audio and visual processing. A main characteristic of these algorithms is that they operate on far-field and un-constrained audio–visual streams, which ensure that they are non-intrusive. We also illustrate that the combination of their outputs can lead to composite multimodal tracking components, which are suitable for supporting a broad range of context-aware services. In combining audio–visual processing results, we exploit a context-modeling approach based on a graph of situations. Accordingly, we discuss the implementation of realistic prototype applications that make use of the full range of audio, visual and multimodal algorithms.


Linear Discriminant Analysis Gaussian Mixture Model Time Delay Estimation Smart Space Perceptual Component 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work is sponsored by the European Union under the integrated project CHIL, contract number 506909.


  1. 1.
    Weiser M (1991) The computer for the 21st century. Sci Am 265(3):66–75CrossRefGoogle Scholar
  2. 2.
    Anind D, Salber D, Abowd G (2001) A conceptual framework and a toolkit for supporting the rapid prototyping of context-aware applications. Human-Computer Interaction, Lawrence Erlbaum Associates, 16Google Scholar
  3. 3.
    Want R, Hopper A, Falcao V, Gibbons J (1992) The active badge location system. ACM Trans Inform Syst 10(1):91–102CrossRefGoogle Scholar
  4. 4.
    Smailagic A, Siewiorek DP (2002) Application design for wearable and context-aware computers. IEEE Pervasive Comput 1(4):20–29CrossRefGoogle Scholar
  5. 5.
    Johanson B, Fox A, Winograd T (2002) The interactive workspaces project: experiences with ubiquitous computing rooms. IEEE Pervasive Computi Magaz 1(2)Google Scholar
  6. 6.
    Ekenel H, Pnevmatikakis A (2006) Video-based face recognition evaluation in the CHIL Project—run 1. Face and gesture recognition, Southampton, UK, pp 85–90Google Scholar
  7. 7.
    McIvor A (2000) Background subtraction techniques. Image and Vision Computing, New ZealandGoogle Scholar
  8. 8.
    Stauffer C, Grimson WEL (2000) Learning patterns of activity using real-time tracking. IEEE Trans Pattern Anal and Machine Intel 22:747–757CrossRefGoogle Scholar
  9. 9.
    KaewTraKulPong P, Bowden R (2001) An improved adaptive background mixture model for real-time tracking with shadow detection. In: Proceedings of 2nd European workshop on advanced video based surveillance systems (AVBS01)Google Scholar
  10. 10.
    Landabaso JL, Pardas M (2005) Foreground regions extraction and characterization towards real-time object tracking. In: Proceedings of joint workshop on multimodal interaction and related machine learning algorithms (MLMI ’05)Google Scholar
  11. 11.
    Xu LQ, Landabaso JL, Pardas M (1986) Shadow removal with blob-based morphological reconstruction for error correction. IEEE international conference on acoustics, speech, and signal processingGoogle Scholar
  12. 12.
    Blackman S (1986) Multiple-target tracking with radar applications, Chap. 14. Artech House, DedhamGoogle Scholar
  13. 13.
    Jones M, Rehg J (2002) Statistical color models with application to skin detection. Int J Comput Vision 46(1):81–96zbMATHCrossRefGoogle Scholar
  14. 14.
    Pnevmatikakis A, Polymenakos L (2005) A testing methodology for face recognition algorithms. In: Renals S, Bengio S (eds) MLMI 2005, Lecture Notes in Computer Science, vol 3869. Springer, Berlin, pp 218–229Google Scholar
  15. 15.
    Kittler J, Hatef M, Duin RPW, Matas J (1998) On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20(3):226–239CrossRefGoogle Scholar
  16. 16.
    Knapp CH, Carter GC (1976) The generalized correlation method for estimation of time delay. IEEE Trans Acoust Speech, Signal Process 24(4):320–327CrossRefGoogle Scholar
  17. 17.
    Talantzis F, Constantinides AG, Polymenakos L (2005) Estimation of direction of arrival using information theory. IEEE Signal Process 12(8):561–564CrossRefGoogle Scholar
  18. 18.
    Bell A, Sejnowski T (1995) An information maximization approach to blind separation and blind deconvolution. Neural Comput 7:1129–1159CrossRefGoogle Scholar
  19. 19.
    Cover TM, Thomas JA (1991) Elements of information theory. Wiley, New YorkzbMATHGoogle Scholar
  20. 20.
    Smith J, Abel J (1987) Closed-form least-squares source location estimation from range-difference measurements. IEEE Trans Acoust Speech Signal Process ASSP 35:1661–1669CrossRefGoogle Scholar
  21. 21.
    Stergiou A, Pnevmatikakis A, Polymenakos L (2006) A decision fusion system across time and classifiers for audio–visual person identification. In: Stiefelhagen R, Garofolo J (eds) CLEAR 2006, Lecture Notes in Computer Science. Springer, BerlinGoogle Scholar
  22. 22.
    Strobel N, Spors S, Rabenstein R (2001) Joint audio–video signal processing for object localization and tracking. In: Brandstein M, Ward D (eds) Microphone arrays, Springer, HeidelbergGoogle Scholar
  23. 23.
    Crowley JL (2003) Context driven observation of human activity. In: Proceedings of the European symposium on ambient intelligenceGoogle Scholar
  24. 24.
    Soldatos J, Pandis I, Stamatis K, Polymenakos L, Crowley J (2006) A middleware infrastructure for autonomous context-aware computing services, computer communications magazine, special Issue on emerging middleware for next generation networksGoogle Scholar
  25. 25.
    Azodolmolky S, Dimakis N, Mylonakis V, Souretis G, Soldatos J, Pnevmatikakis A, Polymenakos L (2005) Middleware for indoor ambient intelligence: the PolyOmaton system. In: Proceedings of the 2nd NGNM Workshop, Networking 2005, Waterloo, CanadaGoogle Scholar
  26. 26.
    Soldatos J, Polymenakos L, Pnevmatikakis A, Talantzis F, Stamatis K, Carras M (2005) Perceptual interfaces and distributed agents supporting ubiquitous computing services. In: Proceedings of the Eurescom Summit, pp. 43–50Google Scholar

Copyright information

© Springer-Verlag London Limited 2007

Authors and Affiliations

  • A. Pnevmatikakis
    • 1
    Email author
  • J. Soldatos
    • 1
  • F. Talantzis
    • 1
  • L. Polymenakos
    • 1
  1. 1.Athens Information TechnologyAthensGreece

Personalised recommendations