Combining Audio and Video in Perceptive Spaces

Wren, Christopher R.; Basu, Sumit; Sparacino, Flavia; Pentland, Alex P.

doi:10.1007/978-1-4471-0743-9_5

Christopher R. Wren²,
Sumit Basu²,
Flavia Sparacino² &
…
Alex P. Pentland²

635 Accesses
1 Citations

Abstract

Virtual environments have great potential in applications such as entertainment, animation by example, design interface, information browsing, and even expressive performance. In this paper we describe an approach to unencumbered, natural interfaces called Perceptive Spaces with a particular focus on efforts to include true multi-modal interface: interfaces that attend to both the speech and gesture of the user. The spaces are unencumbered because they utilize passive sensors that don’t require special clothing and large format displays that don’t isolate the user from their environment. The spaces are natural because the open environment facilitates active participation. Several applications illustrate the expressive power of this approach, as well as the challenges associated with designing these interfaces.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ali Azarbayejani and Alex Pentland. Real-time self-calibrating stereo person tracking using 3-D shape estimation from blob features. In Proceedings of 13th ICPR, Vienna, Austria, August 1996. IEEE Computer Society Press.
Google Scholar
Sumit Basu, Michael Casey, William Gardner, Ali Azarbayejani, and Alex Pentland. Vision-steered audio for interactive environments. In Proceedings of IMAGE’COM 96, Bordeaux, Prance, May 1996.
Google Scholar
Sumit Basu and Alex Pentland. Headset-free voicing detection and pitch tracking in noisy environments. Technical Report 503, MIT Media Lab Vision and Modeling Group, June 1999.
Google Scholar
R. A. Bolt, ’put-that-there’: Voice and gesture at the graphics interface. In Computer Graphics Proceedings, SIGGRAPH 1980, volume 14, pages 262–70, July 1980.
Google Scholar
Brian Clarkson and Alex Pentland. Unsupervised clustering of ambulatory audio and video. In ICASSP’99, 1999.
Google Scholar
T. Darreil, B. Moghaddam, and A. Pentland. Active face tracking and pose estimation in an interactive room. In CVPR96. IEEE Computer Society, 1996.
Google Scholar
M. W. Krueger. Artificial Reality II. Addison Wesley, 1990.
Google Scholar
Pattie Maes, Bruce Blumberg, Trevor Darrell, and Alex Pentland. The alive system: Full-body interaction with animated autonomous agents. ACM Multimedia Systems, 5:105–112, 1997.
Article Google Scholar
Deb Roy and Alex Pentland. ”learning words from audio-visual input. In Int. Conf. Spoken Language Processing, volume 4, page 1279, Sydney, Australia, December 1998.
Google Scholar
Kenneth Russell, Thad Starner, and Alex Pentland. Unencumbered virtual environments. In IJCAI-95 Workshop on Entertainment and AI/Alife, 1995.
Google Scholar
Flavia Sparacino, Christopher Wren, Glorianna Davenport, and Alex Pentland. Augmented performance in dance and theater. In International Dance and Technology 99, ASU, Tempe, Arizona, February 1999.
Google Scholar
C. Wren, F. Sparacino, A. Azarbayejani, T. Darrell, James W. Davis, T. Starner, Kotani A, C. Chao, M. Hlavac, K. Russell, Aaron Bobick, and Pentland A. Perceptive spaces for performance and entertainment (revised). In ATR Workshop on Virtual Communication Environments, April 1998.
Google Scholar
Christopher Wren, Ali Azarbayejani, Trevor Darrell, and Alex Pentland. Pfinder: Real-time tracking of the human body. IEEE Trans. Pattern Analysis and Machine Intelligence, 19(7):780–785, July 1997.
Article Google Scholar
Christopher R. Wren and Alex P. Pentland. Dynamic models of human motion. In Proceedings of FG’98, Nara, Japan, April 1998. IEEE.
Google Scholar

Download references

Author information

Authors and Affiliations

Perceptual Computing Section, The MIT Media Laboratory, 20 Ames St., Cambridge, MA, 02139, USA
Christopher R. Wren, Sumit Basu, Flavia Sparacino & Alex P. Pentland

Authors

Christopher R. Wren
View author publications
You can also search for this author in PubMed Google Scholar
Sumit Basu
View author publications
You can also search for this author in PubMed Google Scholar
Flavia Sparacino
View author publications
You can also search for this author in PubMed Google Scholar
Alex P. Pentland
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Trinity College Dublin, Dublin 2, Ireland
Paddy Nixon BSc, MA, PhD, CEng, MBCS , Gerard Lacey BA, BAI, MA, PhD & Simon Dobson BSc, DPhil, CEng, MBCS , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wren, C.R., Basu, S., Sparacino, F., Pentland, A.P. (2000). Combining Audio and Video in Perceptive Spaces. In: Nixon, P., Lacey, G., Dobson, S. (eds) Managing Interactions in Smart Environments. Springer, London. https://doi.org/10.1007/978-1-4471-0743-9_5

Download citation

DOI: https://doi.org/10.1007/978-1-4471-0743-9_5
Publisher Name: Springer, London
Print ISBN: 978-1-85233-228-0
Online ISBN: 978-1-4471-0743-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics