Skip to main content

The IntelliMedia WorkBench-An Environment for Building Multimodal Systems

  • Conference paper
  • First Online:
Cooperative Multimodal Communication (CMC 1998)

Abstract

Intelligent MultiMedia (IntelliMedia) focuses on the computer processing and understanding of signal and symbol input from at least speech, text and visual images in terms of semantic representations. We have developed a general suite of tools in the form of a software and hardware platform called “Chameleon” that can be tailored to conducting IntelliMedia in various application domains. Chameleon has an open distributed processing architecture and currently includes ten agent modules: blackboard, dialogue manager, domain model, gesture recogniser, laser system, microphone array, speech recogniser, speech synthesiser, natural language processor, and a distributed Topsy learner. Most of the modules are programmed in C and C++ and are glued together using the Dacs communications system. In effect, the blackboard, dialogue manager and Dacs form the kernel of Chameleon. Modules can communicate with each other and the blackboard which keeps a record of interactions over time via semantic representations in frames. Inputs to Chameleon can include synchronised spoken dialogue and images and outputs include synchronised laser pointing and spoken dialogue. An initial prototype application of Chameleon is an IntelliMedia Work-Bench where a user will be able to ask for information about things (e.g. 2D/3D models, pictures, objects, gadgets, people, or whatever) on a physical table. The current domain is a Campus Information System for 2D building plans which provides information about tenants, rooms and routes and can answer questions like Whose office is this? and Show me the route from Paul Mc Kevitt’s office to Paul Dalsgaard’s office. in real time. Chameleon and the IntelliMedia WorkBench are ideal for testing integrated signal and symbol processing of language and vision for the future of SuperinformationhighwayS.

Paul Mc Kevitt was also a British Engineering and Physical Sciences Research Council (EPSRC) Advanced Fellow at the University of Sheffield, England for five years under grant B/94/AF/1833 for the Integration of Natural Language, Speech and Vision Processing and recently took up appointment as Chair in Digital MultiMedia at The University of Ulster (Magee), Northern Ireland (p.mckevitt@ulst.ac.uk).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Bakman, L., M. Blidegn, T.D. Nielsen, and S. Carrasco Gonzalez (1997) NIVICO-Natural Interface for VIdeo COnferencing. Project Report (8th Semester), Department of Communication Technology, Institute for Electronic Systems, Aalborg University, Denmark.

    Google Scholar 

  • Bech, A. (1991) Description of the EUROTRA framework. In The Eurotra Formal Specifications, Studies in Machine Translation and Natural Language Processing, C. Copeland, J. Durand, S. Krauwer, and B. Maegaard (Eds), Vol. 2, 7–40. Luxembourg: Office for Official Publications of the Commission of the European Community.

    Google Scholar 

  • Br’lndsted, T. (1998) nlparser. http://www.kom.auc.dk/tb/nlparser

  • Brøndsted, T., P. Dalsgaard, L.B. Larsen, M. Manthey, P. Mc Kevitt, T.B. Moeslund, and K.G. Olesen (1998) A platform for developing Intelligent MultiMedia applications. Technical Report R-98-1004, Center for PersonKommunikation (CPK), Institute for Electronic Systems (IES), Aalborg University, Denmark, May.

    Google Scholar 

  • Christensen, H., B. Lindberg, and P. Steingrimsson (1998) Functional specification of the CPK Spoken LANGuage recognition research system (SLANG). Center for PersonKommunikation, Aalborg University, Denmark, March.

    Google Scholar 

  • CPK Annual Report (1998) CPK Annual Report. Center for PersonKommunikation (CPK), Fredrik Bajers Vej 7-A2, Institute for Electronic Systems (IES), Aalborg University, DK-9220, Aalborg, Denmark.

    Google Scholar 

  • Denis, M. and M. Carfantan (Eds.) (1993) Images et langages: multimodalité et modelisation cognitive. Actes du Colloque Interdisciplinaire du Comité National de la Recherche Scientifique, Salle des Conférences, Siége du CNRS, Paris, April.

    Google Scholar 

  • Fink, G.A., N. Jungclaus, H. Ritter, and G. Sagerer (1995) A communication framework for heterogeneous distributed pattern analysis. In Proc. International Conference on Algorithms and Applications for Parallel Processing, V. L. Narasimhan (Ed.), 881–890. IEEE, Brisbane, Australia.

    Chapter  Google Scholar 

  • Fink, G.A., N. Jungclaus, and F. Kummert, H. Ritter, and G. Sagerer (1996) A distributed system for integrated speech and image understanding. In Proceedings of the International Symposium on Artificial Intelligence, Rogelio Soto (Ed.), 117–126. Cancun, Mexico.

    Google Scholar 

  • Infovox (1994) INFOVOX: Text-to-speech converter user’s manual (version 3.4). Solna, Sweden: Telia Promotor Infovox AB

    Google Scholar 

  • Jensen, F.V. (1996) An introduction to Bayesian Networks. London, England: UCL Press.

    Google Scholar 

  • Jensen, F. (1996) Bayesian belief network technology and the HUGIN system. In Proceedings of UNICOM seminar on Intelligent Data Management, Alex Gammerman (Ed.), 240–248. Chelsea Village, London, England, April.

    Google Scholar 

  • Kosslyn, S.M. and J.R. Pomerantz (1977) Imagery, propositions and the form of internal representations. In Cognitive Psychology, 9, 52–76.

    Article  Google Scholar 

  • Leth-Espensen, P. and B. Lindberg (1996) Separation of speech signals using eigen-filtering in a dual beamforming system. In Proc. IEEE Nordic Signal Processing Symposium (NORSIG), Espoo, Finland, September, 235–238.

    Google Scholar 

  • Manthey, M.J. (1998) The Phase Web Paradigm. In International Journal of General Systems, special issue on General Physical Systems Theories, K. Bowden (Ed.). in press.

    Google Scholar 

  • Mc Kevitt, P. (1994) Visions for language. In Proceedings of the Workshop on Integration of Natural Language and Vision processing, Twelfth American National Conference on Artificial Intelligence (AAAI-94), Seattle, Washington, USA, August, 47–57.

    Google Scholar 

  • Mc Kevitt, P. (Ed.) (1995/1996) Integration of Natural Language and Vision Processing (Vols. I-IV). Dordrecht, The Netherlands: Kluwer-Academic Publishers.

    Google Scholar 

  • Mc Kevitt, P. (1997) Superinformationhighway S. In “Sprog og Multimedier” (Speech and Multimedia), Tom Brøndsted and Inger Lytje (Eds.), 166–183, April 1997. Aalborg, Denmark: Aalborg Universitetsforlag (Aalborg University Press).

    Google Scholar 

  • Mc Kevitt, P. and P. Dalsgaard (1997) A frame semantics for an IntelliMedia Tour-Guide. In Proceedings of the Eighth Ireland Conference on Artificial Intelligence (AI-97), Volume 1, 104–111. University of Uster, Magee College, Derry, Northern Ireland, September.

    Google Scholar 

  • Minsky, M. (1975) A framework for representing knowledge. In The Psychology of Computer Vision, P.H. Winston (Ed.), 211–217. New York: McGraw-Hill.

    Google Scholar 

  • Nielsen, C., J. Jensen, O. Andersen, and E. Hansen (1997) Speech synthesis based on diphone concatenation. Technical Report, No. CPK971120-JJe (in confidence), Center for PersonKommunikation, Aalborg University, Denmark.

    Google Scholar 

  • Okada, N. (1997) Integrating vision, motion and language through mind. In Proceedings of the Eighth Ireland Conference on Artificial Intelligence (AI-97), Volume 1, 7–16. University of Uster, Magee, Derry, Northern Ireland, September.

    Google Scholar 

  • Pentland, A. (Ed.) (1993) Looking at people: recognition and interpretation of human action. IJCAI-93 Workshop (W28) at The 13th International Conference on Artificial Intelligence (IJCAI-93), Chambéry, France, August.

    Google Scholar 

  • Power, K., C. Matheson, D. Ollason, and R. Morton (1997) The grapHvite book (version 1.0). Cambridge, England: Entropic Cambridge Research Laboratory Ltd.

    Google Scholar 

  • Pylyshyn, Z. (1973) What the mind’s eye tells the mind’s brain: a critique of mental imagery. In Psychological Bulletin, 80, 1–24.

    Article  Google Scholar 

  • Rickheit, G. and I. Wachsmuth (1996) Collaborative Research Centre “Situated Artificial Communicators” at the University of Bielefeld, Germany. In Integration of Natural Language and Vision Processing, Volume IV, Recent Advances, Mc Kevitt, Paul (ed.), 11–16. Dordrecht, The Netherlands: Kluwer Academic Publishers.

    Google Scholar 

  • Thórisson, K.R. (1997) Layered action control in communicative humanoids. In Proceedings of Computer Graphics Europe’ 97, June 5–7, Geneva, Switzerland.

    Google Scholar 

  • Waibel, A., M.T. Vo, P. Duchnowski, and S. Manke (1996) Multimodal interfaces. In Integration of Natural Language and Vision Processing, Volume IV, Recent Advances, Mc Kevitt, Paul (Ed.), 145–165. Dordrecht, The Netherlands: Kluwer Academic Publishers.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Brøndsted, T. et al. (2001). The IntelliMedia WorkBench-An Environment for Building Multimodal Systems. In: Bunt, H., Beun, R.J. (eds) Cooperative Multimodal Communication. CMC 1998. Lecture Notes in Computer Science(), vol 2155. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45520-5_13

Download citation

  • DOI: https://doi.org/10.1007/3-540-45520-5_13

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42806-0

  • Online ISBN: 978-3-540-45520-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics