The IntelliMedia WorkBench-An Environment for Building Multimodal Systems

  • Tom Brøndsted
  • Paul Dalsgaard
  • Lars Bo Larsen
  • Michael Manthey
  • Paul Mc Kevitt
  • Thomas B. Moeslund
  • Kristian G. Olesen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2155)

Abstract

Intelligent MultiMedia (IntelliMedia) focuses on the computer processing and understanding of signal and symbol input from at least speech, text and visual images in terms of semantic representations. We have developed a general suite of tools in the form of a software and hardware platform called “Chameleon” that can be tailored to conducting IntelliMedia in various application domains. Chameleon has an open distributed processing architecture and currently includes ten agent modules: blackboard, dialogue manager, domain model, gesture recogniser, laser system, microphone array, speech recogniser, speech synthesiser, natural language processor, and a distributed Topsy learner. Most of the modules are programmed in C and C++ and are glued together using the Dacs communications system. In effect, the blackboard, dialogue manager and Dacs form the kernel of Chameleon. Modules can communicate with each other and the blackboard which keeps a record of interactions over time via semantic representations in frames. Inputs to Chameleon can include synchronised spoken dialogue and images and outputs include synchronised laser pointing and spoken dialogue. An initial prototype application of Chameleon is an IntelliMedia Work-Bench where a user will be able to ask for information about things (e.g. 2D/3D models, pictures, objects, gadgets, people, or whatever) on a physical table. The current domain is a Campus Information System for 2D building plans which provides information about tenants, rooms and routes and can answer questions like Whose office is this? and Show me the route from Paul Mc Kevitt’s office to Paul Dalsgaard’s office. in real time. Chameleon and the IntelliMedia WorkBench are ideal for testing integrated signal and symbol processing of language and vision for the future of SuperinformationhighwayS.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bakman, L., M. Blidegn, T.D. Nielsen, and S. Carrasco Gonzalez (1997) NIVICO-Natural Interface for VIdeo COnferencing. Project Report (8th Semester), Department of Communication Technology, Institute for Electronic Systems, Aalborg University, Denmark.Google Scholar
  2. Bech, A. (1991) Description of the EUROTRA framework. In The Eurotra Formal Specifications, Studies in Machine Translation and Natural Language Processing, C. Copeland, J. Durand, S. Krauwer, and B. Maegaard (Eds), Vol. 2, 7–40. Luxembourg: Office for Official Publications of the Commission of the European Community.Google Scholar
  3. Br’lndsted, T. (1998) nlparser. http://www.kom.auc.dk/tb/nlparser
  4. Brøndsted, T., P. Dalsgaard, L.B. Larsen, M. Manthey, P. Mc Kevitt, T.B. Moeslund, and K.G. Olesen (1998) A platform for developing Intelligent MultiMedia applications. Technical Report R-98-1004, Center for PersonKommunikation (CPK), Institute for Electronic Systems (IES), Aalborg University, Denmark, May.Google Scholar
  5. Christensen, H., B. Lindberg, and P. Steingrimsson (1998) Functional specification of the CPK Spoken LANGuage recognition research system (SLANG). Center for PersonKommunikation, Aalborg University, Denmark, March.Google Scholar
  6. CPK Annual Report (1998) CPK Annual Report. Center for PersonKommunikation (CPK), Fredrik Bajers Vej 7-A2, Institute for Electronic Systems (IES), Aalborg University, DK-9220, Aalborg, Denmark.Google Scholar
  7. Denis, M. and M. Carfantan (Eds.) (1993) Images et langages: multimodalité et modelisation cognitive. Actes du Colloque Interdisciplinaire du Comité National de la Recherche Scientifique, Salle des Conférences, Siége du CNRS, Paris, April.Google Scholar
  8. Fink, G.A., N. Jungclaus, H. Ritter, and G. Sagerer (1995) A communication framework for heterogeneous distributed pattern analysis. In Proc. International Conference on Algorithms and Applications for Parallel Processing, V. L. Narasimhan (Ed.), 881–890. IEEE, Brisbane, Australia.CrossRefGoogle Scholar
  9. Fink, G.A., N. Jungclaus, and F. Kummert, H. Ritter, and G. Sagerer (1996) A distributed system for integrated speech and image understanding. In Proceedings of the International Symposium on Artificial Intelligence, Rogelio Soto (Ed.), 117–126. Cancun, Mexico.Google Scholar
  10. Infovox (1994) INFOVOX: Text-to-speech converter user’s manual (version 3.4). Solna, Sweden: Telia Promotor Infovox ABGoogle Scholar
  11. Jensen, F.V. (1996) An introduction to Bayesian Networks. London, England: UCL Press.Google Scholar
  12. Jensen, F. (1996) Bayesian belief network technology and the HUGIN system. In Proceedings of UNICOM seminar on Intelligent Data Management, Alex Gammerman (Ed.), 240–248. Chelsea Village, London, England, April.Google Scholar
  13. Kosslyn, S.M. and J.R. Pomerantz (1977) Imagery, propositions and the form of internal representations. In Cognitive Psychology, 9, 52–76.CrossRefGoogle Scholar
  14. Leth-Espensen, P. and B. Lindberg (1996) Separation of speech signals using eigen-filtering in a dual beamforming system. In Proc. IEEE Nordic Signal Processing Symposium (NORSIG), Espoo, Finland, September, 235–238.Google Scholar
  15. Manthey, M.J. (1998) The Phase Web Paradigm. In International Journal of General Systems, special issue on General Physical Systems Theories, K. Bowden (Ed.). in press.Google Scholar
  16. Mc Kevitt, P. (1994) Visions for language. In Proceedings of the Workshop on Integration of Natural Language and Vision processing, Twelfth American National Conference on Artificial Intelligence (AAAI-94), Seattle, Washington, USA, August, 47–57.Google Scholar
  17. Mc Kevitt, P. (Ed.) (1995/1996) Integration of Natural Language and Vision Processing (Vols. I-IV). Dordrecht, The Netherlands: Kluwer-Academic Publishers.Google Scholar
  18. Mc Kevitt, P. (1997) Superinformationhighway S. In “Sprog og Multimedier” (Speech and Multimedia), Tom Brøndsted and Inger Lytje (Eds.), 166–183, April 1997. Aalborg, Denmark: Aalborg Universitetsforlag (Aalborg University Press).Google Scholar
  19. Mc Kevitt, P. and P. Dalsgaard (1997) A frame semantics for an IntelliMedia Tour-Guide. In Proceedings of the Eighth Ireland Conference on Artificial Intelligence (AI-97), Volume 1, 104–111. University of Uster, Magee College, Derry, Northern Ireland, September.Google Scholar
  20. Minsky, M. (1975) A framework for representing knowledge. In The Psychology of Computer Vision, P.H. Winston (Ed.), 211–217. New York: McGraw-Hill.Google Scholar
  21. Nielsen, C., J. Jensen, O. Andersen, and E. Hansen (1997) Speech synthesis based on diphone concatenation. Technical Report, No. CPK971120-JJe (in confidence), Center for PersonKommunikation, Aalborg University, Denmark.Google Scholar
  22. Okada, N. (1997) Integrating vision, motion and language through mind. In Proceedings of the Eighth Ireland Conference on Artificial Intelligence (AI-97), Volume 1, 7–16. University of Uster, Magee, Derry, Northern Ireland, September.Google Scholar
  23. Pentland, A. (Ed.) (1993) Looking at people: recognition and interpretation of human action. IJCAI-93 Workshop (W28) at The 13th International Conference on Artificial Intelligence (IJCAI-93), Chambéry, France, August.Google Scholar
  24. Power, K., C. Matheson, D. Ollason, and R. Morton (1997) The grapHvite book (version 1.0). Cambridge, England: Entropic Cambridge Research Laboratory Ltd.Google Scholar
  25. Pylyshyn, Z. (1973) What the mind’s eye tells the mind’s brain: a critique of mental imagery. In Psychological Bulletin, 80, 1–24.CrossRefGoogle Scholar
  26. Rickheit, G. and I. Wachsmuth (1996) Collaborative Research Centre “Situated Artificial Communicators” at the University of Bielefeld, Germany. In Integration of Natural Language and Vision Processing, Volume IV, Recent Advances, Mc Kevitt, Paul (ed.), 11–16. Dordrecht, The Netherlands: Kluwer Academic Publishers.Google Scholar
  27. Thórisson, K.R. (1997) Layered action control in communicative humanoids. In Proceedings of Computer Graphics Europe’ 97, June 5–7, Geneva, Switzerland.Google Scholar
  28. Waibel, A., M.T. Vo, P. Duchnowski, and S. Manke (1996) Multimodal interfaces. In Integration of Natural Language and Vision Processing, Volume IV, Recent Advances, Mc Kevitt, Paul (Ed.), 145–165. Dordrecht, The Netherlands: Kluwer Academic Publishers.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Tom Brøndsted
    • 1
  • Paul Dalsgaard
    • 1
  • Lars Bo Larsen
    • 1
  • Michael Manthey
    • 1
  • Paul Mc Kevitt
    • 1
  • Thomas B. Moeslund
    • 1
  • Kristian G. Olesen
    • 1
  1. 1.Institute for Electronic Systems (IES)Aalborg UniversityAalborgDenmark

Personalised recommendations