Skip to main content

Developing Intelligent Multimedia Applications

  • Chapter
Multimodality in Language and Speech Systems

Abstract

Intelligent multimedia (IntelliMedia), which involves the computer processing and understanding of perceptual input from at least speech, text and visual images, and then reacting to it, is complex and involves signal and symbol processing techniques from not just engineering and computer science but also artificial intelligence and cognitive science (Mc Kevitt, 1994, 1995/96, 1997). With IntelliMedia systems, people can interact in spoken dialogues with machines, querying about what is being presented and even their gestures and body language can be interpreted.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Andersen, Ove, C. Hoequist, C. Nielsen. “Danish Research Ministry’s Initiative on Text-to-Speech Synthesis”. In: Proceedings of Nordic Signal Processing Symposium, Kolmârden, Sweden, 2000.

    Google Scholar 

  • André, Elisabeth, G. Herzog, T. Rist. “On the simultaneous interpretation of real-world image sequences and their natural language description: the system SOCCER”. In: Proceedings of the 8`h European Conference on Artificial Intelligence 449–454, Munich, Germany, 1988.

    Google Scholar 

  • André, Elisabeth, Thomas Rist. “The design of illustrated documents as a planning task”. In: Intelligent multimedia interfaces. M. Maybury (Ed.), 75–93 Menlo Park, CA: AAAI Press, 1993.

    Google Scholar 

  • Batman, Lau, Mads Blidegn, Thomas Dorf Nielsen, Susana Carrasco Gonzalez. NIVICO - Natural Interface for Video COnferencing. Project Report (8th Semester), Department of Communication Technology, Institute for Electronic Systems, Aalborg University, Denmark, 1997.

    Google Scholar 

  • Bech, A. “Description of the EUROTRA framework”. In: The Eurotra Formal Specifications, Studies in Machine Translation and Natural Language Processing. C. Copeland, J. Durand, S. Krauwer, B. Maegaard (Eds), Vol. 2, 7–40 Luxembourg: Office for Official Publications of the Conunission of the European Community, 1991.

    Google Scholar 

  • Brondsted, Tom. “The CPK NLP Suite for Spoken Language Understanding.” In: Eurospeech, 6th European Conference on Speech Communication and Technology, Budapest, 1999a.

    Google Scholar 

  • Brondsted, Tom. “The Natural Language Processing Modules in REWARD and IntelliMedia 2000+”. In: LAMBDA 25, S. Kirchmeier-Andersen, H. Erdman Thomsen (Eds.). Copenhagen Business School, Dep. of Computational Linguistics, 1999b.

    Google Scholar 

  • Brgndsted, Tom. “Reference Problems in Chameleon”. In: ESCA Tutorial and Research Workshop: Interactive Dialogue in Multi-Modal Systems. Kloster Irsee, 1999c.

    Google Scholar 

  • Brondsted, Tom, P. Dalsgaard, L.B. Larsen, M. Manthey, P. Mc Kevitt, T.B. Moeslund, K.G. Olesen. A platform for developing Intelligent MultiMedia applications. Technical Report R-98–1004, Center for PersonKommunikation (CPK), Institute for Electronic Systems (IES), Aalborg University, Denmark, May, 1998.

    Google Scholar 

  • Carenini, G., F. Pianesi, M. Ponzi, O. Stock. Natural language generation and hypertext access. IRST Technical Report 9201–06, Instituto Per La Scientifica E Tecnologica, Loc. Pant e Di Povo, I-138100 Trento, Italy, 1992.

    Google Scholar 

  • Christensen, Heidi, Borge Lindberg, Pall Steingrimsson. Functional specification of the CPK Spoken LANGuage recognition research system (SLANG). Center for PersonKommunikation, Aalborg University, Denmark, March, 1998.

    Google Scholar 

  • Denis, M., M. Carfantan (Eds.). Images et langages: multimodalite et modelisation cognitive. Actes du Colloque Interdisciplinaire du Comite National de la Recherche Scientifique, Salle des Conferences, Siege du CNRS, Paris, April, 1993.

    Google Scholar 

  • Dennett, Daniel. Consciousness explained. Harmondsworth: Penguin, 1991.

    Google Scholar 

  • Fink, G.A., N. Jungclaus, H. Ritter, G. Sagerer. “A communication framework for heterogeneous distributed pattern analysis”. In: Proc. International Conference on Algorithms and Applications for Parallel Processing. V. L. Narasimhan (Ed.), 881–890 IEEE, Brisbane, Australia, 1995.

    Google Scholar 

  • Fink, Gernot A., Nils Jungclaus, Franz Kummert, Helge Ritter, Gerhard Sagerer. “A distributed system for integrated speech and image understanding”. In: Proceedings of the International Symposium on Artificial Intelligence. Rogelio Soto (Ed.), 117–126 Cancun, Mexico, 1996.

    Google Scholar 

  • Herzog, G., C.-K. Sung, E. André, W. Enkelmann, H.-H. Nagel, T. Rist, W. Wahlster. “Incremental natural language description of dynamic imagery”. In: Wissenbasierte Systeme. 3. Internationaler GI-Kongress, C. Freksa, W. Brauer (Eds.), 153–162 Berlin: Springer-Verlag, 1989.

    Google Scholar 

  • Herzog, G., G. Retz-Schmidt. “Das System SOCCER: Simultane Interpretation und natürlich-sprachliche Beschreibung zeitveranderlicher Szenen”. In: Sport und Informatik, J. Perl (Ed.), 95–119 Schorndorf: Hofmann, 1990.

    Google Scholar 

  • Infovox. INFOVOX: Text-to-speech converter user’s manual (version 3.4). Solna, Sweden: Telia Promotor Infovox AB, 1994.

    Google Scholar 

  • Jensen, Finn V. An introduction to Bayesian Networks London, England: UCL Press, 1996.

    Google Scholar 

  • Jensen, Frank. `Bayesian belief network technology and the HUGIN system“. In: Proceedings of UNICOM seminar on Intelligent Data Management. Alex Gammerman (Ed.), 240–248 Chelsea Village, London, England, April, 1996.

    Google Scholar 

  • Kosslyn, S.M., J.R. Pomerantz. Imagery, propositions and the form of internal representations. In Cognitive Psychology, 9, 52–76, 1977.

    Google Scholar 

  • Leth-Espensen, P., B. Lindberg. “Separation of speech signals using eigenfiltering in a dual beamforming system”. In: Proc. IEEE Nordic Signal Processing Symposium (NORSIG). Espoo, Finland, September, 235–238, 1996.

    Google Scholar 

  • Lindberg, Bq rge. “The Danish SpeechDat(II) Corpus - a Spoken Language Resource”. In: Datalingvistisk Forenings.$rsmode 1999 i Kobenhavn.. Proceedings. CST Working Papers. Report No. 3, B. Maegaard, C. Povlsen, J. Wedekind (Eds), 1999.

    Google Scholar 

  • Maaß, Wolfgang, Peter Wizinski, Gerd Herzog. VITRA GUIDE: Multimodal route descriptions for computer assisted vehicle navigation. Bereich Nr. 93, Universitat des Saarlandes, FB 14 Informatik IV, Im Stadtwald 15, D-6600, Saarbrucken 11, Germany, February, 1993.

    Google Scholar 

  • Manthey, Michael J. “The Phase Web Paradigm”. In: International Journal of General Systems, special issue on General Physical Systems Theories. K. Bowden (Ed.), 1998.

    Google Scholar 

  • Maybury, Mark. “Planning multimedia explanations using communicative acts”. In: Proceedings of the Ninth American National Conference on Artificial Intelligence (MAI-91), Anaheim, CA, July 14–19, 1991.

    Google Scholar 

  • Maybury, Mark (Ed.). Intelligent multimedia interfaces. Menlo Park, CA: AAAI Press, 1993.

    Google Scholar 

  • Maybury, Mark, Wolfgang Wahlster (Eds.). Readings in intelligent user interfaces. Los Altos, CA: Morgan Kaufmann Publishers, 1998.

    Google Scholar 

  • Mc Kevitt, Paul. “Visions for language”. In: Proceedings of the Workshop on Integration of Natural Language and Vision processing. Twelfth American National Conference on Artificial Intelligence (AAAI-94), Seattle, Washington, USA, August, 47–57, 1994.

    Google Scholar 

  • Mc Kevitt, Paul (Ed.). Integration of Natural Language and Vision Processing“(Vols. I-IV). Dordrecht, The Netherlands: Kluwer-Academic Publishers, 1995/1996.

    Google Scholar 

  • Mc Kevitt, Paul. “SuperinformationhighwayS”. In: Sprog og Multimedier. Tom Brgndsted, Inger Lytje (Eds.), 166–183, Aalborg, Denmark: Aalborg University Press, April, 1997.

    Google Scholar 

  • Mc Kevitt, Paul, Paul Dalsgaard. “A frame semantics for an IntelliMedia TourGuide”. In: Proceedings of the Eighth Ireland Conference on Artificial Intelligence (AI-97), Volume 1104–111, University of Uster, Magee College, Deny, Northern Ireland, September, 1997.

    Google Scholar 

  • Minsky, Marvin. “A framework for representing knowledge”. In: The Psychology of Computer Vision. P.H. Winston (Ed.), 211–217 New York: McGraw-Hill, 1975.

    Google Scholar 

  • Neumann, B., H.-J. Novak. “NAOS: Ein System zur natürlichsprachlichen Beschreibung zeitveränderlicher Szenen”. In: Informatik. Forschung and Entwicklung, 1(1): 83–92, 1986.

    Google Scholar 

  • Okada, Naoyuki. “Integrating vision, motion and language through mind”. In: Integration of Natural Language and Vision Processing, Volume N, Recent Advances. Mc Kevitt, Paul (Ed.), 55–80 Dordrecht, The Netherlands: Kluwer Academic Publishers, 1996.

    Google Scholar 

  • Okada, Naoyuki. “Integrating vision, motion and language through mind”. In: Proceedings of the Eighth Ireland Conference on Artificial Intelligence (AI-97), Volume 1, 7–16 University of Uster, Magee, Deny, Northern Ireland, September, 1997.

    Google Scholar 

  • Olsen, Jesper. The SLANG Platform: Design and Philosophy, v. 1. Technical Report, Center for Person-Kommunikation, Aalborg University, September, 2000.

    Google Scholar 

  • Partridge, Derek. A new guide to Artificial Intelligence Norwood, New Jersey: Ablex Publishing Corporation, 1991.

    Google Scholar 

  • Pentland, Alex (Ed.). Looking at people: recognition and interpretation of human action. IJCAI-93 Workshop (W28) at The 13th International Conference on Artificial Intelligence (IJCAI-93), Chambery, France, August, 1993.

    Google Scholar 

  • Power, Kevin, Caroline Matheson, Dave 011ason, Rachel Morton. The grapHvite book (version 1.0), Cambridge, England: Entropie Cambridge Research Laboratory Ltd., 1997.

    Google Scholar 

  • Pylyshyn, Zenon. “What the mind’s eye tells the mind’s brain: a critique of mental imagery”. In: Psychological Bulletin, 80, 1–24, 1973.

    Google Scholar 

  • Rich, Elaine, Kevin Knight. Artificial Intelligence. New York: McGraw-Hill, 1991.

    Google Scholar 

  • Rickheit, Gert, Ipke Wachsmuth. “Collaborative Research Centre `Situated Artificial Communicators’ at the University of Bielefeld, Germany”. In: Integration of Natural Language and Vision Processing, Volume IV, Recent Advances. Mc Kevitt, Paul (Ed.), 11–16, Dordrecht, The Netherlands: Kluwer Academic Publishers, 1996.

    Google Scholar 

  • Retz-Schmidt, Gudala. “Recognizing intentions, interactions, and causes of plan failures”. In: User Modelling and User-Adapted Interaction 1: 173–202, 1991.

    Google Scholar 

  • Retz-Schmidt, Gudala, Markus Tetzlaff. Methods for the intentional description of image sequences. Bereich Nr. 80, Universitat des Saarlandes, FB 14 Informatik IV, Im Stadtwald 15, D-6600, Saarbrucken 11, Germany, August, 1991.

    Google Scholar 

  • Stock, Oliviero. “Natural language and exploration of an information space: the ALFresco Interactive system”. In: Proceedings of the 12th International Joint Conference on Artificial Intelligence (IJCAI91) 972–978, Darling Harbour, Sydney, Australia, August, 1991.

    Google Scholar 

  • Thdrinsson, Kris R. Communicative humanoids: a computational model of psychosocial dialogue skills. Ph.D. thesis, Massachusetts Institute of Technology, 1996.

    Google Scholar 

  • Thσrisson, Kris R. “Layered action control in communicative humanoids”. In: Proceedings of Computer Graphics Europe ‘87 June 5–7, Geneva, Switzerland, 1997.

    Google Scholar 

  • ThOrisson, Kris R. This book, 2001.

    Google Scholar 

  • Wahlster, Wolfgang. One word says more than a thousand pictures: On the automatic verbalization of the results of image sequence analysis. Bereich Nr. 25, Universitat des Saarlandes, FB 14 Informatik IV, Im Stadtwald 15, D-6600, Saarbrucken 11, Germany, February, 1988.

    Google Scholar 

  • Wahlster, Wolfgang, Elisabeth André, Wolfgang Finkler, Hans-Jurgen Profitlich, Thomas Rist. “Plan-based integration of natural language and graphics generation”. In: Artificial Intelligence, Special issue on natural language generation, 63, 387–427, 1993.

    Google Scholar 

  • Wahlster, Wolfgang, Norbert Reithinger, Anselm Blocher. “SmartKom: Multimodal Communication with a Life-Like Character”. In: Eurospeech, 7th European Conference on Speech Communication and Technology, Aalborg, 2001.

    Google Scholar 

  • Waibel, Alex, Minh Tue Vo, Paul Duchnowski, Stefan Manke. “Multimodal interfaces. In: Integration of Natural Language and Vision Processing, Volume IV, Recent Advances, Mc Kevitt, Paul (Ed.), 145–165, Dordrecht, The Netherlands: Kluwer Academic Publishers, 1996.

    Google Scholar 

  • Waltz, David. “Understanding line drawings of scenes with shadows”. In: The psychology of computer vision, Winston, P.H. (Ed.), 19–91 New York: McGraw-Hill, 1975.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Brøndsted, T., Larsen, L.B., Manthey, M., Kevitt, P.M., Moeslund, T.B., Olesen, K.G. (2002). Developing Intelligent Multimedia Applications. In: Granström, B., House, D., Karlsson, I. (eds) Multimodality in Language and Speech Systems. Text, Speech and Language Technology, vol 19. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-2367-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-94-017-2367-1_7

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-90-481-6024-2

  • Online ISBN: 978-94-017-2367-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics