Developing Intelligent Multimedia Applications

Brøndsted, Tom; Larsen, Lars Bo; Manthey, Michael; Kevitt, Paul Mc; Moeslund, Thomas B.; Olesen, Kristian G.

doi:10.1007/978-94-017-2367-1_7

Tom Brøndsted⁴,
Lars Bo Larsen⁴,
Michael Manthey⁴,
Paul Mc Kevitt^4,5,
Thomas B. Moeslund⁴ &
…
Kristian G. Olesen⁴

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 19))

201 Accesses
1 Citations

Abstract

Intelligent multimedia (IntelliMedia), which involves the computer processing and understanding of perceptual input from at least speech, text and visual images, and then reacting to it, is complex and involves signal and symbol processing techniques from not just engineering and computer science but also artificial intelligence and cognitive science (Mc Kevitt, 1994, 1995/96, 1997). With IntelliMedia systems, people can interact in spoken dialogues with machines, querying about what is being presented and even their gestures and body language can be interpreted.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Andersen, Ove, C. Hoequist, C. Nielsen. “Danish Research Ministry’s Initiative on Text-to-Speech Synthesis”. In: Proceedings of Nordic Signal Processing Symposium, Kolmârden, Sweden, 2000.
Google Scholar
André, Elisabeth, G. Herzog, T. Rist. “On the simultaneous interpretation of real-world image sequences and their natural language description: the system SOCCER”. In: Proceedings of the 8`h European Conference on Artificial Intelligence 449–454, Munich, Germany, 1988.
Google Scholar
André, Elisabeth, Thomas Rist. “The design of illustrated documents as a planning task”. In: Intelligent multimedia interfaces. M. Maybury (Ed.), 75–93 Menlo Park, CA: AAAI Press, 1993.
Google Scholar
Batman, Lau, Mads Blidegn, Thomas Dorf Nielsen, Susana Carrasco Gonzalez. NIVICO - Natural Interface for Video COnferencing. Project Report (8th Semester), Department of Communication Technology, Institute for Electronic Systems, Aalborg University, Denmark, 1997.
Google Scholar
Bech, A. “Description of the EUROTRA framework”. In: The Eurotra Formal Specifications, Studies in Machine Translation and Natural Language Processing. C. Copeland, J. Durand, S. Krauwer, B. Maegaard (Eds), Vol. 2, 7–40 Luxembourg: Office for Official Publications of the Conunission of the European Community, 1991.
Google Scholar
Brondsted, Tom. “The CPK NLP Suite for Spoken Language Understanding.” In: Eurospeech, 6th European Conference on Speech Communication and Technology, Budapest, 1999a.
Google Scholar
Brondsted, Tom. “The Natural Language Processing Modules in REWARD and IntelliMedia 2000+”. In: LAMBDA 25, S. Kirchmeier-Andersen, H. Erdman Thomsen (Eds.). Copenhagen Business School, Dep. of Computational Linguistics, 1999b.
Google Scholar
Brgndsted, Tom. “Reference Problems in Chameleon”. In: ESCA Tutorial and Research Workshop: Interactive Dialogue in Multi-Modal Systems. Kloster Irsee, 1999c.
Google Scholar
Brondsted, Tom, P. Dalsgaard, L.B. Larsen, M. Manthey, P. Mc Kevitt, T.B. Moeslund, K.G. Olesen. A platform for developing Intelligent MultiMedia applications. Technical Report R-98–1004, Center for PersonKommunikation (CPK), Institute for Electronic Systems (IES), Aalborg University, Denmark, May, 1998.
Google Scholar
Carenini, G., F. Pianesi, M. Ponzi, O. Stock. Natural language generation and hypertext access. IRST Technical Report 9201–06, Instituto Per La Scientifica E Tecnologica, Loc. Pant e Di Povo, I-138100 Trento, Italy, 1992.
Google Scholar
Christensen, Heidi, Borge Lindberg, Pall Steingrimsson. Functional specification of the CPK Spoken LANGuage recognition research system (SLANG). Center for PersonKommunikation, Aalborg University, Denmark, March, 1998.
Google Scholar
Denis, M., M. Carfantan (Eds.). Images et langages: multimodalite et modelisation cognitive. Actes du Colloque Interdisciplinaire du Comite National de la Recherche Scientifique, Salle des Conferences, Siege du CNRS, Paris, April, 1993.
Google Scholar
Dennett, Daniel. Consciousness explained. Harmondsworth: Penguin, 1991.
Google Scholar
Fink, G.A., N. Jungclaus, H. Ritter, G. Sagerer. “A communication framework for heterogeneous distributed pattern analysis”. In: Proc. International Conference on Algorithms and Applications for Parallel Processing. V. L. Narasimhan (Ed.), 881–890 IEEE, Brisbane, Australia, 1995.
Google Scholar
Fink, Gernot A., Nils Jungclaus, Franz Kummert, Helge Ritter, Gerhard Sagerer. “A distributed system for integrated speech and image understanding”. In: Proceedings of the International Symposium on Artificial Intelligence. Rogelio Soto (Ed.), 117–126 Cancun, Mexico, 1996.
Google Scholar
Herzog, G., C.-K. Sung, E. André, W. Enkelmann, H.-H. Nagel, T. Rist, W. Wahlster. “Incremental natural language description of dynamic imagery”. In: Wissenbasierte Systeme. 3. Internationaler GI-Kongress, C. Freksa, W. Brauer (Eds.), 153–162 Berlin: Springer-Verlag, 1989.
Google Scholar
Herzog, G., G. Retz-Schmidt. “Das System SOCCER: Simultane Interpretation und natürlich-sprachliche Beschreibung zeitveranderlicher Szenen”. In: Sport und Informatik, J. Perl (Ed.), 95–119 Schorndorf: Hofmann, 1990.
Google Scholar
Infovox. INFOVOX: Text-to-speech converter user’s manual (version 3.4). Solna, Sweden: Telia Promotor Infovox AB, 1994.
Google Scholar
Jensen, Finn V. An introduction to Bayesian Networks London, England: UCL Press, 1996.
Google Scholar
Jensen, Frank. `Bayesian belief network technology and the HUGIN system“. In: Proceedings of UNICOM seminar on Intelligent Data Management. Alex Gammerman (Ed.), 240–248 Chelsea Village, London, England, April, 1996.
Google Scholar
Kosslyn, S.M., J.R. Pomerantz. Imagery, propositions and the form of internal representations. In Cognitive Psychology, 9, 52–76, 1977.
Google Scholar
Leth-Espensen, P., B. Lindberg. “Separation of speech signals using eigenfiltering in a dual beamforming system”. In: Proc. IEEE Nordic Signal Processing Symposium (NORSIG). Espoo, Finland, September, 235–238, 1996.
Google Scholar
Lindberg, Bq rge. “The Danish SpeechDat(II) Corpus - a Spoken Language Resource”. In: Datalingvistisk Forenings.$rsmode 1999 i Kobenhavn.. Proceedings. CST Working Papers. Report No. 3, B. Maegaard, C. Povlsen, J. Wedekind (Eds), 1999.
Google Scholar
Maaß, Wolfgang, Peter Wizinski, Gerd Herzog. VITRA GUIDE: Multimodal route descriptions for computer assisted vehicle navigation. Bereich Nr. 93, Universitat des Saarlandes, FB 14 Informatik IV, Im Stadtwald 15, D-6600, Saarbrucken 11, Germany, February, 1993.
Google Scholar
Manthey, Michael J. “The Phase Web Paradigm”. In: International Journal of General Systems, special issue on General Physical Systems Theories. K. Bowden (Ed.), 1998.
Google Scholar
Maybury, Mark. “Planning multimedia explanations using communicative acts”. In: Proceedings of the Ninth American National Conference on Artificial Intelligence (MAI-91), Anaheim, CA, July 14–19, 1991.
Google Scholar
Maybury, Mark (Ed.). Intelligent multimedia interfaces. Menlo Park, CA: AAAI Press, 1993.
Google Scholar
Maybury, Mark, Wolfgang Wahlster (Eds.). Readings in intelligent user interfaces. Los Altos, CA: Morgan Kaufmann Publishers, 1998.
Google Scholar
Mc Kevitt, Paul. “Visions for language”. In: Proceedings of the Workshop on Integration of Natural Language and Vision processing. Twelfth American National Conference on Artificial Intelligence (AAAI-94), Seattle, Washington, USA, August, 47–57, 1994.
Google Scholar
Mc Kevitt, Paul (Ed.). Integration of Natural Language and Vision Processing“(Vols. I-IV). Dordrecht, The Netherlands: Kluwer-Academic Publishers, 1995/1996.
Google Scholar
Mc Kevitt, Paul. “SuperinformationhighwayS”. In: Sprog og Multimedier. Tom Brgndsted, Inger Lytje (Eds.), 166–183, Aalborg, Denmark: Aalborg University Press, April, 1997.
Google Scholar
Mc Kevitt, Paul, Paul Dalsgaard. “A frame semantics for an IntelliMedia TourGuide”. In: Proceedings of the Eighth Ireland Conference on Artificial Intelligence (AI-97), Volume 1104–111, University of Uster, Magee College, Deny, Northern Ireland, September, 1997.
Google Scholar
Minsky, Marvin. “A framework for representing knowledge”. In: The Psychology of Computer Vision. P.H. Winston (Ed.), 211–217 New York: McGraw-Hill, 1975.
Google Scholar
Neumann, B., H.-J. Novak. “NAOS: Ein System zur natürlichsprachlichen Beschreibung zeitveränderlicher Szenen”. In: Informatik. Forschung and Entwicklung, 1(1): 83–92, 1986.
Google Scholar
Okada, Naoyuki. “Integrating vision, motion and language through mind”. In: Integration of Natural Language and Vision Processing, Volume N, Recent Advances. Mc Kevitt, Paul (Ed.), 55–80 Dordrecht, The Netherlands: Kluwer Academic Publishers, 1996.
Google Scholar
Okada, Naoyuki. “Integrating vision, motion and language through mind”. In: Proceedings of the Eighth Ireland Conference on Artificial Intelligence (AI-97), Volume 1, 7–16 University of Uster, Magee, Deny, Northern Ireland, September, 1997.
Google Scholar
Olsen, Jesper. The SLANG Platform: Design and Philosophy, v. 1. Technical Report, Center for Person-Kommunikation, Aalborg University, September, 2000.
Google Scholar
Partridge, Derek. A new guide to Artificial Intelligence Norwood, New Jersey: Ablex Publishing Corporation, 1991.
Google Scholar
Pentland, Alex (Ed.). Looking at people: recognition and interpretation of human action. IJCAI-93 Workshop (W28) at The 13th International Conference on Artificial Intelligence (IJCAI-93), Chambery, France, August, 1993.
Google Scholar
Power, Kevin, Caroline Matheson, Dave 011ason, Rachel Morton. The grapHvite book (version 1.0), Cambridge, England: Entropie Cambridge Research Laboratory Ltd., 1997.
Google Scholar
Pylyshyn, Zenon. “What the mind’s eye tells the mind’s brain: a critique of mental imagery”. In: Psychological Bulletin, 80, 1–24, 1973.
Google Scholar
Rich, Elaine, Kevin Knight. Artificial Intelligence. New York: McGraw-Hill, 1991.
Google Scholar
Rickheit, Gert, Ipke Wachsmuth. “Collaborative Research Centre `Situated Artificial Communicators’ at the University of Bielefeld, Germany”. In: Integration of Natural Language and Vision Processing, Volume IV, Recent Advances. Mc Kevitt, Paul (Ed.), 11–16, Dordrecht, The Netherlands: Kluwer Academic Publishers, 1996.
Google Scholar
Retz-Schmidt, Gudala. “Recognizing intentions, interactions, and causes of plan failures”. In: User Modelling and User-Adapted Interaction 1: 173–202, 1991.
Google Scholar
Retz-Schmidt, Gudala, Markus Tetzlaff. Methods for the intentional description of image sequences. Bereich Nr. 80, Universitat des Saarlandes, FB 14 Informatik IV, Im Stadtwald 15, D-6600, Saarbrucken 11, Germany, August, 1991.
Google Scholar
Stock, Oliviero. “Natural language and exploration of an information space: the ALFresco Interactive system”. In: Proceedings of the 12th International Joint Conference on Artificial Intelligence (IJCAI91) 972–978, Darling Harbour, Sydney, Australia, August, 1991.
Google Scholar
Thdrinsson, Kris R. Communicative humanoids: a computational model of psychosocial dialogue skills. Ph.D. thesis, Massachusetts Institute of Technology, 1996.
Google Scholar
Thσrisson, Kris R. “Layered action control in communicative humanoids”. In: Proceedings of Computer Graphics Europe ‘87 June 5–7, Geneva, Switzerland, 1997.
Google Scholar
ThOrisson, Kris R. This book, 2001.
Google Scholar
Wahlster, Wolfgang. One word says more than a thousand pictures: On the automatic verbalization of the results of image sequence analysis. Bereich Nr. 25, Universitat des Saarlandes, FB 14 Informatik IV, Im Stadtwald 15, D-6600, Saarbrucken 11, Germany, February, 1988.
Google Scholar
Wahlster, Wolfgang, Elisabeth André, Wolfgang Finkler, Hans-Jurgen Profitlich, Thomas Rist. “Plan-based integration of natural language and graphics generation”. In: Artificial Intelligence, Special issue on natural language generation, 63, 387–427, 1993.
Google Scholar
Wahlster, Wolfgang, Norbert Reithinger, Anselm Blocher. “SmartKom: Multimodal Communication with a Life-Like Character”. In: Eurospeech, 7th European Conference on Speech Communication and Technology, Aalborg, 2001.
Google Scholar
Waibel, Alex, Minh Tue Vo, Paul Duchnowski, Stefan Manke. “Multimodal interfaces. In: Integration of Natural Language and Vision Processing, Volume IV, Recent Advances, Mc Kevitt, Paul (Ed.), 145–165, Dordrecht, The Netherlands: Kluwer Academic Publishers, 1996.
Google Scholar
Waltz, David. “Understanding line drawings of scenes with shadows”. In: The psychology of computer vision, Winston, P.H. (Ed.), 19–91 New York: McGraw-Hill, 1975.
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Electronic systems (IES), Aalborg Universtiy, Fredrik Bajers Vej 7, DK-9220, Aalborg, Denmark
Tom Brøndsted, Lars Bo Larsen, Michael Manthey, Paul Mc Kevitt (Chair in Intelligent MultiMedia), Thomas B. Moeslund & Kristian G. Olesen
The University of Ulster (Magee), Derry, Northern Ireland
Paul Mc Kevitt (Chair in Intelligent MultiMedia)

Authors

Tom Brøndsted
View author publications
You can also search for this author in PubMed Google Scholar
Lars Bo Larsen
View author publications
You can also search for this author in PubMed Google Scholar
Michael Manthey
View author publications
You can also search for this author in PubMed Google Scholar
Paul Mc Kevitt
View author publications
You can also search for this author in PubMed Google Scholar
Thomas B. Moeslund
View author publications
You can also search for this author in PubMed Google Scholar
Kristian G. Olesen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Speech, Music and Hearing, KTH Royal Institute of Technology, Stockholm, Sweden
Björn Granström , David House & Inger Karlsson , &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Brøndsted, T., Larsen, L.B., Manthey, M., Kevitt, P.M., Moeslund, T.B., Olesen, K.G. (2002). Developing Intelligent Multimedia Applications. In: Granström, B., House, D., Karlsson, I. (eds) Multimodality in Language and Speech Systems. Text, Speech and Language Technology, vol 19. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-2367-1_7

Download citation

DOI: https://doi.org/10.1007/978-94-017-2367-1_7
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-6024-2
Online ISBN: 978-94-017-2367-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics