Skip to main content
Log in

ΦDmDialog: A speech-to-speech dialogue translation system

  • Published:
Machine Translation

Abstract

Development of interpreting telephony or a speech-to-speech translation system is one of the ultimate goals of speech recognition, natural language processing, artificial intelligence and machine translation. This paper describes ΦDmDialog, a speech-to-speech dialogue translation system. ΦDmDialog is one of the first experimental systems to perform speech-to-speech translation and the first to demonstrate the possibility of simultaneous interpretation. The hybrid parallel system integrates parallel marker-passing and connectionist networks. Other characteristics of the system include a simultaneous interpretation capability, mixed-initiative discourse understanding, cost-based ambiguity resolution and an integration of case-based and constraint-based processing. ΦDmDialog is implemented and has been publicly demonstrated since March 1989. The current implementation translates Japanese into English and operates on the ATR's conference registration domain. Massively parallel implementations have been carried out on various machines and attained high performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Crain, S., and M. Steedman. 1985. On Not Being Led Up The Garden Path: The Use of Context by the Psychological Syntax Processor. In D. Dowty, L. Karttunen and A. Zwicky, (eds.), Natural Language Parsing, Cambridge: Cambridge University Press, 320–358.

    Google Scholar 

  • De Smedt, K. 1990. Incremental Sentence Generation. NICI Technical Report 90-01, Nijmegen Institute for Cognition Research and Information Technology.

  • Ford, M., J. Bresnan and R. Kaplan. 1981. A Competence-Based Theory of Syntactic Closure. In J. Bresnan (ed.), The Mental Representation of Grammatical Relations, Cambridge, MA: MIT Press, 727–796.

    Google Scholar 

  • Goodman, K., and S. Nirenburg, Forthcoming. KBMT-89: A Case Study in Knowledge-Based Machine Translation. San Mateo, CA: Morgan Kaufmann.

  • Grosz, B., and C. Sidner. 1985. The Structure of Discourse Structure. CSLI Report 85-39.

  • Higuchi, T., T. Furuya, H. Kusumoto, K. Handa and A. Kokubu. 1990. IXM2: A Parallel Associate Processor for Semantic Net Processing. In Proceedings of the International Conference on Tools for Artificial Intelligence.

  • Hovy, E.H. 1988 Generating Natural Language Under Pragmatic Constraints. Hillsdale, N.J.: Erlbaum.

    Google Scholar 

  • Kaplan, R., and J. Bresnan 1982. Lexical-Functional Grammar: A Formal System for Grammatical Representation. In J. Bresnan (ed.), The Mental Representation of Grammatical Relations, Cambridge, MA: MIT Press, 173–281.

    Google Scholar 

  • Kempen, G., and E. Hoenkamp. 1987. An Incremental Procedural Grammar for Sentence Formulation. Cognitive Science 11: 201–258.

    Google Scholar 

  • Kita, K., T. Kawabata and H. Saito. 1989. HMM Continuous Speech Recognition using Predictive LR Parsing. In Proceedings of Icassp-ieee, International Conference on Acoustic, Speech, and Signal Processing.

  • Kitano, H. 1989. Hybrid Parallelism: A Case of Speech-to-Speech Dialogue Translation. In Proceedings of Ijcai-89 Workshop on Parallel Algorithms for Machine Intelligence, Detroit.

  • Kitano, H. 1990a. Parallel Incremental Sentence Production for a Model of Simultaneous Interpretation. In R. Dale, C. Mellish and M. Zock, (eds.), Current Research in Natural Language Generation, London: Academic Press, 321–351.

    Google Scholar 

  • Kitano, H. 1990b. The Making of a Speech-to-Speech Dialogue Translation System: Some Findings from the ΦDmDialog Project. In Proceedings of International Conference on Spoken Language Processing, Kobe.

  • Kitano, H. 1990c. Incremental Sentence Production with a Parallel Marker-Passing Algorithm. In Proceedings of Coling-90, Helsinki, 217–222.

  • Litman, D., and J. Allen. 1987. A Plan Recognition Model for Subdialogues in Conversation. Cognitive Science 11: 163–200.

    Google Scholar 

  • Moldovan, D., W. Lee and C. Lin. 1989. Snap: A Marker-Propagation Architecture for Knowledge Processing. Department of Electrical Engineering Systems, University of Southern California, CENG 89-10.

  • Morii, S., K. Niyada, S. Fujii and M. Hoshimi. 1985. Large Vocabulary Speaker-Independent Japanese Speech Recognition System. In Proceedings of Icssp-ieee International Conference on Acoustics, Speech, and Signal Processing.

  • Morimoto, T., H. Iida, A. Kurematsu, K. Shikano and T. Aizawa. 1990. Spoken Language Translation: Toward Realizing an Automatic Telephone Interpretation System. Proceedings of InfoJapan-90, Tokyo.

  • Nirenburg, S., V. Lesser and E. Nyberg. 1989. Controlling a Language Generation Planner. In Proceedings of Ijcai-89, Detroit, 1524–1530.

  • Oviatt, S., P. Cohen and A. Podlozny. 1990. Spoken Language in Interpreted Telephone Dialogues. SRI International, Technical Note 496.

  • Pollard, C., and I. Sag. 1987. Information-based Syntax and Semantics. CSLI Lecture Notes No. 13.

  • Prather, P., and D. Swinney. 1988. Lexical Processing and Ambiguity Resolution: An Autonomous Processing in an Interactive Box. In S. Small et al. (eds.), Lexical Ambiguity Resolution, San Mateo, CA: Morgan Kaufmann, 289–310.

    Google Scholar 

  • Riesbeck, C., and C. Martin. 1985. Direct Memory Access Parsing. Yale University Report 354.

  • Riesbeck, C., and R. Schank. 1989. Inside Case-Based Reasoning. Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Saito, H., and M. Tomita. 1988. Parsing Noisy Sentences. In Proceedings of Coling-88, Budapest, 561–565.

  • Schank, R. 1982. Dynamic Memory: A Theory of Learning in Computer and People. Cambridge: Cambridge University Press.

    Google Scholar 

  • Stanfill, C., and D. Waltz. 1986. Toward Memory-Based Reasoning. Communications of the ACM 29: 1213–1228.

    Google Scholar 

  • Thinking Machines Corporation. 1989. Model CM-2 Technical Summary. Technical Report TR89-1.

  • Tomabechi, H. 1987. Direct Memory Access Translation. In Proceedings of Ijcai-87, Milan, 722–725.

  • Tomabechi, H., H. Saito and M. Tomita. 1989. SpeechTrans: An Experimental Real-Time Speech-to-Speech Translation. In Proceedings of the 1989 Spring Symposium of the American Association for Artificial Intelligence.

  • Tsujii, J. 1985. The Roles of Dictionaries in Machine Translation (in Japanese). Jouhousyori 26(10). [“Information Proceesing,” journal of the Information Processing Society of Japan.]

  • Viterbi, A.J. 1967. Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm. IEEE Transactions on Information Theory IT-13(2): 260–269.

    Google Scholar 

  • Waibel, A., T. Hanazawa, G. Hinton, K. Shikano and K. Lang. 1989. Phoneme Recognition Using Time-Delay Neural Networks. Ieee Transactions on Acoustics, Speech and Signal Processing.

  • Waltz, D.L., and J.B. Pollack. 1985. Massively Parallel Parsing: A Strongly Interactive Model of Natural Language Interpretation. Cognitive Science 9: 51–74.

    Google Scholar 

  • Webber, B. 1983. So What Can We Talk About Now? In M. Brady and P. Berwick (eds.), Computational Models of Discourse, Cambridge, MA: MIT Press, 331–371.

    Google Scholar 

  • Young, S., W. Ward and A. Hauptmann. 1989. Layering Predictions: Flexible Use of Dialogue Expectation in Speech Recognition. In Proceedings of Ijcai-89, Detroit, 1543–1549.

Download references

Author information

Authors and Affiliations

Authors

Additional information

Part of the work reported here (the massively parallel implementation) is supported by the National Science Foundation under grant MIP-90/09109. I want to thank Hideto Tomabechi, Teruko Mitamura, Lori Levin, Masaru Tomita, Jaime Carbonell, Alex Waibel, James McClelland and Hitoshi Iida for fruitful discussions and continuing support; also, Testuya Higuchi for IXM2 implementation and Dan Moldovan and members of the SNAP project for SNAP implementations. Several anonymous referees for this journal offered helpful suggestions. I would like to express special thanks to Mitsuko Saito who trained me as a simultaneous interpreter. Without my intuitions as a simultaneous interpreter, this work would not have started.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kitano, H. ΦDmDialog: A speech-to-speech dialogue translation system. Mach Translat 5, 301–338 (1990). https://doi.org/10.1007/BF00376645

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00376645

Keywords

Navigation