Effective Speaker Tracking Strategies for Multi-party Human-Computer Dialogue

  • Vladimir Popescu
  • Corneliu Burileanu
  • Jean Caelen
Part of the Studies in Computational Intelligence book series (SCI, volume 217)


Human-computer dialogue is already a rather mature research field [10] that already boiled down to several commercial applications, either service or task-oriented [11]. Nevertheless, several issues remain to be tackled, when unrestricted, spontaneous dialogue is concerned: barge-in (when users interrupt the system or interrupt each other) must be properly handled, hence Voice Activity Detection is a crucial point [13]. Moreover, when multi-party interactions are allowed (i.e., the machine engages simultaneously in dialogue with several users), supplementary robustness constraints occur: the speakers have to be properly tracked, so that each utterance is mapped to a certain speaker that had produced it. This is needed in order to perform a reliable analysis of input utterances [2].


Speech Recognition Adaptation Data Speech Recognition System Word Error Rate Regression Class 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Barras, C.: Reconnaissance de la parole continue: adaptation au locuteur et contrôle temporel dans les modèles de Markov cachés., PHD Thesis, University of Paris VI, Paris (1996)Google Scholar
  2. 2.
    Braningan, H.: Research on Language and Computation 4, 153–177 (2006)Google Scholar
  3. 3.
    Caelen, J., Xuereb, A.: Interaction et pragmatique - jeux de dialogue et de langage. Hermès Science, Paris (2007)Google Scholar
  4. 4.
    Christensen, H.: Speaker adaptation of hidden Markov models using maximum likelihood linear regression. MA Thesis, University of Aalborg, Denmark (1996)Google Scholar
  5. 5.
    Ginzburg, J., Fernandez, R.: From Dialogue to Multilogue.... In: Proc. of ACL (2005)Google Scholar
  6. 6.
    Huang, X., Acero, A., Hon, H.-W.: Spoken language processing: a guide to theory, algorithm and system development. Prentice Hall, New Jersey (2001)Google Scholar
  7. 7.
    Landragin, F.: Dialogue homme-machine multimodal. Hermès Science, Paris (2005)Google Scholar
  8. 8.
    Larsson, S., Traum, D.: Natural Language Engineering 1(1) (2000)Google Scholar
  9. 9.
    Leggetter, C.J., Woodland, P.C.: Computer Speech and Language 9, 171–185 (1995)CrossRefGoogle Scholar
  10. 10.
    McTear, M.F.: ACM Computing Surveys 34(1), 90–169 (2002)CrossRefGoogle Scholar
  11. 11.
    Minker, W., Bennacef, S.: Parole et dialogue homme-machine. CNRS Editions, Paris (2001)Google Scholar
  12. 12.
    Motlicek, P., Burget, L., Cernoký, J.: Non-parametric speaker turn segmentation of meeting data. In: Proc. Eurospeech, Lisbon (2005)Google Scholar
  13. 13.
    Murani, N., Kobayashi, T.: Systems and Computers in Japan 34(13), 103–111 (2003)CrossRefGoogle Scholar
  14. 14.
    Popescu, V., Burileanu, C.: Parallel implementation of acoustic training procedures for continuous speech recognition. In: Burileanu, C. (ed.) Trends in speech technology. Romanian Academy Publishing House, Bucharest (2005)Google Scholar
  15. 15.
    Popescu, V., Burileanu, C., Rafaila, M., Calimanescu, R.: Parallel training algorithms for continuous speech recognition, implemented in a message passing framework. In: Proc. Eusipco, Florence (2006)Google Scholar
  16. 16.
    Popescu-Belis, A., Zufferey, S.: Contrasting the Automatic Identification of Two Discourse Markers in Multi-Party Dialogues. In: Proc. of SigDial, Antwerp (2007)Google Scholar
  17. 17.
    Ravishankhar, M.: Efficient algorithms for speech recognition. PHD thesis, Carnegie Mellon University, Pittsburg (1996)Google Scholar
  18. 18.
    Sato, S., Segi, H., Onoe, K., Miyasaka, E., Isono, H., Imai, T., Ando, A.: Electronics and Communications in Japan 88(2), 41–51 (2004)Google Scholar
  19. 19.
    Trudgill, P.: Sociolinguistics: an introduction to language and society, 4th edn. Penguin Books, LondonGoogle Scholar
  20. 20.
    Yamada, M., Baba, A., Yoshizawa, S., Mera, Y., Lee, A., Saruwatari, H., Shikano, K.: Electronics and Communications in Japan 89(3), 48–58 (2005)Google Scholar
  21. 21.
    Yamada, S., Baba, A., Yoshizawa, S., Lee, A., Saruwatari, H., Shikano, K.: Electronics and Communications in Japan 88(8), 30–41 (2005)Google Scholar
  22. 22.
    Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK book. Cambridge University, United Kingdom (2005)Google Scholar
  23. 23.
    Zhang, Z., Furui, S., Ohtsuki, K.: On-line incremental speaker adaptat ion for broadcast news transcription. Speech Communication 37, 271–281 (2002)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Vladimir Popescu
    • 1
    • 2
  • Corneliu Burileanu
    • 2
  • Jean Caelen
    • 1
  1. 1.Grenoble Institute of TechnologyFrance
  2. 2.“Politehnica” University of BucharestRomania

Personalised recommendations