Effective Speaker Tracking Strategies for Multi-party Human-Computer Dialogue

  • Vladimir Popescu
  • Corneliu Burileanu
  • Jean Caelen
Part of the Studies in Computational Intelligence book series (SCI, volume 217)


Human-computer dialogue is already a rather mature research field [10] that already boiled down to several commercial applications, either service or task-oriented [11]. Nevertheless, several issues remain to be tackled, when unrestricted, spontaneous dialogue is concerned: barge-in (when users interrupt the system or interrupt each other) must be properly handled, hence Voice Activity Detection is a crucial point [13]. Moreover, when multi-party interactions are allowed (i.e., the machine engages simultaneously in dialogue with several users), supplementary robustness constraints occur: the speakers have to be properly tracked, so that each utterance is mapped to a certain speaker that had produced it. This is needed in order to perform a reliable analysis of input utterances [2].


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Barras, C.: Reconnaissance de la parole continue: adaptation au locuteur et contrôle temporel dans les modèles de Markov cachés., PHD Thesis, University of Paris VI, Paris (1996)Google Scholar
  2. 2.
    Braningan, H.: Research on Language and Computation 4, 153–177 (2006)Google Scholar
  3. 3.
    Caelen, J., Xuereb, A.: Interaction et pragmatique - jeux de dialogue et de langage. Hermès Science, Paris (2007)Google Scholar
  4. 4.
    Christensen, H.: Speaker adaptation of hidden Markov models using maximum likelihood linear regression. MA Thesis, University of Aalborg, Denmark (1996)Google Scholar
  5. 5.
    Ginzburg, J., Fernandez, R.: From Dialogue to Multilogue.... In: Proc. of ACL (2005)Google Scholar
  6. 6.
    Huang, X., Acero, A., Hon, H.-W.: Spoken language processing: a guide to theory, algorithm and system development. Prentice Hall, New Jersey (2001)Google Scholar
  7. 7.
    Landragin, F.: Dialogue homme-machine multimodal. Hermès Science, Paris (2005)Google Scholar
  8. 8.
    Larsson, S., Traum, D.: Natural Language Engineering 1(1) (2000)Google Scholar
  9. 9.
    Leggetter, C.J., Woodland, P.C.: Computer Speech and Language 9, 171–185 (1995)CrossRefGoogle Scholar
  10. 10.
    McTear, M.F.: ACM Computing Surveys 34(1), 90–169 (2002)CrossRefGoogle Scholar
  11. 11.
    Minker, W., Bennacef, S.: Parole et dialogue homme-machine. CNRS Editions, Paris (2001)Google Scholar
  12. 12.
    Motlicek, P., Burget, L., Cernoký, J.: Non-parametric speaker turn segmentation of meeting data. In: Proc. Eurospeech, Lisbon (2005)Google Scholar
  13. 13.
    Murani, N., Kobayashi, T.: Systems and Computers in Japan 34(13), 103–111 (2003)CrossRefGoogle Scholar
  14. 14.
    Popescu, V., Burileanu, C.: Parallel implementation of acoustic training procedures for continuous speech recognition. In: Burileanu, C. (ed.) Trends in speech technology. Romanian Academy Publishing House, Bucharest (2005)Google Scholar
  15. 15.
    Popescu, V., Burileanu, C., Rafaila, M., Calimanescu, R.: Parallel training algorithms for continuous speech recognition, implemented in a message passing framework. In: Proc. Eusipco, Florence (2006)Google Scholar
  16. 16.
    Popescu-Belis, A., Zufferey, S.: Contrasting the Automatic Identification of Two Discourse Markers in Multi-Party Dialogues. In: Proc. of SigDial, Antwerp (2007)Google Scholar
  17. 17.
    Ravishankhar, M.: Efficient algorithms for speech recognition. PHD thesis, Carnegie Mellon University, Pittsburg (1996)Google Scholar
  18. 18.
    Sato, S., Segi, H., Onoe, K., Miyasaka, E., Isono, H., Imai, T., Ando, A.: Electronics and Communications in Japan 88(2), 41–51 (2004)Google Scholar
  19. 19.
    Trudgill, P.: Sociolinguistics: an introduction to language and society, 4th edn. Penguin Books, LondonGoogle Scholar
  20. 20.
    Yamada, M., Baba, A., Yoshizawa, S., Mera, Y., Lee, A., Saruwatari, H., Shikano, K.: Electronics and Communications in Japan 89(3), 48–58 (2005)Google Scholar
  21. 21.
    Yamada, S., Baba, A., Yoshizawa, S., Lee, A., Saruwatari, H., Shikano, K.: Electronics and Communications in Japan 88(8), 30–41 (2005)Google Scholar
  22. 22.
    Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK book. Cambridge University, United Kingdom (2005)Google Scholar
  23. 23.
    Zhang, Z., Furui, S., Ohtsuki, K.: On-line incremental speaker adaptat ion for broadcast news transcription. Speech Communication 37, 271–281 (2002)MATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Vladimir Popescu
    • 1
    • 2
  • Corneliu Burileanu
    • 2
  • Jean Caelen
    • 1
  1. 1.Grenoble Institute of TechnologyFrance
  2. 2.“Politehnica” University of BucharestRomania

Personalised recommendations