Advertisement

Overlap in Meetings: ASR Effects and Analysis by Dialog Factors, Speakers, and Collection Site

  • Özgür Çetin
  • Elizabeth Shriberg
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4299)

Abstract

We analyze speaker overlap in multiparty meetings both in terms of automatic speech recognition (ASR) performance, and in terms of distribution of overlap with respect to various factors (collection site, speakers, dialog acts, and hot spots). Unlike most previous work on overlap or crosstalk, our ASR error analysis uses an approach that allows comparison of the same foreground speech with and without naturally occurring overlap, using a state-of-the-art meeting recognition system. We examine a total of 101 meetings. For analysis of ASR, we use 26 meetings from the NIST meeting transcription evaluations, and discover a number of interesting phenomena. First, overlaps tend to occur at high-perplexity regions in the foreground talker’s speech. Second, overlap regions tend to have higher perplexity than those in nonoverlaps, if trigrams or 4-grams are used, but unigram perplexity within overlaps is considerably lower than that of nonoverlaps. Third, word error rate (WER) after overlaps is consistently lower than that before the overlap, apparently because the foreground speaker reduces perplexity shortly after being overlapped. These appear to be robust findings, because they hold in general across meetings from different collection sites, even though meeting style and absolute rates of overlap vary by site. Further analyses of overlap with respect to speakers and meeting content were conducted on a set of 75 additional meetings collected and annotated at ICSI. These analyses reveal interesting relationships between overlap and dialog acts, as well as between overlap and “hot spots” (points of increased participant involvement). Finally, results from this larger data set show that individual speakers have widely varying rates of being overlapped.

Keywords

Automatic Speech Recognition Word Error Rate Recognition Condition Automatic Speech Recognition System Individual Speaker 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ang, J., Liu, Y., Shriberg, E.: Automatic Dialog Act Segmentation and Classification in Multi-party Meetings. In: Proc. Intl. Conf. on Acoustic, Speech and Signal Processing, pp. 1061–1064 (2005)Google Scholar
  2. 2.
    Clark, A., Popescu-Belis, A.: Multi-level Dialogue Act Tags. In: SIGdial Workshop on Discourse and Dialogue, pp. 163–170 (2004)Google Scholar
  3. 3.
    Cooke, M., Ellis, D.P.W.: The Auditory Organization of Speech and Other Sources in Listeners and Computational Models. Speech Communication 35, 141–177 (2001)MATHCrossRefGoogle Scholar
  4. 4.
    Çetin, Ö., Stolcke, A.: Language Modeling in the ICSI-SRI Spring 2005 Meeting Speech Recognition Evaluation System, Technical Report TR-05-006, ICSI (2005)Google Scholar
  5. 5.
    Çetin, Ö., Shriberg, E.E.: Speaker Overlaps and ASR Errors in Meetings: Effects Before, During, and After the Overlap. In: Proc. Intl. Conf. on Acoustic, Speech and Signal Processing (2006)Google Scholar
  6. 6.
    Dhillon, R., Bhagat, S., Carvey, H., Shriberg, E.: Meeting Recorder Project: Dialog Act Labeling Guide, Technical Report TR-04-002, ICSI (2004)Google Scholar
  7. 7.
    Jefferson, G.: A Sketch of Some Orderly Aspects of Overlap in Natural Conversation. In: Lerner, G.H. (ed.) Conversation Analysis, pp. 43–59. John Benjamins, Amsterdam (2004)Google Scholar
  8. 8.
    Ji, G., Bilmes, J.: Dialog Act Tagging Using Graphical Models. In: Proc. Intl. Conf. on Acoustics, Speech and Signal Process, pp. 33–36 (2005)Google Scholar
  9. 9.
    Janin, A., Baron, D., Edwards, J., Ellis, D., Gelbart, D., Morgan, N., Peskin, B., Pfau, T., Shriberg, E., Stolcke, A., Wooters, C.: The ICSI Meeting Corpus. In: Proc. Intl. Conf. on Acoustics, Speech and Signal Process, pp. 364–367 (2003)Google Scholar
  10. 10.
    Morgan, N., Baron, D., Edwards, J., Ellis, D., Gelbart, D., Janin, A., Pfau, T., Shriberg, E., Stolcke, A.: The Meeting Project at ICSI. In: Proc. Human Language Technologies Conf., pp. 1–7 (2001)Google Scholar
  11. 11.
  12. 12.
    Pfau, T., Ellis, D., Stolcke, A.: Multispeaker Speech Activity Detection for the ICSI Meeting Recorder. In: Proc. Automatic Speech Recognition and Understanding Workshop, pp. 107–110 (2001)Google Scholar
  13. 13.
    Sacks, H., Schegloff, E., Jefferson, G.: A Simplest Semantics for the Organization of the Turn-taking in Conversation. Language 50, 696–735 (1974)CrossRefGoogle Scholar
  14. 14.
    Schegloff, E.: Recycled Turn Beginnings: A precise repair mechanism in conversation’s turn-taking organisation. In: Button, G., Lee, J.R.E. (eds.) Talk and Social Organisation, pp. 70–85. Clevadon (1987)Google Scholar
  15. 15.
    Schegloff, E.: Overlapping Talk and the Organization of Turn-Taking for Conversation. Language in Society 29, 696–735 (2000)CrossRefGoogle Scholar
  16. 16.
    Schultz, R.T., Waibel, A., Bett, M., Metze, F., Pan, Y., Ries, K., Schaaf, T., Soltau, H., Westphal, M., Yu, H., Zechner, K.: The ISL Meeting Room System. In: Proc. Workshop on Hands-Free Speech Communication (2001)Google Scholar
  17. 17.
    Shriberg, E., Stolcke, A., Baron, D.: Observations on Overlap: Findings and implications for automatic processing of multi-party conversation. In: Proc. European Conf. on Speech Communication and Technology, pp. 1359–1362 (2001)Google Scholar
  18. 18.
    Shriberg, E., Dhillon, R., Bhagat, S., Ang, J., Carvey, H.: The ICSI Meeting Recorder Dialog Act (MRDA) Corpus. In: Proc. 5th SIGdial Workshop on Discourse and Dialogue, pp. 97–100 (2004)Google Scholar
  19. 19.
    Stolcke, A., Anguera, X., Boakye, K., Çetin, Ö., Grezl, F., Janin, A., Mandal, A., Peskin, B., Wooters, C., Zheng, J.: Further Progress in Meeting Recognition: The ICSI-SRI Spring 2005 Speech-to-Text Evaluation System. In: Proc. NIST RT-05 Meeting Recognition Workshop (2005)Google Scholar
  20. 20.
    Wrede, B., Bhagat, S., Dhillon, R., Shriberg, E.: Meeting Recorder Project: Hot Spot Labeling Guide, Technical Report TR-05-004, ICSI (2005)Google Scholar
  21. 21.
    Wrigley, S., Brown, G., Wan, V., Renals, S.: Speech and Crosstalk Detection in Multi-channel Audio. IEEE Trans. on Speech and Audio Processing 13, 84–91 (2005)CrossRefGoogle Scholar
  22. 22.
    Zimmermann, M., Liu, Y., Shriberg, E., Stolcke, A.: A* based Joint Segmentation and Classification of Dialog Acts in Multiparty Meetings. In: Proc. Automatic Speech Recognition and Understanding Workshop, pp. 215–219 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Özgür Çetin
    • 1
  • Elizabeth Shriberg
    • 1
    • 2
  1. 1.International Computer Science InstituteBerkeleyUSA
  2. 2.SRI InternationalMenlo ParkUSA

Personalised recommendations