Automatic Labeling Inconsistencies Detection and Correction for Sentence Unit Segmentation in Conversational Speech

  • Sébastien Cuendet
  • Dilek Hakkani-Tür
  • Elizabeth Shriberg
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4892)


In conversational speech, irregularities in the speech such as overlaps and disruptions make it difficult to decide what is a sentence. Thus, despite very precise guidelines on how to label conversational speech with dialog acts (DA), labeling inconsistencies are likely to appear. In this work, we present various methods to detect labeling inconsistencies in the ICSI meeting corpus. We show that by automatically detecting and removing the inconsistent examples from the training data, we significantly improve the sentence segmentation accuracy. We then manually analyze 200 of noisy examples detected by the system and observe that only 13% of them are labeling inconsitencies, while the rest are errors done by the classifier. The errors naturally cluster into 5 main classes for each of which we give hints on how the system can be improved to avoid these mistakes.


Automatic relabeling error correction boosting sentence segmentation noisy data 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Mrozinski, J., Whittaker, E.W.D., Chatain, P., Furui, S.: Automatic sentence segmentation of speech for automatic summarization. In: Proc. ICASSP, Philadelphia, PA (2005)Google Scholar
  2. 2.
    Makhoul, J., Baron, A., Bulyko, I., Nguyen, L., Ramshaw, L., Stallard, D., Schwartz, R., Xiang, B.: The effects of speech recognition and punctuation on information extraction performance. In: Proc. of Interspeech, Lisbon (2005)Google Scholar
  3. 3.
    Shriberg, E., Dhillon, R., Bhagat, S., Ang, J., Carvey, H.: The ICSI meeting recorder dialog act (MRDA) corpus. In: Proc. SigDial Workshop, Boston, MA (2004)Google Scholar
  4. 4.
    Schapire, R.E., Singer, Y.: BoosTexter: A boosting-based system for text categorization. Machine Learning 39(2/3), 135–168 (2000)MATHCrossRefGoogle Scholar
  5. 5.
    Zimmermann, M., Hakkani-Tür, D., Fung, J., Mirghafori, N., Shriberg, E., Liu, Y.: The ICSI+ multi-lingual sentence segmentation system. In: Proc. ICSLP, Pittsburgh, PA (2006)Google Scholar
  6. 6.
    Schapire, R.: The boosting approach to machine learning: An overview. In: MSRI Workshop on Nonlinear Estimation and Classification, Berkeley, CA (2001)Google Scholar
  7. 7.
    Tur, G., Rahim, M., Hakkani-Tür, D.: Active labeling for spoken language understanding. In: Proceedings of EUROSPEECH, Geneva, Switzerland (2003)Google Scholar
  8. 8.
    Eskin, E.: Anomaly detection over noisy data using learned probability distributions. In: Proc. 17th International Conf. on Machine Learning, pp. 255–262. Morgan Kaufmann, San Francisco (2000)Google Scholar
  9. 9.
    Abney, S., Schapire, R., Singer, Y.: Boosting applied to tagging and pp attachment. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (1999)Google Scholar
  10. 10.
    Wheway, V.: Using boosting to detect noisy data. In: Mizoguchi, R., Slaney, J.K. (eds.) PRICAI 2000. LNCS (LNAI), vol. 1886, pp. 123–132. Springer, Heidelberg (2000)Google Scholar
  11. 11.
    Liu, X-D., Shi, C.-Y., Gu, X.-D.: A boosting method to detect noisy data. In: Proc. of the Fourth International Conference on Machine Learning and Cybernetics, Guangzhou, China (2005)Google Scholar
  12. 12.
    Oza, N.C.: Aveboost2: Boosting for noisy data. In: Fifth International Workshop on Multiple Classifier Systems, Cagliari, Italy, June 2004, pp. 31–40. Springer, Heidelberg (2004)Google Scholar
  13. 13.
    Breiman, L.: Arcing the edge. Technical report, Statistics Department, UC Berkeley (1997)Google Scholar
  14. 14.
    Janin, A., Ang, J., Bhagat, S., Dhillon, R., Edwards, J., Macias-Guarasa, J., Morgan, N., Peskin, B., Shriberg, E., Stolcke, A., Wooters, C., Wrede, B.: The ICSI meeting project: Resources and research. In: Proceedings of ICASSP, Montreal (2004)Google Scholar
  15. 15.
    Ang, J., Liu, Y., Shriberg, E.: Automatic dialog act segmentation and classification in multiparty meetings. In: Proc. ICASSP, Philadelphia, PA (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Sébastien Cuendet
    • 1
  • Dilek Hakkani-Tür
    • 1
  • Elizabeth Shriberg
    • 1
    • 2
  1. 1.International Computer Science Institute (ICSI)BerkeleyUSA
  2. 2.Speech Technology and Research Laboratory, SRI InternationalMenlo ParkUSA

Personalised recommendations