Abstract
In conversational speech, irregularities in the speech such as overlaps and disruptions make it difficult to decide what is a sentence. Thus, despite very precise guidelines on how to label conversational speech with dialog acts (DA), labeling inconsistencies are likely to appear. In this work, we present various methods to detect labeling inconsistencies in the ICSI meeting corpus. We show that by automatically detecting and removing the inconsistent examples from the training data, we significantly improve the sentence segmentation accuracy. We then manually analyze 200 of noisy examples detected by the system and observe that only 13% of them are labeling inconsitencies, while the rest are errors done by the classifier. The errors naturally cluster into 5 main classes for each of which we give hints on how the system can be improved to avoid these mistakes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Mrozinski, J., Whittaker, E.W.D., Chatain, P., Furui, S.: Automatic sentence segmentation of speech for automatic summarization. In: Proc. ICASSP, Philadelphia, PA (2005)
Makhoul, J., Baron, A., Bulyko, I., Nguyen, L., Ramshaw, L., Stallard, D., Schwartz, R., Xiang, B.: The effects of speech recognition and punctuation on information extraction performance. In: Proc. of Interspeech, Lisbon (2005)
Shriberg, E., Dhillon, R., Bhagat, S., Ang, J., Carvey, H.: The ICSI meeting recorder dialog act (MRDA) corpus. In: Proc. SigDial Workshop, Boston, MA (2004)
Schapire, R.E., Singer, Y.: BoosTexter: A boosting-based system for text categorization. Machine Learning 39(2/3), 135–168 (2000)
Zimmermann, M., Hakkani-Tür, D., Fung, J., Mirghafori, N., Shriberg, E., Liu, Y.: The ICSI+ multi-lingual sentence segmentation system. In: Proc. ICSLP, Pittsburgh, PA (2006)
Schapire, R.: The boosting approach to machine learning: An overview. In: MSRI Workshop on Nonlinear Estimation and Classification, Berkeley, CA (2001)
Tur, G., Rahim, M., Hakkani-Tür, D.: Active labeling for spoken language understanding. In: Proceedings of EUROSPEECH, Geneva, Switzerland (2003)
Eskin, E.: Anomaly detection over noisy data using learned probability distributions. In: Proc. 17th International Conf. on Machine Learning, pp. 255–262. Morgan Kaufmann, San Francisco (2000)
Abney, S., Schapire, R., Singer, Y.: Boosting applied to tagging and pp attachment. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (1999)
Wheway, V.: Using boosting to detect noisy data. In: Mizoguchi, R., Slaney, J.K. (eds.) PRICAI 2000. LNCS (LNAI), vol. 1886, pp. 123–132. Springer, Heidelberg (2000)
Liu, X-D., Shi, C.-Y., Gu, X.-D.: A boosting method to detect noisy data. In: Proc. of the Fourth International Conference on Machine Learning and Cybernetics, Guangzhou, China (2005)
Oza, N.C.: Aveboost2: Boosting for noisy data. In: Fifth International Workshop on Multiple Classifier Systems, Cagliari, Italy, June 2004, pp. 31–40. Springer, Heidelberg (2004)
Breiman, L.: Arcing the edge. Technical report, Statistics Department, UC Berkeley (1997)
Janin, A., Ang, J., Bhagat, S., Dhillon, R., Edwards, J., Macias-Guarasa, J., Morgan, N., Peskin, B., Shriberg, E., Stolcke, A., Wooters, C., Wrede, B.: The ICSI meeting project: Resources and research. In: Proceedings of ICASSP, Montreal (2004)
Ang, J., Liu, Y., Shriberg, E.: Automatic dialog act segmentation and classification in multiparty meetings. In: Proc. ICASSP, Philadelphia, PA (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cuendet, S., Hakkani-Tür, D., Shriberg, E. (2008). Automatic Labeling Inconsistencies Detection and Correction for Sentence Unit Segmentation in Conversational Speech. In: Popescu-Belis, A., Renals, S., Bourlard, H. (eds) Machine Learning for Multimodal Interaction. MLMI 2007. Lecture Notes in Computer Science, vol 4892. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78155-4_13
Download citation
DOI: https://doi.org/10.1007/978-3-540-78155-4_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78154-7
Online ISBN: 978-3-540-78155-4
eBook Packages: Computer ScienceComputer Science (R0)