Abstract
This paper presents a novel technique for the classification of sentences as Dialogue Acts, based on structural information contained in function words. It focuses on classifying questions or non-questions as a generally useful task in agent-based systems. The proposed technique extracts salient features by replacing function words with numeric tokens and replacing each content word with a standard numeric wildcard token. The Decision Tree, which is a well-established classification technique, has been chosen for this work. Experiments provide evidence of potential for highly effective classification, with a significant achievement on a challenging dataset, before any optimisation of feature extraction has taken place.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Keizer, S.: A Bayesian Approach to Dialogue Act Classification. In: BI-DIALOG 2001 the 5th Workshop on Formal Semantics and Pragmatics of Dialogue, pp. 88–94. ZiF, Bielefeld (2001)
Webb, N., Hepple, M., Wilks, Y.: Dialogue Act Classification Based on Intra-Utterance Features. In: AAAI 2005. AAAI Press, Pittsburgh (2005)
Verbree, D., Rienks, R., Heylen, D.: Dialogue-Act Tagging Using Smart Feature Selection; Results On Multiple Corpora. In: IEEE Spoken Language Technology Workshop, pp. 70–73. IEEE Press, New York (2006)
Venkataraman, A., Stolcke, A., Shriberg, E.: Automatic Dialog Act Labeling With Minimal Supervision. In: 9th Australian International Conference on Speech Science and Technology (2002)
Serafin, R., Di Eugenio, B., Glass, M.: Latent Semantic Analysis for dialogue act classification. In: The 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (2003)
Wermter, S., Lochel, M.: Learning Dialog Act Processing. In: COLING 1996, 16th International Conference on Computational Linguistics (1996)
Li, Y., Bandar, Z., McLean, D., O’Shea, J.: A method for measuring sentence similarity and its application to conversational agents. In: The 17th International Florida Artificial Intelligence Research Society Conference (FLAIRS 2004), pp. 820–825. AAAI Press, Menlo Park (2004)
Längle, T., Lüth, T., Stopp, E., Herzog, G., Kamstrup, G.: KANTRA - A Natural Language Interface for Intelligent Robots. In: Intelligent Autonomous Systems (IAS 4), pp. 357–364 (1995)
Bickmore, T., Giorgino, T.: Health dialog systems for patients and consumers. J. Biomed. Inform. 39(5), 556–571 (2006)
Keizer, S., op den Akker, R., Nijholt, A.: Dialogue Act Recognition with Bayesian Networks for Dutch Dialogues. In: Third SIGdial Workshop on Discourse and Dialogue, pp. 88–94 (2002)
Crockett, K., Bandar, Z., O’Shea, J., McLean, D.: Bullying and Debt: Developing Novel Applications of Dialogue Systems. In: Knowledge and Reasoning in Practical Dialogue Systems (IJCAI), Pasadena (2009)
van Rijsbergen, C.: Information Retrieval. Butterworths, Boston (1980)
Sanderson, M.: http://ftp.dcs.glasgow.ac.uk/idom/ir_resources/linguistic_utils/stop_words
Spärck-Jones, K.: A Statistical Interpretation of Term Specificity and its Application in Retrieval. Journal of Documentation 28, 11–21 (1972)
Salton, G., Wong, A., Yang, C.: A Vector Space Model for Automatic Indexing. Communications of the ACM 18(11), 613–620 (1975)
Deerwester, S., Dumais, S., Furnas, G., Harshman, R., Landauer, T., Lochbaum, K., Streeter, L.: Computer information retrieval using Latent Semantic Structure. Bell Communications Research Inc. U.S.P. Office (1989)
Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by Latent Semantic Analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)
Bollacker, K., Lawrence, S., Giles, C.: CiteSeer: An Autonomous Web Agent for Automatic Retrieval and Identification of Interesting Publications. In: 2nd International ACM Conference on Autonomous Agents, pp. 116–123. ACM Press, New York (1998)
Li, Y., Bandar, Z., McLean, D., O’Shea, J.: Sentence Similarity Based on Semantic Nets and Corpus Statistics. IEEE Transactions on Knowledge and Data Engineering 18(8), 1138–1150 (2006)
Islam, A., Inkpen, D.: Semantic Text Similarity using Corpus-Based Word Similarity and String Similarity. ACM Transactions on Knowledge Discovery from Data 2(2), 1–25 (2008)
Quinlan, J.: C4.5: programs for machine learning. Morgan Kaufmann Publishers, San Mateo (1993)
Witten, I., Eibe, F.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Elsevier, San Francisco (2005)
Aleksander, I., Morton, H.: Introduction to Neural Computing. International Thomson Computer Press (1995)
Quinlan, J.: Induction of Decision Trees. Machine Learning 1(1), 81–106 (1986)
Fong, T., Grange, S., Thorpe, C., Baur, C.: Multi-robot remote driving with collaborative control. In: IEEE International Workshop on Robot-Human Interactive Communication (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
O’Shea, J., Bandar, Z., Crockett, K. (2010). A Machine Learning Approach to Speech Act Classification Using Function Words. In: Jędrzejowicz, P., Nguyen, N.T., Howlet, R.J., Jain, L.C. (eds) Agent and Multi-Agent Systems: Technologies and Applications. KES-AMSTA 2010. Lecture Notes in Computer Science(), vol 6071. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13541-5_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-13541-5_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13540-8
Online ISBN: 978-3-642-13541-5
eBook Packages: Computer ScienceComputer Science (R0)