Advertisement

Knowledge and Information Systems

, Volume 9, Issue 2, pp 131–156 | Cite as

Call classification using recurrent neural networks, support vector machines and finite state automata

  • Sheila Garfield
  • Stefan Wermter
Regular Paper

Abstract

Our objective is spoken-language classification for helpdesk call routing using a scanning understanding and intelligent-system techniques. In particular, we examine simple recurrent networks, support-vector machines and finite-state transducers for their potential in this spoken-language-classification task and we describe an approach to classification of recorded operator-assistance telephone utterances. The main contribution of the paper is a comparison of a variety of techniques in the domain of call routing. Support-vector machines and transducers are shown to have some potential for spoken-language classification, but the performance of the neural networks indicates that a simple recurrent network performs best for helpdesk call routing.

Keywords

Classification Finite-state automata Recurrent neural networks Spontaneous language Support-vector machines 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Allen J, Ferguson G, Ringger EK et al (2001) Dialogue Systems: From Theory to Practice in TRAINS-96. In: Dale R, Moisl H, Somers H (eds) Handbook of natural language processing. Marcel Dekker, New York, pp 347–376Google Scholar
  2. 2.
    Allen J, Ferguson G, Stent A (2001) An architecture for more realistic conversational systems. In: Proceedings of intelligent user interfaces (IUI-01), Santa Fe, NMGoogle Scholar
  3. 3.
    Arai K, Wright JH, Riccardi G et al (1999) Grammar fragment acquisition using syntactic and semantic clustering. Speech Commun 27:43–62Google Scholar
  4. 4.
    Attwater D, Edgington M, Durston P et al (2000) Practical issues in the application of speech technology to network and customer service applications. Speech Commun 31:279–291Google Scholar
  5. 5.
    Brill E, Florian R, Henderson JC et al (1998) Beyond N-Grams: Can linguistic sophistication improve language modeling? In: Boitet C, Whitelock P (eds) Proceedings of the thirty-sixth annual meeting of the association for computational linguistics and seventeenth international conference on computational linguistics, Morgan Kaufmann Publishers, San Francisco, CA, pp 186–190Google Scholar
  6. 6.
    Burges CJ (1998) A tutorial on support vector machines for pattern recognition. Data Mining Knowl Discov 2:121–167Google Scholar
  7. 7.
    Burton-Roberts N (1986) Analysing sentences. An introduction to English syntax. Longman Group UK Ltd, EnglandGoogle Scholar
  8. 8.
    Carpenter B, Chu-Carroll J (1998) Natural language call routing: a robust self-organizing approach. ICSLP 98, Sydney, pp 2059–2062Google Scholar
  9. 9.
    Chapelle O, Vapnik V (2000) Model selection for support vector machines. In: Solla S, Leen T, Muller K-R (eds) Advances in neural information processing systems, vol 12. MIT Press, Cambridge, MAGoogle Scholar
  10. 10.
    Charniak E (1993) Statistical language learning. MIT Press, Cambridge, MAGoogle Scholar
  11. 11.
    Chou W, Zhou Q, Kuo H-K J et al (2000) Natural language call steering for service applications. In: Proceedings of the international conference on spoken language processing, Beijing, ChinaGoogle Scholar
  12. 12.
    Chu-Carroll J, Carpenter B (1998) Dialogue management in vector-based call routing. COLING-ACL98, pp 256–262Google Scholar
  13. 13.
    Chu-Carroll J, Carpenter B (1999) Vector-based natural language call routing. J Comput Ling 25(3):361–388Google Scholar
  14. 14.
    Durston PJ, Farrell M, Attwater D et al (2001) OASIS natural language call steering trial. In: Proceedings of Eurospeech, vol 2, pp 1323–1326Google Scholar
  15. 15.
    Edgington M, Attwater D, Durston P (1999) OASIS—a framework for spoken language call steering. In: Proceedings of Eurospeech '99, Budapest Hungary, pp 923–926Google Scholar
  16. 16.
    Elman JL, Bates EA, Johnson MH et al (1996) Rethinking innateness. MIT Press, Cambridge, MAGoogle Scholar
  17. 17.
    Elman JL (1991) Distributed representations simple recurrent networks and grammatical structure. Mach Learn 7:195–225Google Scholar
  18. 18.
    Elman JL (1990) Finding structure in time. Cognitive Sci 14:179–211CrossRefGoogle Scholar
  19. 19.
    Feng J, Williams P (2001) The generalization error of the symmetric and scaled support vector machines. IEEE Trans on Neural Net 12(5):1255–1260Google Scholar
  20. 20.
    Ferguson G, Allen JF (1998) TRIPS: an integrated intelligent problem-solving assistant. In: Proceedings of the Fifteenth National Conference on Artificial Intelligence (AAAI-98), Maddison, WI, pp 567–573Google Scholar
  21. 21.
    Forman G (2002) Choose your words carefully: an empirical study of feature selection metrics or text classification. In: Proceedings of the 13th European conference on machine learning ECML '02 and 6th European conference on principles and practice of knowledge discovery in databases PKDD, Helsinki, FinlandGoogle Scholar
  22. 22.
    Garner PN (1997) On topic identification and dialogue move recognition. Comput Speech Lang 11:275–306CrossRefGoogle Scholar
  23. 23.
    Glass1999 Glass JR (1999) Challenges for spoken dialogue systems. In: Proceedings of IEEE ASRU Workshop, KeyGoogle Scholar
  24. 24.
    Gorin AL, Riccardi G, Wright JH (1997) How may I help you? Speech Commun 23:113–127Google Scholar
  25. 25.
    Gorin AL, Wright JH, Riccardi G et al (2000) Semantic information processing of spoken language. In: Proceedings of 2000 International ATR Workshop on Multilingual Speech Communication, Kyoto Japan, October 2000, pp 13–16Google Scholar
  26. 26.
    Gunn S (1998) Support vector machines for classification and regression. ISIS Technical ReportGoogle Scholar
  27. 27.
    Harman D (1995) Overview of the fourth text retrieval conference. In: Proceedings of TRECGoogle Scholar
  28. 28.
    Joachims T (1999) Making large-scale SVM learning practical. In: Schölkopf B, Burges C, Smola A (eds) Advances in kernel methods: support vector learning. MIT Press, Cambridge, MAGoogle Scholar
  29. 29.
    Joachims T (2000) Estimating the generalization performance of an SVM efficiently. In: Proceedings of International Conference on Machine LearningGoogle Scholar
  30. 30.
    Joachims T (2002) Learning to classify text using support vector machines. Kluwer Academic Publishers, Boston, MAGoogle Scholar
  31. 31.
    Jordan M (1986) Attractor dynamics and parallelism in a connectionist sequential machine. In: Proceedings of the Eighth Annual Conference of the Cognitive Science Society, Amherst, pp 531–546Google Scholar
  32. 32.
    Jurafsky D, Martin JH (2000) Speech and language processing. Prentice Hall, Upper Saddle River, NJGoogle Scholar
  33. 33.
    Landauer TK, Foltz PW, Laham D (1998) An introduction to latent semantic analysis. Discourse Process 25:259–284Google Scholar
  34. 34.
    LeCun Y, Bottou L, Orr G et al (1998) Efficient backprop. In: Orr G, Muller K (eds) Neural networks: tricks of the trade, Springer, Berlin Heidelberg New YorkGoogle Scholar
  35. 35.
    McDonough J, Ng K, Jeanrenaud P et al (1994) Approaches to topic identification on the switchboard corpus. In: Proceedings of IEEE international conference on acoustics speech and signal processing, Adelaide, Australia, pp 385–388Google Scholar
  36. 36.
    McTear MF (2000) Intelligent interface technology: from theory to reality? Interact Comput 12:323–336Google Scholar
  37. 37.
    McTear MF (2002) Spoken dialogue technology: enabling the conversational user interface. ACM Comput Surv 34(1):90–169CrossRefGoogle Scholar
  38. 38.
    Moghaddam B, Yang M-H (2001) Sex with support vector machines. In: Leen TK, Dietterich TG, Tresp V (eds) Advances in neural information processing systems, vol 13. MIT Press, Cambridge, MA, pp 960–966Google Scholar
  39. 39.
    Opper M, Urbanczik R (2001) Universal learning curves of support vector machines. Phys Rev Lett 86(19):4410–4413CrossRefGoogle Scholar
  40. 40.
    Platt JC (1998) Sequential minimal optimization: a fast algorithm for training support vector machines. Technical Report MSR-TR-98-14Google Scholar
  41. 41.
    Pyka C (1992) Management of hypotheses in an integrated speech-language architecture. In: Proceedings of 10th European conference on artificial intelligenceGoogle Scholar
  42. 42.
    Roche E, Schabes Y (1997) Finite-state language processing. MIT, LondonGoogle Scholar
  43. 43.
    Salton G, McGill M (1983) Introduction to modern information retrieval. McGraw Hill, New YorkGoogle Scholar
  44. 44.
    Schölkopf B (1998) SVMs-a practical consequence of learning theory. IEEE Intell Syst pp: 18–21Google Scholar
  45. 45.
    Schölkopf B, Burges C, Vapnik V (1995) Extracting support data for a given task. In: Fayyad UM, Uthurusamy R (eds) Proceedings first international conference on knowledge discovery and data mining AAAI, Press, Menlo Park, CA, pp 252–257Google Scholar
  46. 46.
    Stitson MO, Weston JAE, Gammerman A et al (1996) Theory of support vector machines. Technical Report CSD-TR-96-17Google Scholar
  47. 47.
    Stolcke A, Shriberg E, Bates R et al (1998) Dialog act modeling for conversational speech. In: Proceedings of AAAI-98 spring symposium on applying machine learning to discourse processingGoogle Scholar
  48. 48.
    CJ (1979) Information retrieval, 2nd edn. Butterworths, LondonGoogle Scholar
  49. 49.
    Vapnik VN (1995) The nature of statistical learning theory. Springer Verlag, Berlin Heidelberg New YorkGoogle Scholar
  50. 50.
    Wermter S (1995) Hybrid connectionist natural language processing. Chapman and Hall Thomson International, London, UKGoogle Scholar
  51. 51.
    Wermter S, Panchev C, Arevian G (1999) Hybrid neural plausibility networks for news agents. In: Proceedings of the National Conference on Artificial Intelligence, Orlando, FLGoogle Scholar
  52. 52.
    Wermter S, Weber V (1997) SCREEN: learning a flat syntactic and semantic spoken language analysis. J Artif Intell Res 6(1):35–85Google Scholar
  53. 53.
    Wermter S (2000) Neural fuzzy preference integration using neural preference Moore machines. Int J Neural Syst 10(4):287–309zbMATHGoogle Scholar
  54. 54.
    Wermter S (1999) Preference Moore machines for neural fuzzy integration. In: Proceedings of the international joint conference on artificial intelligence, Stockholm, pp 840–845Google Scholar
  55. 55.
    Wermter S, Panchev C, Houlsby J (1999) Language disorders in the brain: distinguishing aphasia forms with recurrent networks. In: Proceedings of AAAI 99 conference workshop on neuroscience and neural computation, Orlando, FL, pp 93–98Google Scholar
  56. 56.
    Yang Y-J, Chien L-F, Lee L-S (2002) Speaker intention modeling for large vocabulary Mandarin spoken dialogues. ICSLP '96, vol 2, pp 713–716Google Scholar
  57. 57.
    Young SR, Hauptmann AG, Ward WH et al (1989) High level knowledge sources in usable speech recognition systems. Commun ACM 32:183–194CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Ltd. 2005

Authors and Affiliations

  • Sheila Garfield
    • 1
  • Stefan Wermter
    • 1
  1. 1.University of Sunderland, School of Computing and TechnologySunderlandUnited Kingdom

Personalised recommendations