Skip to main content

“How am I Doing?”: A New Framework to Effectively Measure the Performance of Automated Customer Care Contact Centers

  • Chapter
  • First Online:
Advances in Speech Recognition

Abstract

Satisfying callers’ goals and expectations is the primary objective of every customer care contact center. However, quantifying how successfully interactive voice response (IVR) systems satisfy callers’ goals and expectations has historically proven to be a most difficult task. Such difficulties in assessing automated customer care contact centers can be traced to two assumptions made by most stakeholders in the call center industry:

  1. 1.

    Performance can be effectively measured by deriving statistics from call logs; and

  2. 2.

    The overall performance of an IVR can be expressed by a single numeric value.

This chapter introduces an IVR assessment framework which confronts these ­misguided assumptions head on and shows how they can be overcome. Our new framework for measuring the performance of IVR-driven call centers incorporates objective and subjective measures. Using the concepts of hidden and observable measures, we demonstrate in this chapter how it is possible to produce reliable and meaningful performance metrics which provide insights into multiple aspects of IVR performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    During a side conversation at the AVIxD workshop held August 2009 in New York, Mark Stallings reported about a GI stationed in Iraq that called an American IVR while rounds of explosions tore through the air.

  2. 2.

    To make things even more complicated, there are cases where the distinction between observable and hidden becomes fuzzy: Dialog systems may acknowledge that they do not know the facts with certainty and, therefore, work with beliefs, i.e., with probability distributions over the observable facts. For instance, a system may ask a caller for his first name, but instead of accepting the first best hypothesis of the speech recognizer (e.g., “Bob”), it keeps the entire n-best list and the associated confidence scores (e.g., “Bob”: 50%; “Rob”: 20%; “Snob”: 10%, etc.). Such spoken dialog systems are referred to as belief systems [2, 28, 29].

  3. 3.

    This can be crucial considering that speech recognition applied to real-world spoken dialog ­systems can produce word error rates of 30% or higher even after careful tuning [5].

  4. 4.

    See Sect. 7.3 for more details on grammars used in IVRs.

  5. 5.

    “A 19 minute call takes 19 minutes to listen to” is one of ISCA and IEEE fellow Roberto Pieraccini’s famous aphorisms.

  6. 6.

    To give an example: We recently heard a call where the caller said “Cannot send e-mail” in a call-routing application and was forwarded to an automatic Internet troubleshooting application. This app took care of the problem and supposedly fixed it by successfully walking the caller through the steps of sending an e-mail to himself. Thereafter, the caller was asked whether there was anything else he needed help with, and he said “yes.” He was then connected back to the call router where he was asked to describe the reason for his call, and he said “Cannot send e-mail.” Instead of understanding that the caller’s problem was obviously not fixed by the Internet troubleshooting application during the first turn, he was routed there again and went through the same steps as he did the first time. Eventually, the caller requested human-agent assistance, understanding that he was caught in an infinite loop. Here, the caller’s opt-out was directly related to the app’s logical flaw.

References

  1. Acomb, K., Bloom, J., Dayanidhi, K., Hunter, P., Krogh, P., Levin, E., and Pieraccini, R. (2007). Technical Support Dialog Systems: Issues, Problems, and Solutions. In Proc. of the HLT-NAACL, Rochester, USA.

    Google Scholar 

  2. Bohus, D. and Rudnicky, A. (2005). Constructing Accurate Beliefs in Spoken Dialog Systems. In Proc. of the ASRU, San Juan, Puerto Rico.

    Google Scholar 

  3. Danieli, M. and Gerbino, E. (1995). Metrics for Evaluating Dialogue Strategies in a Spoken Language System. In Proc. of the AAAI Spring Symposium on Empirical Methods in Discourse Interpretation and Generation, Torino, Italy.

    Google Scholar 

  4. Evanini, K., Hunter, P., Liscombe, J., Suendermann, D., Dayanidhi, K., and Pieraccini, R. (2008). Caller Experience: A Method for Evaluating Dialog Systems and Its Automatic Prediction. In Proc. of the SLT, Goa, India.

    Google Scholar 

  5. Evanini, K., Suendermann, D., and Pieraccini, R. (2007). Call Classification for Automated Troubleshooting on Large Corpora. In Proc. of the ASRU, Kyoto, Japan.

    Google Scholar 

  6. Gorin, A., Riccardi, G., and Wright, J. (1997). How May I Help You? Speech Communication, 23(1/2).

    Google Scholar 

  7. Hone, K. and Graham, R. (2000). Towards a Tool for the Subjective Assessment of Speech System Interfaces (SASSI). Natural Language Engineering, 6(34).

    Google Scholar 

  8. Kamm, C., Litman, D., and Walker, M. (1998). From Novice to Expert: The Effect of Tutorials on User Expertise with Spoken Dialogue Systems. In Proc. of the ICSLP, Sydney, Australia.

    Google Scholar 

  9. Knight, S., Gorrell, G., Rayner, M., Milward, D., Koeling, R., and Lewin, I. (2001). Comparing Grammar-Based and Robust Approaches to Speech Understanding: A Case Study. In Proc. of the Eurospeech, Aalborg, Denmark.

    Google Scholar 

  10. Levin, E. and Pieraccini, R. (2006). Value-Based Optimal Decision for Dialog Systems. In Proc. of the SLT, Palm Beach, Aruba.

    Google Scholar 

  11. McGlashan, S., Burnett, D., Carter, J., Danielsen, P., Ferrans, J., Hunt, A., Lucas, B., Porter, B., Rehor, K., and Tryphonas, S. (2004). VoiceXML 2.0. W3C Recommendation. http://www.w3.org/TR/2004/REC-voicexml20-20040316.

  12. Melin, H., Sandell, A., and Ihse, M. (2001). CTT-Bank: A Speech Controlled Telephone Banking System – An Initial Evaluation. Technical report, KTH, Stockholm, Sweden.

    Google Scholar 

  13. Merriam-Webster (1998). Merriam-Webster’s Collegiate Dictionary. Merriam-Webster, Springfield, USA.

    Google Scholar 

  14. Minker, W. and Bennacef, S. (2004). Speech and Human-Machine Dialog. Springer, New York, USA.

    MATH  Google Scholar 

  15. Noeth, E., Boros, M., Fischer, J., Gallwitz, F., Haas, J., Huber, R., Niemann, H., Stemmer, G., and Warnke, V. (2001). Research Issues for the Next Generation Spoken Dialogue Systems Revisited. In Proc. of the TSD, Zelezna Ruda, Czech Republic.

    Google Scholar 

  16. Papineni, K., Roukos, S., Ward, T., and Zhu, W. J. (2002). BLEU: A Method for Automatic Evaluation of Machine Translation. In Proc. of the ACL, Philadelphia, USA.

    Google Scholar 

  17. Polifroni, J., Hirschman, L., Seneff, S., and Zue, V. (1992). Experiments in Evaluating Interactive Spoken Language Systems. In Proc. of the DARPA Workshop on Speech and Natural Language, Harriman, USA.

    Google Scholar 

  18. Quinlan, J. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco, USA.

    Google Scholar 

  19. Rabiner, L. (1989). A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proc. of the IEEE, 77(2).

    Google Scholar 

  20. Raux, A., Langner, B., Black, A., and Eskenazi, M. (2005). Let’s Go Public! Taking a Spoken Dialog System to the Real World. In Proc. of the Interspeech, Lisbon, Portugal.

    Google Scholar 

  21. Shriberg, E., Wade, E., and Prince, P. (1992). Human-Machine Problem Solving Using Spoken Language Systems (SLS): Factors Affecting Performance and User Satisfaction. In Proc. of the DARPA Workshop on Speech and Natural Language, Harriman, USA.

    Google Scholar 

  22. Suendermann, D., Hunter, P., and Pieraccini, R. (2008a). Call Classification with Hundreds of Classes and Hundred Thousands of Training Utterances and No Target Domain Data. In Proc. of the PIT, Kloster Irsee, Germany.

    Google Scholar 

  23. Suendermann, D., Liscombe, J., Dayanidhi, K., and Pieraccini, R. (2009a). A Handsome Set of Metrics to Measure Utterance Classification Performance in Spoken Dialog Systems. In Proc. of the SIGdial Workshop on Discourse and Dialogue, London, UK.

    Google Scholar 

  24. Suendermann, D., Liscombe, J., and Pieraccini, R. (2010). How to Drink from a Fire Hose. One Person can Annoscribe 693 Thousand Utterances in One Month. In Proc. of the SIGDIL, 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Tokyo, Japan.

    Google Scholar 

  25. Suendermann, D., Liscombe, J., Evanini, K., Dayanidhi, K., and Pieraccini, R. (2008b). C5. In Proc. of the SLT, Goa, India.

    Google Scholar 

  26. Suendermann, D., Liscombe, J., Evanini, K., Dayanidhi, K., and Pieraccini, R. (2009c). From Rule-Based to Statistical Grammars: Continuous Improvement of Large-Scale Spoken Dialog Systems. In Proc. of the ICASSP, Taipei, Taiwan.

    Google Scholar 

  27. Williams, J. (2006). Partially Observable Markov Decision Processes for Spoken Dialogue Management. PhD thesis, Cambridge University, Cambridge, UK.

    Google Scholar 

  28. Williams, J. (2008). Exploiting the ASR N-Best by Tracking Multiple Dialog State Hypotheses. In Proc. of the Interspeech, Brisbane, Australia.

    Google Scholar 

  29. Young, S., Schatzmann, J., Weilhammer, K., and Ye, H. (2007). The Hidden Information State Approach to Dialog Management. In Proc. of the ICASSP, Hawaii, USA.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Suendermann .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Suendermann, D., Liscombe, J., Pieraccini, R., Evanini, K. (2010). “How am I Doing?”: A New Framework to Effectively Measure the Performance of Automated Customer Care Contact Centers. In: Neustein, A. (eds) Advances in Speech Recognition. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-5951-5_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-5951-5_7

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-5950-8

  • Online ISBN: 978-1-4419-5951-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics