VoiceMarks: restructuring hierarchical voice menus for improving navigation

  • Pourang Irani
  • Peer Shajahan
  • Christel Kemke


Interactive Voice Response (IVR) systems, or touch-tone telephony interfaces, are nowadays a common medium of interaction between organizations or companies and their customers, allowing users to access or enter specific company-based information. These telephony interfaces typically involve the use of hierarchically structured voice menus, through which a user has to navigate in order to locate a specific desired menu item. This navigation process is often inefficient and time-consuming, leaving users at times frustrated and annoyed. In this paper, we describe the foundation of VoiceMarks, a system designed to improve the ease and efficiency of navigation in menu-based voice interfaces. The system features personalized menus through the use of voicemarks, in a process similar to bookmarking, but adapted to voice interfaces. VoiceMarks are essentially bookmarked nodes in the voice menu hierarchy, which are stored for the respective user in a directly accessible, personal menu. We developed and tested VoiceMarks interfaces for two applications: a bus schedule information system and a cinema ticket purchase system. A comparative study of VoiceMarks and traditional interfaces of these applications showed that VoiceMarks can significantly improve the interaction between users and systems, in terms of time and number of keystrokes needed to locate a menu item, as well as regarding user satisfaction. In general, users responded very positively to the VoiceMarks interface. In addition, the study pointed to some useful modifications of VoiceMarks, which should be considered before employing the system in a commercial setting.


Voice user interfaces Personalized touch-tone menus Telephony bookmarks Touch-tone interface navigation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Baecker, R., Booth, K. S., Jovivic, S., McGrenere, J., & Moore, G. (2000). Reducing the gap between what users know and what they need to know. In Proceedings of the conference on universal usability (pp. 17–23). New York: ACM Press. CrossRefGoogle Scholar
  2. Balentine, B. (1999). Re-Engineering the speech menu. In D. Gardner-Bonneau (Ed.), Human factors and voice interactive systems (pp. 205–235). Norwell: Kluwer Academic. Google Scholar
  3. Balentine, B., & Morgan, D. P. (1999). How to build a speech recognition application. San Ramon: Enterprise Integration Group, Inc. Google Scholar
  4. Blattner, M. M., Sumikawa, D. A., & Greenberg, R. M. (1989). Earcons and icons: Their structure and common design principles. Human Computer Interaction, 4(1), 11–44. CrossRefGoogle Scholar
  5. Brewster, S. A. (1998). Using nonspeech sounds to provide navigation cues. ACM Transactions on Computer-Human Interaction (TOCHI), 5(3), 224–259. CrossRefMathSciNetGoogle Scholar
  6. Gardner-Bonneau, D. (Ed.) (1999). Human factors and voice interactive systems. Norwell: Kluwer Academic. Google Scholar
  7. Carroll, J., & Carrithers, C. (1984). Blocking learner error states in a training-wheels system. Human Factors, 4(26), 377–389. Google Scholar
  8. Gong, L., & Lai, J. (2001). Shall we mix synthetic speech and human speech? Impact on users’ performance, perception, and attitude. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 158–165). New York: ACM Press. CrossRefGoogle Scholar
  9. Karat, C. M., Halverson, M., Horn, D., & Karat, J. (1999). Patterns of entry and correction in large vocabulary continuous speech recognition systems. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 568–575). New York: ACM Press. Google Scholar
  10. Leplatre, G., & Brewster, S. A. (2000). Designing non-speech sounds to support navigation in mobile phone menus. In International conference on auditory display, 2000. Google Scholar
  11. Linton, F., Joy, D., Schaefer, P., & Charron, A. (2000). Owl: A recommender system for organization-wide learning. Educational Technology & Society, 1(3), 62–76. Google Scholar
  12. McGrenere, J., & Moore, G. (2000). Are we all in the same “bloat”? In Graphics interface, pp. 187–186, 2000. Google Scholar
  13. McGrenere, J., Baecker, R. M., & Booth, K. S. (2002). An evaluation of a multiple interface design solution for bloated software. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 164–170). New York: ACM Press. Google Scholar
  14. McInnes, F. R., Nairn, I. A., Attwater, D. J., Edgington, M. D., & Jack, M. A. (1999). A comparison of confirmation strategies for fluent telephone dialogues. In Human factors in telecommunication, 1999. Google Scholar
  15. Pu, P., & Faltings, B. (2002). Personalized navigation of heterogeneous product spaces using smart client. In Proceedings of the 7th international conference on intelligent user interfaces (pp. 212–213). New York: ACM Press. CrossRefGoogle Scholar
  16. Resnick, P., & Virzi, R. A. (1992). Skip and scan: Cleaning up telephone interfaces. In Proceedings of ACM CHI’92 (pp. 419–426). New York: ACM Press. Google Scholar
  17. Roberts, T. L., & Engelbeck, G. (1989). The effects of device technology on the usability of advanced telephone functions. ACM SIGCHI Bulletin, 20(SI), 331–337. CrossRefGoogle Scholar
  18. Shajahan, P., & Irani, P. (2004). Representing hierarchies using multiple synthetic voices. In 8th international conference on information visualization (pp. 885–891). Los Alamitos: IEEE Computer Society. Google Scholar
  19. Smyth, B., McCarthy, K., & Reilly, J. (2005). Mobile portal personalization: tools and techniques. In B. Mobasher & S. S. Anand (Eds.), Intelligent techniques in Web personalization (pp. 255–271). Berlin: Springer. CrossRefGoogle Scholar
  20. Suhm, B., Freeman, B., & Getty, D. (2001). Curing the menu blues in touch-tone voice interfaces. In Extended abstracts on human factors in computer systems (pp. 131–132). New York: ACM Press. CrossRefGoogle Scholar
  21. Sumikawa, D. A. (1985). Guidelines for the integration of audio cues into computer user interfaces. Livermore: Lawrence Livermore National Laboratory. Google Scholar
  22. Tatchell, G. R. (1996). Problems with the existing telephony customer interface: The pending eclipse of touch-tone and dial-tone. In Conference companion on human factors in computing systems (pp. 242–243). New York: ACM Press. CrossRefGoogle Scholar
  23. Voicegenie. (2008).
  24. Yankelovich, N., Levow, G., & Marx, M. (1995). Designing speechacts: issues in speech user interfaces. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 369–376). New York: ACM Press/Addison-Wesley. Google Scholar
  25. Yin, M., & Zhai, S. (2006). The benefits of augmenting telephone voice menu navigation with visual browsing and search. In Proceedings of ACM CHI’06, pp. 319–328, 2006. Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  1. 1.University of ManitobaWinnipegCanada

Personalised recommendations