One family, many voices: Can multiple synthetic voices be used as navigational cues in hierarchical interfaces?
Rent the article at a discountRent now
* Final gross prices may vary according to local VAT.Get Access
Many commercial applications use synthetic speech for conveying information. In many cases the structure of the information is hierarchical (e.g. menus). In this article, we describe the results of two experiments that examine the possibility of conveying hierarchies (family of trees) using multiple synthetic voices. We postulate that if hierarchical structures can be conveyed using synthetic speech, then navigation in these hierarchies can be improved. In the first experiment, hierarchies containing 10 nodes, with a depth of 3 levels, were created. We used synthetic voices to represent nodes in these hierarchies. A within-subjects study (N = 12) was conducted to compare multiple synthetic voices against single synthetic voices for locating the positions of nodes in a hierarchy. Multiple synthetic voices were created by manipulating synthetic voice parameters according to a set of design principles. Results of the first experiment showed that the subjects performed the tasks significantly better with multiple synthetic voices than with single synthetic voices. To investigate the effect of multiple synthetic voices on complex hierarchies a second experiment was conducted. A hierarchy of 27 nodes was created and a between-subjects study (N = 16) was carried out. The results of this experiment showed that the participants recalled 84.38% of the nodes accurately. Results from these studies imply that multiple synthetic voices can be effectively used to represent and provide navigation cues in interfaces structured as hierarchies.
- Balentine, B. (1999). Re-Engineering the speech menu. In D. G. Bonneau (Ed.), Human Factors and Voice Interactive Systems. Massachusetts: Kluwer Academic Publishers, pp. 205–235.
- Blattner, M.M., Sumikawa, D.A., and Greenberg, R.M. (1989). Earcons and icons: Their structure and common design principles. Human Computer Interaction, 4(1):11–44. CrossRef
- Brave, S. and Nass, C. (2002). Emotion in human-computer interaction. In J. Jacko and A. Sears (Eds.), Handbook of Human-Computer Interaction. New York: Lawrence Erlbaum Associates, pp. 251–271.
- Brewster, S.A. (1998). Using nonspeech sounds to provide navigation cues. ACM Transactions on Computer-Human Interaction (TOCHI), 5(3):224–259. CrossRef
- Brewster, S.A. (2002). Chapter 12: Non-speech auditory output. In J. Jacko, and A. Sears (Eds.), The Human Computer Interaction Handbook. United States: Lawrence Erlbaum Associates, pp. 220–239.
- Brewster, S.A., Wright, P.C., and Edwards, A.D.N. (1992). A detailed investigation into the effectiveness of earcons. In G. Kramer (Ed.), Auditory Display. Sonification, Audification and Auditory Interfaces. The Proceedings of the First International Conference on Auditory Display, Santa Fe Institute, Santa Fe, NM: Addison-Wesley, pp. 471–498.
- Brewster, S.A, Wright, P.C., and Edwards, A.D.N. (1993). An evaluation of earcons for use in auditory human-computer interfaces. In INTERCHI Conference Proceedings. Amsterdam, Netherlands: ACM Press, pp. 222–227.
- Cahn, J. (1989). Generating expression in synthesized speech. Master’s thesis, Massachusetts Institute of Technology.
- Halstead-Nussloch, R. (1989). The design of phone-based interfaces for consumers. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. Texas, United States: ACM Press, pp. 347–352.
- Furui, S. (1986). Research on individuality features in speech waves and automatic speaker recognition techniques. Speech Communications, 5(2):183–197. CrossRef
- Greenspan, S.L., Nusbaum, H.C., and Pisoni, D.B. (1988). Perception of synthetic speech produced by rule: Intelligibility of eight text-to-speech systems. Behavior Research Methods, Instruments, & Computers, 18:100–107.
- Johnson, C.C., Hollien, H.F., and Hicks, J.W. (1984). Speaker identification utilizing selected temporal speech features. Journal of Phonetics, 12:319–326.
- Lai, J., Cheng, K., Green, P., and Tsimhoni, O. (2001). On the road and on the web? Comprehension of synthetic and human speech while driving. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM Press, pp. 206–212.
- Lai, J., Wood, D., and Considine, M. (2000). The effect of task conditions on the comprehensibility of synthetic speech. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. The Hague, The Netherlands: ACM Press, pp. 321–338.
- Larson, K. and Mowatt, D. (2003). A speech-based human-computer interaction system for automating directory assistance services. International Journal of Speech Technology, 6(62):145–159.
- Nass, C., Moon, Y., Fogg, B.J., Reeves, B., and Dryer, D.C. (1995). Can computer personalities be human personalities? International Journal of Human-Computer Studies, 43(2):223–239.
- Nass, C., Moon, Y., and Green, N. (1997). Are computers gender-neutral? Gender stereotypic responses to computers. Journal of Applied Social Psychology, 27(10):864–876. CrossRef
- Nusbaum, H.C. and Pisoni, D.B. (1985). Constraints on the perception of synthetic speech generated by rule. Behavior Research Methods, Instruments, & Computers, 17: 235–242.
- Resnick, P. and Virzi, R.A. (1992). Skip and scan: Cleaning up telephone interfaces. In Proceedings of ACM CHI’92. California, United States: ACM Press, pp. 419–426.
- Rosson, M.B. (1985). Using synthetic speech for remote access to information. Behavioral Research Methods and Instrumentation, 17(2):250–252.
- Sambur, M.R. (1975). Selection of acoustic features for speaker identification. IEEE Transactions on Acoustics, Speech, and Signal Processing, 23(2):176–182. CrossRef
- Shajahan, P. and Irani, P. (2003). Improving navigation in touch-tone interfaces. In Human Factors in Telecommunication (HFT). Berlin, Germany, pp. 145–152.
- Slowiaczek, L.M. and Nusbaum, H.C. (1985). Effects of speech rate and pitch contour on the perception of synthetic speech. Human Factors, 27(6):701–712.
- Stylianou, Y., Cappe, O., and E. Moulines, E. (1998). Continuous probabilistic transform for voice conversion. IEEE Transactions on Speech and Audio Processing, 6(2):131–142.
- Suhm, B., Freeman, B., and Getty, D. (2001). Curing the menu blues in touch-tone voice interfaces. Extended Abstracts on Human Factors in Computer Systems. Washington, United States: ACM Press, pp. 131–132.
- Sumikawa, D.A. (1985). Guidelines for the integration of audio cues into computer user interfaces, Lawrence Livermore National Laboratory. Livermore, California, United States.
- Vargas, M.L.M. and Anderson, S. (2003). Combining speech and earcons to assist menu navigation. In Proceedings of the 2003 International Conference on Auditory Display. Boston, United States: Boston University Publications.
- One family, many voices: Can multiple synthetic voices be used as navigational cues in hierarchical interfaces?
International Journal of Speech Technology
Volume 9, Issue 1-2 , pp 1-15
- Cover Date
- Print ISSN
- Online ISSN
- Kluwer Academic Publishers-Plenum Publishers
- Additional Links
- Multiple synthetic voices
- Auditory interfaces
- Navigation cues
- Industry Sectors